Re: [HACKERS] Proposal to add --single-row to psql

2013-05-13 Thread Peter Geoghegan
On Sun, May 12, 2013 at 1:18 PM, Jim Nasby j...@nasby.net wrote:
 FWIW, I've wanted the ability to plug into the parser not for an extension,
 but so that I could programmaticly enforce certain coding conventions.

Depending on the exact requirements, that probably wouldn't be too
difficult. It'd likely entail adding a hook for plugins to inspect raw
parse trees.

The pg_stat_statements core infrastructure changes for 9.2 added a
hook that allowed pg_stat_statements to inspect the post-parse
analysis (query) tree. That wouldn't be suitable for what you want,
though, because that tree loses many of the details that you are
likely to care about but are inconsequential to the semantics of the
query. For example, you couldn't tell from the post-parse analysis
tree if the query text was originally select all, the, columns from
table or select * from table.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logging of PAM Authentication Failure

2013-05-13 Thread Kyotaro HORIGUCHI
Hello,

  auth_failed() in src/backend/libpq/auth.c intentionally logs nothing for
  STATUS_EOF status (ie, client closed the connection without responding).
  But it looks like the PAM code path doesn't have a way to return that
  status code, even when pam_passwd_conv_proc() knows that that's what
  happened, and intentionally printed no log message itself (around line
  1870 in HEAD).  If there's another response code we could return through
  the PAM layer, this could be fixed, and I think it should be.
 
 So if I get this correctly, does this mean the only thing that needs
 to be fixed is unnecessary logging or is there a problem with
 authentication exchange itself in case of PAM? Also, when you say PAM
 layer, is that pam_passwd_conv_proc() that needs to be able to return
 an alternative status code?

Following is the point server requests psql to send password when
PAM is enabled.

backend/libpq/auth.c:1861
   if (strlen(passwd) == 0)
   {
 /*
  * Password wasn't passed to PAM the first time around -
  * let's go ask the client to send a password, which we
  * then stuff into PAM.
  */
 sendAuthRequest(pam_port_cludge, AUTH_REQ_PASSWORD);
 passwd = recv_password_packet(pam_port_cludge);
 if (passwd == NULL)
 {
   /*
* Client didn't want to send password.  We
* intentionally do not log anything about this.
*/
   goto fail;
...
return PAM_CONV_ERR;


This code seems to me expecting for psql to send password without
closing current connnection.On the other hand psql does as
follows.

bin/psql/startup.c: 227
pset.db = PQconnectdbParams(keywords, values, true);
free(keywords);
free(values);

if (PQstatus(pset.db) == CONNECTION_BAD 
  PQconnectionNeedsPassword(pset.db) 
  password == NULL 
  pset.getPassword != TRI_NO)
{
  PQfinish(pset.db);
  password = simple_prompt(password_prompt, 100, false);
  new_pass = true;
}

psql at once disconnects the current connection and reconnects
with this new password, so pam_conv_err is observed in server.

It seems to be a kind of protocol-mimatching. Client should'nt
disconnect for password request or server should fit to what psql
does. Is this wrong?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast promotion failure

2013-05-13 Thread Heikki Linnakangas

On 13.05.2013 06:07, Amit Kapila wrote:

On Monday, May 13, 2013 5:54 AM Kyotaro HORIGUCHI wrote:

Heikki said in the fist message in this thread that he suspected
the cause of the failure he had seen to be wrong TLI on whitch
checkpointer runs. Nevertheless, the patch you suggested for me
looks fixing it. Moreover (one of?) the failure from the same
cause looks fixed with the patch.


There were 2 problems:
1. There was some issue in walsender logic due to which after promotion in
some cases it hits assertion or error
2. During fast promotion, checkpoint gets created with wrong TLI

He has provided 2 different patches
fix-standby-promotion-assert-fail-2.patch and
fast-promotion-quick-fix.patch.
Among 2, he has already committed fix-standby-promotion-assert-fail-2.patch
(http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2ffa66f49
75c99e52984f7ee81b47d137b5b4751)


That's correct.


Is the point of this discussion that the patch may leave out some
glich about timing of timeline-related changing and Heikki saw an
egress of that?


AFAIU, the committed patch has some gap in overall scenario which is the
fast promotion issue.


Right, the fast promotion issue is still there.

Just to get us all on the same page again: Does anyone see a problem 
with a fresh git checkout, with the fast-promotion-quick-fix.patch 
applied? 
(http://www.postgresql.org/message-id/51894942.4080...@vmware.com). If 
you do, please speak up. As far as I know, the already-committed patch, 
together with fast-promotion-quick-fix.patch, should fix all known 
issues (*).


I haven't committed a fix for the issue I reported in this thread, 
because I'm not 100% on what the right fix for it would be. 
fast-promotion-quick-fix.patch seems to do the trick, but at least the 
comments need to be updated, and I'm not sure if there some related 
corner cases that it doesn't handle. Simon?


(*) Well, almost. This one is still pending: 
http://www.postgresql.org/message-id/cab7npqrhucuud012gcb_taafrixx2wion_zfxqcvlurab8d...@mail.gmail.com


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] erroneous restore into pg_catalog schema

2013-05-13 Thread Heikki Linnakangas

On 09.05.2013 18:24, Robert Haas wrote:

On Sat, May 4, 2013 at 3:59 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

Given the lack of any good alternative proposals, I think we should
press ahead with this approach.  We still need doc updates and fixes
for the affected contrib module(s), though.  Also, in view of point 2,
it seems like the extensions code should test for and reject an attempt
to set a relocatable extension's schema to pg_catalog.  Otherwise you'd
be likely to get not-too-intelligible errors from the extension script.


I looked into this a bit more.  It seems that adminpack is OK: it
already qualifies all of the objects it creates with the pg_catalog
schema.  With the previous patch applied, all of the built-in
extensions seem to install OK (except for uuid-ossp which I'm not set
up to build, but it doesn't look like a problem)  make check-world
also passes.  (adminpack actually has no regression tests, not even a
test that the extension installs, but it does.)

I looked for a suitable place to update the documentation and I don't
have any brilliant ideas.  The point that we're going to forbid
relocatable extensions from being created in pg_catalog seems too
minor to be worth noting; we don't document every trivial error
message that can occur for every command in the manual.  The point
that we won't create ANY objects in pg_catalog unless the CREATE
statement schema-qualifies the object name seems like it would be
worth pointing out, but I don't see an obvious place to do it.
Suggestions?

In the attached new version of the patch, I added an explicit check to
prevent relocatable extensions from being created in pg_catalog.


Do we really want to forbid that? What's wrong with doing e.g:

 create extension btree_gist schema pg_catalog;

I would consider btree_gist, along with many other extensions, as 
core-enough parts of the system that you'd want to install them in 
pg_catalog, to make them readily available for everyone. Besides, not 
allowing that would break backwards-compatibility; I bet there are a lot 
of people doing exactly that.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast promotion failure

2013-05-13 Thread Amit Kapila
On Monday, May 13, 2013 1:13 PM Heikki Linnakangas wrote:
 On 13.05.2013 06:07, Amit Kapila wrote:
  On Monday, May 13, 2013 5:54 AM Kyotaro HORIGUCHI wrote:
  Heikki said in the fist message in this thread that he suspected
  the cause of the failure he had seen to be wrong TLI on whitch
  checkpointer runs. Nevertheless, the patch you suggested for me
  looks fixing it. Moreover (one of?) the failure from the same
  cause looks fixed with the patch.
 
  There were 2 problems:
  1. There was some issue in walsender logic due to which after
 promotion in
  some cases it hits assertion or error
  2. During fast promotion, checkpoint gets created with wrong TLI
 
  He has provided 2 different patches
  fix-standby-promotion-assert-fail-2.patch and
  fast-promotion-quick-fix.patch.
  Among 2, he has already committed fix-standby-promotion-assert-fail-
 2.patch
 
 (http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2ffa
 66f49
  75c99e52984f7ee81b47d137b5b4751)
 
 That's correct.
 
  Is the point of this discussion that the patch may leave out some
  glich about timing of timeline-related changing and Heikki saw an
  egress of that?
 
  AFAIU, the committed patch has some gap in overall scenario which is
 the
  fast promotion issue.
 
 Right, the fast promotion issue is still there.
 
 Just to get us all on the same page again: Does anyone see a problem
 with a fresh git checkout, with the fast-promotion-quick-fix.patch
 applied?
 (http://www.postgresql.org/message-id/51894942.4080...@vmware.com). If
 you do, please speak up. As far as I know, the already-committed patch,
 together with fast-promotion-quick-fix.patch, should fix all known
 issues (*).
 
 I haven't committed a fix for the issue I reported in this thread,
 because I'm not 100% on what the right fix for it would be.
 fast-promotion-quick-fix.patch seems to do the trick, but at least the
 comments need to be updated, and I'm not sure if there some related
 corner cases that it doesn't handle. Simon?

The patch provided will un-necessarily call InitXLOGAccess() 2 times for End
of recovery checkpoint, it doesn't matter w.r.t performance but actually the
purpose will
be almost same for calling LocalSetXLogInsertAllowed() and InitXLOGAccess(),
or am I missing something.

One more thing, I think after fast promotion, either it should set timeline
or give error in CreateCheckPoint() function before it reaches the check
mentioned by you in your initial mail.
if (RecoveryInProgress()  (flags  CHECKPOINT_END_OF_RECOVERY) == 0) 
elog(ERROR, can't create a checkpoint during recovery);
Shouldn't it set timeline in above check (RecoveryInProgress()) or when
RecoveryInProgress() is called before CreateCheckPoint()?

With Regards,
Amit Kapila.







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logging of PAM Authentication Failure

2013-05-13 Thread Amit Langote
 This code seems to me expecting for psql to send password without
 closing current connnection.On the other hand psql does as
 follows.

 bin/psql/startup.c: 227
pset.db = PQconnectdbParams(keywords, values, true);
free(keywords);
free(values);

if (PQstatus(pset.db) == CONNECTION_BAD 
  PQconnectionNeedsPassword(pset.db) 
  password == NULL 
  pset.getPassword != TRI_NO)
{
  PQfinish(pset.db);
  password = simple_prompt(password_prompt, 100, false);
  new_pass = true;
}

 psql at once disconnects the current connection and reconnects
 with this new password, so pam_conv_err is observed in server.

 It seems to be a kind of protocol-mimatching. Client should'nt
 disconnect for password request or server should fit to what psql
 does. Is this wrong?

In fact, this is the behavior with all the authentication methods that
require a password. But, it is only in the case of PAM authentication
that auth_failed() logs error when first connection attempt is made
(without password), since the STATUS_EOF is not passed to it in that
case.
If we did not drop the connection (unlike what we do now) and
re-attempted connection with the password added to conn, would the
backend's authentication state still be waiting for the password? Can
we do away without having to create a second connection?
--
Amit Langote


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] lock support for aarch64

2013-05-13 Thread Mark Salter
I used the following patch to add lock support aarch64. It is just a
copy of the arm support based on gcc builtins. Postgresql built with
this patch passes the various tests.

diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h
index d4a783f..624a73b 100644
--- a/src/include/storage/s_lock.h
+++ b/src/include/storage/s_lock.h
@@ -335,6 +335,25 @@ tas(volatile slock_t *lock)
 #endif  /* __arm__ */
 
 
+/*
+ * Use gcc builtins for AArch64.
+ */
+#if defined(__aarch64__)
+#define HAS_TEST_AND_SET
+
+#define TAS(lock) tas(lock)
+
+typedef int slock_t;
+
+static __inline__ int
+tas(volatile slock_t *lock)
+{
+   return __sync_lock_test_and_set(lock, 1);
+}
+
+#define S_UNLOCK(lock) __sync_lock_release(lock)
+#endif /* __aarch64__ */
+
 /* S/390 and S/390x Linux (32- and 64-bit zSeries) */
 #if defined(__s390__) || defined(__s390x__)
 #define HAS_TEST_AND_SET




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-05-13 Thread k...@rice.edu
On Sun, May 12, 2013 at 03:46:00PM -0500, Jim Nasby wrote:
 On 5/10/13 1:06 PM, Jeff Janes wrote:
 Of course the paranoid DBA could turn off restart_after_crash and do a 
 manual investigation on every crash, but in that case the database would 
 refuse to restart even in the case where it perfectly clear that all the 
 following WAL belongs to the recycled file and not the current file.
 
 Perhaps we should also allow for zeroing out WAL files before reuse (or just 
 disable reuse). I know there's a performance hit there, but the reuse idea 
 happened before we had bgWriter. Theoretically the overhead creating a new 
 file would always fall to bgWriter and therefore not be a big deal.
 -- 
 Jim C. Nasby, Data Architect   j...@nasby.net
 512.569.9461 (cell) http://jim.nasby.net
 

Unless something has changed dramtically, creating new files is a LOT more
overhead than reusing existing files. My two cents.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-05-13 Thread k...@rice.edu
On Sun, May 12, 2013 at 07:41:26PM -0500, Jon Nelson wrote:
 On Sun, May 12, 2013 at 3:46 PM, Jim Nasby j...@nasby.net wrote:
  On 5/10/13 1:06 PM, Jeff Janes wrote:
 
  Of course the paranoid DBA could turn off restart_after_crash and do a
  manual investigation on every crash, but in that case the database would
  refuse to restart even in the case where it perfectly clear that all the
  following WAL belongs to the recycled file and not the current file.
 
 
  Perhaps we should also allow for zeroing out WAL files before reuse (or just
  disable reuse). I know there's a performance hit there, but the reuse idea
  happened before we had bgWriter. Theoretically the overhead creating a new
  file would always fall to bgWriter and therefore not be a big deal.
 
 For filesystems like btrfs, re-using a WAL file is suboptimal to
 simply creating a new one and removing the old one when it's no longer
 required. Using fallocate (or posix_fallocate) (I have a patch for
 that!) to create a new one is - by my tests - 28 times faster than the
 currently-used method.
 
 
 --
 Jon
 

What about for less cutting (bleeding) edge filesystems?

Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Better LWLocks with compare-and-swap (9.4)

2013-05-13 Thread Heikki Linnakangas
I've been working on-and-off on the WAL-insert scaling patch. It's in 
pretty good shape now, and I'll post it shortly, but one thing I noticed 
is that it benefits a lot from using an atomic compare-and-swap 
instruction for the contention-critical part.


I realized that we could also use compare-and-swap to make LWLocks scale 
better. The LWLock struct is too large to compare-and-swap atomically, 
but we can still use CAS to increment/decrement the shared/exclusive 
counters, when there's no need to manipulate the wait queue. That would 
help with workloads where you have a lot of CPUs, and a lot of backends 
need to acquire the same lwlock in shared mode, but there's no real 
contention (ie. few exclusive lockers).


pgbench -S is such a workload. With 9.3beta1, I'm seeing this profile, 
when I run pgbench -S -c64 -j64 -T60 -M prepared on a 32-core Linux 
machine:


-  64.09%  postgres  postgres   [.] tas
   - tas
  - 99.83% s_lock
 - 53.22% LWLockAcquire
+ 99.87% GetSnapshotData
 - 46.78% LWLockRelease
  GetSnapshotData
+ GetTransactionSnapshot
+   2.97%  postgres  postgres   [.] tas
+   1.53%  postgres  libc-2.13.so   [.] 0x119873
+   1.44%  postgres  postgres   [.] GetSnapshotData
+   1.29%  postgres  [kernel.kallsyms]  [k] arch_local_irq_enable
+   1.18%  postgres  postgres   [.] AllocSetAlloc
...

So, on this test, a lot of time is wasted spinning on the mutex of 
ProcArrayLock. If you plot a graph of TPS vs. # of clients, there is a 
surprisingly steep drop in performance once you go beyond 29 clients 
(attached, pgbench-lwlock-cas-local-clients-sets.png, red line). My 
theory is that after that point all the cores are busy, and processes 
start to be sometimes context switched while holding the spinlock, which 
kills performance. Has anyone else seen that pattern? Curiously, I don't 
see that when connecting pgbench via TCP over localhost, only when 
connecting via unix domain sockets. Overall performance is higher over 
unix domain sockets, so I guess the TCP layer adds some overhead, 
hurting performance, and also affects scheduling somehow, making the 
steep drop go away.


Using a compare-and-swap instruction in place of spinning solves the 
problem (green line in attached graph). This needs some more testing 
with different workloads to make sure it doesn't hurt other scenarios, 
but I don't think it will. I'm also waiting for a colleague to test this 
on a different machine, where he saw a similar steep drop in performance.


The attached patch is still work-in-progress. There needs to be a 
configure test and fallback to spinlock if a CAS instruction is not 
available. I used the gcc __sync_val_compare_and_swap() builtin 
directly, that needs to be abstracted. Also, in the case that the wait 
queue needs to be manipulated, the code spins on the CAS instruction, 
but there is no delay mechanism like there is on a regular spinlock; 
that needs to be added in somehow.


- Heikki
attachment: pgbench-lwlock-cas-local-clients-sets.pngdiff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 4f88d3f..0d92a30 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -36,13 +36,27 @@
 /* We use the ShmemLock spinlock to protect LWLockAssign */
 extern slock_t *ShmemLock;
 
+/*
+ * The part of LWLock struct that needs to be swappable with an atomic
+ * compare-and-swap instruction. Assuming we have an atomic 64-bit CAS
+ * instruction, sizeof(cas_t) must be = 8.
+ */
+typedef union
+{
+	struct
+	{
+		bool	mutex;			/* Protects LWLock and queue of PGPROCs */
+		bool	releaseOK;		/* T if ok to release waiters */
+		bool	waiters;		/* T if wait queue is not empty (head != NULL) */
+		char	exclusive;		/* # of exclusive holders (0 or 1) */
+		uint32	shared;			/* # of shared holders (0..MaxBackends) */
+	} f;
+	uint64 val;
+} LWLock_cas_t;
 
 typedef struct LWLock
 {
-	slock_t		mutex;			/* Protects LWLock and queue of PGPROCs */
-	bool		releaseOK;		/* T if ok to release waiters */
-	char		exclusive;		/* # of exclusive holders (0 or 1) */
-	int			shared;			/* # of shared holders (0..MaxBackends) */
+	LWLock_cas_t		casval;
 	PGPROC	   *head;			/* head of list of waiting PGPROCs */
 	PGPROC	   *tail;			/* tail of list of waiting PGPROCs */
 	/* tail is undefined when head is NULL */
@@ -267,6 +281,9 @@ CreateLWLocks(void)
 	char	   *ptr;
 	int			id;
 
+	StaticAssertStmt(sizeof(LWLock_cas_t) == sizeof(uint64),
+	 unexpected size of LWLock_cas_t);
+
 	/* Allocate space */
 	ptr = (char *) ShmemAlloc(spaceLocks);
 
@@ -283,10 +300,10 @@ CreateLWLocks(void)
 	 */
 	for (id = 0, lock = LWLockArray; id  numLocks; id++, lock++)
 	{
-		SpinLockInit(lock-lock.mutex);
-		lock-lock.releaseOK = true;
-		lock-lock.exclusive = 0;
-		lock-lock.shared = 0;
+		lock-lock.casval.val = 0; /* clear possible padding */
+		lock-lock.casval.f.releaseOK = true;
+		

Re: [HACKERS] lock support for aarch64

2013-05-13 Thread Heikki Linnakangas

On 13.05.2013 15:39, Mark Salter wrote:

I used the following patch to add lock support aarch64. It is just a
copy of the arm support based on gcc builtins. Postgresql built with
this patch passes the various tests.


I think this needs an #ifdef HAVE_GCC_INT_ATOMICS, like the ARM codepath.

If we are to support aarch64, it would be good to have an aarch64 
machine on the buildfarm. Could you arrange that? See 
http://buildfarm.postgresql.org/


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-05-13 Thread Jon Nelson
On Mon, May 13, 2013 at 7:49 AM, k...@rice.edu k...@rice.edu wrote:
 On Sun, May 12, 2013 at 07:41:26PM -0500, Jon Nelson wrote:
 On Sun, May 12, 2013 at 3:46 PM, Jim Nasby j...@nasby.net wrote:
  On 5/10/13 1:06 PM, Jeff Janes wrote:
 
  Of course the paranoid DBA could turn off restart_after_crash and do a
  manual investigation on every crash, but in that case the database would
  refuse to restart even in the case where it perfectly clear that all the
  following WAL belongs to the recycled file and not the current file.
 
 
  Perhaps we should also allow for zeroing out WAL files before reuse (or 
  just
  disable reuse). I know there's a performance hit there, but the reuse idea
  happened before we had bgWriter. Theoretically the overhead creating a new
  file would always fall to bgWriter and therefore not be a big deal.

 For filesystems like btrfs, re-using a WAL file is suboptimal to
 simply creating a new one and removing the old one when it's no longer
 required. Using fallocate (or posix_fallocate) (I have a patch for
 that!) to create a new one is - by my tests - 28 times faster than the
 currently-used method.


 What about for less cutting (bleeding) edge filesystems?

The patch would be applicable for any filesystem which implements the
fallocate/posix_fallocate calls in an efficient fashion. xfs and ext4
would both work, if I recall properly. I'm certain there are others,
and the technique would probably work on other operating systems like
the *BSDs, etc.. Additionally, it's improbable that there would be a
performance hit for other filesystems versus simply writing zeroes,
since that's the approach that is taken anyway as a fallback.  Another
win is reduction in fragmentation. I would love to be able to turn off
WAL recycling to perform more useful testing.

--
Jon


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-05-13 Thread Andres Freund
On 2013-05-12 19:41:26 -0500, Jon Nelson wrote:
 On Sun, May 12, 2013 at 3:46 PM, Jim Nasby j...@nasby.net wrote:
  On 5/10/13 1:06 PM, Jeff Janes wrote:
 
  Of course the paranoid DBA could turn off restart_after_crash and do a
  manual investigation on every crash, but in that case the database would
  refuse to restart even in the case where it perfectly clear that all the
  following WAL belongs to the recycled file and not the current file.
 
 
  Perhaps we should also allow for zeroing out WAL files before reuse (or just
  disable reuse). I know there's a performance hit there, but the reuse idea
  happened before we had bgWriter. Theoretically the overhead creating a new
  file would always fall to bgWriter and therefore not be a big deal.
 
 For filesystems like btrfs, re-using a WAL file is suboptimal to
 simply creating a new one and removing the old one when it's no longer
 required. Using fallocate (or posix_fallocate) (I have a patch for
 that!) to create a new one is - by my tests - 28 times faster than the
 currently-used method.

I don't think the comparison between just fallocate()ing and what we
currently do is fair. fallocate() doesn't guarantee that the file is the
same size after a crash, so you would still need an fsync() or we
couldn't use fdatasync() anymore. And I'd guess the benefits aren't all
that big anymore in that case?
That said, using posix_fallocate seems like a good idea in lots of
places inside pg, its just not all that easy to do in some of the
places.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GENERAL] pg_upgrade fails, mismatch of relation OID - 9.1.9 to 9.2.4

2013-05-13 Thread Steve Singer

On 05/11/2013 04:58 PM, Bruce Momjian wrote:

On Fri, May 10, 2013 at 08:03:38PM -0400, Bruce Momjian wrote:

OK, this verifies that the table had a lot of DDL churn.  I have no idea
how to pursue this further because I am unsure how we are going to
replicate the operations performed on this table in the past, as you
mentioned much of this was before your time on the job.

Evan, I suggest you force a toast table on the table by doing:

ALTER TABLE bpm.setupinfo ADD COLUMN dummy TEXT;

Then drop the column.  That will create a toast table and will allow
pg_upgrade to succeed.

FYI, I did test adding a TEXT column and altering a column to TEXT on
Postgres 9.1, and both created a toast table.  I am still have no clues
about what would have caused the missing toast table.



I once saw a case where a varchar(x) column was changed to something 
larger by manually updating the catalog with an UPDATE statement on 
pg_attribute.atttypmod. Everything was fine until they tried pg_upgrade 
which failed because the DDL to create the table from pg_dump with the 
larger column creates a table that had a toast table but the original 
table in the 8.3 cluster did not have a toast table.


Steve




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-05-13 Thread Jon Nelson
On Mon, May 13, 2013 at 8:32 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-05-12 19:41:26 -0500, Jon Nelson wrote:
 On Sun, May 12, 2013 at 3:46 PM, Jim Nasby j...@nasby.net wrote:
  On 5/10/13 1:06 PM, Jeff Janes wrote:
 
  Of course the paranoid DBA could turn off restart_after_crash and do a
  manual investigation on every crash, but in that case the database would
  refuse to restart even in the case where it perfectly clear that all the
  following WAL belongs to the recycled file and not the current file.
 
 
  Perhaps we should also allow for zeroing out WAL files before reuse (or 
  just
  disable reuse). I know there's a performance hit there, but the reuse idea
  happened before we had bgWriter. Theoretically the overhead creating a new
  file would always fall to bgWriter and therefore not be a big deal.

 For filesystems like btrfs, re-using a WAL file is suboptimal to
 simply creating a new one and removing the old one when it's no longer
 required. Using fallocate (or posix_fallocate) (I have a patch for
 that!) to create a new one is - by my tests - 28 times faster than the
 currently-used method.

 I don't think the comparison between just fallocate()ing and what we
 currently do is fair. fallocate() doesn't guarantee that the file is the
 same size after a crash, so you would still need an fsync() or we
 couldn't use fdatasync() anymore. And I'd guess the benefits aren't all
 that big anymore in that case?

fallocate (16MB) + fsync is still almost certainly faster than
write+write+write... + fsync.
The test I performed at the time did exactly that .. posix_fallocate + pg_fsync.

 That said, using posix_fallocate seems like a good idea in lots of
 places inside pg, its just not all that easy to do in some of the
 places.

I should not derail this thread any further. Perhaps, if interested
parties would like to discuss the use of fallocate/posix_fallocate, a
new thread might be more appropriate?


--
Jon


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lock support for aarch64

2013-05-13 Thread Mark Salter
On Mon, 2013-05-13 at 16:15 +0300, Heikki Linnakangas wrote:
 On 13.05.2013 15:39, Mark Salter wrote:
  I used the following patch to add lock support aarch64. It is just a
  copy of the arm support based on gcc builtins. Postgresql built with
  this patch passes the various tests.
 
 I think this needs an #ifdef HAVE_GCC_INT_ATOMICS, like the ARM codepath.

Yes. Would it be better to do something like:

+/*
+ * Use gcc builtins if available.
+ */
+#if !defined(HAS_TEST_AND_SET)  defined(HAVE_GCC_INT_ATOMICS)
+#define HAS_TEST_AND_SET
+
+#define TAS(lock) tas(lock)
+
+typedef int slock_t;
+
+static __inline__ int
+tas(volatile slock_t *lock)
+{
+   return __sync_lock_test_and_set(lock, 1);
+}
+
+#define S_UNLOCK(lock) __sync_lock_release(lock)
+#endif
+
 /* Blow up if we didn't have any way to do spinlocks */
 #ifndef HAS_TEST_AND_SET
 #error PostgreSQL does not have native spinlock support on this platform.  To 
continue the compilation, rerun configure using --disable-spinlocks.  However, 
performance will be poor.  Please report this to pgsql-b...@postgresql.org.


 
 If we are to support aarch64, it would be good to have an aarch64 
 machine on the buildfarm. Could you arrange that? See 
 http://buildfarm.postgresql.org/
 

Right now, all we have is a simulator. It takes about 24hrs to build and
test the Fedora RPM. Hardware will start to be available soon. But yes,
I think we could arrange to add to the buildfarm.

--Mark




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-05-13 Thread Andres Freund
On 2013-05-13 08:45:41 -0500, Jon Nelson wrote:
 On Mon, May 13, 2013 at 8:32 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2013-05-12 19:41:26 -0500, Jon Nelson wrote:
  On Sun, May 12, 2013 at 3:46 PM, Jim Nasby j...@nasby.net wrote:
   On 5/10/13 1:06 PM, Jeff Janes wrote:
  
   Of course the paranoid DBA could turn off restart_after_crash and do a
   manual investigation on every crash, but in that case the database would
   refuse to restart even in the case where it perfectly clear that all the
   following WAL belongs to the recycled file and not the current file.
  
  
   Perhaps we should also allow for zeroing out WAL files before reuse (or 
   just
   disable reuse). I know there's a performance hit there, but the reuse 
   idea
   happened before we had bgWriter. Theoretically the overhead creating a 
   new
   file would always fall to bgWriter and therefore not be a big deal.
 
  For filesystems like btrfs, re-using a WAL file is suboptimal to
  simply creating a new one and removing the old one when it's no longer
  required. Using fallocate (or posix_fallocate) (I have a patch for
  that!) to create a new one is - by my tests - 28 times faster than the
  currently-used method.
 
  I don't think the comparison between just fallocate()ing and what we
  currently do is fair. fallocate() doesn't guarantee that the file is the
  same size after a crash, so you would still need an fsync() or we
  couldn't use fdatasync() anymore. And I'd guess the benefits aren't all
  that big anymore in that case?
 
 fallocate (16MB) + fsync is still almost certainly faster than
 write+write+write... + fsync.
 The test I performed at the time did exactly that .. posix_fallocate + 
 pg_fsync.
Sure, the initial file creation will be faster. But are the actual
individual wal writes (small, frequently fdatasync()ed) still faster?
That's the critical path currently.
Whether it is pretty much depends on how the filesystem manages
allocated but not initialized blocks...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GENERAL] pg_upgrade fails, mismatch of relation OID - 9.1.9 to 9.2.4

2013-05-13 Thread Bruce Momjian
On Sun, May 12, 2013 at 10:43:41PM -0400, Evan D. Hoffman wrote:
 Adding  dropping a column resolved the problem.  Currently vacuuming the new
 cluster.  Thanks for your help everybody!

Great!  Someday we will figure out what caused it.

---


 
 
 On Sat, May 11, 2013 at 4:58 PM, Bruce Momjian br...@momjian.us wrote:
 
 On Fri, May 10, 2013 at 08:03:38PM -0400, Bruce Momjian wrote:
  On Fri, May 10, 2013 at 12:36:21PM -0400, Evan D. Hoffman wrote:
   pg.dropped.16 INTEGER /* dummy */,
   pg.dropped.17 INTEGER /* dummy */,
   pg.dropped.18 INTEGER /* dummy */,
   pg.dropped.19 INTEGER /* dummy */,
   pg.dropped.20 INTEGER /* dummy */,
   pg.dropped.21 INTEGER /* dummy */,
   pg.dropped.22 INTEGER /* dummy */,
   pg.dropped.23 INTEGER /* dummy */,
   pg.dropped.24 INTEGER /* dummy */,
   pg.dropped.25 INTEGER /* dummy */,
   pg.dropped.26 INTEGER /* dummy */,
   ha27 character varying(10) DEFAULT 'UNKNOWN'::character varying,
   pg.dropped.28 INTEGER /* dummy */,
   dr29 character varying(10)
 
  OK, this verifies that the table had a lot of DDL churn.  I have no idea
  how to pursue this further because I am unsure how we are going to
  replicate the operations performed on this table in the past, as you
  mentioned much of this was before your time on the job.
 
  Evan, I suggest you force a toast table on the table by doing:
 
ALTER TABLE bpm.setupinfo ADD COLUMN dummy TEXT;
 
  Then drop the column.  That will create a toast table and will allow
  pg_upgrade to succeed.
 
 FYI, I did test adding a TEXT column and altering a column to TEXT on
 Postgres 9.1, and both created a toast table.  I am still have no clues
 about what would have caused the missing toast table.
 
 --
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com
 
   + It's impossible for everything to be true. +
 
 

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GENERAL] pg_upgrade fails, mismatch of relation OID - 9.1.9 to 9.2.4

2013-05-13 Thread Bruce Momjian
On Mon, May 13, 2013 at 09:36:51AM -0400, Steve Singer wrote:
 On 05/11/2013 04:58 PM, Bruce Momjian wrote:
 On Fri, May 10, 2013 at 08:03:38PM -0400, Bruce Momjian wrote:
 OK, this verifies that the table had a lot of DDL churn.  I have no idea
 how to pursue this further because I am unsure how we are going to
 replicate the operations performed on this table in the past, as you
 mentioned much of this was before your time on the job.
 
 Evan, I suggest you force a toast table on the table by doing:
 
 ALTER TABLE bpm.setupinfo ADD COLUMN dummy TEXT;
 
 Then drop the column.  That will create a toast table and will allow
 pg_upgrade to succeed.
 FYI, I did test adding a TEXT column and altering a column to TEXT on
 Postgres 9.1, and both created a toast table.  I am still have no clues
 about what would have caused the missing toast table.
 
 
 I once saw a case where a varchar(x) column was changed to something
 larger by manually updating the catalog with an UPDATE statement on
 pg_attribute.atttypmod. Everything was fine until they tried
 pg_upgrade which failed because the DDL to create the table from
 pg_dump with the larger column creates a table that had a toast
 table but the original table in the 8.3 cluster did not have a toast
 table.

That is a good point. We used to tell users they could manually update
pg_attribute to increase the length of a column --- that obviously will
not work anymore with pg_upgrade, and now that we have an ALTER TABLE
that can handle it without a table rewrite since PG 9.1, there is no
reason for users to need to adjust pg_attribute.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add more regression tests for dbcommands

2013-05-13 Thread Tom Lane
Fabien COELHO coe...@cri.ensmp.fr writes:
 Would you be okay if there is one/a few effective create/drop (some tests 
 check that the create or drop fails e.g. depending on permissions, which 
 ISTM is not tested anywhere else), so that tests for various ALTER 
 DATABASE commands are combined together onto these databases?

TBH, I do not see that such tests are worth adding, if they are going to
significantly slow down the core regression tests.  Those tests are run
probably hundreds of times a day, in aggregate across all Postgres
developers.  Adding ten seconds or whatever this would add is a major
cost, while the benefit appears trivial.

We could consider adding expensive low-value tests like these to some
alternate regression target that's only exercised by buildfarm members,
perhaps.  But I think there's probably a point of diminishing returns
even in that context.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Better LWLocks with compare-and-swap (9.4)

2013-05-13 Thread Merlin Moncure
On Mon, May 13, 2013 at 7:50 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 The attached patch is still work-in-progress. There needs to be a configure
 test and fallback to spinlock if a CAS instruction is not available. I used
 the gcc __sync_val_compare_and_swap() builtin directly, that needs to be
 abstracted. Also, in the case that the wait queue needs to be manipulated,
 the code spins on the CAS instruction, but there is no delay mechanism like
 there is on a regular spinlock; that needs to be added in somehow.

These are really interesting results. Why is the CAS method so much
faster then TAS?  Did you see any contention?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lock support for aarch64

2013-05-13 Thread Merlin Moncure
On Mon, May 13, 2013 at 8:15 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 13.05.2013 15:39, Mark Salter wrote:

 I used the following patch to add lock support aarch64. It is just a
 copy of the arm support based on gcc builtins. Postgresql built with
 this patch passes the various tests.


 I think this needs an #ifdef HAVE_GCC_INT_ATOMICS, like the ARM codepath.

I'm starting to wonder why we don't always use gcc atomics if they are
available and assembly implementation is not (any maybe, even if it
is).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] MemoryContextAllocHuge(): selectively bypassing MaxAllocSize

2013-05-13 Thread Noah Misch
A memory chunk allocated through the existing palloc.h interfaces is limited
to MaxAllocSize (~1 GiB).  This is best for most callers; SET_VARSIZE() need
not check its own 1 GiB limit, and algorithms that grow a buffer by doubling
need not check for overflow.  However, a handful of callers are quite happy to
navigate those hazards in exchange for the ability to allocate a larger chunk.

This patch introduces MemoryContextAllocHuge() and repalloc_huge() that check
a higher MaxAllocHugeSize limit of SIZE_MAX/2.  Chunks don't bother recording
whether they were allocated as huge; one can start with palloc() and then
repalloc_huge() to grow the value.  To demonstrate, I put this to use in
tuplesort.c; the patch also updates tuplestore.c to keep them similar.  Here's
the trace_sort from building the pgbench_accounts primary key at scale factor
7500, maintenance_work_mem = '56GB'; memtuples itself consumed 17.2 GiB:

LOG:  internal sort ended, 48603324 KB used: CPU 75.65s/305.46u sec elapsed 
391.21 sec

Compare:

LOG:  external sort ended, 1832846 disk blocks used: CPU 77.45s/988.11u sec 
elapsed 1146.05 sec

This was made easier by tuplesort growth algorithm improvements in commit
8ae35e91807508872cabd3b0e8db35fc78e194ac.  The problem has come up before
(TODO item Allow sorts to use more available memory), and Tom floated the
idea[1] behind the approach I've used.  The next limit faced by sorts is
INT_MAX concurrent tuples in memory, which limits helpful work_mem to about
150 GiB when sorting int4.

I have not added variants like palloc_huge() and palloc0_huge(), and I have
not added to the frontend palloc.h interface.  There's no particular barrier
to doing any of that.  I don't expect more than a dozen or so callers, so most
of the variations might go unused.

The comment at MaxAllocSize said that aset.c expects doubling the size of an
arbitrary allocation to never overflow, but I couldn't find the code in
question.  AllocSetAlloc() does double sizes of blocks used to aggregate small
allocations, so maxBlockSize had better stay under SIZE_MAX/2.  Nonetheless,
that expectation does apply to dozens of repalloc() users outside aset.c, and
I preserved it for repalloc_huge().  64-bit builds will never notice, and I
won't cry for the resulting 2 GiB limit on 32-bit.

Thanks,
nm

[1] http://www.postgresql.org/message-id/19908.1297696...@sss.pgh.pa.us

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com
*** a/src/backend/utils/mmgr/aset.c
--- b/src/backend/utils/mmgr/aset.c
***
*** 557,562  AllocSetDelete(MemoryContext context)
--- 557,566 
   * AllocSetAlloc
   *Returns pointer to allocated memory of given size; memory is 
added
   *to the set.
+  *
+  * No request may exceed:
+  *MAXALIGN_DOWN(SIZE_MAX) - ALLOC_BLOCKHDRSZ - ALLOC_CHUNKHDRSZ
+  * All callers use a much-lower limit.
   */
  static void *
  AllocSetAlloc(MemoryContext context, Size size)
*** a/src/backend/utils/mmgr/mcxt.c
--- b/src/backend/utils/mmgr/mcxt.c
***
*** 451,464  MemoryContextContains(MemoryContext context, void *pointer)
header = (StandardChunkHeader *)
((char *) pointer - STANDARDCHUNKHEADERSIZE);
  
!   /*
!* If the context link doesn't match then we certainly have a non-member
!* chunk.  Also check for a reasonable-looking size as extra guard 
against
!* being fooled by bogus pointers.
!*/
!   if (header-context == context  AllocSizeIsValid(header-size))
!   return true;
!   return false;
  }
  
  /*
--- 451,457 
header = (StandardChunkHeader *)
((char *) pointer - STANDARDCHUNKHEADERSIZE);
  
!   return header-context == context;
  }
  
  /*
***
*** 735,740  repalloc(void *pointer, Size size)
--- 728,790 
  }
  
  /*
+  * MemoryContextAllocHuge
+  *Allocate (possibly-expansive) space within the specified 
context.
+  *
+  * See considerations in comment at MaxAllocHugeSize.
+  */
+ void *
+ MemoryContextAllocHuge(MemoryContext context, Size size)
+ {
+   AssertArg(MemoryContextIsValid(context));
+ 
+   if (!AllocHugeSizeIsValid(size))
+   elog(ERROR, invalid memory alloc request size %lu,
+(unsigned long) size);
+ 
+   context-isReset = false;
+ 
+   return (*context-methods-alloc) (context, size);
+ }
+ 
+ /*
+  * repalloc_huge
+  *Adjust the size of a previously allocated chunk, permitting a 
large
+  *value.  The previous allocation need not have been huge.
+  */
+ void *
+ repalloc_huge(void *pointer, Size size)
+ {
+   StandardChunkHeader *header;
+ 
+   /*
+* Try to detect bogus pointers handed to us, poorly though we can.
+* Presumably, a pointer that isn't MAXALIGNED isn't pointing at an
+* allocated chunk.
+*/
+   

[HACKERS] Parallel Sort

2013-05-13 Thread Noah Misch
It would be great if one client session could take advantage of multiple CPU
cores.  EnterpriseDB wishes to start the trek into this problem space for 9.4
by implementing parallel internal (i.e. not spilling to disk) sort.  This
touches on a notable subset of the infrastructure components we'll need for
parallel general query.  My intent is to map out the key design topics, hear
about critical topics I hadn't considered, and solicit feedback on the quality
of the high-level plan.  Full designs for key pieces will come later.


* Worker Backends

A multi-process model, rather than a multi-thread, is best for introducing
parallelism to a PostgreSQL session.  With threading, all in-memory state
would become shared by default; with processes, in-memory data structures
remain unshared until we explicitly select them for sharing.  Some data
structures will require new sharing, but I expect unshared ones to remain the
typical case.  I refer to these additional processes as worker backends.  They
will have much in common with ordinary client backends, including a database
connection and a PGPROC.  They will have no client socket; instead, they will
take direction from the client-connected backend, termed the master, via
shared memory.

Each worker needs to make SnapshotNow visibility decisions coherent with the
master.  For sorting, this allows us to look up comparison functions, even
when the current transaction created or modified those functions.  This will
also be an essential building block for any parallelism project that consults
user tables.  Implementing this means copying the subtransaction stack and the
combocid hash to each worker.  For the sake of completeness, we should also
copy the global MVCC snapshot data (sorting probably won't care).  It also
means forbidding, while a parallel task is in flight, operations that affect
the transaction state:

- CommandCounterIncrement()
- GetCurrentCommandId(true)
- AssignTransactionId()
- subtransaction-management functions

I don't intend to commit us to a design fundamentally antagonistic to, for
example, starting a subtransaction in a worker backend.  That being said, I
think we could achieve a lot in the parallel query space before we would deem
it important to relax those restrictions.

Workers will copy all GUCs from the master.  bttextcmp() needs lc_collate, and
we'll inevitably reach the point of needing broad GUC coverage as we make more
things run in parallel.  Further GUC changes will be forbidden while a
parallel task is in flight.  (More generally, anything that mutates
backend-local state without reference to shared state will either need to be
forbidden or converted to route through shared state.)

The heavyweight locking mechanism will need to be aware of the association
between the master and its workers.  For sorting, workers will acquire
heavyweight locks on system catalogs only.  It is almost enough for the
workers to behave as though they are independent of the master for locking
purposes.  But an undetected deadlock would arise if the master holds an
AccessExclusiveLock on a system catalog a worker needs.

We can allocate a small amount of permanent shared memory for coordination
among a group of processes, but sorting will benefit from a region as large as
maintenance_work_mem.  Expect on-demand memory sharing.


* Identifying Parallel-Compatible Functions

Not all functions can reasonably run on a worker backend.  We should not
presume that a VOLATILE function can tolerate the unstable execution order
imposed by parallelism, though a function like clock_timestamp() is perfectly
reasonable to run that way.  STABLE does not have that problem, but neither
does it constitute a promise that the function implementation is compatible
with parallel execution.  Consider xid_age(), which would need code changes to
operate correctly in parallel.  IMMUTABLE almost guarantees enough; there may
come a day when all IMMUTABLE functions can be presumed parallel-safe.  For
now, an IMMUTABLE function could cause trouble by starting a (read-only)
subtransaction.  The bottom line is that parallel-compatibility needs to be
separate from volatility classes for the time being.

I'm not sure what the specific answer here should look like.  Simply having a
CREATE FUNCTION ... PARALLEL_IS_FINE flag is not entirely satisfying, because
the rules are liable to loosen over time.


* Planner  Similar Issues

We should decide whether to actually sort in parallel based on the comparator
cost and the data size.  The system currently has no information on comparator
cost: bt*cmp (and indeed almost all built-in functions) all have procost=1,
but bttextcmp is at least 1000x slower than btint4cmp.  Let's improve the
procost estimates of all core B-tree and hash operators.  This will have other
benefits, but we will need to be cognizant of the risk of upsetting setups
that have tuned cpu_operator_cost based on the present situation.

The choice of whether to parallelize can 

Re: [HACKERS] Parallel Sort

2013-05-13 Thread Andres Freund
Hi,

Interesting! Need to think about most, but one piece immediately came to
mind:

On 2013-05-13 10:28:59 -0400, Noah Misch wrote:
 Each worker needs to make SnapshotNow visibility decisions coherent with the
 master.  For sorting, this allows us to look up comparison functions, even
 when the current transaction created or modified those functions.

I don't really see how you can achieve that given how SnapshotNow
works. There's nothing protecting you against one backend seeing changes
made by another transaction while another doesn't see them. SnapshotNow
doesn't even guarantee consistency within a single backend during a
single scan...
If you are meaning the above to just apply to changes made by the local
master backend, sure I can see that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add more regression tests for dbcommands

2013-05-13 Thread Fabien COELHO


Hello,


Would you be okay if there is one/a few effective create/drop (some tests
check that the create or drop fails e.g. depending on permissions, which
ISTM is not tested anywhere else), so that tests for various ALTER
DATABASE commands are combined together onto these databases?


TBH, I do not see that such tests are worth adding, if they are going to
significantly slow down the core regression tests.  Those tests are run
probably hundreds of times a day, in aggregate across all Postgres
developers.  Adding ten seconds or whatever this would add is a major
cost, while the benefit appears trivial.



We could consider adding expensive low-value tests like these to some
alternate regression target that's only exercised by buildfarm members,
perhaps.  But I think there's probably a point of diminishing returns
even in that context.


I'm not sure that the tests are low value, because a commit that would 
generate a failure on a permission check test would be a potential 
security issue for Pg.


My personal experience in other contexts is that small sanity checks help 
avoid blunders at a small cost. It is also worthwhile to have significant 
functional tests, such as replication and so on...


As for the cost, if the proposed tests are indeed too costly, what is not 
necessarily the case for what I have seen, I do not think that it would be 
a great problem to have two set of tests, with one a superset of the 
other, with some convention.


It is enough that these tests are run once in a while to raise an alert if 
need be, especially before a release, and not necessarily on every make 
check of every developer in the world, so that we get some value at very 
low cost. So, as you suggest implicitely, maybe make check and make 
longcheck or make shortcheck in the test infrastructure would do the 
trick?


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Sort

2013-05-13 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 Each worker needs to make SnapshotNow visibility decisions coherent with the
 master.  For sorting, this allows us to look up comparison functions, even
 when the current transaction created or modified those functions.  This will
 also be an essential building block for any parallelism project that consults
 user tables.  Implementing this means copying the subtransaction stack and the
 combocid hash to each worker.

 [ ... and GUC settings, and who knows what else ... ]

This approach seems to me to be likely to guarantee that the startup
overhead for any parallel sort is so large that only fantastically
enormous sorts will come out ahead.

I think you need to think in terms of restricting the problem space
enough so that the worker startup cost can be trimmed to something
reasonable.  One obvious suggestion is to forbid the workers from
doing any database access of their own at all --- the parent would
have to do any required catalog lookups for sort functions etc.
before forking the children.

I think we should also seriously think about relying on fork() and
copy-on-write semantics to launch worker subprocesses, instead of
explicitly copying so much state over to them.  Yes, this would
foreclose ever having parallel query on Windows, but that's okay
with me (hm, now where did I put my asbestos longjohns ...)

Both of these lines of thought suggest that the workers should *not*
be full-fledged backends.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add more regression tests for dbcommands

2013-05-13 Thread Andres Freund
On 2013-05-13 16:52:08 +0200, Fabien COELHO wrote:
 
 Hello,
 
 Would you be okay if there is one/a few effective create/drop (some tests
 check that the create or drop fails e.g. depending on permissions, which
 ISTM is not tested anywhere else), so that tests for various ALTER
 DATABASE commands are combined together onto these databases?
 
 TBH, I do not see that such tests are worth adding, if they are going to
 significantly slow down the core regression tests.  Those tests are run
 probably hundreds of times a day, in aggregate across all Postgres
 developers.  Adding ten seconds or whatever this would add is a major
 cost, while the benefit appears trivial.
 
 We could consider adding expensive low-value tests like these to some
 alternate regression target that's only exercised by buildfarm members,
 perhaps.  But I think there's probably a point of diminishing returns
 even in that context.
 
 I'm not sure that the tests are low value, because a commit that would
 generate a failure on a permission check test would be a potential security
 issue for Pg.

 As for the cost, if the proposed tests are indeed too costly, what is not
 necessarily the case for what I have seen, I do not think that it would be a
 great problem to have two set of tests, with one a superset of the other,
 with some convention.

Well, tests like permission tests aren't the expensive part. The actual
CREATE/DROP DATABASE you do is. The latter essentially are already
tested by the buildfarm already.
So, trimming the patch to do only the fast stuff should be less
controversial?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-05-13 Thread Greg Stark
On Mon, May 13, 2013 at 2:49 PM, Andres Freund and...@2ndquadrant.com wrote:
 Sure, the initial file creation will be faster. But are the actual
 individual wal writes (small, frequently fdatasync()ed) still faster?
 That's the critical path currently.
 Whether it is pretty much depends on how the filesystem manages
 allocated but not initialized blocks...

In ext4 aIui it doesn't actually pick target blocks. It just adjusts
the accounting so it knows that many blocks will be needed for this
file and guarantees they'll be available. If you read from them it
knows to provide 0s. So in theory the performance in the critical path
would be worse but I think by an insignificant amount.

The reason Postgres pre-allocates the blocks is not for the
performance optimization. It's for safety. To guarantee -- as best as
possible -- that it won't get a write error when the time comes to
write to it. Especially to guarantee that the disk won't suddenly turn
out to be full.

It seems possible that some file systems would not protect you against
media errors nearly as well using it. It might take time to respond to
a media error and in a poorly written filesystem it might even be
reported to the application even though there's no need. But media
errors can occur any time, even after the initial write so I don't
think this should be a blocker.  I think posix_fallocate is good
enough for us and I would support using it.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Sort

2013-05-13 Thread Andres Freund
On 2013-05-13 10:57:39 -0400, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  Each worker needs to make SnapshotNow visibility decisions coherent with the
  master.  For sorting, this allows us to look up comparison functions, even
  when the current transaction created or modified those functions.  This will
  also be an essential building block for any parallelism project that 
  consults
  user tables.  Implementing this means copying the subtransaction stack and 
  the
  combocid hash to each worker.
 
  [ ... and GUC settings, and who knows what else ... ]
 
 This approach seems to me to be likely to guarantee that the startup
 overhead for any parallel sort is so large that only fantastically
 enormous sorts will come out ahead.

I think if this is the way to go - and I am not sure it is - we need to
use some worker pool that then are (re-)used everytime someone needs to
do a sort. Which would be easier if backends could switch databases...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-05-13 Thread Andres Freund
On 2013-05-13 16:03:11 +0100, Greg Stark wrote:
 On Mon, May 13, 2013 at 2:49 PM, Andres Freund and...@2ndquadrant.com wrote:
  Sure, the initial file creation will be faster. But are the actual
  individual wal writes (small, frequently fdatasync()ed) still faster?
  That's the critical path currently.
  Whether it is pretty much depends on how the filesystem manages
  allocated but not initialized blocks...
 
 In ext4 aIui it doesn't actually pick target blocks. It just adjusts
 the accounting so it knows that many blocks will be needed for this
 file and guarantees they'll be available. If you read from them it
 knows to provide 0s. So in theory the performance in the critical path
 would be worse but I think by an insignificant amount.
 
 The reason Postgres pre-allocates the blocks is not for the
 performance optimization. It's for safety. To guarantee -- as best as
 possible -- that it won't get a write error when the time comes to
 write to it. Especially to guarantee that the disk won't suddenly turn
 out to be full.

posix_fallocate() guarantees that. And if you fsync() the file
afterwards its even supposed to still have enough space after the crash.

After a successful call to posix_fallocate(), subsequent writes to
bytes in the specified range are guaranteed not to fail because of lack
of disk space.


 It seems possible that some file systems would not protect you against
 media errors nearly as well using it.

True. The same probably is true for modifications of existing files for
those fancy COW filesystems...

 I think posix_fallocate is good enough for us and I would support
 using it.

Me too, although this isn't the place where I'd be really excited to see
a patch implementing it properly ;)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lock support for aarch64

2013-05-13 Thread Tom Lane
Mark Salter msal...@redhat.com writes:
 I used the following patch to add lock support aarch64. It is just a
 copy of the arm support based on gcc builtins. Postgresql built with
 this patch passes the various tests.

Couldn't we just do

-#if defined(__arm__) || defined(__arm)
+#if defined(__arm__) || defined(__arm) || defined(__aarch64__)

in the existing ARM code block?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lock support for aarch64

2013-05-13 Thread Heikki Linnakangas

On 13.05.2013 17:26, Merlin Moncure wrote:

I'm starting to wonder why we don't always use gcc atomics if they are
available and assembly implementation is not (any maybe, even if it
is).


That was discussed a while ago, but there were a lot of claims that 
__sync_lock_test_and_set is broken on many various platforms: 
http://www.postgresql.org/message-id/flat/5642.1324482...@sss.pgh.pa.us#5642.1324482...@sss.pgh.pa.us. 
The situation might've improved since, but I guess we should proceed 
cautiously.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MemoryContextAllocHuge(): selectively bypassing MaxAllocSize

2013-05-13 Thread Pavel Stehule
+1

Pavel
Dne 13.5.2013 16:29 Noah Misch n...@leadboat.com napsal(a):

 A memory chunk allocated through the existing palloc.h interfaces is
 limited
 to MaxAllocSize (~1 GiB).  This is best for most callers; SET_VARSIZE()
 need
 not check its own 1 GiB limit, and algorithms that grow a buffer by
 doubling
 need not check for overflow.  However, a handful of callers are quite
 happy to
 navigate those hazards in exchange for the ability to allocate a larger
 chunk.

 This patch introduces MemoryContextAllocHuge() and repalloc_huge() that
 check
 a higher MaxAllocHugeSize limit of SIZE_MAX/2.  Chunks don't bother
 recording
 whether they were allocated as huge; one can start with palloc() and then
 repalloc_huge() to grow the value.  To demonstrate, I put this to use in
 tuplesort.c; the patch also updates tuplestore.c to keep them similar.
  Here's
 the trace_sort from building the pgbench_accounts primary key at scale
 factor
 7500, maintenance_work_mem = '56GB'; memtuples itself consumed 17.2 GiB:

 LOG:  internal sort ended, 48603324 KB used: CPU 75.65s/305.46u sec
 elapsed 391.21 sec

 Compare:

 LOG:  external sort ended, 1832846 disk blocks used: CPU 77.45s/988.11u
 sec elapsed 1146.05 sec

 This was made easier by tuplesort growth algorithm improvements in commit
 8ae35e91807508872cabd3b0e8db35fc78e194ac.  The problem has come up before
 (TODO item Allow sorts to use more available memory), and Tom floated the
 idea[1] behind the approach I've used.  The next limit faced by sorts is
 INT_MAX concurrent tuples in memory, which limits helpful work_mem to about
 150 GiB when sorting int4.

 I have not added variants like palloc_huge() and palloc0_huge(), and I have
 not added to the frontend palloc.h interface.  There's no particular
 barrier
 to doing any of that.  I don't expect more than a dozen or so callers, so
 most
 of the variations might go unused.

 The comment at MaxAllocSize said that aset.c expects doubling the size of
 an
 arbitrary allocation to never overflow, but I couldn't find the code in
 question.  AllocSetAlloc() does double sizes of blocks used to aggregate
 small
 allocations, so maxBlockSize had better stay under SIZE_MAX/2.
  Nonetheless,
 that expectation does apply to dozens of repalloc() users outside aset.c,
 and
 I preserved it for repalloc_huge().  64-bit builds will never notice, and I
 won't cry for the resulting 2 GiB limit on 32-bit.

 Thanks,
 nm

 [1] http://www.postgresql.org/message-id/19908.1297696...@sss.pgh.pa.us

 --
 Noah Misch
 EnterpriseDB http://www.enterprisedb.com


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




Re: [HACKERS] Add more regression tests for dbcommands

2013-05-13 Thread Robins Tharakan
I believe Tom / Andres and Fabien all have valid points.

Net-net, I believe the tests although non-negotiable, are not required to
be in make-check. For now, its the slow tests that are the pain points
here, and then I would soon try to prune them and commit once again.

Whether it goes in make-check or not is obviously not discretion but to me
their importance is undoubted since I am sure they increase code-coverage.
Actually that is 'how' I create those tests, I look at untouched code and
create new tests that check untouched code.

Would try to revert with a faster script (preferably with minimal
CREATE/DROP).

--
Robins Tharakan


On 13 May 2013 20:30, Andres Freund and...@2ndquadrant.com wrote:

 On 2013-05-13 16:52:08 +0200, Fabien COELHO wrote:
 
  Hello,
 
  Would you be okay if there is one/a few effective create/drop (some
 tests
  check that the create or drop fails e.g. depending on permissions,
 which
  ISTM is not tested anywhere else), so that tests for various ALTER
  DATABASE commands are combined together onto these databases?
  
  TBH, I do not see that such tests are worth adding, if they are going to
  significantly slow down the core regression tests.  Those tests are run
  probably hundreds of times a day, in aggregate across all Postgres
  developers.  Adding ten seconds or whatever this would add is a major
  cost, while the benefit appears trivial.
 
  We could consider adding expensive low-value tests like these to some
  alternate regression target that's only exercised by buildfarm members,
  perhaps.  But I think there's probably a point of diminishing returns
  even in that context.
 
  I'm not sure that the tests are low value, because a commit that would
  generate a failure on a permission check test would be a potential
 security
  issue for Pg.

  As for the cost, if the proposed tests are indeed too costly, what is not
  necessarily the case for what I have seen, I do not think that it would
 be a
  great problem to have two set of tests, with one a superset of the other,
  with some convention.

 Well, tests like permission tests aren't the expensive part. The actual
 CREATE/DROP DATABASE you do is. The latter essentially are already
 tested by the buildfarm already.
 So, trimming the patch to do only the fast stuff should be less
 controversial?

 Greetings,

 Andres Freund

 --
  Andres Freund http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services



Re: [HACKERS] lock support for aarch64

2013-05-13 Thread Heikki Linnakangas

On 13.05.2013 18:14, Tom Lane wrote:

Mark Saltermsal...@redhat.com  writes:

I used the following patch to add lock support aarch64. It is just a
copy of the arm support based on gcc builtins. Postgresql built with
this patch passes the various tests.


Couldn't we just do

-#if defined(__arm__) || defined(__arm)
+#if defined(__arm__) || defined(__arm) || defined(__aarch64__)

in the existing ARM code block?


That would imply falling back to swpb instruction also on aarch64, which 
won't work.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] erroneous restore into pg_catalog schema

2013-05-13 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 09.05.2013 18:24, Robert Haas wrote:
 In the attached new version of the patch, I added an explicit check to
 prevent relocatable extensions from being created in pg_catalog.

 Do we really want to forbid that?

The only alternative I see is the one proposed in
http://www.postgresql.org/message-id/12365.1358098...@sss.pgh.pa.us:

 I think maybe what we should do is have namespace.c retain an explicit
 notion that the first schema listed in search_path didn't exist, and
 then throw errors if any attempt is made to create objects without an
 explicitly specified namespace.

which is also pretty grotty.  Robert pointed out that it's inconsistent
with the traditional behavior of the default setting $user, public.
Now, we could continue to treat $user as a special case, but that's just
stacking more kluges onto the pile.

BTW, looking back over the thread, I notice we have also not done
anything about this newly-introduced inconsistency:

regression=# create table pg_catalog.t(f1 int);
CREATE TABLE
regression=# drop table pg_catalog.t;
ERROR:  permission denied: t is a system catalog

Surely allow_system_table_mods should allow both or neither of those.

Perhaps, if we fixed that, the need to prevent table creations in
pg_catalog via search_path semantics changes would be lessened.

I believe the DROP prohibition is mainly there to prevent
drop table pg_catalog.pg_proc;
ERROR:  permission denied: pg_proc is a system catalog
but that thinking predates the invention of pg_depend.  If this
check were removed, you'd still be prevented from dropping pg_proc
because it's pinned.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lock support for aarch64

2013-05-13 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 13.05.2013 18:14, Tom Lane wrote:
 Couldn't we just do
 -#if defined(__arm__) || defined(__arm)
 +#if defined(__arm__) || defined(__arm) || defined(__aarch64__)

 That would imply falling back to swpb instruction also on aarch64, which 
 won't work.

It doesn't work on current ARM32 chips either, but no one has complained
about the existing coding.  At least on ARM64 there'd be a likelihood
that you'd get an assembler error rather than an executable that crashes
at launch.

I suppose we could consider adding

#ifdef __aarch64__
#error ...
#endif

in the section for not-HAVE_GCC_INT_ATOMICS, but I'm doubtful it's
worth the trouble.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Sort

2013-05-13 Thread Robert Haas
On Mon, May 13, 2013 at 10:57 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 This approach seems to me to be likely to guarantee that the startup
 overhead for any parallel sort is so large that only fantastically
 enormous sorts will come out ahead.

 I think you need to think in terms of restricting the problem space
 enough so that the worker startup cost can be trimmed to something
 reasonable.  One obvious suggestion is to forbid the workers from
 doing any database access of their own at all --- the parent would
 have to do any required catalog lookups for sort functions etc.
 before forking the children.

 I think we should also seriously think about relying on fork() and
 copy-on-write semantics to launch worker subprocesses, instead of
 explicitly copying so much state over to them.  Yes, this would
 foreclose ever having parallel query on Windows, but that's okay
 with me (hm, now where did I put my asbestos longjohns ...)

 Both of these lines of thought suggest that the workers should *not*
 be full-fledged backends.

Eventually, PostgreSQL needs not only parallel sort, but a more
general parallel query facility.  The goal here is not to design
something specific to parallel sort, but to provide a general
infrastructure for server-side parallelism.  If we restrict ourselves
to a design where syscache lookups aren't possible from a worker
backend, I have trouble seeing how that's ever gonna work.  That's a
very low-level facility that a lot of things rely on.  Even if you
could make the shuttling of requests between master and slave
transparent to the backend code, it's cutting into the amount of
actual parallelizable stuff, and adding very significantly to the
overhead.

I don't see any reason to panic about the worker startup cost.  I
don't know whether the stuff that Noah mentioned copying will take 10
microseconds or 100 milliseconds, but there are plenty of sorts that
take large numbers of seconds or minutes to happen, so there's still
plenty of opportunity for win there.  By definition, the things that
you want to run in parallel are the ones that take a long time if you
don't run them in parallel.  Now, of course, if we can reduce the cost
of starting new backends (e.g. by keeping them around from one
parallel operation to the next, or by starting them via fork), that
will expand the range of cases where parallelism is a win.  But I
think we could win in plenty of interesting real-world cases even if
it took a full second to initialize each new worker, and surely it
won't be nearly that bad.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] erroneous restore into pg_catalog schema

2013-05-13 Thread Robert Haas
On Mon, May 13, 2013 at 11:48 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 09.05.2013 18:24, Robert Haas wrote:
 In the attached new version of the patch, I added an explicit check to
 prevent relocatable extensions from being created in pg_catalog.

 Do we really want to forbid that?

 The only alternative I see is the one proposed in
 http://www.postgresql.org/message-id/12365.1358098...@sss.pgh.pa.us:

Let me propose another alternative: it would be relatively
straightforward to allow this to work differently in extension scripts
than it does in general; it's already contingent in whether we're in
bootstrap-processing mode, and there's no reason we couldn't add some
other flag that gets set during extension-script processing mode, if
there isn't one already, and make it contingent on that also.  I think
what we're concerned with is mostly preventing accidental object
creation in pg_catalog, and allowing extensions to be created there
wouldn't interfere with that.

 I believe the DROP prohibition is mainly there to prevent
 drop table pg_catalog.pg_proc;
 ERROR:  permission denied: pg_proc is a system catalog
 but that thinking predates the invention of pg_depend.  If this
 check were removed, you'd still be prevented from dropping pg_proc
 because it's pinned.

Well, +1 for relaxing that restriction, no matter what else we do.
But that only makes an accidental restore into pg_catalog less sucky,
not less likely.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] erroneous restore into pg_catalog schema

2013-05-13 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, May 13, 2013 at 11:48 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Heikki Linnakangas hlinnakan...@vmware.com writes:
 Do we really want to forbid that?

 The only alternative I see is the one proposed in
 http://www.postgresql.org/message-id/12365.1358098...@sss.pgh.pa.us:

 Let me propose another alternative: it would be relatively
 straightforward to allow this to work differently in extension scripts
 than it does in general;

That's just making the rules even more impossibly complicated (rules
that aren't documented, and we've not found any obvious place to
document them, so people aren't going to read whatever we do come up
with...)

The original objective of commit 880bfc328 was to simplify the rules
about search_path interpretation.  I'm not really happy about adding
a bunch of different, but just as obscure, rules in the name of making
things easier to use.  We'd be better off just reverting that patch IMO.

 I believe the DROP prohibition is mainly there to prevent
 drop table pg_catalog.pg_proc;
 ERROR:  permission denied: pg_proc is a system catalog
 but that thinking predates the invention of pg_depend.  If this
 check were removed, you'd still be prevented from dropping pg_proc
 because it's pinned.

 Well, +1 for relaxing that restriction, no matter what else we do.
 But that only makes an accidental restore into pg_catalog less sucky,
 not less likely.

Another way to fix that inconsistency is to consider that
allow_system_table_mods should gate table creations not just drops in
pg_catalog.  I'm not real sure why this wasn't the case all along ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] erroneous restore into pg_catalog schema

2013-05-13 Thread Tom Lane
I wrote:
 Another way to fix that inconsistency is to consider that
 allow_system_table_mods should gate table creations not just drops in
 pg_catalog.  I'm not real sure why this wasn't the case all along ...

Uh, scratch that last comment: actually, allow_system_table_mods *did*
gate that, in every existing release.  I bitched upthread about the fact
that this was changed in 9.3, and did not hear any very satisfactory
defense of the change.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GENERAL] pg_upgrade fails, mismatch of relation OID - 9.1.9 to 9.2.4

2013-05-13 Thread Jerry Sievers
Bruce Momjian br...@momjian.us writes:

 On Fri, May 10, 2013 at 08:03:38PM -0400, Bruce Momjian wrote:

 On Fri, May 10, 2013 at 12:36:21PM -0400, Evan D. Hoffman wrote:
  pg.dropped.16 INTEGER /* dummy */,
  pg.dropped.17 INTEGER /* dummy */,
  pg.dropped.18 INTEGER /* dummy */,
  pg.dropped.19 INTEGER /* dummy */,
  pg.dropped.20 INTEGER /* dummy */,
  pg.dropped.21 INTEGER /* dummy */,
  pg.dropped.22 INTEGER /* dummy */,
  pg.dropped.23 INTEGER /* dummy */,
  pg.dropped.24 INTEGER /* dummy */,
  pg.dropped.25 INTEGER /* dummy */,
  pg.dropped.26 INTEGER /* dummy */,
  ha27 character varying(10) DEFAULT 'UNKNOWN'::character varying,
  pg.dropped.28 INTEGER /* dummy */,
  dr29 character varying(10)
 
 OK, this verifies that the table had a lot of DDL churn.  I have no idea
 how to pursue this further because I am unsure how we are going to
 replicate the operations performed on this table in the past, as you
 mentioned much of this was before your time on the job.
 
 Evan, I suggest you force a toast table on the table by doing:
 
  ALTER TABLE bpm.setupinfo ADD COLUMN dummy TEXT;
 
 Then drop the column.  That will create a toast table and will allow
 pg_upgrade to succeed.

 FYI, I did test adding a TEXT column and altering a column to TEXT on
 Postgres 9.1, and both created a toast table.  I am still have no clues
 about what would have caused the missing toast table.

Possibly manual catalog updates to change a varchar(N) to text and
whoopsie!  That may be one explanation.

   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com

   + It's impossible for everything to be true. +


 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consult...@comcast.net
p: 312.241.7800


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] erroneous restore into pg_catalog schema

2013-05-13 Thread Robert Haas
On Mon, May 13, 2013 at 12:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 Another way to fix that inconsistency is to consider that
 allow_system_table_mods should gate table creations not just drops in
 pg_catalog.  I'm not real sure why this wasn't the case all along ...

 Uh, scratch that last comment: actually, allow_system_table_mods *did*
 gate that, in every existing release.  I bitched upthread about the fact
 that this was changed in 9.3, and did not hear any very satisfactory
 defense of the change.

It disallowed it only for tables, and not for any other object type.
I found that completely arbitrary.  It's perfectly obvious that people
want to be able to create objects in pg_catalog; shall we adopt a rule
that you can put extension there, as long as those extensions don't
happen to contain tables?  That is certainly confusing and arbitrary.

I suppose we could add a GUC, separate from allow_system_table_mods,
to allow specifically adding and dropping objects in pg_catalog.  It
would be consistent, and there would sure be a place to document it.
And it would make it easy to emit the right error-hint.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add more regression tests for dbcommands

2013-05-13 Thread Fabien COELHO



As for the cost, if the proposed tests are indeed too costly, what is not
necessarily the case for what I have seen, I do not think that it would be a
great problem to have two set of tests, with one a superset of the other,
with some convention.


Well, tests like permission tests aren't the expensive part. The actual
CREATE/DROP DATABASE you


Not me, but Robins Tharakan.

do is. The latter essentially are already tested by the buildfarm 
already.


Yep, that is what Tom was arguing.


So, trimming the patch to do only the fast stuff should be less
controversial?


Yes, I think so.

I think that allowing expensive tests such as a full archive or 
replication setup would be nice as well, so maybe having several level of 
tests would be a good thing anyway.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] erroneous restore into pg_catalog schema

2013-05-13 Thread Andres Freund
On 2013-05-13 12:59:47 -0400, Robert Haas wrote:
 On Mon, May 13, 2013 at 12:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  I wrote:
  Another way to fix that inconsistency is to consider that
  allow_system_table_mods should gate table creations not just drops in
  pg_catalog.  I'm not real sure why this wasn't the case all along ...
 
  Uh, scratch that last comment: actually, allow_system_table_mods *did*
  gate that, in every existing release.  I bitched upthread about the fact
  that this was changed in 9.3, and did not hear any very satisfactory
  defense of the change.
 
 It disallowed it only for tables, and not for any other object type.
 I found that completely arbitrary.  It's perfectly obvious that people
 want to be able to create objects in pg_catalog; shall we adopt a rule
 that you can put extension there, as long as those extensions don't
 happen to contain tables?  That is certainly confusing and arbitrary.

Why don't we just prohibit deletion/modification for anything below
FirstNormalObjectId instead of using the schema as a restriction? Then
we can allow creation for tables as well.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] erroneous restore into pg_catalog schema

2013-05-13 Thread Robert Haas
On Mon, May 13, 2013 at 1:03 PM, Andres Freund and...@2ndquadrant.com wrote:
 It disallowed it only for tables, and not for any other object type.
 I found that completely arbitrary.  It's perfectly obvious that people
 want to be able to create objects in pg_catalog; shall we adopt a rule
 that you can put extension there, as long as those extensions don't
 happen to contain tables?  That is certainly confusing and arbitrary.

 Why don't we just prohibit deletion/modification for anything below
 FirstNormalObjectId instead of using the schema as a restriction? Then
 we can allow creation for tables as well.

We currently do, but that led to problems with $SUBJECT.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] erroneous restore into pg_catalog schema

2013-05-13 Thread Andres Freund
On 2013-05-13 13:04:52 -0400, Robert Haas wrote:
 On Mon, May 13, 2013 at 1:03 PM, Andres Freund and...@2ndquadrant.com wrote:
  It disallowed it only for tables, and not for any other object type.
  I found that completely arbitrary.  It's perfectly obvious that people
  want to be able to create objects in pg_catalog; shall we adopt a rule
  that you can put extension there, as long as those extensions don't
  happen to contain tables?  That is certainly confusing and arbitrary.
 
  Why don't we just prohibit deletion/modification for anything below
  FirstNormalObjectId instead of using the schema as a restriction? Then
  we can allow creation for tables as well.
 
 We currently do, but that led to problems with $SUBJECT.

But we currently don't allow to drop. Which is confusingly
inconsistent. And allowing object creation withing pg_catalog only from
within extension scripts and not from normal SQL sounds like a *very*
poor workaround giving problems to quite some people upgrading from
earlier releases. Especially from those where we didn't have extensions.

And I don't see why allowing consistent relation creation/removal from
pg_catalog is conflicting with fixing the issue at hand?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] erroneous restore into pg_catalog schema

2013-05-13 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, May 13, 2013 at 12:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Uh, scratch that last comment: actually, allow_system_table_mods *did*
 gate that, in every existing release.  I bitched upthread about the fact
 that this was changed in 9.3, and did not hear any very satisfactory
 defense of the change.

 It disallowed it only for tables, and not for any other object type.
 I found that completely arbitrary.

No doubt, but nonetheless the name of the GUC is allow_system_TABLE_mods,
not allow_system_object_mods.  And removing just one part of its
longstanding functionality, in a way that creates the type of pg_dump
hazard this thread started with, is as arbitrary as things get.

We don't have time anymore to redesign this for 9.3, so I think just
putting back that one error check might be a reasonable fix for now.

 I suppose we could add a GUC, separate from allow_system_table_mods,
 to allow specifically adding and dropping objects in pg_catalog.

+1 ... for 9.4.  Or maybe the right thing is to replace all these tests
with checks on whether the objects are pinned (which, again, already
happens for the DROP case).  It's not immediately clear to me why we
need any of these restrictions for non-pinned objects.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] erroneous restore into pg_catalog schema

2013-05-13 Thread Heikki Linnakangas

On 13.05.2013 19:59, Robert Haas wrote:

On Mon, May 13, 2013 at 12:35 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

I wrote:

Another way to fix that inconsistency is to consider that
allow_system_table_mods should gate table creations not just drops in
pg_catalog.  I'm not real sure why this wasn't the case all along ...


Uh, scratch that last comment: actually, allow_system_table_mods *did*
gate that, in every existing release.  I bitched upthread about the fact
that this was changed in 9.3, and did not hear any very satisfactory
defense of the change.


It disallowed it only for tables, and not for any other object type.
I found that completely arbitrary.  It's perfectly obvious that people
want to be able to create objects in pg_catalog; shall we adopt a rule
that you can put extension there, as long as those extensions don't
happen to contain tables?  That is certainly confusing and arbitrary.


Makes sense to me, actually. It's quite sensible to put functions, 
operators, etc. in pg_catalog. Especially if they're part of an 
extension. But I can't think of a good reason for putting a table in 
pg_catalog. Maybe some sort of control data for an extension, but seems 
like a kludge. Its contents wouldn't be included in pg_dump, for example.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] erroneous restore into pg_catalog schema

2013-05-13 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, May 13, 2013 at 1:03 PM, Andres Freund and...@2ndquadrant.com wrote:
 Why don't we just prohibit deletion/modification for anything below
 FirstNormalObjectId instead of using the schema as a restriction? Then
 we can allow creation for tables as well.

 We currently do, but that led to problems with $SUBJECT.

AFAIR there are no code restrictions based on OID value.  We've got
restrictions based on things being in pg_catalog or not, and we've got
restrictions based on things being marked pinned in pg_depend.

Looking at the OID range might be a reasonable proxy for pinned-ness,
though, and it would certainly be a lot cheaper than a lookup in
pg_depend.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] erroneous restore into pg_catalog schema

2013-05-13 Thread Andres Freund
On 2013-05-13 13:40:57 -0400, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Mon, May 13, 2013 at 1:03 PM, Andres Freund and...@2ndquadrant.com 
  wrote:
  Why don't we just prohibit deletion/modification for anything below
  FirstNormalObjectId instead of using the schema as a restriction? Then
  we can allow creation for tables as well.
 
  We currently do, but that led to problems with $SUBJECT.

 AFAIR there are no code restrictions based on OID value.  We've got
 restrictions based on things being in pg_catalog or not, and we've got
 restrictions based on things being marked pinned in pg_depend.

 Looking at the OID range might be a reasonable proxy for pinned-ness,
 though, and it would certainly be a lot cheaper than a lookup in
 pg_depend.

It might need a slight change in GetNewObjectId() though:
if (IsPostmasterEnvironment)
{
/* wraparound in normal environment */
ShmemVariableCache-nextOid = FirstNormalObjectId;
ShmemVariableCache-oidCount = 0;
}
else
{
/* we may be bootstrapping, so don't enforce the full 
range */
if (ShmemVariableCache-nextOid  ((Oid) 
FirstBootstrapObjectId))
{
/* wraparound in standalone environment? */
ShmemVariableCache-nextOid = 
FirstBootstrapObjectId;
ShmemVariableCache-oidCount = 0;
}
}

I think we shouldn't check IsPostmasterEnvironment here but instead
IsBootstrapProcessingMode() since we otherwise can generate oids below
FirstNormalObjectId in --single mode. Imo that should be fixed
independently though, given the comment it looks like either an
oversight or the check predating the existance of
IsBootstrapProcessingMode().


Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Sort

2013-05-13 Thread Noah Misch
On Mon, May 13, 2013 at 04:39:01PM +0200, Andres Freund wrote:
 On 2013-05-13 10:28:59 -0400, Noah Misch wrote:
  Each worker needs to make SnapshotNow visibility decisions coherent with the
  master.  For sorting, this allows us to look up comparison functions, even
  when the current transaction created or modified those functions.
 
 I don't really see how you can achieve that given how SnapshotNow
 works. There's nothing protecting you against one backend seeing changes
 made by another transaction while another doesn't see them. SnapshotNow
 doesn't even guarantee consistency within a single backend during a
 single scan...
 If you are meaning the above to just apply to changes made by the local
 master backend, sure I can see that.

Yes; it only makes the workers consistent with the master to the extent that
the master is consistent with itself.  However, your comment makes me see that
this may not be enough.  For an object not protected by locks, if parallel
execution probes the syscache N times where the current code probes it only
once, we'll introduce new anomalies.  I don't think this affects sorting in
particular, because we already make little effort to behave sanely when a
comparison function is redefined mid-sort.  It seems likely that this will
need a better answer sooner or later as we move into the parallelism space.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] erroneous restore into pg_catalog schema

2013-05-13 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 I think we shouldn't check IsPostmasterEnvironment here but instead
 IsBootstrapProcessingMode() since we otherwise can generate oids below
 FirstNormalObjectId in --single mode.

That is, in fact, exactly what we want to do and must do during initdb.
If you change anything about this code you'll break the way the
post-bootstrap initdb steps assign OIDs.

(The comments about wraparound are slightly misleading, since those
code blocks also execute during the first OID assignment in normal
mode and the first one in post-bootstrap initdb processing, respectively.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] erroneous restore into pg_catalog schema

2013-05-13 Thread Andres Freund
On 2013-05-13 14:35:47 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  I think we shouldn't check IsPostmasterEnvironment here but instead
  IsBootstrapProcessingMode() since we otherwise can generate oids below
  FirstNormalObjectId in --single mode.
 
 That is, in fact, exactly what we want to do and must do during initdb.
 If you change anything about this code you'll break the way the
 post-bootstrap initdb steps assign OIDs.

Well, then we should use some other way to discern from those both
cases. If you currently execute CREATE TABLE or something else in
--single user mode the database cannot safely be pg_upgraded anymore
since the oids might already be used in a freshly initdb'ed cluster in
the new version.
Or am I missing something here?

DROPing and recreating a new index in --single mode isn't that
uncommon...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] erroneous restore into pg_catalog schema

2013-05-13 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2013-05-13 14:35:47 -0400, Tom Lane wrote:
 That is, in fact, exactly what we want to do and must do during initdb.
 If you change anything about this code you'll break the way the
 post-bootstrap initdb steps assign OIDs.

 Well, then we should use some other way to discern from those both
 cases. If you currently execute CREATE TABLE or something else in
 --single user mode the database cannot safely be pg_upgraded anymore
 since the oids might already be used in a freshly initdb'ed cluster in
 the new version.

[ shrug... ]  In the list of ways you can break your system in --single
mode, that one has got to be exceedingly far down the list.

 DROPing and recreating a new index in --single mode isn't that
 uncommon...

Surely you'd just REINDEX it instead.  Moreover, if it isn't a system
index already, why are you doing this in --single mode at all?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] erroneous restore into pg_catalog schema

2013-05-13 Thread Andres Freund
On 2013-05-13 14:48:52 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2013-05-13 14:35:47 -0400, Tom Lane wrote:
  That is, in fact, exactly what we want to do and must do during initdb.
  If you change anything about this code you'll break the way the
  post-bootstrap initdb steps assign OIDs.
 
  Well, then we should use some other way to discern from those both
  cases. If you currently execute CREATE TABLE or something else in
  --single user mode the database cannot safely be pg_upgraded anymore
  since the oids might already be used in a freshly initdb'ed cluster in
  the new version.
 
 [ shrug... ]  In the list of ways you can break your system in --single
 mode, that one has got to be exceedingly far down the list.

Well, sure there are loads of ways where you can intentionally break
things. But I'd say that it's not exactly obvious that CREATE INDEX
can break things.

  DROPing and recreating a new index in --single mode isn't that
  uncommon...
 
 Surely you'd just REINDEX it instead.  Moreover, if it isn't a system
 index already, why are you doing this in --single mode at all?

The last case I had was that an index was corrupted in a way that
autovacuum got stuck on the corrupt index and wasn't killable. Without
single mode it was hard to be fast enough to drop the index before
autovac grabbed the lock again.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Sort

2013-05-13 Thread Simon Riggs
On 13 May 2013 15:28, Noah Misch n...@leadboat.com wrote:

 The heavyweight locking mechanism will need to be aware of the association
 between the master and its workers.

Not sure I can see why that would be true.

ISTM that the workers need to be restricted in various ways from a
full-functioned master. If the workers can do things that conflict
with the master and/or each other, we're going to have all kinds of
hurt.

The key is going to be deciding what the restrictions are and then
working out how to enforce them.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] erroneous restore into pg_catalog schema

2013-05-13 Thread Marko Kreen
On Sat, May 04, 2013 at 10:57:44PM +0200, Dimitri Fontaine wrote:
 Other than adminpack, I know of PGQ installing their objects in
 pg_catalog. They only began doing that when switching to the CREATE
 EXTENSION facility. And they set relocatable to false.

FYI - PgQ and related modules install no objects into pg_catalog.

I used schema='pg_catalog' because I had trouble getting schema='pgq'
to work.  I wanted 'pgq' schema to live and die with extension,
and that was only way I got it to work on 9.1.

-- 
marko



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 9.3 prepare plans - possible incompatibility

2013-05-13 Thread Pavel Stehule
Hello

I am not sure so replanning prepared plans due change of search_path is
expected in all use cases. Minimally it should be mentioned as possible
incompatibility.

Use case - move record from schema A to schema B with prepared statements

SET search_path = 'A';
PREPARE x(int) AS DELETE FROM tab WHERE id = $1 RETURNING *;

SET search_path = 'B';
PREPARE y(int, int) AS INSERT INTO tab VALUES($1, $2);

EXECUTE x;
EXECUTE y(10,20);

This use case will not work in 9.3.

Regards

Pavel


Re: [HACKERS] Add more regression tests for dbcommands

2013-05-13 Thread Fabien COELHO



Would try to revert with a faster script (preferably with minimal
CREATE/DROP).


Yes. I just checked with a few create database/drop database. One cycle 
takes about 1 full second on my laptop, so I must agree that it is slow.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] erroneous restore into pg_catalog schema

2013-05-13 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2013-05-13 14:48:52 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
 DROPing and recreating a new index in --single mode isn't that
 uncommon...

 Surely you'd just REINDEX it instead.  Moreover, if it isn't a system
 index already, why are you doing this in --single mode at all?

 The last case I had was that an index was corrupted in a way that
 autovacuum got stuck on the corrupt index and wasn't killable. Without
 single mode it was hard to be fast enough to drop the index before
 autovac grabbed the lock again.

Meh.  Actually, after looking closer at xlog.c, the OID counter starts
out at FirstBootstrapObjectId, which is not what I'd been thinking.
So a value less than that must indicate wraparound, which presumably
never happens during initdb.  We could just change the code to

if (ShmemVariableCache-nextOid  ((Oid) FirstBootstrapObjectId))
{
/* wraparound while in standalone environment */
ShmemVariableCache-nextOid = FirstNormalObjectId;
ShmemVariableCache-oidCount = 0;
}

which is a bit asymmetric-looking but should do the right thing in all
cases.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Sort

2013-05-13 Thread Kohei KaiGai
2013/5/13 Noah Misch n...@leadboat.com

 * Planner  Similar Issues

 We should decide whether to actually sort in parallel based on the
 comparator
 cost and the data size.  The system currently has no information on
 comparator
 cost: bt*cmp (and indeed almost all built-in functions) all have procost=1,
 but bttextcmp is at least 1000x slower than btint4cmp.  Let's improve the
 procost estimates of all core B-tree and hash operators.  This will have
 other
 benefits, but we will need to be cognizant of the risk of upsetting setups
 that have tuned cpu_operator_cost based on the present situation.

 The choice of whether to parallelize can probably be made a manner similar
 to
 the choice to do an external sort: the planner guesses the outcome for
 costing
 purposes, but the actual decision is made at execution time.  The planner
 would determine a tuple count cutoff at which parallelism becomes
 favorable,
 and tuplesort would check that to establish its actual decision.

 It probably crossovers my problem consciousness to off-load CPU bounds
workloads; that I partially tried to implement on writable foreign table
feature.
Not only sorting stuff, I think it may be worthful to have capability to
push
heavy workload (like sort, aggregate or complex target-list) out external
computing resources.
However, I doubt whether the decision to parallelize should be done in
execution time, rather than plan stage. For example, in case when we
have enough number of records and 10-core multiprocessor, the wise
plan may take parallel data load by 10-processors, partial-sort by 10-
processors individually, then merge-sort. It needs fundamental different
tree structure from the traditional single-processors based plan-tree.
So, it seems to me we should take an enhancement to allow to inject
plan-tree special purpose parallel processing plan node.
How about your opinion?

Thanks,


Re: [HACKERS] Parallel Sort

2013-05-13 Thread Simon Riggs
On 13 May 2013 15:57, Tom Lane t...@sss.pgh.pa.us wrote:

 I think you need to think in terms of restricting the problem space

+1

 One obvious suggestion is to forbid the workers from
 doing any database access of their own at all --- the parent would
 have to do any required catalog lookups for sort functions etc.

+1

 I think we should also seriously think about relying on fork() and
 copy-on-write semantics to launch worker subprocesses, instead of
 explicitly copying so much state over to them.  Yes, this would
 foreclose ever having parallel query on Windows, but that's okay
 with me (hm, now where did I put my asbestos longjohns ...)

If we relied on some kind of inherited state we could easily make the
mistake of relying on something that isn't actually being maintained
correctly in the worker. Luckily (?) that is exactly the stance we
need to make this work on Windows. Other than that, releasing on
Windows in later release sounds sensible, otherwise we'll just delay
the development so much it will still happen in the later timeframe,
just the chance of an earlier release on Linux/BSD will be missed.

For example, the idea of managing separate subtransactions in each
worker sounds nuts. Impressive, if you're already thinking about
parallel DML that can self recover halfway through a statement and
then continue processing, but that's a little advanced. The way to
think about this is as a 10 year journey, not as a single feature.

-1 for forking

 Both of these lines of thought suggest that the workers should *not*
 be full-fledged backends.

+1 to the idea of workers != masters

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-05-13 Thread Simon Riggs
On 13 May 2013 14:45, Jon Nelson jnelson+pg...@jamponi.net wrote:

 I should not derail this thread any further. Perhaps, if interested
 parties would like to discuss the use of fallocate/posix_fallocate, a
 new thread might be more appropriate?

Sounds like a good idea. Always nice to see a fresh take on earlier ideas.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Better handling of archive_command problems

2013-05-13 Thread Peter Geoghegan
The documentation says of continuous archiving:

While designing your archiving setup, consider what will happen if
the archive command fails repeatedly because some aspect requires
operator intervention or the archive runs out of space. For example,
this could occur if you write to tape without an autochanger; when the
tape fills, nothing further can be archived until the tape is swapped.
You should ensure that any error condition or request to a human
operator is reported appropriately so that the situation can be
resolved reasonably quickly. The pg_xlog/ directory will continue to
fill with WAL segment files until the situation is resolved. (If the
file system containing pg_xlog/ fills up, PostgreSQL will do a PANIC
shutdown. No committed transactions will be lost, but the database
will remain offline until you free some space.)

I think that it is not uncommon for archiving to fall seriously
behind, risking a serious loss of availability. When this happens, the
DBA has to fight against the clock to fix whatever problem there is
with continuous archiving, hoping to catch up and prevent a PANIC
shutdown. This is a particularly unpleasant problem to have.

At Heroku, we naturally monitor the state of continuous archiving on
all clusters under our control. However, when faced with this
situation, sometimes the least-worst option to buy time is to throttle
Postgres using a crude mechanism: issuing repeated SIGSTOP and SIGCONT
signals to all Postgres processes, with the exception of the archiver
auxiliary process. Obviously this is a terrible thing to have to do,
principally because it slows almost everything right down. It would be
far preferable to just slow down the writing of WAL segments when
these emergencies arise, since that alone is what risks causing a
PANIC shutdown when XLogWrite() cannot write WAL. Even if the pg_xlog
directory is on the same filesystem as database heap files, it is
obviously the case that throttling WAL will have the effect of
throttling operations that might cause those heap files to be
enlarged. Reads (including the writes that enable reads, like those
performed by the background writer and backends to clean dirty
buffers) and checkpointing are not affected (though of course
checkpointing does have to write checkpoint WAL records, so perhaps
not quite).

What I'd like to propose is that we simply sit on WALWriteLock for a
configured delay in order to throttle the writing (though not the
insertion) of WAL records. I've drafted a patch that does just that -
it has the WAL Writer optionally sleep on the WALWriteLock for some
period of time once per activity cycle (avoiding WAL Writer
hibernation). If this sounds similar to commit_delay, that's because
it is almost exactly the same. We just sleep within the WAL Writer
rather than a group commit leader backend because that doesn't depend
upon some backend hitting the XLogFlush()/commit_delay codepath. In a
bulk loading situation, it's perfectly possible for no backend to
actually hit XLogFlush() with any sort of regularity, so commit_delay
cannot really be abused to do what I describe here. Besides, right now
commit_delay is capped so that it isn't possible to delay for more
than 1/10th of a second.

What I've proposed here has the disadvantage of making activity rounds
of the WAL Writer take longer, thus considerably increasing the window
in which any asynchronous commits will actually make it out to disk.
However, that's a problem that's inherent with any throttling of WAL
Writing as described here (XLogBackgroundFlush() itself acquires
WalWriteLock anyway), so I don't imagine that there's anything that
can be done about that other than having a clear warning. I envisage
this feature as very much a sharp tool to be used by the DBA only when
they are in a very tight bind. Better to at least be able to handle
read queries in the event of having this problem, and to not throttle
longer running transactions with some writes that don't need to make
it out to disk right away. I also have a notion that we can usefully
throttle WAL writing less aggressively than almost or entirely
preventing it. I have an idea that a third party monitoring daemon
could scale up or down the throttling delay as a function of how full
the pg_xlog filesystem is. It might be better to modestly throttle WAL
writing for two hours in order to allow continuous archiving to catch
up, rather than sharply curtailing WAL writing for a shorter period.

Has anyone else thought about approaches to mitigating the problems
that arise when an archive_command continually fails, and the DBA must
manually clean up the mess?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] local_preload_libraries logspam

2013-05-13 Thread Peter Geoghegan
On Tue, May 7, 2013 at 7:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 It seems reasonable to me to reduce it to DEBUG1 level.

Attached patch renders all loaded library... messages DEBUG1,
regardless of whether local_preload_libraries or
shared_preload_libraries is involved, and regardless of EXEC_BACKEND.

-- 
Peter Geoghegan


load_level.2013_05_13.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incomplete description of pg_start_backup?

2013-05-13 Thread Dmitry Koterov
Could you please provide a bit more detailed explanation on how it works?

And how could postgres write at the middle of archiving files during an
active pg_start_backup? if it could, here might be a case when a part of
archived data file contains an overridden information from the future,
while wal files contain only information like i want to write X to Z, not
i've overridden the following X with Y at the Z position. The appending
is much better here, because unexpected appended data from the future may
just be ignored.

On Wednesday, May 1, 2013, Jeff Janes wrote:

 On Tue, Apr 30, 2013 at 3:24 PM, Dmitry Koterov 
 dmi...@koterov.rujavascript:_e({}, 'cvml', 'dmi...@koterov.ru');
  wrote:

 I think that at
 http://www.postgresql.org/docs/current/static/functions-admin.html and
 http://www.postgresql.org/docs/current/static/continuous-archiving.html two
 important points on how pg_start_backup() works are missing:

 1. After pg_start_backup() and till pg_stop_backup() VACUUM is denied
 (e.g. autovacuum is turned off), so the new data is always appended to data
 files, is never written at their middle.


 This is not the case.  Autovacuum continues to run during the backup.



 This allows to archive the data directory using any copying tools (rsync,
 tar, cp etc.). If you forget to call pg_stop_backup() by accident, data
 files will grow forever. So pg_start_backup() switches the database to
 append-only mode which is safe to backup without stopping (data files
 temporarily become append-only, WAL logs are append-only always).


 No, it doesn't work that way.  I don't know why appending would be any
 safer than normal updates would be anyway.  WAL replay fixes up any
 problems that might arise.


 2. After pg_start_backup() and till pg_stop_backup() full_page_writes is
 forced to be ON.


 Effectively yes, this is documented in one of your links above (and is one
 of the reasons vacuuming during the backup is not a problem)

 Cheers,

 Jeff




[HACKERS] Add more regression tests for ASYNC

2013-05-13 Thread Robins Tharakan
Hi,

Patch to add more regression tests to ASYNC (/src/backend/commands/async).
Take code-coverage from 39% to 75%.

Any and all feedback is obviously welcome.

p.s.: Please let me know whether these tests are useless or would not be
committed owing to unrelated reasons. As also, if these tests need to be
clubbed (bundled up in 2-3) and submitted as a single submit.
--
Robins Tharakan


regress_async.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] erroneous restore into pg_catalog schema

2013-05-13 Thread Stephen Frost
* Marko Kreen (mark...@gmail.com) wrote:
 On Sat, May 04, 2013 at 10:57:44PM +0200, Dimitri Fontaine wrote:
  Other than adminpack, I know of PGQ installing their objects in
  pg_catalog. They only began doing that when switching to the CREATE
  EXTENSION facility. And they set relocatable to false.
 
 FYI - PgQ and related modules install no objects into pg_catalog.
 
 I used schema='pg_catalog' because I had trouble getting schema='pgq'
 to work.  I wanted 'pgq' schema to live and die with extension,
 and that was only way I got it to work on 9.1.

I've read through this thread and I think you're the only person here
that I actually agree with..  I like the idea of having a schema that
lives  dies with an extension.  imv, putting random objects (of ANY
kind) into pg_catalog is a bad idea.  Sure, it's convenient because it's
always in your search_path, but that, imv, means we should have a way to
say these schemas are always in the search_path, not that we should
encourage people to dump crap into pg_catalog.

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] PostgreSQL 9.3 beta breaks some extensions make install

2013-05-13 Thread Marti Raudsepp
Hi list,

While testing out PostgreSQL 9.3beta1, I stumbled upon a problem
installing some extensions (pgTAP and semver among others):

% make
[...]
% make DESTDIR=/tmp/foo install
[...]
/usr/bin/install -c -m 644 ./sql/semver--unpackaged--0.2.1.sql
./sql/semver--0.2.4--0.3.0.sql ./sql/semver--0.2.1--0.2.4.sql
./sql/semver--0.3.0.sql ./sql/semver--0.3.0.sql
'/tmp/foo/usr/share/postgresql/extension/'
/usr/bin/install: will not overwrite just-created
‘/tmp/foo/usr/share/postgresql/extension/semver--0.3.0.sql’ with
‘./sql/semver--0.3.0.sql’
make: *** [install] Error 1

I traced the problem down to commit
9db7ccae2000524b72a4052352cbb5407fb53b02 Use system install program
when available and usable. It turns out that 'install' from coreutils
8.21 complains when it's told to install the same source file twice.

It's caused by this common pattern in extension makefiles:
DATA = $(wildcard sql/*--*.sql) sql/$(EXTENSION)--$(EXTVERSION).sql

The wildcard will match semver--0.3.0.sql too, in addition to the
direct reference, causing it to be duplicated (unless make install
is called from a clean directory).

The attached 1st patch fixes this case by putting $(sort ...) around
the DATA install command, which drops duplicate filenames. While there
are other install commands, the chances of them containing duplicate
names seem lower.

I'm not sure if this is worth worrying about, but there's still a
problem when DATA and DATA_built are separated, e.g:
DATA = $(wildcard sql/*--*.sql)
DATA_built = sql/$(EXTENSION)--$(EXTVERSION).sql

This can't be solved with a sort since DATA_built doesn't use
$(addprefix). But the install could be split into 2 separate commands
(2nd patch)

Regards,
Marti


1_pgxs_sort_data.patch
Description: Binary data


2_pgxs_separate_data_built.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logging of PAM Authentication Failure

2013-05-13 Thread Kyotaro HORIGUCHI
 In fact, this is the behavior with all the authentication methods that
 require a password. But, it is only in the case of PAM authentication
 that auth_failed() logs error when first connection attempt is made
 (without password), since the STATUS_EOF is not passed to it in that
 case.

Well, if we are allowed to use a bit ugry way, the attached patch
seems to cope with this issue. As far as I can see there's no
problem since pg_fe_sendauth() refueses to send empty password.

Any suggestions?

 If we did not drop the connection (unlike what we do now) and
 re-attempted connection with the password added to conn, would the
 backend's authentication state still be waiting for the password? Can
 we do away without having to create a second connection?

Sorry, I've read there incorrectly. I had understood the code
after sendAuthRequest in pam_passwd_conv_proc but it is used
indeed.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 3a041d9..304df03 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -1816,6 +1816,7 @@ auth_peer(hbaPort *port)
  * PAM conversation function
  */
 
+static int pam_auth_eof = false;
 static int
 pam_passwd_conv_proc(int num_msg, const struct pam_message ** msg,
 struct pam_response ** resp, void 
*appdata_ptr)
@@ -1824,6 +1825,8 @@ pam_passwd_conv_proc(int num_msg, const struct 
pam_message ** msg,
struct pam_response *reply;
int i;
 
+   pam_auth_eof = false;
+
if (appdata_ptr)
passwd = (char *) appdata_ptr;
else
@@ -1872,6 +1875,7 @@ pam_passwd_conv_proc(int num_msg, const struct 
pam_message ** msg,
 * Client didn't want to send 
password.  We
 * intentionally do not log 
anything about this.
 */
+   pam_auth_eof = true;
goto fail;
}
if (strlen(passwd) == 0)
@@ -1986,6 +1990,16 @@ CheckPAMAuth(Port *port, char *user, char *password)
 
retval = pam_authenticate(pamh, 0);
 
+   /*
+* pam_auth_eof is true when the client disconnected for password 
request
+* in pam_passw_conv().
+*/
+   if (pam_auth_eof)
+   {
+   pam_passwd = NULL;  /* Unset pam_passwd */
+   return STATUS_EOF;
+   }
+
if (retval != PAM_SUCCESS)
{
ereport(LOG,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-05-13 Thread Jon Nelson
Pertinent to another thread titled
[HACKERS] corrupt pages detected by enabling checksums
I hope to explore the possibility of using fallocate (or
posix_fallocate) for new WAL file creation.

Most modern Linux filesystems support fast fallocate/posix_fallocate,
reducing extent fragmentation (where extents are used) and frequently
offering a pretty significant speed improvement. In my tests, using
posix_fallocate (followed by pg_fsync) is at least 28 times quicker
than using the current method (which writes zeroes followed by
pg_fsync).

I have written up a patch to use posix_fallocate in new WAL file
creation, including configuration by way of a GUC variable, but I've
not contributed to the PostgreSQL project before. Therefore, I'm
fairly certain the patch is not formatted properly or conforms to the
appropriate style guides. Currently, the patch is based on 9.2, and is
quite small in size - 3.6KiB.

Advice on how to proceed is appreciated.

--
Jon


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions make install

2013-05-13 Thread Tom Lane
Marti Raudsepp ma...@juffo.org writes:
 While testing out PostgreSQL 9.3beta1, I stumbled upon a problem
 installing some extensions (pgTAP and semver among others):
 ...
 I traced the problem down to commit
 9db7ccae2000524b72a4052352cbb5407fb53b02 Use system install program
 when available and usable. It turns out that 'install' from coreutils
 8.21 complains when it's told to install the same source file twice.

TBH, I thought that was a dangerous idea from the get-go.  My vote for
fixing this would be to revert that change, not try to hack all the
makefiles to work around it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logging of PAM Authentication Failure

2013-05-13 Thread Amit Langote
 Well, if we are allowed to use a bit ugry way, the attached patch
 seems to cope with this issue. As far as I can see there's no
 problem since pg_fe_sendauth() refueses to send empty password.

 Any suggestions?

That seems to do the trick. This probably solves the problem that I
originally posted.

 Sorry, I've read there incorrectly. I had understood the code
 after sendAuthRequest in pam_passwd_conv_proc but it is used
 indeed.

Though, I am still not sure why we drop the existing connection and
start all over again but now with the newly entered password. This
kind of seems to leave the protocol state machine (as in
PQconnectPoll() ) halfway (after pg_fe_sendauth() failed) in the first
connection attempt for the auth requests requiring the password (or
others, too?). Although, sticking to this design may have to do with
the problems of doing otherwise that I am unaware of.


--
Amit Langote


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions make install

2013-05-13 Thread Peter Eisentraut
On Tue, 2013-05-14 at 04:12 +0300, Marti Raudsepp wrote:
 It's caused by this common pattern in extension makefiles:
 DATA = $(wildcard sql/*--*.sql) sql/$(EXTENSION)--$(EXTVERSION).sql
 
What is the point of this?  Why have the wildcard and then the
non-wildcard term?

I think using wildcard is bad style and you should fix it.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] commit fest schedule for 9.4

2013-05-13 Thread Peter Eisentraut
In the last two years, the first commit fest started in June, which is
about a month from now.  If we are going to do that again, we should
clarify that as soon as possible.  And if we are not, then we should
also clarify that, because some people are probably expecting that we
are.

So, any thoughts on the commit fest schedule, any need for deviations,
or same old?




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] commit fest schedule for 9.4

2013-05-13 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 In the last two years, the first commit fest started in June, which is
 about a month from now.  If we are going to do that again, we should
 clarify that as soon as possible.  And if we are not, then we should
 also clarify that, because some people are probably expecting that we
 are.

 So, any thoughts on the commit fest schedule, any need for deviations,
 or same old?

Usually this is settled at the PGCon developers' meeting ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fallocate / posix_fallocate for new WAL file creation (etc...)

2013-05-13 Thread David Fetter
On Mon, May 13, 2013 at 08:54:39PM -0500, Jon Nelson wrote:
 Pertinent to another thread titled
 [HACKERS] corrupt pages detected by enabling checksums
 I hope to explore the possibility of using fallocate (or
 posix_fallocate) for new WAL file creation.
 
 Most modern Linux filesystems support fast fallocate/posix_fallocate,
 reducing extent fragmentation (where extents are used) and frequently
 offering a pretty significant speed improvement. In my tests, using
 posix_fallocate (followed by pg_fsync) is at least 28 times quicker
 than using the current method (which writes zeroes followed by
 pg_fsync).
 
 I have written up a patch to use posix_fallocate in new WAL file
 creation, including configuration by way of a GUC variable, but I've
 not contributed to the PostgreSQL project before. Therefore, I'm
 fairly certain the patch is not formatted properly or conforms to the
 appropriate style guides. Currently, the patch is based on 9.2, and is
 quite small in size - 3.6KiB.
 
 Advice on how to proceed is appreciated.

Thanks for hopping in!

Please re-base the patch vs. git master, as new features like this go
there.  Please also to send along the tests you're doing so others can
riff.  Tests that find any weak points are also good.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions make install

2013-05-13 Thread Andrew Dunstan


On 05/13/2013 10:27 PM, Peter Eisentraut wrote:

On Tue, 2013-05-14 at 04:12 +0300, Marti Raudsepp wrote:

It's caused by this common pattern in extension makefiles:
DATA = $(wildcard sql/*--*.sql) sql/$(EXTENSION)--$(EXTVERSION).sql


What is the point of this?  Why have the wildcard and then the
non-wildcard term?

I think using wildcard is bad style and you should fix it.



I've been caught by it. I'm not sure why the wildcard is a bad idea - 
don't we want any present update sql files to be installed? . I don't 
see any reason to have the redundant addition of the current version's 
sql file, though.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] commit fest schedule for 9.4

2013-05-13 Thread Peter Eisentraut
On Mon, 2013-05-13 at 22:38 -0400, Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
  In the last two years, the first commit fest started in June, which is
  about a month from now.  If we are going to do that again, we should
  clarify that as soon as possible.  And if we are not, then we should
  also clarify that, because some people are probably expecting that we
  are.
 
  So, any thoughts on the commit fest schedule, any need for deviations,
  or same old?
 
 Usually this is settled at the PGCon developers' meeting ...

But it doesn't have to, does it?

If there is nothing to change then we might as well settle it here and
save the meeting for more interesting things.  But if someone wants to
argue for adjustments, it'd be better to collect arguments here rather
than argue all of it out in 15 mins or so without any preparation.

(The same goes for all agenda items in that meeting.  At one point there
was a guideline to discuss items on the mailing list before the
meeting.)




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Better handling of archive_command problems

2013-05-13 Thread Daniel Farina
On Mon, May 13, 2013 at 3:02 PM, Peter Geoghegan p...@heroku.com wrote:
 Has anyone else thought about approaches to mitigating the problems
 that arise when an archive_command continually fails, and the DBA must
 manually clean up the mess?

Notably, the most common problem in this vein suffered at Heroku has
nothing to do with archive_command failing, and everything to do with
the ratio of block device write performance (hence, backlog) versus
the archiving performance.  When CPU is uncontended it's not a huge
deficit, but it is there and it causes quite a bit of stress.

Archive commands failing are definitely a special case there, where it
might be nice to bring write traffic to exactly zero for a time.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Sort

2013-05-13 Thread Michael Paquier
On Mon, May 13, 2013 at 11:28 PM, Noah Misch n...@leadboat.com wrote:

 * Identifying Parallel-Compatible Functions

 Not all functions can reasonably run on a worker backend.  We should not
 presume that a VOLATILE function can tolerate the unstable execution order
 imposed by parallelism, though a function like clock_timestamp() is
 perfectly
 reasonable to run that way.  STABLE does not have that problem, but neither
 does it constitute a promise that the function implementation is compatible
 with parallel execution.  Consider xid_age(), which would need code
 changes to
 operate correctly in parallel.  IMMUTABLE almost guarantees enough; there
 may
 come a day when all IMMUTABLE functions can be presumed parallel-safe.  For
 now, an IMMUTABLE function could cause trouble by starting a (read-only)
 subtransaction.  The bottom line is that parallel-compatibility needs to be
 separate from volatility classes for the time being.

I am not sure that this problem is only limited to functions, but to all
the expressions
and clauses of queries that could be shipped and evaluated on the worker
backends when
fetching tuples that could be used to accelerate a parallel sort. Let's
imagine for example
the case of a LIMIT clause that can be used by worker backends to limit the
number of tuples
to sort as final result.
In some ways, Postgres-XC has faced (and is still facing) similar
challenges and they have
been partially solved.

I'm not sure what the specific answer here should look like.  Simply having
 a
 CREATE FUNCTION ... PARALLEL_IS_FINE flag is not entirely satisfying,
 because
 the rules are liable to loosen over time.

Having a flag would be enough to control parallelism, but cannot we also
determine if
the execution of a function can be shipped safely to a worker based on its
volatility
only? Immutable functions are presumably safe as they do not modify the
database state
and give always the same result, volatile and stable functions are
definitely not safe.
For such reasons, it would be better to keep things simple and rely on
simple rules to
determine if a given expression can be executed safely on a backend worker.
-- 
Michael