Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-01-27 Thread Kyotaro HORIGUCHI
Hello, This is a new version of the patch formerly known as
'alternative storage for libpq'.

- Changed the concept to 'Alternative Row Processor' from
  'Storage handler'. Symbol names are also changed.

- Callback function is modified following to the comment.

- From the restriction of time, I did minimum check for this
  patch. The purpose of this patch is to show the new implement.

- Proformance is not measured for this patch for the same
  reason. I will do that on next monday.

- The meaning of PGresAttValue is changed. The field 'value' now
  contains a value withOUT terminating zero. This change seems to
  have no effect on any other portion within the whole source
  tree of postgresql from what I've seen.


  I would like to propose better one-shot API with:
  
  void *(*RowStoreHandler)(PGresult *res, PGresAttValue *columns);
...
  1) Pass-through processing do not need to care about unnecessary
 per-row allocations.
  
  2) Handlers that want to copy of the row (like regular libpq),
 can optimize allocations by having global view of the row.
 (Eg. One allocation for row header + data).

I expect the new implementation is far more better than the
orignal.

regargs,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 1af8df6..c47af3a 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -160,3 +160,6 @@ PQconnectStartParams  157
 PQping158
 PQpingParams  159
 PQlibVersion  160
+PQregisterRowProcessor	  161
+PQgetRowProcessorParam	  163
+PQsetRowProcessorErrMes	  164
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index d454538..93803d5 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2692,6 +2692,7 @@ makeEmptyPGconn(void)
 	conn-allow_ssl_try = true;
 	conn-wait_ssl_try = false;
 #endif
+	conn-rowProcessor = NULL;
 
 	/*
 	 * We try to send at least 8K at a time, which is the usual size of pipe
@@ -5076,3 +5077,10 @@ PQregisterThreadLock(pgthreadlock_t newhandler)
 
 	return prev;
 }
+
+void
+PQregisterRowProcessor(PGconn *conn, RowProcessor func, void *param)
+{
+	conn-rowProcessor = func;
+	conn-rowProcessorParam = param;
+}
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index b743566..5d78b39 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -66,7 +66,7 @@ static PGresult *PQexecFinish(PGconn *conn);
 static int PQsendDescribe(PGconn *conn, char desc_type,
 			   const char *desc_target);
 static int	check_field_number(const PGresult *res, int field_num);
-
+static void *pqAddTuple(PGresult *res, PGresAttValue *columns);
 
 /* 
  * Space management for PGresult.
@@ -160,6 +160,9 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
 	result-curBlock = NULL;
 	result-curOffset = 0;
 	result-spaceLeft = 0;
+	result-rowProcessor = pqAddTuple;
+	result-rowProcessorParam = NULL;
+	result-rowProcessorErrMes = NULL;
 
 	if (conn)
 	{
@@ -194,6 +197,12 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
 			}
 			result-nEvents = conn-nEvents;
 		}
+
+		if (conn-rowProcessor)
+		{
+			result-rowProcessor = conn-rowProcessor;
+			result-rowProcessorParam = conn-rowProcessorParam;
+		}
 	}
 	else
 	{
@@ -445,7 +454,7 @@ PQsetvalue(PGresult *res, int tup_num, int field_num, char *value, int len)
 		}
 
 		/* add it to the array */
-		if (!pqAddTuple(res, tup))
+		if (pqAddTuple(res, tup) == NULL)
 			return FALSE;
 	}
 
@@ -701,7 +710,6 @@ pqClearAsyncResult(PGconn *conn)
 	if (conn-result)
 		PQclear(conn-result);
 	conn-result = NULL;
-	conn-curTuple = NULL;
 }
 
 /*
@@ -756,7 +764,6 @@ pqPrepareAsyncResult(PGconn *conn)
 	 */
 	res = conn-result;
 	conn-result = NULL;		/* handing over ownership to caller */
-	conn-curTuple = NULL;		/* just in case */
 	if (!res)
 		res = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
 	else
@@ -829,12 +836,17 @@ pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
 
 /*
  * pqAddTuple
- *	  add a row pointer to the PGresult structure, growing it if necessary
- *	  Returns TRUE if OK, FALSE if not enough memory to add the row
+ *	  add a row to the PGresult structure, growing it if necessary
+ *	  Returns the pointer to the new tuple if OK, NULL if not enough
+ *	  memory to add the row.
  */
-int
-pqAddTuple(PGresult *res, PGresAttValue *tup)
+void *
+pqAddTuple(PGresult *res, PGresAttValue *columns)
 {
+	PGresAttValue *tup;
+	int nfields = res-numAttributes;
+	int i;
+
 	if (res-ntups = res-tupArrSize)
 	{
 		/*
@@ -858,13 +870,39 @@ pqAddTuple(PGresult *res, PGresAttValue *tup)
 			newTuples = (PGresAttValue **)
 realloc(res-tuples, newSize * sizeof(PGresAttValue *));
 		if (!newTuples)
-			return FALSE;		/* malloc or realloc failed */
+			return NULL;		/* malloc or realloc failed */
 		

Re: [HACKERS] Inline Extension

2012-01-27 Thread Peter van Hardenberg
On Thu, Jan 26, 2012 at 3:48 PM, David E. Wheeler da...@justatheory.com wrote:
 On Jan 26, 2012, at 9:40 AM, Dimitri Fontaine wrote:

 Not for 9.2, but I can't help thinking that if we could manage to host
 the .so module itself in the catalogs, we could solve updating it in a
 transactional way and more importantly host it per-database, rather than
 having the modules work per major version (not even per cluster) and the
 extension mechanism work per-database inside each cluster. But that's
 work for another release.

 +1 Cloud vendors will *love* this.


Confirmed.

Let me share my perspective. I'll begin by describing the current
state of runtime code dependency management for comparison.

In Ruby, any user can push an application to our platform which relies
on any/every ruby gem ever released (give or take). These gems may
require exact releases of other gems, have elaborate system
dependencies, and/or natively compiled code components. This is thanks
to the rubygems.org repository, the gem system, and recently but
crucially, the bundler system for resolving and isolating
dependencies. Releasing a new gem takes moments and I have personally
released a half dozen of no real consequence to the world which I use
from time to time.

In contrast, the idea that any person or team of people could possibly
review the literally hundreds of gems released each day is no longer
plausible. The only feasible solution for providing a robust service
is to engineer a solution which can be operated from inside the
cluster to install any library whatsoever.

Our aim is, put simply, to be able to support every extension in the
world, at once, under cascading replication, across major catalogue
upgrades. We hope this ideal is shared by the community at large,
since our problems are generally the same as other users, just writ
large.

-pvh

PS: As an aside, because of the many problems with in-cluster
multi-tenancy (to pick just one example, resource isolation between
users) I have no security concerns with giving users every ability to
execute code as the cluster owner's UNIX user. On our service we do
not restrict our users access to superuser out of spite, but to reduce
the available surface area for self-destruction.

-- 
Peter van Hardenberg
San Francisco, California
Everything was beautiful, and nothing hurt. -- Kurt Vonnegut

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: ALTER TABLE IF EXISTS

2012-01-27 Thread Dean Rasheed
On 23 January 2012 20:14, Pavel Stehule pavel.steh...@gmail.com wrote:
 Hello

 2012/1/23 Robert Haas robertmh...@gmail.com:
 On Tue, Jan 3, 2012 at 2:49 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 jup, we can continue in enhancing step by step.

 I change a patch and now ALTER TABLE, ALTER INDEX, ALTER SEQUENCE and
 ALTER VIEW has IF EXISTS clause

 ALTER FOREIGN TABLE should be parallel as well, I think.


 refreshed + ALTER FOREIGN TABLE IF EXISTS ... support


I just noticed this copy-and-paste error in the ALTER FOREIGN TABLE docs:

IF EXISTS:

  Do not throw an error if the sequence does not exist. A notice is issued
  in this case.

That should be foreign table not sequence.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inline Extension

2012-01-27 Thread Cédric Villemain
 So I'm going to prepare the next version of the patch with this design:

 - in catalog extension scripts for inline extension

   pg_extension_script(extoid, oldversion, version, script)

   oldversion is null when create extension is used
   unless when using the create extension from 'unpackaged' form

 Would you keep all the migration scripts used over time to upgrade from one 
 version to another?

yes, that is the idea. You then can play all the stack of SQL needed
to go from version foo to version bar (we don't care about version
format, here again).

 - same as we have -t, add -e --extension to pg_dump so that you can
   choose to dump only a given extension

 Also --exclude-extension?

It might be the default.
We need something to dump the content of
pg_catalog.pg_extension_script (or whatever table is going to contain
SQL code), per extension or all.

-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: ALTER TABLE IF EXISTS

2012-01-27 Thread Heikki Linnakangas

On 27.01.2012 11:57, Dean Rasheed wrote:

I just noticed this copy-and-paste error in the ALTER FOREIGN TABLE docs:

IF EXISTS:

   Do not throw an error if the sequence does not exist. A notice is issued
   in this case.

That should be foreign table not sequence.


Thanks, fixed.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-27 Thread Robert Haas
On Thu, Jan 26, 2012 at 7:01 PM, Dan Scales sca...@vmware.com wrote:
 I'm not sure why you moved the checksum calculation (PageSetVerificationInfo) 
 to mdwrite() rather than smgrwrite().  If there were every another storage 
 backend, it would have to duplicate the checksum check, right?  Is there a 
 disadvantage to putting it in smgrwrite()?

The smgr and md layers don't currently know anything about the page
format, and I imagine we want to keep it that way.  It seems like the
right place for this is in some higher layer, like bufmgr.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Group commit, revised

2012-01-27 Thread Heikki Linnakangas

On 26.01.2012 04:10, Robert Haas wrote:

On Wed, Jan 25, 2012 at 3:11 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

Attached is a patch to do that. It adds a new mode to
LWLockConditionalAcquire(), LW_EXCLUSIVE_BUT_WAIT. If the lock is free, it
is acquired and the function returns immediately. However, unlike normal
LWLockConditionalAcquire(), if the lock is not free it goes to sleep until
it is released. But unlike normal LWLockAcquire(), when the lock is
released, the function returns *without* acquiring the lock.
 ...


I think you should break this off into a new function,
LWLockWaitUntilFree(), rather than treating it as a new LWLockMode.
Also, instead of adding lwWaitOnly, I would suggest that we generalize
lwWaiting and lwExclusive into a field lwWaitRequest, which can be set
to 1 for exclusive, 2 for shared, 3 for wait-for-free, etc.  That way
we don't have to add another boolean every time someone invents a new
type of wait - not that that should hopefully happen very often.  A
side benefit of this is that you can simplify the test in
LWLockRelease(): keep releasing waiters until you come to an exclusive
waiter, then stop.  This possibly saves one shared memory fetch and
subsequent test instruction, which might not be trivial - all of this
code is extremely hot.


Makes sense. Attached is a new version, doing exactly that.


 We probably want to benchmark this carefully
to make sure that it doesn't cause a performance regression even just
from this:

+   if (!proc-lwWaitOnly)
+   lock-releaseOK = false;

I know it sounds crazy, but I'm not 100% sure that that additional
test there is cheap enough not to matter.  Assuming it is, though, I
kind of like the concept: I think there must be other places where
these semantics are useful.


Yeah, we have to be careful with any overhead in there, it can be a hot 
spot. I wouldn't expect any measurable difference from the above, 
though. Could I ask you to rerun the pgbench tests you did recently with 
this patch? Or can you think of a better test for this?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** a/src/backend/access/transam/twophase.c
--- b/src/backend/access/transam/twophase.c
***
*** 327,333  MarkAsPreparing(TransactionId xid, const char *gid,
  	proc-databaseId = databaseid;
  	proc-roleId = owner;
  	proc-lwWaiting = false;
! 	proc-lwExclusive = false;
  	proc-lwWaitLink = NULL;
  	proc-waitLock = NULL;
  	proc-waitProcLock = NULL;
--- 327,333 
  	proc-databaseId = databaseid;
  	proc-roleId = owner;
  	proc-lwWaiting = false;
! 	proc-lwWaitMode = 0;
  	proc-lwWaitLink = NULL;
  	proc-waitLock = NULL;
  	proc-waitProcLock = NULL;
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 2118,2140  XLogFlush(XLogRecPtr record)
  	/* initialize to given target; may increase below */
  	WriteRqstPtr = record;
  
! 	/* read LogwrtResult and update local state */
  	{
  		/* use volatile pointer to prevent code rearrangement */
  		volatile XLogCtlData *xlogctl = XLogCtl;
  
  		SpinLockAcquire(xlogctl-info_lck);
  		if (XLByteLT(WriteRqstPtr, xlogctl-LogwrtRqst.Write))
  			WriteRqstPtr = xlogctl-LogwrtRqst.Write;
  		LogwrtResult = xlogctl-LogwrtResult;
  		SpinLockRelease(xlogctl-info_lck);
- 	}
  
! 	/* done already? */
! 	if (!XLByteLE(record, LogwrtResult.Flush))
! 	{
! 		/* now wait for the write lock */
! 		LWLockAcquire(WALWriteLock, LW_EXCLUSIVE);
  		LogwrtResult = XLogCtl-Write.LogwrtResult;
  		if (!XLByteLE(record, LogwrtResult.Flush))
  		{
--- 2118,2160 
  	/* initialize to given target; may increase below */
  	WriteRqstPtr = record;
  
! 	/*
! 	 * Now wait until we get the write lock, or someone else does the
! 	 * flush for us.
! 	 */
! 	for (;;)
  	{
  		/* use volatile pointer to prevent code rearrangement */
  		volatile XLogCtlData *xlogctl = XLogCtl;
  
+ 		/* read LogwrtResult and update local state */
  		SpinLockAcquire(xlogctl-info_lck);
  		if (XLByteLT(WriteRqstPtr, xlogctl-LogwrtRqst.Write))
  			WriteRqstPtr = xlogctl-LogwrtRqst.Write;
  		LogwrtResult = xlogctl-LogwrtResult;
  		SpinLockRelease(xlogctl-info_lck);
  
! 		/* done already? */
! 		if (XLByteLE(record, LogwrtResult.Flush))
! 			break;
! 
! 		/*
! 		 * Try to get the write lock. If we can't get it immediately, wait
! 		 * until it's released, and recheck if we still need to do the flush
! 		 * or if the backend that held the lock did it for us already. This
! 		 * helps to maintain a good rate of group committing when the system
! 		 * is bottlenecked by the speed of fsyncing.
! 		 */
! 		if (!LWLockWaitUntilFree(WALWriteLock, LW_EXCLUSIVE))
! 		{
! 			/*
! 			 * The lock is now free, but we didn't acquire it yet. Before we
! 			 * do, loop back to check if someone else flushed the record for
! 			 * us already.
! 			 */
! 			continue;
! 		}
! 		/* Got the 

Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-01-27 Thread Robert Haas
On Thu, Jan 26, 2012 at 11:36 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 I'm not surprised that you weren't able to measure a performance
 regression from the binary bloat.  Any such regression is bound to be
 very small and probably quite difficult to notice most of the time;
 it's really the cumulative effect of many binary-size-increasing
 changes we're worried about, not each individual one.  Certainly,
 trying to shrink the binary is micro-optimimzation at its finest, but
 then so is inlining comparators.  I don't think it can be
 realistically argued that we can increasing the size of the binary
 arbitrarily will never get us in trouble, much like (for a typical
 American family) spending $30 to have dinner at a cheap resteraunt
 will never break the budget.  But if you do it every day, it gets
 expensive (and fattening).

 Sure. At the risk of stating the obvious, and of repeating myself, I
 will point out that the true cost of increasing the size of the binary
 is not necessarily linear - it's a complex equation. I hope that this
 doesn't sound flippant, but if some naive person were to look at just
 the increasing binary size of Postgres and its performance in each
 successive release, they might conclude that there was a positive
 correlation between the two (since we didn't add flab to the binary,
 but muscle that pulls its own weight and then some).

I completely agree.  So the point is that, when faced a patch that
adds an atypically large number of CPU instructions, we ought to ask
ourselves whether those instructions are pulling their weight.  By way
of comparison, the latest posted version of Tom's generalized
index-only paths patch adds 14kB to the resulting executable (on my
Mac).  Yours adds 55kB.  We might ask ourselves whether the benefits
of this patch are four times greater than the benefits of Tom's patch.
 It's pretty hard to compare them directly since they're doing very
different things, and all of this is completely subjective, but I
doubt it.  On the other hand, there is no rule that every byte of code
that gets committed must be made of solid gold, either.  So, once
again: judgement call.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Group commit, revised

2012-01-27 Thread Robert Haas
On Fri, Jan 27, 2012 at 8:35 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Yeah, we have to be careful with any overhead in there, it can be a hot
 spot. I wouldn't expect any measurable difference from the above, though.
 Could I ask you to rerun the pgbench tests you did recently with this patch?
 Or can you think of a better test for this?

I can't do so immediately, because I'm waiting for Nate Boley to tell
me I can go ahead and start testing on that machine again.  But I will
do it once I get the word.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-01-27 Thread Peter Geoghegan
Uh, obviously I meant causal relationship and not correlation.

On 27 January 2012 13:37, Robert Haas robertmh...@gmail.com wrote:
 I completely agree.  So the point is that, when faced a patch that
 adds an atypically large number of CPU instructions, we ought to ask
 ourselves whether those instructions are pulling their weight.  By way
 of comparison, the latest posted version of Tom's generalized
 index-only paths patch adds 14kB to the resulting executable (on my
 Mac).  Yours adds 55kB.  We might ask ourselves whether the benefits
 of this patch are four times greater than the benefits of Tom's patch.
  It's pretty hard to compare them directly since they're doing very
 different things, and all of this is completely subjective, but I
 doubt it.  On the other hand, there is no rule that every byte of code
 that gets committed must be made of solid gold, either.  So, once
 again: judgement call.

Well, I don't think it's all that subjective - it's more the case that
it is just difficult, or it gets that way as you consider more
specialisations.

As for what types/specialisations may not make the cut, I'm
increasingly convinced that floats (in the following order: float4,
float8) should be the first to go. Aside from the fact that we cannot
use their specialisations for anything like dates and timestamps,
floats are just way less useful than integers in the context of
database applications, or at least those that I've been involved with.
As important as floats are in the broad context of computing, it's
usually only acceptable to store data in a database as floats within
scientific applications, and only then when their limitations are
well-understood and acceptable. I think we've all heard anecdotes at
one time or another, involving their limitations not being well
understood.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-01-27 Thread Robert Haas
On Fri, Jan 27, 2012 at 9:27 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 Well, I don't think it's all that subjective - it's more the case that
 it is just difficult, or it gets that way as you consider more
 specialisations.

Sure it's subjective.  Two well-meaning people could have different
opinions without either of them being wrong.  If you do a lot of
small, in-memory sorts, more of this stuff is going to seem worthwhile
than if you don't.

 As for what types/specialisations may not make the cut, I'm
 increasingly convinced that floats (in the following order: float4,
 float8) should be the first to go. Aside from the fact that we cannot
 use their specialisations for anything like dates and timestamps,
 floats are just way less useful than integers in the context of
 database applications, or at least those that I've been involved with.
 As important as floats are in the broad context of computing, it's
 usually only acceptable to store data in a database as floats within
 scientific applications, and only then when their limitations are
 well-understood and acceptable. I think we've all heard anecdotes at
 one time or another, involving their limitations not being well
 understood.

While we're waiting for anyone else to weigh in with an opinion on the
right place to draw the line here, do you want to post an updated
patch with the changes previously discussed?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dry-run mode for pg_archivecleanup

2012-01-27 Thread Robert Haas
On Sun, Jan 15, 2012 at 5:05 PM, Josh Kupershmidt schmi...@gmail.com wrote:
 On Sun, Jan 15, 2012 at 3:02 PM, Gabriele Bartolini
 gabriele.bartol...@2ndquadrant.it wrote:

 My actual intention was to have the filename as output of the command, in
 order to easily pipe it to another script. Hence my first choice was to
 use the stdout channel, considering also that pg_archivecleanup in dry-run
 mode is harmless and does not touch the content of the directory.

 Oh, right - I should have re-read your initial email before diving
 into the patch. That all makes sense given your intended purpose. I
 guess your goal of constructing some simple way to pass the files
 which would be removed on to another script is a little different than
 what I initially thought the patch would be useful for, namely as a
 testing/debugging aid for an admin.

 Perhaps both goals could be met by making use of '--debug' together
 with '--dry-run'. If they are both on, then an additional message like
 pg_archivecleanup: would remove file ...  would be printed to
 stderr, along with just the filename printed to stdout you already
 have.

This email thread seems to have trailed off without reaching a
conclusion.  The patch is marked as Waiting on Author in the
CommitFest application, but I'm not sure that's accurate.  Can we try
to nail this down?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Caching for stable expressions with constant arguments v6

2012-01-27 Thread Robert Haas
On Mon, Jan 16, 2012 at 12:06 PM, Marti Raudsepp ma...@juffo.org wrote:
 Here's v6 of my expression caching patch.

The patch is not attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-01-27 Thread Merlin Moncure
On Fri, Jan 27, 2012 at 2:57 AM, Kyotaro HORIGUCHI
horiguchi.kyot...@oss.ntt.co.jp wrote:
 Hello, This is a new version of the patch formerly known as
 'alternative storage for libpq'.

I took a quick look at the patch and the docs.  Looks good and agree
with rationale and implementation.   I see you covered the pqsetvalue
case which is nice.  I expect libpq C api clients coded for
performance will immediately gravitate to this api.

 - The meaning of PGresAttValue is changed. The field 'value' now
  contains a value withOUT terminating zero. This change seems to
  have no effect on any other portion within the whole source
  tree of postgresql from what I've seen.

This is a minor point of concern.  This function was exposed to
support libpqtypes (which your stuff compliments very nicely by the
way) and I quickly confirmed removal of the null terminator didn't
cause any problems there.  I doubt anyone else is inspecting the
structure directly (also searched the archives and didn't find
anything).

This needs to be advertised very loudly in the docs -- I understand
why this was done but it's a pretty big change in the way the api
works.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-01-27 Thread Marko Kreen
On Fri, Jan 27, 2012 at 05:57:01PM +0900, Kyotaro HORIGUCHI wrote:
 Hello, This is a new version of the patch formerly known as
 'alternative storage for libpq'.
 
 - Changed the concept to 'Alternative Row Processor' from
   'Storage handler'. Symbol names are also changed.
 
 - Callback function is modified following to the comment.
 
 - From the restriction of time, I did minimum check for this
   patch. The purpose of this patch is to show the new implement.
 
 - Proformance is not measured for this patch for the same
   reason. I will do that on next monday.
 
 - The meaning of PGresAttValue is changed. The field 'value' now
   contains a value withOUT terminating zero. This change seems to
   have no effect on any other portion within the whole source
   tree of postgresql from what I've seen.


I think we have general structure in place.  Good.

Minor notes:

= rowhandler api =

* It returns bool, so void* is wrong.  Instead libpq style is to use int,
  with 1=OK, 0=Failure.  Seems that was also old pqAddTuple() convention.

* Drop PQgetRowProcessorParam(), instead give param as argument.

* PQsetRowProcessorErrMes() should strdup() the message.  That gets rid of
  allocator requirements in API.  This also makes safe to pass static
  strings there.  If strdup() fails, fall back to generic no-mem message.

* Create new struct to replace PGresAttValue for rowhandler usage.
  RowHandler API is pretty unique and self-contained.  It should have
  it's own struct.  Main reason is that it allows to properly document it.
  Otherwise the minor details get lost as they are different from
  libpq-internal usage.  Also this allows two structs to be
  improved separately.  (PGresRawValue?)

* Stop storing null_value into -value.  It's libpq internal detail.
  Instead the -value should always point into buffer where the value
  info is located, even for NULL.  This makes safe to simply subtract
  pointers to get row size estimate. Seems pqAddTuple() already
  does null_value logic, so no need to do it in rowhandler api.

= libpq =

Currently its confusing whether rowProcessor can be NULL, and what
should be done if so.  I think its better to fix usage so that
it is always set.

* PQregisterRowProcessor() should use default func if func==NULL.
  and set default handler if so.
* Never set rowProcessor directly, always via PQregisterRowProcessor()
* Drop all if(rowProcessor) checks.

= dblink =

* There are malloc failure checks missing in initStoreInfo()  storeHandler().


-- 
marko


PS.  You did not hear it from me, but most raw values are actually
nul-terminated in protocol.  Think big-endian.  And those which
are not, you can make so, as the data is not touched anymore.
You cannot do it for last value, as next byte may not be allocated.
But you could memmove() it lower address so you can null-terminate.

I'm not suggesting it for official patch, but it would be fun to know
if such hack is benchmarkable, and benchmarkable on realistic load.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Unreliable pg_ctl -w start again

2012-01-27 Thread MauMau

Hello,

Last year, I asked for your opinions about how to fix the bug of unreliable 
pg_ctl -w start, as in the thread:


http://archives.postgresql.org/pgsql-hackers/2011-05/msg01407.php

The phenomenon was that pg_ctl -w start did not return for 60 seconds when 
postgresql.conf contained a wrong parameter specification.


Recently, I've encountered another problem of pg_ctl -w start, which I 
cannot reliably avoid. I found the cause in pg_ctl.c. I'm willing to create 
a patch, but I'm concerned about the correctness of the fix. I desire this 
bug will be eliminated as soon as possible. I'd like to ask your opinions.



[Problem]
I use PostgreSQL 8.3.12 embedded in a packaged application. The problem 
occurred on RHEL5 when the operating system was starting up. The packaged 
application is started from /etc/init.d/myapp. That application internally 
executes pg_ctl -w start and checks its return value. The application does 
not start unless the return value is 0.


The problematic phenomenon is that pg_ctl -w start fails with return value 
1 in only two seconds without waiting until 60 seconds pass. That is, -w did 
not work. However, the database server started successfully.


The timeline was as follows:

18:09:45 the application executed pg_ctl -w start
18:09:47 pg_ctl -w start returned with 1

PostgreSQL's server log (dates are intentionally eliminated)
18:10:01 JST 22995  LOG:  database system was interrupted;last known up at 
2012-01-21 02:24:59 JST
18:10:32 JST 22995  LOG:  database system was not properly shut down; 
automatic recovery in progress

18:10:34 JST 22995  LOG:  record with zero length at 0/23E35D4
18:10:34 JST 22995  LOG:  redo is not required
18:11:38 JST 22893  LOG:  database system is ready to accept connections
18:11:38 JST 23478  LOG:  autovacuum launcher started

PostgreSQL took a long time to start. This is probably because the system 
load was high with many processes booting up concurrently during OS boot.



[Cause]
The following part in do_start() of pg_ctl.c contains a bug:

if (old_pid != 0)
{
 pg_usleep(100);
 pid = get_pgpid();
 if (pid == old_pid)
 {
  write_stderr(_(%s: could not start server\n
Examine the log output.\n),
  progname);
  exit(1);
 }
}

This part assumes that postmaster will overwrite postmaster.pid within a 
second. This assumption is not correct under heavy load like OS startup.


In PostgreSQL 9.1, the wait processing is largely modified. However, the 
same assumption seems to still remain, though the duration is 5 seconds. 5 
seconds of wait is probably insufficient for my case. I think no fixed 
duration is appropriate.



[Solution]
So, what is the reliable solution? The pipe-based one, which I proposed in 
the past thread, would be reliable. However, that is not simple enough to 
back-port to 8.3.


How about inserting postmaster_is_alive() as below? I know this is not 
perfect, but this will work in most cases. I need some solution that 
pratically helps.


if (old_pid != 0)
{
 pg_usleep(100);
 pid = get_pgpid();
 if (pid == old_pid  postmaster_is_alive(pid))
 {
  write_stderr(_(%s: could not start server\n
Examine the log output.\n),
  progname);
  exit(1);
 }
}


Regards
MauMau


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-01-27 Thread Marko Kreen
On Fri, Jan 27, 2012 at 09:35:04AM -0600, Merlin Moncure wrote:
 On Fri, Jan 27, 2012 at 2:57 AM, Kyotaro HORIGUCHI
  - The meaning of PGresAttValue is changed. The field 'value' now
   contains a value withOUT terminating zero. This change seems to
   have no effect on any other portion within the whole source
   tree of postgresql from what I've seen.
 
 This is a minor point of concern.  This function was exposed to
 support libpqtypes (which your stuff compliments very nicely by the
 way) and I quickly confirmed removal of the null terminator didn't
 cause any problems there.  I doubt anyone else is inspecting the
 structure directly (also searched the archives and didn't find
 anything).
 
 This needs to be advertised very loudly in the docs -- I understand
 why this was done but it's a pretty big change in the way the api
 works.

Note that the non-NUL-terminated PGresAttValue is only used for row
handler.  So no existing usage is affected.

But I agree using same struct in different situations is confusing,
thus the request for separate struct for row handler usage.

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for parallel pg_dump

2012-01-27 Thread Robert Haas
On Sun, Jan 15, 2012 at 1:01 PM, Joachim Wieland j...@mcknight.de wrote:
 So this is the parallel pg_dump patch, generalizing the existing
 parallel restore and allowing parallel dumps for the directory archive
 format, the patch works on Windows and Unix.

This patch introduces a large amount of notational churn, as perhaps
well-illustrated by this hunk:

  static int
! dumpBlobs(Archive *AHX, void *arg)
  {
+   /*
+* This is a data dumper routine, executed in a child for
parallel backup,
+* so it must not access the global g_conn but AH-connection instead.
+*/
+   ArchiveHandle *AH = (ArchiveHandle *) AHX;

It seems pretty grotty to have a static function that gets an argument
of the wrong type, and then just immediately turns around and casts it
to something else.  It's not exactly obvious that that's even safe,
especially if you don't know that ArchiveHandle is a struct whose
first element is an Archive.  But even if you do know that subclassing
is intended, that doesn't prove that the particular Archive object is
always going to be an ArchiveHandle under the hood.  If it is, why not
just pass it as an ArchiveHandle to begin with?  I think this needs to
be revised in some way.  At least in the few cases I checked, the only
point of getting at the ArchiveHandle this way was to find
AH-connection, which suggests to me either that AH-connection should
be in the public section, inside Archive rather than ArchiveHandle,
or else that we ought to just pass the connection object to this
function (and all of its friends who have similar issues) as a
separate argument.  Either way, I think that would make this patch
both cleaner and less-invasive.  In fact we might want to pull out
just that change and commit it separately to simplify review of the
remaining work.

It's not clear to me why fmtQualifiedId needs to move to dumputils.c.
The way you have it, fmtQualifiedId() is now with fmtId(), but no
longer with fmtCopyColumnList(), the only other similarly named
function in that directory.  That may be more logical, or it may not,
but rearranging the code like this makes it a lot harder to review,
and I would prefer that we make such changes as separate commits if
we're going to do them, so that diff can do something sensible with
the changes that are integral to the patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for parallel pg_dump

2012-01-27 Thread Robert Haas
On Fri, Jan 27, 2012 at 10:57 AM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Jan 15, 2012 at 1:01 PM, Joachim Wieland j...@mcknight.de wrote:
 So this is the parallel pg_dump patch, generalizing the existing
 parallel restore and allowing parallel dumps for the directory archive
 format, the patch works on Windows and Unix.

 This patch introduces a large amount of notational churn, as perhaps
 well-illustrated by this hunk:

  static int
 ! dumpBlobs(Archive *AHX, void *arg)
  {
 +       /*
 +        * This is a data dumper routine, executed in a child for
 parallel backup,
 +        * so it must not access the global g_conn but AH-connection instead.
 +        */
 +       ArchiveHandle *AH = (ArchiveHandle *) AHX;

 It seems pretty grotty to have a static function that gets an argument
 of the wrong type, and then just immediately turns around and casts it
 to something else.  It's not exactly obvious that that's even safe,
 especially if you don't know that ArchiveHandle is a struct whose
 first element is an Archive.  But even if you do know that subclassing
 is intended, that doesn't prove that the particular Archive object is
 always going to be an ArchiveHandle under the hood.  If it is, why not
 just pass it as an ArchiveHandle to begin with?  I think this needs to
 be revised in some way.  At least in the few cases I checked, the only
 point of getting at the ArchiveHandle this way was to find
 AH-connection, which suggests to me either that AH-connection should
 be in the public section, inside Archive rather than ArchiveHandle,
 or else that we ought to just pass the connection object to this
 function (and all of its friends who have similar issues) as a
 separate argument.  Either way, I think that would make this patch
 both cleaner and less-invasive.  In fact we might want to pull out
 just that change and commit it separately to simplify review of the
 remaining work.

 It's not clear to me why fmtQualifiedId needs to move to dumputils.c.
 The way you have it, fmtQualifiedId() is now with fmtId(), but no
 longer with fmtCopyColumnList(), the only other similarly named
 function in that directory.  That may be more logical, or it may not,
 but rearranging the code like this makes it a lot harder to review,
 and I would prefer that we make such changes as separate commits if
 we're going to do them, so that diff can do something sensible with
 the changes that are integral to the patch.

Woops, hit send a little too soon there.  I'll try to make some more
substantive comments after looking at this more.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] TS: Limited cover density ranking

2012-01-27 Thread karavelov
Hello,

I have developed a variation of cover density ranking functions that counts 
only covers that are lesser than a specified limit. It is useful for finding 
combinations of terms that appear nearby one another. Here is an example of 
usage:

-- normal cover density ranking : not changed
luben= select ts_rank_cd(to_tsvector('a b c d e g h i j k'), 
to_tsquery('ad'));
 ts_rank_cd 

  0.033
(1 row)

-- limited to 2
luben= select ts_rank_cd(2, to_tsvector('a b c d e g h i j k'), 
to_tsquery('ad'));
 ts_rank_cd 

  0
(1 row)

luben= select ts_rank_cd(2, to_tsvector('a b c d e g h i j k a d'), 
to_tsquery('ad'));
 ts_rank_cd 

0.1
(1 row)

-- limited to 3
luben= select ts_rank_cd(3, to_tsvector('a b c d e g h i j k'), 
to_tsquery('ad'));
 ts_rank_cd 

  0.033
(1 row)

 luben= select ts_rank_cd(3, to_tsvector('a b c d e g h i j k a d'), 
to_tsquery('ad'));
 ts_rank_cd 

   0.13
(1 row)

Find attached a path agains 9.1.2 sources. I preferred to make a patch, not a 
separate extension because it is only 1 statement change in calc_rank_cd 
function. If I have to make an extension a lot of code would be duplicated 
between backend/utils/adt/tsrank.c and the extension.

I have some questions:

1. Is it interesting to develop it further (documentation, cleanup, etc) for 
inclusion in one of the next versions? If this is the case, there are some 
further questions:

- should I overload ts_rank_cd (as in examples above and the patch) or should I 
define new set of functions, for example ts_rank_lcd ?
- should I define define this new sql level functions in core or should I go 
only with this 2 lines change in calc_rank_cd() and define the new functions as 
an extension? If we prefer the later, could I overload core functions with 
functions defined in extensions?
- and finally there is always the possibility to duplicate the code and make an 
independent extension.

2. If I run the patched version on cluster that was initialized with unpatched 
server, is there a way to register the new functions in the system catalog 
without reinitializing the cluster?

Best regards
luben

--
Luben Karavelov

Re: [HACKERS] TS: Limited cover density ranking

2012-01-27 Thread karavelov
And here is the patch, that I forgot to attach
 Hello,
 
 I have developed a variation of cover density ranking functions that counts 
 only covers that are lesser than a specified limit. It is useful for finding 
 combinations of terms that appear nearby one another. Here is an example of 
 usage:

...

 
 Find attached a path agains 9.1.2 sources. I preferred to make a patch, not a 
 separate extension because it is only 1 statement change in calc_rank_cd 
 function. If I have to make an extension a lot of code would be duplicated 
 between backend/utils/adt/tsrank.c and the extension.
 
--
Luben Karavelovdiff -pur postgresql-9.1-9.1.2/src/backend/utils/adt/tsrank.c /usr/src/postgresql-9.1-9.1.2/src/backend/utils/adt/tsrank.c
--- postgresql-9.1-9.1.2/src/backend/utils/adt/tsrank.c	2011-12-01 23:47:20.0 +0200
+++ /usr/src/postgresql-9.1-9.1.2/src/backend/utils/adt/tsrank.c	2012-01-27 07:45:34.558028176 +0200
@@ -724,7 +724,7 @@ get_docrep(TSVector txt, QueryRepresenta
 }
 
 static float4
-calc_rank_cd(float4 *arrdata, TSVector txt, TSQuery query, int method)
+calc_rank_cd(int limit, float4 *arrdata, TSVector txt, TSQuery query, int method)
 {
 	DocRepresentation *doc;
 	int			len,
@@ -768,6 +768,9 @@ calc_rank_cd(float4 *arrdata, TSVector t
 		int			nNoise;
 		DocRepresentation *ptr = ext.begin;
 
+if (limit  0  ext.end-pos - ext.begin-pos  limit) 
+continue;
+
 		while (ptr = ext.end)
 		{
 			InvSum += invws[ptr-wclass];
@@ -834,7 +837,7 @@ ts_rankcd_wttf(PG_FUNCTION_ARGS)
 	int			method = PG_GETARG_INT32(3);
 	float		res;
 
-	res = calc_rank_cd(getWeights(win), txt, query, method);
+	res = calc_rank_cd(0, getWeights(win), txt, query, method);
 
 	PG_FREE_IF_COPY(win, 0);
 	PG_FREE_IF_COPY(txt, 1);
@@ -850,7 +853,7 @@ ts_rankcd_wtt(PG_FUNCTION_ARGS)
 	TSQuery		query = PG_GETARG_TSQUERY(2);
 	float		res;
 
-	res = calc_rank_cd(getWeights(win), txt, query, DEF_NORM_METHOD);
+	res = calc_rank_cd(0, getWeights(win), txt, query, DEF_NORM_METHOD);
 
 	PG_FREE_IF_COPY(win, 0);
 	PG_FREE_IF_COPY(txt, 1);
@@ -866,7 +869,7 @@ ts_rankcd_ttf(PG_FUNCTION_ARGS)
 	int			method = PG_GETARG_INT32(2);
 	float		res;
 
-	res = calc_rank_cd(getWeights(NULL), txt, query, method);
+	res = calc_rank_cd(0, getWeights(NULL), txt, query, method);
 
 	PG_FREE_IF_COPY(txt, 0);
 	PG_FREE_IF_COPY(query, 1);
@@ -880,9 +883,75 @@ ts_rankcd_tt(PG_FUNCTION_ARGS)
 	TSQuery		query = PG_GETARG_TSQUERY(1);
 	float		res;
 
-	res = calc_rank_cd(getWeights(NULL), txt, query, DEF_NORM_METHOD);
+	res = calc_rank_cd(0, getWeights(NULL), txt, query, DEF_NORM_METHOD);
 
 	PG_FREE_IF_COPY(txt, 0);
 	PG_FREE_IF_COPY(query, 1);
 	PG_RETURN_FLOAT4(res);
 }
+
+Datum
+ts_rankcd_lwttf(PG_FUNCTION_ARGS)
+{
+int limit = PG_GETARG_INT32(0);
+	ArrayType  *win = (ArrayType *) PG_DETOAST_DATUM(PG_GETARG_DATUM(1));
+	TSVector	txt = PG_GETARG_TSVECTOR(2);
+	TSQuery		query = PG_GETARG_TSQUERY(3);
+	int			method = PG_GETARG_INT32(4);
+	float		res;
+
+	res = calc_rank_cd(limit, getWeights(win), txt, query, method);
+
+	PG_FREE_IF_COPY(win, 1);
+	PG_FREE_IF_COPY(txt, 2);
+	PG_FREE_IF_COPY(query, 3);
+	PG_RETURN_FLOAT4(res);
+}
+
+Datum
+ts_rankcd_lwtt(PG_FUNCTION_ARGS)
+{
+int limit = PG_GETARG_INT32(0);
+	ArrayType  *win = (ArrayType *) PG_DETOAST_DATUM(PG_GETARG_DATUM(1));
+	TSVector	txt = PG_GETARG_TSVECTOR(2);
+	TSQuery		query = PG_GETARG_TSQUERY(3);
+	float		res;
+
+	res = calc_rank_cd(limit, getWeights(win), txt, query, DEF_NORM_METHOD);
+
+	PG_FREE_IF_COPY(win, 1);
+	PG_FREE_IF_COPY(txt, 2);
+	PG_FREE_IF_COPY(query, 3);
+	PG_RETURN_FLOAT4(res);
+}
+
+Datum
+ts_rankcd_lttf(PG_FUNCTION_ARGS)
+{
+int limit = PG_GETARG_INT32(0);
+	TSVector	txt = PG_GETARG_TSVECTOR(1);
+	TSQuery		query = PG_GETARG_TSQUERY(2);
+	int			method = PG_GETARG_INT32(3);
+	float		res;
+
+	res = calc_rank_cd(limit, getWeights(NULL), txt, query, method);
+
+	PG_FREE_IF_COPY(txt, 1);
+	PG_FREE_IF_COPY(query, 2);
+	PG_RETURN_FLOAT4(res);
+}
+
+Datum
+ts_rankcd_ltt(PG_FUNCTION_ARGS)
+{
+int limit = PG_GETARG_INT32(0);
+	TSVector	txt = PG_GETARG_TSVECTOR(1);
+	TSQuery		query = PG_GETARG_TSQUERY(2);
+	float		res;
+
+	res = calc_rank_cd(limit, getWeights(NULL), txt, query, DEF_NORM_METHOD);
+
+	PG_FREE_IF_COPY(txt, 1);
+	PG_FREE_IF_COPY(query, 2);
+	PG_RETURN_FLOAT4(res);
+}
diff -pur postgresql-9.1-9.1.2/src/include/catalog/pg_proc.h /usr/src/postgresql-9.1-9.1.2/src/include/catalog/pg_proc.h
--- postgresql-9.1-9.1.2/src/include/catalog/pg_proc.h	2011-12-01 23:47:20.0 +0200
+++ /usr/src/postgresql-9.1-9.1.2/src/include/catalog/pg_proc.h	2012-01-27 05:45:53.944979678 +0200
@@ -4159,6 +4159,15 @@ DATA(insert OID = 3709 (  ts_rank_cd	PGN
 DESCR(relevance);
 DATA(insert OID = 3710 (  ts_rank_cd	PGNSP PGUID 12 1 0 0 f f f t f i 2 0 700 3614 3615 _null_ _null_ _null_ _null_ ts_rankcd_tt _null_ _null_ _null_ ));
 DESCR(relevance);
+DATA(insert OID = 3675 (  ts_rank_cd	PGNSP PGUID 12 1 0 0 f f f t f i 5 0 700 23 1021 3614 

Re: [HACKERS] patch for parallel pg_dump

2012-01-27 Thread Robert Haas
On Fri, Jan 27, 2012 at 10:58 AM, Robert Haas robertmh...@gmail.com wrote:
 It's not clear to me why fmtQualifiedId needs to move to dumputils.c.
 The way you have it, fmtQualifiedId() is now with fmtId(), but no
 longer with fmtCopyColumnList(), the only other similarly named
 function in that directory.  That may be more logical, or it may not,
 but rearranging the code like this makes it a lot harder to review,
 and I would prefer that we make such changes as separate commits if
 we're going to do them, so that diff can do something sensible with
 the changes that are integral to the patch.

 Woops, hit send a little too soon there.  I'll try to make some more
 substantive comments after looking at this more.

And maybe retract some of the bogus ones, like the above: I see why
you moved this, now - parallel.c needs it.

Still looking...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TS: Limited cover density ranking

2012-01-27 Thread Sushant Sinha
The rank counts 1/coversize. So bigger covers will not have much impact
anyway. What is the need of the patch?

-Sushant.

On Fri, 2012-01-27 at 18:06 +0200, karave...@mail.bg wrote:
 Hello, 
 
 I have developed a variation of cover density ranking functions that
 counts only covers that are lesser than a specified limit. It is
 useful for finding combinations of terms that appear nearby one
 another. Here is an example of usage: 
 
 -- normal cover density ranking : not changed 
 luben= select ts_rank_cd(to_tsvector('a b c d e g h i j k'),
 to_tsquery('ad')); 
 ts_rank_cd 
  
 0.033 
 (1 row) 
 
 -- limited to 2 
 luben= select ts_rank_cd(2, to_tsvector('a b c d e g h i j k'),
 to_tsquery('ad')); 
 ts_rank_cd 
  
 0 
 (1 row) 
 
 luben= select ts_rank_cd(2, to_tsvector('a b c d e g h i j k a d'),
 to_tsquery('ad')); 
 ts_rank_cd 
  
 0.1 
 (1 row) 
 
 -- limited to 3 
 luben= select ts_rank_cd(3, to_tsvector('a b c d e g h i j k'),
 to_tsquery('ad')); 
 ts_rank_cd 
  
 0.033 
 (1 row) 
 
 luben= select ts_rank_cd(3, to_tsvector('a b c d e g h i j k a d'),
 to_tsquery('ad')); 
 ts_rank_cd 
  
 0.13 
 (1 row) 
 
 Find attached a path agains 9.1.2 sources. I preferred to make a
 patch, not a separate extension because it is only 1 statement change
 in calc_rank_cd function. If I have to make an extension a lot of code
 would be duplicated between backend/utils/adt/tsrank.c and the
 extension. 
 
 I have some questions: 
 
 1. Is it interesting to develop it further (documentation, cleanup,
 etc) for inclusion in one of the next versions? If this is the case,
 there are some further questions: 
 
 - should I overload ts_rank_cd (as in examples above and the patch) or
 should I define new set of functions, for example ts_rank_lcd ? 
 - should I define define this new sql level functions in core or
 should I go only with this 2 lines change in calc_rank_cd() and define
 the new functions as an extension? If we prefer the later, could I
 overload core functions with functions defined in extensions? 
 - and finally there is always the possibility to duplicate the code
 and make an independent extension. 
 
 2. If I run the patched version on cluster that was initialized with
 unpatched server, is there a way to register the new functions in the
 system catalog without reinitializing the cluster? 
 
 Best regards 
 luben 
 
 -- 
 Luben Karavelov



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for parallel pg_dump

2012-01-27 Thread Robert Haas
On Sun, Jan 15, 2012 at 1:01 PM, Joachim Wieland j...@mcknight.de wrote:
 So this is the parallel pg_dump patch, generalizing the existing
 parallel restore and allowing parallel dumps for the directory archive
 format, the patch works on Windows and Unix.

It seems a little unfortunate that we are using threads here on
Windows and processes on Linux.  Maybe it's too late to revisit that
decision, but it seems like a recipe for subtle bugs.

 In parallel restore, the master closes its own connection to the
 database before forking of worker processes, just as it does now. In
 parallel dump however, we need to hold the masters connection open so
 that we can detect deadlocks. The issue is that somebody could have
 requested an exclusive lock after the master has initially requested a
 shared lock on all tables. Therefore, the worker process also requests
 a shared lock on the table with NOWAIT and if this fails, we know that
 there is a conflicting lock in between and that we need to abort the
 dump.

I think this is an acceptable limitation, but the window where it can
happen seems awfully wide right now.  As things stand, it seems like
we don't try to lock the table in the child until we're about to
access it, which means that, on a large database, we could dump out
99% of the database and then be forced to abort the dump because of a
conflicting lock on the very last table.  We could fix that by having
every child lock every table right at the beginning, so that all
possible failures of this type would happen before we do any work, but
that will eat up a lot of lock table space.  It would be nice if the
children could somehow piggyback on the parent's locks, but I don't
see any obvious way to make that work.  Maybe we just have to live
with it the way it is, but I worry that people whose dumps fail 10
hours into a 12 hour parallel dump are going to be grumpy.

 The connections of the parallel dump use the synchronized snapshot
 feature. However there's also an option --no-synchronized-snapshots
 which can be used to dump from an older PostgreSQL version.

Right now, you have it set up so that --no-synchronized-snapshots is
ignored even if synchronized snapshots are supported, which doesn't
make much sense to me.  I think you should allow
--no-synchronized-snapshots with any release, and error out if it's
not specified and the version is too old work without it.  It's
probably not a good idea to run with --no-synchronized-snapshots ever,
and doubly so if they're not available, but I'd rather leave that
decision to the user.  (Imagine, for example, that we discovered a bug
in our synchronized snapshot implementation.)

I am tempted to advocate calling the flag --unsynchronized-snapshots,
because to me that underscores the danger a little more clearly, but
perhaps that is too clever.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TS: Limited cover density ranking

2012-01-27 Thread karavelov
- Цитат от Sushant Sinha (sushant...@gmail.com), на 27.01.2012 в 18:32 -

 The rank counts 1/coversize. So bigger covers will not have much impact
 anyway. What is the need of the patch?
 
 -Sushant.


If you want to find only combinations of words that are close one to another, 
with the patch you could use something as:

WITH a AS (SELECT to_tsvector('a b c d e g h i j k') AS vec, to_tsquery('ad') 
AS query) 
SELECT * FROM a WHERE vec @@ query AND ts_rank_cd(3,vec,query)0;

I could not find another way to make this type of queries. If there is an 
alternative, I am open to suggestions

Best regards
--
Luben Karavelov

Re: [HACKERS] patch for parallel pg_dump

2012-01-27 Thread Heikki Linnakangas

On 27.01.2012 18:46, Robert Haas wrote:

On Sun, Jan 15, 2012 at 1:01 PM, Joachim Wielandj...@mcknight.de  wrote:

In parallel restore, the master closes its own connection to the
database before forking of worker processes, just as it does now. In
parallel dump however, we need to hold the masters connection open so
that we can detect deadlocks. The issue is that somebody could have
requested an exclusive lock after the master has initially requested a
shared lock on all tables. Therefore, the worker process also requests
a shared lock on the table with NOWAIT and if this fails, we know that
there is a conflicting lock in between and that we need to abort the
dump.


I think this is an acceptable limitation, but the window where it can
happen seems awfully wide right now.  As things stand, it seems like
we don't try to lock the table in the child until we're about to
access it, which means that, on a large database, we could dump out
99% of the database and then be forced to abort the dump because of a
conflicting lock on the very last table.  We could fix that by having
every child lock every table right at the beginning, so that all
possible failures of this type would happen before we do any work, but
that will eat up a lot of lock table space.  It would be nice if the
children could somehow piggyback on the parent's locks, but I don't
see any obvious way to make that work.  Maybe we just have to live
with it the way it is, but I worry that people whose dumps fail 10
hours into a 12 hour parallel dump are going to be grumpy.


If the master process keeps the locks it acquires in the beginning, you 
could fall back to dumping those tables where the child lock fails using 
the master connection.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TS: Limited cover density ranking

2012-01-27 Thread karavelov
- Цитат от karave...@mail.bg, на 27.01.2012 в 18:48 -

 - Цитат от Sushant Sinha (sushant...@gmail.com), на 27.01.2012 в 18:32 
 -
 
 The rank counts 1/coversize. So bigger covers will not have much impact
 anyway. What is the need of the patch?
 
 -Sushant.

 
 If you want to find only combinations of words that are close one to another, 
 with the patch you could use something as:
 
 WITH a AS (SELECT to_tsvector('a b c d e g h i j k') AS vec, 
 to_tsquery('ad') AS query) 
 SELECT * FROM a WHERE vec @@ query AND ts_rank_cd(3,vec,query)0;
 

Another example, if you want to match 'b c d' only, you could use:

WITH A AS (SELECT to_tsvector('a b c d e g h i j k') AS vec, 
to_tsquery('bcd') AS query) 
SELECT * FROM A WHERE vec @@ query AND ts_rank_cd(2,vec,query)0;

The catch is that it will match also 'b d c', 'd c b', 'd b c', 'c d b' and 'd 
b d', so it is not a
replacement for exact phrase match but something that I find useful

--
Luben Karavelov

Re: [HACKERS] pg_statistic, lack of documentation

2012-01-27 Thread Robert Haas
On Sat, Jan 14, 2012 at 7:34 AM, Sergey Konoplev gray...@gmail.com wrote:
 Hi,

 http://www.postgresql.org/docs/9.1/interactive/catalog-pg-statistic.html

 It specifies that entries are created by ANALYZE, but does not mention
 that if a table is empty the entry for it is not created.

The actual behavior, in somewhat more detail, is that the existing
statistics entries, if any, are retained.

I've added a note to that effect to the documentation for ANALYZE,
which seems like a more appropriate place than the pg_statistic
documentation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for parallel pg_dump

2012-01-27 Thread Robert Haas
On Fri, Jan 27, 2012 at 11:53 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 If the master process keeps the locks it acquires in the beginning, you
 could fall back to dumping those tables where the child lock fails using the
 master connection.

Hmm, that's a thought.

Another idea I just had is to allow a transaction that has imported a
snapshot to jump the queue when attempting to acquire a lock that the
backend from which the snapshot was imported already holds.  We don't
want to allow queue-jumping in general because there are numerous
places in the code where we rely on the current behavior to avoid
starving strong lockers, but it seems like it might be reasonable to
allow it in this special case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Confusing EXPLAIN output in case of inherited tables

2012-01-27 Thread Robert Haas
On Wed, Jan 11, 2012 at 6:43 AM, Ashutosh Bapat
ashutosh.ba...@enterprisedb.com wrote:
 Hi,
 After running regression, I ran EXPLAIN on one of the queries in regression
 (test create_misc) and got following output
 regression=# explain verbose select * into table ramp from road where name ~
 '.*Ramp';
  QUERY
 PLAN
 
  Result  (cost=0.00..154.00 rows=841 width=67)
    Output: public.road.name, public.road.thepath
    -  Append  (cost=0.00..154.00 rows=841 width=67)
  -  Seq Scan on public.road  (cost=0.00..135.05 rows=418 width=67)
    Output: public.road.name, public.road.thepath
    Filter: (public.road.name ~ '.*Ramp'::text)
  -  Seq Scan on public.ihighway road  (cost=0.00..14.99 rows=367
 width=67)
     ^
    Output: public.road.name, public.road.thepath
    ^^,   ^^
    Filter: (public.road.name ~ '.*Ramp'::text)
  ^^^
  -  Seq Scan on public.shighway road  (cost=0.00..3.96 rows=56
 width=67)
    Output: public.road.name, public.road.thepath
    Filter: (public.road.name ~ '.*Ramp'::text)
 (12 rows)

 regression=# \d+ road
     Table public.road
  Column  | Type | Modifiers | Storage  | Stats target | Description
 -+--+---+--+--+-
  name    | text |   | extended |  |
  thepath | path |   | extended |  |
 Indexes:
     rix btree (name)
 Child tables: ihighway,
   shighway
 Has OIDs: no

 Table road has children ihighway and shighway as seen in the \d+
 output above. The EXPLAIN output of Seq Scan node on children has
 public.road as prefix for variables. public.road could imply the parent
 table road and thus can cause confusion, as to what's been referreed, the
 columns of parent table or child table. In the EXPLAIN output children
 tables have road as alias (as against public.road). The alias comes from
 RangeTblEntry-eref-aliasname. It might be better to have road as prefix
 in the variable names over public.road.

It's a feature, not a bug, that we schema-qualify names when VERBOSE
is specified.  That was done on purpose for the benefit of external
tools that might need this information to disambiguate which object is
being referenced.

Table *aliases*, of course, should not be schema-qualified, but I
don't think that's what we're doing.  You could make it more clear by
including an alias in the query, like this:

explain verbose select * into table ramp from road hwy where name ~ '.*Ramp';

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get_fn_expr_argtype() vs. internal calls

2012-01-27 Thread Robert Haas
On Thu, Dec 29, 2011 at 4:17 PM, Noah Misch n...@leadboat.com wrote:
 We document that a polymorphic C-language function may identify the concrete
 data type of each argument using calls to get_fn_expr_argtype().  That relies
 on FmgrInfo.fn_expr, which only the executor sets.  Calls of internal origin,
 by way of {Direct,,Oid}FunctionCall*(), don't cons up an fn_expr, so
 get_fn_expr_argtype() just returns InvalidOid every time.  (Indeed, we
 couldn't easily do better in many cases.)  To what extent is it safe to rely
 on this situation remaining as it is?

 I ask on account of some second thoughts I had about CheckIndexCompatible().
 When writing it, I did not explicitly consider operator classes having
 polymorphic opcintype.  If get_fn_expr_argtype() were to work in a function
 called from the btree search code, CheckIndexCompatible() should impose
 stricter checks on indexes having opclasses of polymorphic opcintype.  If
 that's not too likely to happen, I might just add a comment instead.

[ found this while reviewing emails I hadn't read carefully enough
when they were first posted ]

I think the way you actually chose to tighten this up is probably
safer, and it seems to have minimal downside.

Thanks for pursuing this despite the lack of response to your initial query.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unreliable pg_ctl -w start again

2012-01-27 Thread Tom Lane
MauMau maumau...@gmail.com writes:
 In PostgreSQL 9.1, the wait processing is largely modified. However, the 
 same assumption seems to still remain, though the duration is 5 seconds. 5 
 seconds of wait is probably insufficient for my case. I think no fixed 
 duration is appropriate.

Well, feel free to increase that duration if you want.  The reason it's
there is to not wait for a long time if the postmaster falls over
instantly at startup, but in a non-interactive situation you might not
care.

 How about inserting postmaster_is_alive() as below?

Looks like complete nonsense to me, if the goal is to behave sanely when
postmaster.pid hasn't been created yet.  Where do you think get_pgpid
gets the PID from?

If you want to do something useful about this, the correct hint is
buried in start_postmaster():

/*
 * Since there might be quotes to handle here, it is easier simply to pass
 * everything to a shell to process them.
 *
 * XXX it would be better to fork and exec so that we would know the child
 * postmaster's PID directly; then test_postmaster_connection could use
 * the PID without having to rely on reading it back from the pidfile.
 */

If we had the postmaster's PID a priori, we could detect postmaster
death directly instead of having to make assumptions about how long
is reasonable to wait for the pidfile to appear.  The problem is that
we don't want to write a complete replacement for the shell's command
line parser and I/O redirection logic.  It doesn't look like a small
project.

(But maybe we could bypass that by doing a fork() and then having
the child exec() the shell, telling it to exec postmaster in turn?)

And of course Windows as usual makes things twice as hard, since we
couldn't make such a change unless start_postmaster could return the
proper PID in that case too.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Intermittent regression test failures from index-only plan changes

2012-01-27 Thread Robert Haas
On Sat, Jan 7, 2012 at 12:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I feel like this is a trick question, but I'll ask anyway: Can't we
 just ignore ANALYZE?

 AFAICS, no.  ANALYZE will run user-defined code: not only user-supplied
 stats collection functions, but user-defined index expressions.  We
 cannot assume that none of that ever requires a snapshot.

The question is: Why would it matter if we expunged tuples from table
A while ANALYZE was running on table B?   I guess the problem is that
the index on B might involve a user-defined function which (under the
covers) peeks at table A, possibly now seeing an inconsistent view of
the database.

It's pretty unfortunate to have to cater to that situation, though,
because most of the time an ANALYZE on table A is only going to look
at table A and the system catalogs.  In fact, it wouldn't even be
disastrous (in most cases) if we removed tuples from the table being
analyzed - we're engaged in an inherently statistical process anyway,
so who really cares if things change on us in medias res?

Could we easily detect the cases where user code is being run and
ignore ANALYZE when none is?

A probably crazy idea is to add an option to vacuum that would cause
it, upon discovering that it can't set PD_ALL_VISIBLE on a page
because the global xmin is too old, to wait for all of the virtual
transaction IDs who might not be able to see every tuple on the page.
This would allow us to get into a state where all the PD_ALL_VISIBLE
bits are known to be set.  But that seems a bit complex for something
that we probably don't care about much outside of the regression
tests.

If none of the above is feasible (and I suspect it isn't), we might
just want to tweak the queries to do something that will preclude
using an index-only scan, like including tableoid::regclass in the
target list.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rewriteheap.c bug: toast rows don't get XIDs matching their parents

2012-01-27 Thread Robert Haas
On Thu, Jan 12, 2012 at 4:50 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 While working on bug #6393 I was reminded of the truth of $SUBJECT: any
 rows inserted into the new toast table will have the xmin of the CLUSTER
 or VACUUM FULL operation, and invalid xmax, whereas their parent heap
 rows will have xmin/xmax copied from the previous instance of the table.
 This does not matter much for ordinary live heap rows, but it's also
 necessary for CLUSTER/VACUUM FULL to copy recently-dead,
 insert-in-progress, and delete-in-progress rows.  In such cases, a later
 plain VACUUM might reap the parent heap rows and not the toast rows,
 leading to a storage leak that won't be recovered short of another
 CLUSTER/VACUUM FULL.

 I can't remember if we discussed this risk when the heap rewrite code
 was written.  I'm not sure it's worth fixing, but at the least it ought
 to be documented in the comments in rewriteheap.c.

People run CLUSTER and VACUUM FULL to recover wasted storage, so it's
a bit unfortunate if those operations can themselves introduce a
storage leak.  So I think it would be nice to fix this, but if that's
more than we can manage right now, then I agree we should at least add
a code comment so that it has a better chance of getting fixed later.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_statistic, lack of documentation

2012-01-27 Thread Sergey Konoplev
On Fri, Jan 27, 2012 at 9:14 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Jan 14, 2012 at 7:34 AM, Sergey Konoplev gray...@gmail.com wrote:
 I've added a note to that effect to the documentation for ANALYZE,
 which seems like a more appropriate place than the pg_statistic
 documentation.

Thank you.


 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company



-- 
Sergey Konoplev

Blog: http://gray-hemp.blogspot.com
LinkedIn: http://ru.linkedin.com/in/grayhemp
JID/GTalk: gray...@gmail.com Skype: gray-hemp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multithread Query Planner

2012-01-27 Thread Pierre C



Not to mention palloc, another extremely fundamental and non-reentrant
subsystem.

Possibly we could work on making all that stuff re-entrant, but it would
be a huge amount of work for a distant and uncertain payoff.



Right.  I think it makes more sense to try to get parallelism working
first with the infrastructure we have.  Converting to use threading,
if we ever do it at all, should be something we view as a later
performance optimization.  But I suspect we won't want to do it
anyway; I think there will be easier ways to get where we want to be.


Multithreading got fashionable with the arrival of the Dual-core CPU a few  
years ago. However, multithreading as it is used currently has a huge  
problem : usually, threads share all of their memory. This opens the door  
to an infinite number of hard to find bugs, and more importantly, defeats  
the purpose.


Re-entrant palloc() is nonsense. Suppose you can make a reentrant  
palloc() which scales OK at 2 threads thanks to a cleverly placed atomic  
instruction. How is it going to scale on 64 cores ? On HP's new 1000-core  
ARM server with non-uniform memory access ? Probably it would suck very  
very badly... not to mention the horror of multithreaded exception-safe  
deallocation when 1 thread among many blows up on an error...


For the ultimate in parallelism, ask a FPGA guy. Is he using shared memory  
to wire together his 12000 DSP blocks ? Nope, he's using isolated  
Processes which share nothing and communicate through FIFOs and hardware  
message passing. Like shell pipes, basically. Or Erlang.


Good parallelism = reduce shared state and communicate through  
data/message channels.


Shared-everything multithreading is going to be in a lot of trouble on  
future many-core machines. Incidentally, Postgres, with its Processes,  
sharing only what is needed, has a good head start...


With more and more cores coming, you guys are going to have to fight to  
reduce the quantity of shared state between processes, not augment it by  
using shared memory threads !...


Say you want to parallelize sorting.
Sorting is a black-box with one input data pipe and one output data pipe.
Data pipes are good for parallelism, just like FIFOs. FPGA guys love black  
boxes with FIFOs between them.


Say you manage to send tuples through a FIFO like zeromq. Now you can even  
run the sort on another machine and allow it to use all the RAM if you  
like. Now split the black box in two black boxes (qsort and merge),  
instanciate as many qsort boxes as necessary, and connect that together  
with pipes. Run some boxes on some of this machine's cores, some other  
boxes on another machine, etc. That would be very flexible (and scalable).


Of course the black box has a small backdoor : some comparison functions  
can access shared state, which is basically *the* issue (not reentrant  
stuff, which you do not need).


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-27 Thread Dan Scales
The advantage of putting the checksum calculation in smgrwrite() (or mdwrite()) 
is that it catches a bunch of page writes that don't go through the buffer pool 
(see calls to smgrwrite() in nbtree.c, nbtsort.c, spginsert.c)

Also, I missed this before:  don't you want to add the checksum calculation 
(PageSetVerificationInfo) to mdextend() (or preferably smgrextend()) as well?  
Otherwise, you won't be checksumming a bunch of the new pages.

Dan

- Original Message -
From: Robert Haas robertmh...@gmail.com
To: Dan Scales sca...@vmware.com
Cc: Noah Misch n...@leadboat.com, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com, Andres Freund and...@anarazel.de, 
Kevin Grittner kevin.gritt...@wicourts.gov, da...@fetter.org, 
ai...@highrise.ca, st...@mit.edu, pgsql-hackers@postgresql.org, Simon Riggs 
si...@2ndquadrant.com
Sent: Friday, January 27, 2012 5:19:32 AM
Subject: Re: [HACKERS] 16-bit page checksums for 9.2

On Thu, Jan 26, 2012 at 7:01 PM, Dan Scales sca...@vmware.com wrote:
 I'm not sure why you moved the checksum calculation (PageSetVerificationInfo) 
 to mdwrite() rather than smgrwrite().  If there were every another storage 
 backend, it would have to duplicate the checksum check, right?  Is there a 
 disadvantage to putting it in smgrwrite()?

The smgr and md layers don't currently know anything about the page
format, and I imagine we want to keep it that way.  It seems like the
right place for this is in some higher layer, like bufmgr.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simulating Clog Contention

2012-01-27 Thread Jeff Janes
On Thu, Jan 12, 2012 at 4:31 AM, Simon Riggs si...@2ndquadrant.com wrote:

 The following patch adds a pgbench option -I to load data using
 INSERTs, so that we can begin benchmark testing with rows that have
 large numbers of distinct un-hinted transaction ids. With a database
 pre-created using this we will be better able to simulate and thus
 more easily measure clog contention. Note that current clog has space
 for 1 million xids, so a scale factor of greater than 10 is required
 to really stress the clog.

Running with this patch with a non-default scale factor generates the
spurious notice:

Scale option ignored, using pgbench_branches table count = 10

In fact the scale option is not being ignored, because it was used to
initialize the pgbench_branches table count earlier in this same
invocation.

I think that even in normal (non-initialization) usage, this message
should be suppressed when the provided scale factor
is equal to the pgbench_branches table count.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention, part 2

2012-01-27 Thread Jeff Janes
On Sat, Jan 21, 2012 at 7:31 AM, Simon Riggs si...@2ndquadrant.com wrote:

 Yes, it was. Sorry about that. New version attached, retesting while
 you read this.

In my hands I could never get this patch to do anything.  The new
cache was never used.

I think that that was because RecentXminPageno never budged from -1.

I think that that, in turn, is because the comparison below can never
return true, because the comparison is casting both sides to uint, and
-1 cast to uint is very large

/* When we commit advance ClogCtl's shared RecentXminPageno if needed */
if (ClogCtl-shared-RecentXminPageno  TransactionIdToPage(RecentXmin))
 ClogCtl-shared-RecentXminPageno =
TransactionIdToPage(RecentXmin);


Also, I think the general approach is wrong.  The only reason to have
these pages in shared memory is that we can control access to them to
prevent write/write and read/write corruption.  Since these pages are
never written, they don't need to be in shared memory.   Just read
each page into backend-local memory as it is needed, either
palloc/pfree each time or using a single reserved block for the
lifetime of the session.  Let the kernel worry about caching them so
that the above mentioned reads are cheap.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dry-run mode for pg_archivecleanup

2012-01-27 Thread Josh Kupershmidt
On Fri, Jan 27, 2012 at 9:47 AM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Jan 15, 2012 at 5:05 PM, Josh Kupershmidt schmi...@gmail.com wrote:
 On Sun, Jan 15, 2012 at 3:02 PM, Gabriele Bartolini
 gabriele.bartol...@2ndquadrant.it wrote:

 My actual intention was to have the filename as output of the command, in
 order to easily pipe it to another script. Hence my first choice was to
 use the stdout channel, considering also that pg_archivecleanup in dry-run
 mode is harmless and does not touch the content of the directory.

 Oh, right - I should have re-read your initial email before diving
 into the patch. That all makes sense given your intended purpose. I
 guess your goal of constructing some simple way to pass the files
 which would be removed on to another script is a little different than
 what I initially thought the patch would be useful for, namely as a
 testing/debugging aid for an admin.

 Perhaps both goals could be met by making use of '--debug' together
 with '--dry-run'. If they are both on, then an additional message like
 pg_archivecleanup: would remove file ...  would be printed to
 stderr, along with just the filename printed to stdout you already
 have.

 This email thread seems to have trailed off without reaching a
 conclusion.  The patch is marked as Waiting on Author in the
 CommitFest application, but I'm not sure that's accurate.  Can we try
 to nail this down?

Perhaps my last email was a bit wordy. The only real change I am
suggesting for Gabriele's patch is that the message printed to stderr
when debug + dryrun are activated be changed to would remove file
... from removing file, i.e around line 124:

if (debug)
fprintf(stderr, %s: %s file \%s\\n,
progname, (dryrun ? would remove : removing),
WALFilePath);

Other than that little quibble, I thought the patch was fine.

Josh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Dry-run mode for pg_archivecleanup

2012-01-27 Thread Alvaro Herrera

Excerpts from Josh Kupershmidt's message of vie ene 27 19:43:51 -0300 2012:
 On Fri, Jan 27, 2012 at 9:47 AM, Robert Haas robertmh...@gmail.com wrote:

  This email thread seems to have trailed off without reaching a
  conclusion.  The patch is marked as Waiting on Author in the
  CommitFest application, but I'm not sure that's accurate.  Can we try
  to nail this down?
 
 Perhaps my last email was a bit wordy. The only real change I am
 suggesting for Gabriele's patch is that the message printed to stderr
 when debug + dryrun are activated be changed to would remove file
 ... from removing file, i.e around line 124:
 
 if (debug)
 fprintf(stderr, %s: %s file \%s\\n,
 progname, (dryrun ? would remove : removing),
 WALFilePath);
 
 Other than that little quibble, I thought the patch was fine.

If you do that, please make sure you use two complete separate strings
instead of building one from spare parts.  This bit is missing the _()
stuff though.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention, part 2

2012-01-27 Thread Merlin Moncure
On Fri, Jan 27, 2012 at 4:05 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Also, I think the general approach is wrong.  The only reason to have
 these pages in shared memory is that we can control access to them to
 prevent write/write and read/write corruption.  Since these pages are
 never written, they don't need to be in shared memory.   Just read
 each page into backend-local memory as it is needed, either
 palloc/pfree each time or using a single reserved block for the
 lifetime of the session.  Let the kernel worry about caching them so
 that the above mentioned reads are cheap.

right -- exactly.  but why stop at one page?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Configuring Postgres to Add A New Source File

2012-01-27 Thread Tareq Aljabban
Indeed, I'm a beginner in Make, but I read few tutorials and was able to
do what I wanted outside of PG using a simple make file.
Now, when moving to PG, I found the Make structure much more complicated
and didn't know where to add my configuration.
I'm looking only for this file to run in PG (the required functionality is
done already). My code will be a part of the backend, so I want to keep it
there.
You're saying it's going to be a bit tricky to do what I want with make (as
I'm novice in it).. so can you please point out some references I can start
from?!
Do you think reposting this in the novice mailing list will be a good
idea?
Thanks


On Thu, Jan 26, 2012 at 12:54 PM, Robert Haas robertmh...@gmail.com wrote:

 On Wed, Jan 25, 2012 at 7:06 AM, Tareq Aljabban
 tareq.aljab...@gmail.com wrote:
  Hi,
  I'm doing some development on the storage manager module of Postgres.
 
  I have added few source files already to the smgr folder, and I was able
 to
  have the Make system includes them by adding their names to the OBJS
 list in
  the Makefile inside the smgr folder. (i.e. When I add A.c, I would add
 A.o
  to the OBJS list).
 
  That was working fine. Now I'm trying to add a new file hdfs_test.c to
 the
  project. The problem with this file is that it requires some extra
  directives in its compilation command (-I and -L directives).
 
  Using gcc the file is compiled with the command:
 
  gcc hdfs_test.c -I/HDFS_HOME/hdfs/src/c++/libhdfs
  -I/usr/lib/jvm/default-java/include -L/HDFS_HOME/hdfs/src/c++/libhdfs
  -L/HDFS_HOME/build/c++/Linux-i386-32/lib
  -L/usr/lib/jvm/default-java/jre/lib/i386/server -ljvm -lhdfs -o hdfs_test
 
  I was told that in order to add the extra directives, I need to modify
  $LDFLAGS in configure.in and $LIBS in MakeFile.
 
  I read about autoconf and checked configure.in of Postgres but still
 wasn't
  able to know exactly what I should be doing.

 This is really a general question about make that has nothing to do
 with PostgreSQL specifically; you might want to get a book on make.

 Makefiles in subdirectories of src/backend are only intended to be
 able to build the backend itself, so there's probably no particularly
 easy way to do what you want there.  You likely want to put this code
 someplace like src/test/hdfs_test and copy one of the other Makefiles
 into the new directory, perhaps src/test/isolation/Makefile.

 But even if you do that for starters, you're going to need to make
 some modifications, which may be a bit tricky if you have no
 experience at all using make.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company



[HACKERS] initdb and fsync

2012-01-27 Thread Jeff Davis
It looks like initdb doesn't fsync all the files it creates, e.g. the
PG_VERSION file.

While it's unlikely that it would cause any real data loss, it can be
inconvenient in some testing scenarios involving VMs.

Thoughts? Would a patch to add a few fsync calls to initdb be accepted?
Is a platform-independent fsync be available at initdb time?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention, part 2

2012-01-27 Thread Jeff Janes
On Fri, Jan 27, 2012 at 3:16 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Fri, Jan 27, 2012 at 4:05 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Also, I think the general approach is wrong.  The only reason to have
 these pages in shared memory is that we can control access to them to
 prevent write/write and read/write corruption.  Since these pages are
 never written, they don't need to be in shared memory.   Just read
 each page into backend-local memory as it is needed, either
 palloc/pfree each time or using a single reserved block for the
 lifetime of the session.  Let the kernel worry about caching them so
 that the above mentioned reads are cheap.

 right -- exactly.  but why stop at one page?

If you have more than one, you need code to decide which one to evict
(just free) every time you need a new one.  And every process needs to
be running this code, while the kernel is still going to need make its
own decisions for the entire system.  It seems simpler to just let the
kernel do the job for everyone.  Are you worried that a read syscall
is going to be slow even when the data is presumably cached in the OS?

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Temp file missing during large pgbench data set

2012-01-27 Thread Thom Brown
Hi,

I'm using latest git master (latest entry
0816fad6eebddb8f1f0e21635e46625815d690b9) and I'm getting an error
when trying to create a large data set with pgbench:

thom@swift:~/Development$ createdb pgbench
thom@swift:~/Development$ pgbench -i -s 100 pgbench
NOTICE:  table pgbench_branches does not exist, skipping
NOTICE:  table pgbench_tellers does not exist, skipping
NOTICE:  table pgbench_accounts does not exist, skipping
NOTICE:  table pgbench_history does not exist, skipping
creating tables...
1 tuples done.
2 tuples done.
3 tuples done.
4 tuples done.

snip

997 tuples done.
998 tuples done.
999 tuples done.
1000 tuples done.
set primary key...
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index
pgbench_branches_pkey for table pgbench_branches
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index
pgbench_tellers_pkey for table pgbench_tellers
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index
pgbench_accounts_pkey for table pgbench_accounts
LOG:  could not stat file base/pgsql_tmp/pgsql_tmp8056.0: Success
STATEMENT:  alter table pgbench_accounts add primary key (aid)
vacuum...done.

I've tried this multiple times and the result is the same.  I'm not
sure what this message means, but the primary key is created
successfully. If I reduce the scale to 20 the error doesn't occur.

The following changes were made to my postgresql.conf file:

max_connections = 300
shared_buffers = 3900MB
temp_buffers = 16MB
work_mem = 16MB
maintenance_work_mem = 256MB
checkpoint_segments = 32
random_page_cost = 1.1
effective_cache_size = 12GB

All other values are at default.

Is this anything to worry about?

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unreliable pg_ctl -w start again

2012-01-27 Thread MauMau

From: Tom Lane t...@sss.pgh.pa.us

Well, feel free to increase that duration if you want.  The reason it's
there is to not wait for a long time if the postmaster falls over
instantly at startup, but in a non-interactive situation you might not
care.


Yes, just lengthening the wait duration causes unnecessary long wait when we 
run pg_ctl interactively. Therefore, the current wait approach is is not 
correct.




How about inserting postmaster_is_alive() as below?


Looks like complete nonsense to me, if the goal is to behave sanely when
postmaster.pid hasn't been created yet.  Where do you think get_pgpid
gets the PID from?


Yes, I understand that get_pgpid() gets the pid from postmaster.pid, which 
may be the pid of the previous postmaster that did not stop cleanly.


I think my simple fix makes sense to solve the problem as follows. Could you 
point out what might not be good?


1.The previous postmaster was terminated abruptly due to OS shutdown, 
machine failure, etc. leaving postmaster.pid.

2.Run pg_ctl -w start to start new postmaster.
3.do_start() of pg_ctl reads the pid of previously running postmaster from 
postmaster.pid. Say, let it be pid-1 (old_pid in code) here.


 old_pid = get_pgpid();

4.Anyway, try to start postmaster by calling start_postmaster().
5.If postmaster.pid existed at step 3, it means either of:

(a) Previous postmaster did not stop cleanly and left postmaster.pid.
(b) Another postmaster is already running in the data directory (since 
before running pg_ctl -w start this time.)


But we can't distinguish between them. Then, we read ostmaster.pid again to 
judge the situation. Let it be pid-2 (pid in code).


if (old_pid != 0)
{
 pg_usleep(100);
 pid = get_pgpid();

6.If pid-1 != pid-2, it means that the situation (a) applies and the newly 
started postmaster overwrote old postmaster.pid. Then, try to connect to 
postmaster.


If pid-1 == pid-2, it means either of:

(a') Previous postmaster did not stop cleanly and left postmaster.pid. Newly 
started postmaster will complete startup, but hasn't overwritten 
postmaster.pid yet.
(b) Another postmaster is already running in the data directory (since 
before running pg_ctl -w start this time.)


The current comparison logic cannot distinguish between them. In my problem 
situation, situation a' happened, and pg_ctl mistakenly exited.


 if (pid == old_pid)
 {
  write_stderr(_(%s: could not start server\n
Examine the log output.\n),
  progname);
  exit(1);
 }

7.To distinguish between a' and b, check if pid-1 is alive. If pid-1 is 
alive, it means situation b. Otherwise, that is situation a'.


 if (pid == old_pid  postmaster_is_alive(old_pid))

However, the pid of newly started postmaster might match the one of old 
postmaster. To deal with that situation, it may be better to check the 
modified timestamp of postmaster.pid in addition.


What do you think?



If we had the postmaster's PID a priori, we could detect postmaster
death directly instead of having to make assumptions about how long
is reasonable to wait for the pidfile to appear.  The problem is that
we don't want to write a complete replacement for the shell's command
line parser and I/O redirection logic.  It doesn't look like a small
project.


Yes, I understand this. I don't think we can replace shell's various work.



(But maybe we could bypass that by doing a fork() and then having
the child exec() the shell, telling it to exec postmaster in turn?)


Possibly. I hope this works. Then, we can pass unnamed pipe file descriptors 
to postmaster via environment variables from the pg_ctl's forked child.




And of course Windows as usual makes things twice as hard, since we
couldn't make such a change unless start_postmaster could return the
proper PID in that case too.


Well, we can make start_postmaster() return the pid of the newly created 
postmaster. CreateProcess() sets the process handle in the structure passed 
to it. We can pass the process handle to WaitForSingleObject8) to know 
whether postmaster is alive.


Regards
MauMau


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unreliable pg_ctl -w start again

2012-01-27 Thread MauMau

From: Tom Lane t...@sss.pgh.pa.us

Well, feel free to increase that duration if you want.  The reason it's
there is to not wait for a long time if the postmaster falls over
instantly at startup, but in a non-interactive situation you might not
care.


Yes, just lengthening the wait duration causes unnecessary long wait when we 
run pg_ctl interactively. Therefore, the current wait approach is is not 
correct.




How about inserting postmaster_is_alive() as below?


Looks like complete nonsense to me, if the goal is to behave sanely when
postmaster.pid hasn't been created yet.  Where do you think get_pgpid
gets the PID from?


Yes, I understand that get_pgpid() gets the pid from postmaster.pid, which 
may be the pid of the previous postmaster that did not stop cleanly.


I think my simple fix makes sense to solve the problem as follows. Could you 
point out what might not be good?


1.The previous postmaster was terminated abruptly due to OS shutdown, 
machine failure, etc. leaving postmaster.pid.

2.Run pg_ctl -w start to start new postmaster.
3.do_start() of pg_ctl reads the pid of previously running postmaster from 
postmaster.pid. Say, let it be pid-1 (old_pid in code) here.


 old_pid = get_pgpid();

4.Anyway, try to start postmaster by calling start_postmaster().
5.If postmaster.pid existed at step 3, it means either of:

(a) Previous postmaster did not stop cleanly and left postmaster.pid.
(b) Another postmaster is already running in the data directory (since 
before running pg_ctl -w start this time.)


But we can't distinguish between them. Then, we read ostmaster.pid again to 
judge the situation. Let it be pid-2 (pid in code).


if (old_pid != 0)
{
 pg_usleep(100);
 pid = get_pgpid();

6.If pid-1 != pid-2, it means that the situation (a) applies and the newly 
started postmaster overwrote old postmaster.pid. Then, try to connect to 
postmaster.


If pid-1 == pid-2, it means either of:

(a') Previous postmaster did not stop cleanly and left postmaster.pid. Newly 
started postmaster will complete startup, but hasn't overwritten 
postmaster.pid yet.
(b) Another postmaster is already running in the data directory (since 
before running pg_ctl -w start this time.)


The current comparison logic cannot distinguish between them. In my problem 
situation, situation a' happened, and pg_ctl mistakenly exited.


 if (pid == old_pid)
 {
  write_stderr(_(%s: could not start server\n
Examine the log output.\n),
  progname);
  exit(1);
 }

7.To distinguish between a' and b, check if pid-1 is alive. If pid-1 is 
alive, it means situation b. Otherwise, that is situation a'.


 if (pid == old_pid  postmaster_is_alive(old_pid))

However, the pid of newly started postmaster might match the one of old 
postmaster. To deal with that situation, it may be better to check the 
modified timestamp of postmaster.pid in addition.


What do you think?



If we had the postmaster's PID a priori, we could detect postmaster
death directly instead of having to make assumptions about how long
is reasonable to wait for the pidfile to appear.  The problem is that
we don't want to write a complete replacement for the shell's command
line parser and I/O redirection logic.  It doesn't look like a small
project.


Yes, I understand this. I don't think we can replace shell's various work.



(But maybe we could bypass that by doing a fork() and then having
the child exec() the shell, telling it to exec postmaster in turn?)


Possibly. I hope this works. Then, we can pass unnamed pipe file descriptors 
to postmaster via environment variables from the pg_ctl's forked child.




And of course Windows as usual makes things twice as hard, since we
couldn't make such a change unless start_postmaster could return the
proper PID in that case too.


Well, we can make start_postmaster() return the pid of the newly created 
postmaster. CreateProcess() sets the process handle in the structure passed 
to it. We can pass the process handle to WaitForSingleObject8) to know 
whether postmaster is alive.


Regards
MauMau


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Confusing EXPLAIN output in case of inherited tables

2012-01-27 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 It's a feature, not a bug, that we schema-qualify names when VERBOSE
 is specified.  That was done on purpose for the benefit of external
 tools that might need this information to disambiguate which object is
 being referenced.

 Table *aliases*, of course, should not be schema-qualified, but I
 don't think that's what we're doing.  You could make it more clear by
 including an alias in the query, like this:

 explain verbose select * into table ramp from road hwy where name ~ '.*Ramp';

I think you are both focusing on the wrong thing.  There is a lot of
squishiness in what EXPLAIN prints out, since SQL notation is not always
well suited to what an execution plan actually does.  But this code has
a hard and fast requirement that it dump view definitions correctly,
else pg_dump doesn't work.  And after looking at this I think Ashutosh
has in fact found a bug.  Consider this example:

regression=# create schema s1;
CREATE SCHEMA
regression=# create schema s2;
CREATE SCHEMA
regression=# create table s1.t1 (f1 int);
CREATE TABLE
regression=# create table s2.t1 (f1 int);
CREATE TABLE
regression=# create view v1 as
regression-#   select * from s1.t1 where exists (
regression(# select 1 from s2.t1 where s2.t1.f1 = s1.t1.f1
regression(#   );
CREATE VIEW
regression=# \d+ v1
   View public.v1
 Column |  Type   | Modifiers | Storage | Description 
+-+---+-+-
 f1 | integer |   | plain   | 
View definition:
 SELECT t1.f1
   FROM s1.t1
  WHERE (EXISTS ( SELECT 1
   FROM s2.t1
  WHERE t1.f1 = s1.t1.f1));

regression=# alter table s2.t1 rename to tx;
ALTER TABLE
regression=# \d+ v1
   View public.v1
 Column |  Type   | Modifiers | Storage | Description 
+-+---+-+-
 f1 | integer |   | plain   | 
View definition:
 SELECT t1.f1
   FROM s1.t1
  WHERE (EXISTS ( SELECT 1
   FROM s2.tx t1
  WHERE t1.f1 = s1.t1.f1));

Both of the above displays of the view are formally correct, in that the
variables will be taken to refer to the correct upper or lower RTE.
But let's change that back and rename the other table:

regression=# alter table s2.tx rename to t1;
ALTER TABLE
regression=# alter table s1.t1 rename to tx;
ALTER TABLE
regression=# \d+ v1
   View public.v1
 Column |  Type   | Modifiers | Storage | Description 
+-+---+-+-
 f1 | integer |   | plain   | 
View definition:
 SELECT t1.f1
   FROM s1.tx t1
  WHERE (EXISTS ( SELECT 1
   FROM s2.t1
  WHERE t1.f1 = s1.t1.f1));

This is just plain wrong, as you'll see if you try to execute that
query:

regression=# SELECT t1.f1
regression-#FROM s1.tx t1
regression-#   WHERE (EXISTS ( SELECT 1
regression(#FROM s2.t1
regression(#   WHERE t1.f1 = s1.t1.f1));
ERROR:  invalid reference to FROM-clause entry for table t1
LINE 5:   WHERE t1.f1 = s1.t1.f1));
^
HINT:  There is an entry for table t1, but it cannot be referenced
from this part of the query.

(The HINT is a bit confused here, but the query is certainly invalid.)

So what we have here is a potential failure to dump and reload view
definitions, which is a lot more critical in my book than whether
EXPLAIN's output is confusing.

If we stick with the existing rule for attaching a fake alias to renamed
RTEs, I think that Ashutosh's patch or something like it is probably
appropriate, because the variable-printing code ought to be in step with
the RTE-printing code.  Unfortunately, I think the hack to attach a fake
alias to renamed RTEs creates some issues of its own.  Consider

select * from s1.t1
  where exists (select 1 from s2.t2 t1 where t1.f1 = s1.t1.f1);

If s1.t1 is now renamed to s1.tx, it is still possible to express
the same semantics:

select * from s1.tx
  where exists (select 1 from s2.t2 t1 where t1.f1 = s1.tx.f1);

But when we attach a fake alias, it's broken:

select * from s1.tx t1
  where exists (select 1 from s2.t2 t1 where t1.f1 = ?.f1);

There is no way to reference the outer RTE anymore from the subquery,
because the conflicting lower alias masks it.

We may be between a rock and a hard place though, because it's not that
hard to demonstrate cases where not adding a fake alias breaks it too:

select * from s1.t1 tx
  where exists (select 1 from s2.t1 where s2.t1.f1 = tx.f1);

If s2.t1 is renamed to s2.tx, there's no longer any way to reference the
upper alias tx, unless you alias the lower RTE to some different name.
I think that when we put in the fake-alias behavior, we made a value
judgment that this type of situation was more common than the other,
but I'm not really sure why.

Maybe what we need to do instead is create totally-made-up, unique
aliases when something like this 

Re: [HACKERS] Unreliable pg_ctl -w start again

2012-01-27 Thread Tom Lane
MauMau maumau...@gmail.com writes:
 From: Tom Lane t...@sss.pgh.pa.us
 Looks like complete nonsense to me, if the goal is to behave sanely when
 postmaster.pid hasn't been created yet.  Where do you think get_pgpid
 gets the PID from?

 Yes, I understand that get_pgpid() gets the pid from postmaster.pid, which 
 may be the pid of the previous postmaster that did not stop cleanly.
 [ convoluted reasoning about what to do if that's the case ]

I don't see any point in worrying about that case when you can't handle
the basic case that the postmaster hasn't created postmaster.pid yet.
In any case, this does nothing at all to answer the question you posed,
which was how long is it reasonable to wait for the postmaster to
produce a new postmaster.pid file.  We really need to know the PID of
the process we started in order to make any real improvement here.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Temp file missing during large pgbench data set

2012-01-27 Thread Tom Lane
Thom Brown t...@linux.com writes:
 I'm using latest git master (latest entry
 0816fad6eebddb8f1f0e21635e46625815d690b9) and I'm getting an error
 when trying to create a large data set with pgbench:

 LOG:  could not stat file base/pgsql_tmp/pgsql_tmp8056.0: Success
 STATEMENT:  alter table pgbench_accounts add primary key (aid)

I'll bet lunch this is due to some bug in commit
bc3347484a7bf9eddb98e4352d84599cae9a31c6

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cursors FOR UPDATE don't return most recent row

2012-01-27 Thread Tom Lane
Alvaro Herrera alvhe...@alvh.no-ip.org writes:
 I expected the FETCH to return one row, with the latest data, i.e.
 (1, 3), but instead it's returning empty.

This is the same thing I was complaining about in the bug #6123 thread,
http://archives.postgresql.org/message-id/9698.1327266...@sss.pgh.pa.us

It looks a bit ticklish to fix.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb and fsync

2012-01-27 Thread Noah Misch
On Fri, Jan 27, 2012 at 04:19:41PM -0800, Jeff Davis wrote:
 It looks like initdb doesn't fsync all the files it creates, e.g. the
 PG_VERSION file.
 
 While it's unlikely that it would cause any real data loss, it can be
 inconvenient in some testing scenarios involving VMs.
 
 Thoughts? Would a patch to add a few fsync calls to initdb be accepted?

+1.  If I'm piloting strace -f right, initdb currently issues *no* syncs.

We'd probably, then, want a way to re-disable the fsyncs for hacker benefit.

 Is a platform-independent fsync be available at initdb time?

Not sure.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unreliable pg_ctl -w start again

2012-01-27 Thread MauMau

From: Tom Lane t...@sss.pgh.pa.us

I don't see any point in worrying about that case when you can't handle
the basic case that the postmaster hasn't created postmaster.pid yet.
In any case, this does nothing at all to answer the question you posed,
which was how long is it reasonable to wait for the postmaster to
produce a new postmaster.pid file.  We really need to know the PID of
the process we started in order to make any real improvement here.



I'm sorry, but I may be missing the point of discussion. Let me clarify my 
point again.


1.To be honest, I wish a real solution that can eliminate the problem 
completely, for example, by using SIGCHLD approach Tom-san proposed before, 
or by using unnamed pipes I proposed last year. However, the real solution 
seems complicated to back-port to past versions.


2.As you said, we need any real improvement, if not a perfect solution, that 
can lower the possibility of the problem. Doing something which can somehow 
mitigate the problem is better than doing nothing and leaving the problem.


3.How long is it reasonable to wait for the postmaster to produce a new 
postmaster.pid file?

No duration is reasonable.

4.Then, what can we do to mitigate the problem in the past versions (8.3, 
8.4, 9.0)?
My very simple fix works well in the cases where old postmaster.pid file 
exists. But it does not make any improvement in the cases where old 
postmaster.pid is not left (i.e., running pg_ctl after clean shutdown; a 
basic case).

Am I missing anything? Does the fix result in degradation in some cases?


Otherwise, is Tom-san saying:

We will be able to solve the remaining problems completely if we can get 
the pid of postmaster started in pg_ctl. We thought that it was not easy to 
get the pid during 9.1 development. However, we might be able to get it more 
easily than we thought, such as by fork-execing and making the child process 
exec shell to make the new shell exec postmaster. So let's pursue the real 
solution here.


If so, I agree. But if we find the real solution difficult, I wish my 
non-perfect but somewhat effective solution will be accepted.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers