date:20120714

Re: [HACKERS] Use of rsync for data directory copying

2012-07-14 Thread Stephen Frost

Bruce,

* Bruce Momjian (br...@momjian.us) wrote:
> On Sat, Jul 14, 2012 at 09:17:22PM -0400, Stephen Frost wrote:
> > So, can you explain which case you're specifically worried about?
> 
> OK.  The basic problem is that I previously was not clear about how
> reliant our use of rsync (without --checksum) was on the presence of WAL
> replay.

We should only be relying on WAL replay for hot backups which used
pg_start/pg_stop_backup.

> Here is an example from our documentation that doesn't have WAL replay:
> 
>   http://www.postgresql.org/docs/9.2/static/backup-file.html
>   
>   Another option is to use rsync to perform a file system backup. This is
>   done by first running rsync while the database server is running, then
>   shutting down the database server just long enough to do a second rsync.
>   The second rsync will be much quicker than the first, because it has
>   relatively little data to transfer, and the end result will be
>   consistent because the server was down. This method allows a file system
>   backup to be performed with minimal downtime.

To be honest, this looks like a recommendation that might have been made
all the way back to before we had hot backups.  Technically speaking, it
should work fine to use the above method where the start/stop backup is
only done for the second rsync, if there's a reason to implement such a
system (perhaps the WALs grow too fast or too numerous for a full backup
with rsync between the start_backup and stop_backup?).

> Now, if a write happens in both the first and second half of a second,
> and only the first write is seen by the first rsync, I don't think the
> second rsync will see the write, and hence the backup will be
> inconsistent.

To be more specific, rsync relies on the combination of mtime and size
to tell if the file has been changed or not.  In contrast, cp --update
looks like it might only depend on mtime (from reading the cp man page
on a Debian system).

It seems like there could be an issue where PG is writing to a file, an
rsync comes along and copies the file, and then PG writes to that same
file again, after the rsync is done, but within the same second.  If the
file size isn't changed by that write, a later rsync might feel that it
isn't necessary to check if the file contents changed.  I have to say
that I don't believe I'v ever seen that happen though.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Use of rsync for data directory copying

2012-07-14 Thread Bruce Momjian

On Sat, Jul 14, 2012 at 09:17:22PM -0400, Stephen Frost wrote:
> Bruce,
> 
> * Bruce Momjian (br...@momjian.us) wrote:
> > If two writes happens in the middle of a file in the same second, it
> > seems one might be missed.  Yes, I suppose the WAL does fix that during
> > replay, though if both servers were shut down cleanly, WAL would not be
> > replayed.
> > 
> > If you using it for a hot backup, and WAL would clean that up.
> 
> Right...  If it's hot backup, then WAL will fix it; if it's done after a
> clean shut-down, nothing should be writing to those files (much less
> multiple writes in the same second), so checksum shouldn't be
> necessary...
> 
> If you're doing rsync w/o doing pg_start_backup/pg_stop_backup, that's
> not likely to work even *with* --checksum..
> 
> So, can you explain which case you're specifically worried about?

OK.  The basic problem is that I previously was not clear about how
reliant our use of rsync (without --checksum) was on the presence of WAL
replay.

Here is an example from our documentation that doesn't have WAL replay:

http://www.postgresql.org/docs/9.2/static/backup-file.html

Another option is to use rsync to perform a file system backup. This is
done by first running rsync while the database server is running, then
shutting down the database server just long enough to do a second rsync.
The second rsync will be much quicker than the first, because it has
relatively little data to transfer, and the end result will be
consistent because the server was down. This method allows a file system
backup to be performed with minimal downtime.

Now, if a write happens in both the first and second half of a second,
and only the first write is seen by the first rsync, I don't think the
second rsync will see the write, and hence the backup will be
inconsistent.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Schema version management

2012-07-14 Thread Joel Jacobson

On Sat, Jul 14, 2012 at 12:34 PM, Joel Jacobson  wrote:
> On Sat, Jul 14, 2012 at 11:25 AM, Peter Eisentraut  wrote:
>> Well, of course everyone uses directories in moderation.  But you might
>> want to take a look at the gcc source code.  You'll love it. ;-)

[505][joel.Joel-Jacobsons-iMac-2: /Users/joel/gcc]$ find . -type d | wc -l
   41895
[506][joel.Joel-Jacobsons-iMac-2: /Users/joel/gcc]$ find . -type f | wc -l
  167183
[507][joel.Joel-Jacobsons-iMac-2: /Users/joel/gcc]$

Not that bad actually, only 4 files per directory on average.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Use of rsync for data directory copying

2012-07-14 Thread Stephen Frost

Bruce,

* Bruce Momjian (br...@momjian.us) wrote:
> If two writes happens in the middle of a file in the same second, it
> seems one might be missed.  Yes, I suppose the WAL does fix that during
> replay, though if both servers were shut down cleanly, WAL would not be
> replayed.
> 
> If you using it for a hot backup, and WAL would clean that up.

Right...  If it's hot backup, then WAL will fix it; if it's done after a
clean shut-down, nothing should be writing to those files (much less
multiple writes in the same second), so checksum shouldn't be
necessary...

If you're doing rsync w/o doing pg_start_backup/pg_stop_backup, that's
not likely to work even *with* --checksum..

So, can you explain which case you're specifically worried about?

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [PERFORM] DELETE vs TRUNCATE explanation

2012-07-14 Thread Jeff Janes

On Thu, Jul 12, 2012 at 9:55 PM, Jeff Janes  wrote:
> I've moved this thread from performance to hackers.
>
> The topic was poor performance when truncating lots of small tables
> repeatedly on test environments with fsync=off.
>
> On Thu, Jul 12, 2012 at 6:00 PM, Jeff Janes  wrote:
>
>> I think the problem is in the Fsync Absorption queue.  Every truncate
>> adds a FORGET_RELATION_FSYNC to the queue, and processing each one of
>> those leads to sequential scanning the checkpointer's pending ops hash
>> table, which is quite large.  It is almost entirely full of other
>> requests which have already been canceled, but it still has to dig
>> through them all.   So this is essentially an N^2 operation.
...
>
>> I'm not sure why we don't just delete the entry instead of marking it
>> as cancelled.  It looks like the only problem is that you can't delete
>> an entry other than the one just returned by hash_seq_search.  Which
>> would be fine, as that is the entry that we would want to delete;
>> except that mdsync might have a different hash_seq_search open, and so
>> it wouldn't be safe to delete.

The attached patch addresses this problem by deleting the entry when
it is safe to do so, and flagging it as canceled otherwise.

I thought of using has_seq_scans to determine when it is safe, but
dynahash.c does not make that function public, and I was afraid it
might be too slow, anyway.

So instead I used a static variable, plus the knowledge that the only
time there are two scans on the table is when mdsync starts one and
then calls RememberFsyncRequest indirectly.  There is one other place
that does a seq scan, but there is no way for control to pass from
that loop to reach RememberFsyncRequest.

I've added code to disclaim the scan if mdsync errors out.  I don't
think that this should a problem because at that point the scan object
is never going to be used again, so if its internal state gets screwed
up it shouldn't matter.  However, I wonder if it should also call
hash_seq_term, otherwise the pending ops table will be permanently
prevented from expanding (this is a pre-existing condition, not to do
with my patch).  Since I don't know what can make mdsync error out
without being catastrophic, I don't know how to test this out.

One concern is that if the ops table ever does become bloated, it can
never recover while under load.  The bloated table will cause mdsync
to take a long time to run, and as long as mdsync is in the call stack
the antibloat feature is defeated--so we have crossed a tipping point
and cannot get back.  I don't see that occurring in the current use
case, however.  With my current benchmark, the anti-bloat is effective
enough that mdsync never takes very long to execute, so a virtuous
circle exists.

As an aside, the comments in dynahash.c seem to suggest that one can
always delete the entry returned by hash_seq_search, regardless of the
existence of other sequential searches.  I'm pretty sure that this is
not true.  Also, shouldn't this contract about when one is allowed to
delete entries be in the hsearch.h file, rather than the dynahash.c
file?

Also, I still wonder if it is worth memorizing fsyncs (under
fsync=off) that may or may not ever take place.  Is there any
guarantee that we can make by doing so, that couldn't be made
otherwise?

Cheers,

Jeff

FsyncRequest_delete_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous Standalone Master Redoux

2012-07-14 Thread Josh Berkus

So, here's the core issue with degraded mode.  I'm not mentioning this
to block any patch anyone has, but rather out of a desire to see someone
address this core problem with some clever idea I've not thought of.
The problem in a nutshell is: indeterminancy.

Assume someone implements degraded mode.  Then:

1. Master has one synchronous standby, Standby1, and two asynchronous,
Standby2 and Standby3.

2. Standby1 develops a NIC problem and is in and out of contact with
Master.  As a result, it's flipping in and out of synchronous / degraded
mode.

3. Master fails catastrophically due to a RAID card meltdown.  All data
lost.

At this point, the DBA is in kind of a pickle, because he doesn't know:

(a) Was Standby1 in synchronous or degraded mode when Master died?  The
only log for that was on Master, which is now gone.

(b) Is Standby1 actually the most caught up standby, and thus the
appropriate new master for Standby2 and Standby3, or is it behind?

With the current functionality of Synchronous Replication, you don't
have either piece of indeterminancy, because some external management
process (hopefully located on another server) needs to disable
synchronous replication when Standby1 develops its problem.  That is, if
the master is accepting synchronous transactions at all, you know that
Standby1 is up-to-date, and no data is lost.

While you can answer (b) by checking all servers, (a) is particularly
pernicious, because unless you have the application log all "operating
in degraded mode" messages, there is no way to ever determine the truth.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] elog/ereport noreturn decoration

2012-07-14 Thread Tom Lane

Peter Eisentraut  writes:
> A small sidetrack here.  I've managed to set up the Solaris Studio
> compiler on Linux and tried this out.  It turns out these "statement not
> reached" warnings have nothing to do with knowledge about library
> functions such as abort() or exit() not returning.  The warnings come
> mostly from loops that never end (except by returning from the function)
> and some other more silly cases where the supposed fallback return
> statement is clearly unnecessary.  I think these should be fixed,
> because the code is wrong and could mask real errors if someone ever
> wanted to rearrange those loops or something.

> Patch attached.  I tried this out with old and new versions of gcc,
> clang, and the Solaris compiler, and everyone was happy about.  I didn't
> touch the regex code.  And there's some archeological knowledge about
> Perl in there.

Hm.  I seem to recall that at least some of these lines were themselves
put in to suppress compiler warnings.  So we may just be moving the
warnings from one set of non-mainstream compilers to another set.
Still, we might as well try it and see if there's a net reduction.
(In some places we might need to tweak the code a bit harder than this,
to make it clearer that the unreachable spot is unreachable.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] elog/ereport noreturn decoration

2012-07-14 Thread Peter Eisentraut

On lör, 2012-06-30 at 10:52 -0400, Tom Lane wrote:
> Peter Eisentraut  writes:
> > But my point was, there aren't any unused code warnings.  None of the
> > commonly used compilers issue any.  I'd be interested to know if there
> > is any recent C compiler supported by PostgreSQL that issues some kind
> > of unused code warning under any circumstances, and see an example of
> > that.  Then we can determine whether there is an issue here.
> 
> Well, the Solaris Studio compiler produces "warning: statement not
> reached" messages, as seen for example on buildfarm member castoroides.
> I don't have one available to experiment with, so I'm not sure whether
> it knows that abort() doesn't return; but I think it rather foolish to
> assume that such a combination doesn't exist in the wild.

A small sidetrack here.  I've managed to set up the Solaris Studio
compiler on Linux and tried this out.  It turns out these "statement not
reached" warnings have nothing to do with knowledge about library
functions such as abort() or exit() not returning.  The warnings come
mostly from loops that never end (except by returning from the function)
and some other more silly cases where the supposed fallback return
statement is clearly unnecessary.  I think these should be fixed,
because the code is wrong and could mask real errors if someone ever
wanted to rearrange those loops or something.

Patch attached.  I tried this out with old and new versions of gcc,
clang, and the Solaris compiler, and everyone was happy about.  I didn't
touch the regex code.  And there's some archeological knowledge about
Perl in there.

The Solaris compiler does not, by the way, complain about the elog patch
I had proposed.

diff --git a/contrib/hstore/hstore_io.c b/contrib/hstore/hstore_io.c
index aadf050..7bdac3d 100644
--- a/contrib/hstore/hstore_io.c
+++ b/contrib/hstore/hstore_io.c
@@ -163,8 +163,6 @@
 
 		state->ptr++;
 	}
-
-	return false;
 }
 
 #define WKEY	0
diff --git a/contrib/intarray/_int_bool.c b/contrib/intarray/_int_bool.c
index dfb113a..d0572af 100644
--- a/contrib/intarray/_int_bool.c
+++ b/contrib/intarray/_int_bool.c
@@ -136,7 +136,6 @@
 		}
 		(state->buf)++;
 	}
-	return END;
 }
 
 /*
@@ -301,7 +300,6 @@
 		else
 			return execute(curitem - 1, checkval, calcnot, chkcond);
 	}
-	return false;
 }
 
 /*
@@ -404,7 +402,6 @@
 		else
 			return false;
 	}
-	return false;
 }
 
 bool
diff --git a/contrib/intarray/_int_gist.c b/contrib/intarray/_int_gist.c
index e429c8b..60de393 100644
--- a/contrib/intarray/_int_gist.c
+++ b/contrib/intarray/_int_gist.c
@@ -217,8 +217,6 @@
 	}
 	else
 		PG_RETURN_POINTER(entry);
-
-	PG_RETURN_POINTER(entry);
 }
 
 Datum
diff --git a/contrib/ltree/ltxtquery_io.c b/contrib/ltree/ltxtquery_io.c
index c2e532c..583ff2a 100644
--- a/contrib/ltree/ltxtquery_io.c
+++ b/contrib/ltree/ltxtquery_io.c
@@ -139,7 +139,6 @@
 
 		state->buf += charlen;
 	}
-	return END;
 }
 
 /*
diff --git a/contrib/ltree/ltxtquery_op.c b/contrib/ltree/ltxtquery_op.c
index bedbe24..64f9d21 100644
--- a/contrib/ltree/ltxtquery_op.c
+++ b/contrib/ltree/ltxtquery_op.c
@@ -40,7 +40,6 @@
 		else
 			return ltree_execute(curitem + 1, checkval, calcnot, chkcond);
 	}
-	return false;
 }
 
 typedef struct
diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 82ac53e..3efdedd 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -146,9 +146,6 @@
 			stack->predictNumber = 1;
 		}
 	}
-
-	/* keep compiler happy */
-	return NULL;
 }
 
 void
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index 022bd27..5702259 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -354,8 +354,6 @@
 		 */
 		stack->off++;
 	}
-
-	return true;
 }
 
 /*
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index c790ad6..2253e7c 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -535,8 +535,6 @@
 			} while (so->nPageData == 0);
 		}
 	}
-
-	PG_RETURN_BOOL(false);		/* keep compiler quiet */
 }
 
 /*
diff --git a/src/backend/executor/nodeGroup.c b/src/backend/executor/nodeGroup.c
index 80e282b..a8a1fe6 100644
--- a/src/backend/executor/nodeGroup.c
+++ b/src/backend/executor/nodeGroup.c
@@ -184,9 +184,6 @@
 		else
 			InstrCountFiltered1(node, 1);
 	}
-
-	/* NOTREACHED */
-	return NULL;
 }
 
 /* -
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index e0ab599..0d66dab 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -201,9 +201,9 @@
 {
 #ifdef USE_SSL
 	return ssl_loaded_verify_locations;
-#endif
-
+#else
 	return false;
+#endif
 }
 
 /*
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index c927747..d96b7a7 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -233,9 +233,6 @@ static void AddBufferToRing(BufferAc

[HACKERS] Getting rid of pre-assignment of index names in CREATE TABLE LIKE

2012-07-14 Thread Tom Lane

In bug #6734 we have a complaint about a longstanding misfeature of
CREATE TABLE LIKE.  Ordinarily, this command doesn't select names for
copied indexes, but leaves that to be done at runtime by DefineIndex.
But if it's copying comments, and an index of the source table has a
comment, it's forced to preassign a name to the new index so that it can
build a CommentStmt that can apply the comment after the index is made.
Apart from being something of a modularity violation, this isn't very safe
because of the danger of name collision with earlier indexes for the new
table.  And that's exactly what's happening in bug #6734.

I suggested that we could dodge the problem by allowing IndexStmt to
carry a comment to be attached to the new index, and thereby avoid
needing an explicit COMMENT command.  Attached is a patch that fixes it
that way.

While I was at it, it seemed like DefineIndex's parameter list had grown
well past any sane bound, so I refactored it to pass the IndexStmt
struct as-is rather than passing all the fields individually.

With or without that choice, though, this approach means a change in
DefineIndex's API, as well as the contents of struct IndexStmt.  That
means it's probably unsafe to back-patch, since it seems plausible that
there might be third-party code out there that creates indexes and would
use these interfaces.

I would like to sneak this fix into 9.2, though.  Does anyone think
it's already too late to be touching these APIs for 9.2?

regards, tom lane

diff --git a/src/backend/bootstrap/bootparse.y b/src/backend/bootstrap/bootparse.y
index 18f0add852e7832739e3877811e385abcb540fab..ec634f1660e0be883b451abbb380d6dc30e69b93 100644
*** a/src/backend/bootstrap/bootparse.y
--- b/src/backend/bootstrap/bootparse.y
*** Boot_InsertStmt:
*** 279,296 
  Boot_DeclareIndexStmt:
  		  XDECLARE INDEX boot_ident oidspec ON boot_ident USING boot_ident LPAREN boot_index_params RPAREN
  {
  	do_start();
  
! 	DefineIndex(makeRangeVar(NULL, $6, -1),
! $3,
  $4,
! InvalidOid,
! $8,
! NULL,
! $10,
! NULL, NIL, NIL,
! false, false, false, false, false,
! false, false, true, false, false);
  	do_end();
  }
  		;
--- 279,312 
  Boot_DeclareIndexStmt:
  		  XDECLARE INDEX boot_ident oidspec ON boot_ident USING boot_ident LPAREN boot_index_params RPAREN
  {
+ 	IndexStmt *stmt = makeNode(IndexStmt);
+ 
  	do_start();
  
! 	stmt->idxname = $3;
! 	stmt->relation = makeRangeVar(NULL, $6, -1);
! 	stmt->accessMethod = $8;
! 	stmt->tableSpace = NULL;
! 	stmt->indexParams = $10;
! 	stmt->options = NIL;
! 	stmt->whereClause = NULL;
! 	stmt->excludeOpNames = NIL;
! 	stmt->idxcomment = NULL;
! 	stmt->indexOid = InvalidOid;
! 	stmt->oldNode = InvalidOid;
! 	stmt->unique = false;
! 	stmt->primary = false;
! 	stmt->isconstraint = false;
! 	stmt->deferrable = false;
! 	stmt->initdeferred = false;
! 	stmt->concurrent = false;
! 
! 	DefineIndex(stmt,
  $4,
! false,
! false,
! true, /* skip_build */
! false);
  	do_end();
  }
  		;
*** Boot_DeclareIndexStmt:
*** 298,315 
  Boot_DeclareUniqueIndexStmt:
  		  XDECLARE UNIQUE INDEX boot_ident oidspec ON boot_ident USING boot_ident LPAREN boot_index_params RPAREN
  {
  	do_start();
  
! 	DefineIndex(makeRangeVar(NULL, $7, -1),
! $4,
  $5,
! InvalidOid,
! $9,
! NULL,
! $11,
! NULL, NIL, NIL,
! true, false, false, false, false,
! false, false, true, false, false);
  	do_end();
  }
  		;
--- 314,347 
  Boot_DeclareUniqueIndexStmt:
  		  XDECLARE UNIQUE INDEX boot_ident oidspec ON boot_ident USING boot_ident LPAREN boot_index_params RPAREN
  {
+ 	IndexStmt *stmt = makeNode(IndexStmt);
+ 
  	do_start();
  
! 	stmt->idxname = $4;
! 	stmt->relation = makeRangeVar(NULL, $7, -1);
! 	stmt->accessMethod = $9;
! 	stmt->tableSpace = NULL;
! 	stmt->indexParams = $11;
! 	stmt->options = NIL;
! 	stmt->whereClause = NULL;
! 	stmt->excludeOpNames = NIL;
! 	stmt->idxcomment = NULL;
! 	stmt->indexOid = InvalidOid;
! 	stmt->oldNode = InvalidOid;
! 	stmt->unique = true;
! 	stmt->primary = false;
! 	stmt->isconstraint = false;
! 	stmt->deferrable = false;
! 	stmt->initdeferred = false;
! 	stmt->concurrent = false;
! 
! 	DefineIndex(stmt,
  $5,
! false,
! false,
! true, /* skip_build */
! false);
  	do_end();
  }
  		;
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index e5d1d8b699ad6d37cd3dd9e19b1d76ba97c01eaa..f315ab60154d6c6aa3c96f05babb8e79945f70b8 100644
*** a/src/backend/commands/indexcmds.c
--- b/src/backend/commands/indexcmds.c
***
*** 24,29 
-

[HACKERS] Re: Reporting WAL file containing checkpoint's REDO record in pg_controldata's result

2012-07-14 Thread Heikki Linnakangas


On 26.06.2012 18:06, Fujii Masao wrote:

On Wed, Mar 28, 2012 at 10:08 AM, Fujii Masao  wrote:

On Wed, Mar 28, 2012 at 12:30 AM, Alvaro Herrera
  wrote:


Excerpts from Fujii Masao's message of mar mar 27 06:40:34 -0300 2012:


Anyway, should I add this patch into the next CF? Or is anyone planning
to commit the patch for 9.2?


I think the correct thing to do here is add to next CF, and if some
committer has enough interest in getting it quickly it can be grabbed
from there and committed into 9.2.


Yep! I've added it to next CF.


Attached is the revised version of the patch.


Looks good to me, committed.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sign mismatch in walreceiver.c

2012-07-14 Thread Heikki Linnakangas


On 12.07.2012 23:57, Peter Eisentraut wrote:

This looks suspicious

static TimeLineID   recvFileTLI = -1;

because TimeLineID is uint32.  The Solaris compiler complains about the
sign mismatch.

Maybe 0 would be a better initial value?


Agreed, fixed, thanks.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for XLogRecPtr in expand_fmt_string?

2012-07-14 Thread Robert Haas

On Jul 13, 2012, at 2:34 PM, Peter Eisentraut  wrote:
> I would rather get rid of this %X/%X notation.  I know we have all grown
> to like it, but it's always been a workaround.  We're now making the
> move to simplify this whole business by saying, the WAL location is an
> unsigned 64-bit number -- which everyone can understand -- but then why
> is it printed in some funny format?

We should take care that whatever format we pick can be easily matched to a WAL 
file name.  So a 64-bit number printed as 16 hex digits would perhaps be OK, 
but a 64-bit number printed in base 10 would be a large usability regression.

Personally, I'm not convinced we should change anything at all.  It's not that 
easy to visually parse a string of many digits; a little punctuation in the 
middle is not a bad thing.

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.

2012-07-14 Thread Jan Urbański


On 13/07/12 13:38, Jan Urbański wrote:

On 12/07/12 11:08, Heikki Linnakangas wrote:

On 07.07.2012 00:12, Jan Urbański wrote:

So you're in favour of doing unicode -> bytes by encoding with UTF-8 and
then using the server's encoding functions?


Sounds reasonable to me. The extra conversion between UTF-8 and UCS-2
should be quite fast, and it would be good to be consistent in the way
we do conversions in both directions.



I'll implement that than (sorry for not following up on that eariler).


Here's a patch that always encodes Python unicode objects using UTF-8 
and then uses Postgres's internal functions to produce bytes in the 
server encoding.


Cheers,
Jan
diff --git a/src/pl/plpython/plpy_elog.c b/src/pl/plpython/plpy_elog.c
new file mode 100644
index c375ac0..c2b3cb8
*** a/src/pl/plpython/plpy_elog.c
--- b/src/pl/plpython/plpy_elog.c
*** static char *get_source_line(const char
*** 28,33 
--- 28,41 
  
  
  /*
+  * Guard agains re-entrant calls to PLy_traceback, which can happen if
+  * traceback formatting functions raise Python errors.
+  */
+ #define TRACEBACK_RECURSION_LIMIT 2
+ static int	recursion_depth = 0;
+ 
+ 
+ /*
   * Emit a PG error or notice, together with any available info about
   * the current Python error, previously set by PLy_exception_set().
   * This should be used to propagate Python errors into PG.	If fmt is
*** PLy_traceback(char **xmsg, char **tbmsg,
*** 147,166 
  	StringInfoData xstr;
  	StringInfoData tbstr;
  
  	/*
  	 * get the current exception
  	 */
  	PyErr_Fetch(&e, &v, &tb);
  
  	/*
! 	 * oops, no exception, return
  	 */
! 	if (e == NULL)
  	{
  		*xmsg = NULL;
  		*tbmsg = NULL;
  		*tb_depth = 0;
  
  		return;
  	}
  
--- 155,177 
  	StringInfoData xstr;
  	StringInfoData tbstr;
  
+ 	recursion_depth++;
+ 
  	/*
  	 * get the current exception
  	 */
  	PyErr_Fetch(&e, &v, &tb);
  
  	/*
! 	 * oops, no exception or recursion depth exceeded, return
  	 */
! 	if (e == NULL || recursion_depth > TRACEBACK_RECURSION_LIMIT)
  	{
  		*xmsg = NULL;
  		*tbmsg = NULL;
  		*tb_depth = 0;
  
+ 		recursion_depth--;
  		return;
  	}
  
*** PLy_traceback(char **xmsg, char **tbmsg,
*** 326,331 
--- 337,344 
  		(*tb_depth)++;
  	}
  
+ 	recursion_depth--;
+ 
  	/* Return the traceback. */
  	*tbmsg = tbstr.data;
  
diff --git a/src/pl/plpython/plpy_util.c b/src/pl/plpython/plpy_util.c
new file mode 100644
index 4aabafc..fe43312
*** a/src/pl/plpython/plpy_util.c
--- b/src/pl/plpython/plpy_util.c
*** PLy_free(void *ptr)
*** 61,126 
  PyObject *
  PLyUnicode_Bytes(PyObject *unicode)
  {
! 	PyObject   *rv;
! 	const char *serverenc;
  
! 	/*
! 	 * Map PostgreSQL encoding to a Python encoding name.
! 	 */
! 	switch (GetDatabaseEncoding())
  	{
! 		case PG_SQL_ASCII:
! 			/*
! 			 * Mapping SQL_ASCII to Python's 'ascii' is a bit bogus. Python's
! 			 * 'ascii' means true 7-bit only ASCII, while PostgreSQL's
! 			 * SQL_ASCII means that anything is allowed, and the system doesn't
! 			 * try to interpret the bytes in any way. But not sure what else
! 			 * to do, and we haven't heard any complaints...
! 			 */
! 			serverenc = "ascii";
! 			break;
! 		case PG_WIN1250:
! 			serverenc = "cp1250";
! 			break;
! 		case PG_WIN1251:
! 			serverenc = "cp1251";
! 			break;
! 		case PG_WIN1252:
! 			serverenc = "cp1252";
! 			break;
! 		case PG_WIN1253:
! 			serverenc = "cp1253";
! 			break;
! 		case PG_WIN1254:
! 			serverenc = "cp1254";
! 			break;
! 		case PG_WIN1255:
! 			serverenc = "cp1255";
! 			break;
! 		case PG_WIN1256:
! 			serverenc = "cp1256";
! 			break;
! 		case PG_WIN1257:
! 			serverenc = "cp1257";
! 			break;
! 		case PG_WIN1258:
! 			serverenc = "cp1258";
! 			break;
! 		case PG_WIN866:
! 			serverenc = "cp866";
! 			break;
! 		case PG_WIN874:
! 			serverenc = "cp874";
! 			break;
! 		default:
! 			/* Other encodings have the same name in Python. */
! 			serverenc = GetDatabaseEncodingName();
! 			break;
  	}
  
! 	rv = PyUnicode_AsEncodedString(unicode, serverenc, "strict");
! 	if (rv == NULL)
! 		PLy_elog(ERROR, "could not convert Python Unicode object to PostgreSQL server encoding");
  	return rv;
  }
  
--- 61,96 
  PyObject *
  PLyUnicode_Bytes(PyObject *unicode)
  {
! 	PyObject	*bytes, *rv;
! 	char		*utf8string, *encoded;
  
! 	/* first encode the Python unicode object with UTF-8 */
! 	bytes = PyUnicode_AsUTF8String(unicode);
! 	if (bytes == NULL)
! 		PLy_elog(ERROR, "could not convert Python Unicode object to bytes");
! 
! 	utf8string = PyBytes_AsString(bytes);
! 	if (utf8string == NULL) {
! 		Py_DECREF(bytes);
! 		PLy_elog(ERROR, "could not extract bytes from encoded string");
! 	}
! 
! 	/* then convert to server encoding */
! 	encoded = (char *) pg_do_encoding_conversion((unsigned char *) utf8string,
!  strlen(utf8string),
!  PG_UTF8,
!  GetDatabaseEncoding());
! 
! 	/* finally, build a bytes object in the server encoding */
! 	rv = PyBytes_Fr

Re: [HACKERS] Synchronous Standalone Master Redoux

2012-07-14 Thread Jose Ildefonso Camargo Tolosa

On Sat, Jul 14, 2012 at 12:42 AM, Amit kapila  wrote:
>> From: Jose Ildefonso Camargo Tolosa [ildefonso.cama...@gmail.com]
>> Sent: Saturday, July 14, 2012 9:36 AM
>>On Fri, Jul 13, 2012 at 11:12 PM, Amit kapila  wrote:
>> From: pgsql-hackers-ow...@postgresql.org 
>> [pgsql-hackers-ow...@postgresql.org] on behalf of Jose Ildefonso Camargo 
>> Tolosa [ildefonso.cama...@gmail.com]
>> Sent: Saturday, July 14, 2012 6:08 AM
>> On Fri, Jul 13, 2012 at 10:22 AM, Bruce Momjian  wrote:
>>> On Fri, Jul 13, 2012 at 09:12:56AM +0200, Hampus Wessman wrote:
>>>
> So how about this for a Postgres TODO:
>
> Add configuration variable to allow Postgres to disable 
> synchronous
>replication after a specified timeout, and add variable to alert
> administrators of the change.
>>
 I agree we need a TODO for this, but... I think timeout-only is not
 the best choice, there should be a maximum timeout (as a last
 resource: the maximum time we are willing to wait for standby, this
 have to have the option of "forever"), but certainly PostgreSQL have
 to detect the *complete* disconnection of the standby (or all standbys
 on the synchronous_standby_names), if it detects that no standbys are
 eligible for sync standby AND the option to do fallback to async is
 enabled = it will go into standalone mode (as if
 synchronous_standby_names were empty), otherwise (if option is
 disabled) it will just continue to wait for ever (the "last resource"
 timeout is ignored if the fallback option is disabled) I would
 call this "soft_synchronous_standby", and
 "soft_synchronous_standby_timeout" (in seconds, 0=forever, a sane
 value would be ~5 seconds) or something like that (I'm quite bad at
 picking names :( ).
>>
>> >After it has gone to standalone mode, if the standby came back will it be 
>> >able to return back to sync mode with it.
>
>> That's the idea, yes, after the standby comes back, the master would
>> act as if the sync standby connected for the first time: first going
>> through the "catchup" mode, and "once the lag between standby and
>> primary reaches zero "(...)" we move to real-time streaming state"
>> (from 9.1 docs), at that point: normal sync behavior is restored.
>
> Idea wise, it looks okay, but are you sure that in the current code/design, 
> it can handle the way you are suggesting.
> I am not sure it can work because it might be the case that due to network 
> instability, the master has gone in standalone mode
> and now after standy is able to communicate back, it might be expecting to 
> get more data rather than go in cacthup mode.
> I believe some person who is expert of this code area can comment here to 
> make it more concrete.

Well, I'd need to dive into the code, but as far as I know, is the
master who decides to be on "catchup" mode, and standby just takes
care of sending feedback to master.  Also, it has to handle the
situation, because currently, if master goes away because it crashed,
or because of network issues, the standby doesn't really know why, and
will reconnect to master and do whatever it needs to do to get in sync
with master again (be it: try to reconnect several times while master
is restarting, or that it just reconnect to a waiting master, and
request pending WAL segments).  There have to be code in place to
handle those issues, because it is already working.  I'm trying to get
a solution that is as non-intrusive as possible, with lower amount of
code added, so that performance doesn't suffer by reusing current
logic and actions, with small alterations.

>
> With Regards,
> Amit Kapila.



--
Ildefonso Camargo
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Allow breaking out of hung connection attempts

2012-07-14 Thread Heikki Linnakangas


On 13.07.2012 14:35, Ryan Kelly wrote:

On Mon, Jul 09, 2012 at 05:35:15PM +0900, Shigeru HANADA wrote:

On Mon, Jun 25, 2012 at 9:00 PM, Ryan Kelly  wrote:

- Copying only "select()" part of pqSocketPoll seems not enough,
should we use poll(2) if it is supported?

I did not think the additional complexity was worth it in this case.
Unless you see some reason to use poll(2) that I do not.


I checked where select() is used in PG, and noticed that poll is used at
only few places.  Agreed to use only select() here.


poll() potentially performs better, but that hardly matters here, so 
select() is ok.


However, on some platforms, signals don't interrupt select(). On such 
platforms, hitting CTRL-C will still not interrupt the connection 
attempt. Also there's a race condition: if you hit CTRL-C just after 
checking that the global variable is not set, but before entering 
select(), the select() will again not be interrupted (until the user 
gets frustrated and hits CTRL-C again).


I think we need to sleep one second at a time, and check the global 
variable on every iteration, until the timeout is reached. That's what 
we do in non-critical sleep loops like this in the backend. We've 
actually been trying to get rid of such polling in the backend code 
lately, to reduce the number of unnecessary interrupts which consume 
power on an otherwise idle system, but I think that technique would 
still be OK for psql's connection code.



Once the issues above are fixed, IMO this patch can be marked as "Ready
for committer".

I have also added additional documentation reflecting Heikki's
suggestion that PQconnectTimeout be recommended for use by applications
using the async API.

Attached is v6 of the patch.


Thanks!

With this patch, any connect_timeout parameter given in the original 
connection info string is copied to the \connect command. That sounds 
reasonable at first glance, but I don't think it's quite right:


First of all, if you give a new connect_timeout string directly in the 
\connect command, the old connect_timeout still overrides the new one:


$ ./psql postgres://localhost/postgres?connect_timeout=3
psql (9.3devel)
Type "help" for help.

postgres=# \connect postgres://192.168.1.123/postgres?connect_timeout=1000

timeout expired
Previous connection kept

Secondly, the docs say that the new connection only inherits a few 
explicitly listed options from the old connection: dbname, username, 
host and port. If you add connect_timeout to that list, at least it 
requires a documentation update. But I don't think it should be 
inherited in the first place. Or if it should, what about other options, 
like application_name? Surely they should then be inherited too. All in 
all I think we should just leave out the changes to \connect, and not 
inherit connect_timeout from the old connection. If we want to change 
the behavior of all options, that's separate discussion and a separate 
patch.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Schema version management

2012-07-14 Thread Joel Jacobson

On Sat, Jul 14, 2012 at 11:25 AM, Peter Eisentraut  wrote:
> Well, of course everyone uses directories in moderation.  But you might
> want to take a look at the gcc source code.  You'll love it. ;-)

Yes, but GCC was also created by someone who picks stuff from his bare
foot and eats it. ;-)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_prewarm

2012-07-14 Thread Cédric Villemain

> c) isn't necessarily safe in production (I've crashed Linux with Fincore
> in the recent past).

fincore is another soft, please provide a bugreport if you hit issue with 
pgfincore, I then be able to fix it and all can benefit.

-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


signature.asc
Description: This is a digitally signed message part.

Re: [HACKERS] Schema version management

2012-07-14 Thread Peter Eisentraut

On lör, 2012-07-14 at 10:41 +0200, Joel Jacobson wrote:
> On Fri, Jul 13, 2012 at 9:41 PM, Peter Eisentraut  wrote:
> 
> > Personally, I hate this proposed nested directory structure.  I would
> > like to have all objects in one directory.
> >
> > But there is a lot of "personally" in this thread, of course.
> 
> 
> Why do you hate it?
> 
> It's a bit like saying,
>  - I hate database normalization, better to keep all rows in one single
> table.
> or even,
>  - I hate directories.

To a certain extent, yes, I hate (excessive use of) directories.

> I have thousands of objects, it would be a total mess to keep them all in a
> single directory.

Thousands of objects could be a problem, in terms of how the typical
file system tools scale.  But hundreds or a few thousand not
necessarily.  It's easy to browse, filter, and sort using common tools,
for example.

> Using a normalized directory structure makes sense for the SCM use-case,

If there is a theory of "normalization" for hierarchical databases, I
don't know it but would like to learn about it.

> I haven't seen any projects where all the files are kept in one directory.

Well, of course everyone uses directories in moderation.  But you might
want to take a look at the gcc source code.  You'll love it. ;-)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Schema version management

2012-07-14 Thread Joel Jacobson

On Fri, Jul 13, 2012 at 9:41 PM, Peter Eisentraut  wrote:

> Personally, I hate this proposed nested directory structure.  I would
> like to have all objects in one directory.
>
> But there is a lot of "personally" in this thread, of course.

Why do you hate it?

It's a bit like saying,
 - I hate database normalization, better to keep all rows in one single
table.
or even,
 - I hate directories.

I have thousands of objects, it would be a total mess to keep them all in a
single directory.

Using a normalized directory structure makes sense for the SCM use-case,
I haven't seen any projects where all the files are kept in one directory.

Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework

2012-07-14 Thread Boszormenyi Zoltan


2012-07-14 00:36 keltezéssel, Tom Lane írta:

Alvaro Herrera  writes:

For what it's worth, I would appreciate it if you would post the lock
timeout patch for the upcoming commitfest.

+1.  I think it's reasonable to get the infrastructure patch in now,
but we are running out of time in this commitfest (and there's still
a lot of patches that haven't been reviewed at all).

regards, tom lane



OK, I will do that.

Thanks for the review.

--
--
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Use of rsync for data directory copying

Re: [HACKERS] Use of rsync for data directory copying

Re: [HACKERS] Schema version management

Re: [HACKERS] Use of rsync for data directory copying

Re: [HACKERS] [PERFORM] DELETE vs TRUNCATE explanation

Re: [HACKERS] Synchronous Standalone Master Redoux

Re: [HACKERS] elog/ereport noreturn decoration

Re: [HACKERS] elog/ereport noreturn decoration

[HACKERS] Getting rid of pre-assignment of index names in CREATE TABLE LIKE

[HACKERS] Re: Reporting WAL file containing checkpoint's REDO record in pg_controldata's result

Re: [HACKERS] sign mismatch in walreceiver.c

Re: [HACKERS] Support for XLogRecPtr in expand_fmt_string?

Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.

Re: [HACKERS] Synchronous Standalone Master Redoux

Re: [HACKERS] [PATCH] Allow breaking out of hung connection attempts

Re: [HACKERS] Schema version management

Re: [HACKERS] pg_prewarm

Re: [HACKERS] Schema version management

Re: [HACKERS] Schema version management

Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework

20 matches

Site Navigation

Mail list logo

Footer information