Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby
On Thu, Jun 10, 2010 at 12:09 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Ok, committed with some cosmetic changes. Thanks! BTW, should there be doc changes for this? I didn't find anything explaining how restartpoints are triggered, we should add a paragraph somewhere. +1 What about the attached patch? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center trigger_restartpoint_doc_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fix use of posix_fadvise in xlog.c
On 10/06/10 06:47, Mark Wong wrote: I wanted to propose a fix for to xlog.c regarding the use of posix_fadvise() for 9.1 (unless someone feels it's ok for 9.0). Currently posix_fadvise() is used right before a log file is closed so it's effectively not doing anything, when posix_fadvise is to be called. This patch moves the posix_fadvise() call into 3 other locations within XLogFileInit() where a file handle is returned. The first case is where an existing open file handle is returned. The next case is when a file is to be zeroed out. The third case is returning a file handle, which may be the file that was just zeroed out. I don't think POSIX_FADV_DONTNEED does what you think it does. It tells the kernel that you don't need to keep these pages in the cache anymore, I won't be accessing them anymore. If you call it when you open the file, before reading/writing, there is nothing in the cache and the call will do nothing. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] InvalidXLogRecPtr in docs
On Thu, Jun 10, 2010 at 11:56 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Jun 9, 2010 at 9:46 PM, Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp wrote: I found a term InvalidXLogRecPtr in 9.0 docs. http://developer.postgresql.org/pgdocs/postgres/functions-admin.html#FUNCTIONS-RECOVERY-INFO-TABLE | ... then the return value will be InvalidXLogRecPtr (0/0). Maybe we should be returning NULL instead of 0/0. +1 for using NULL instead of an artificially chosen value, for both of those functions. Okay, the attached patch makes those functions return NULL in that case. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center recovery_funcs_return_null_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] InvalidXLogRecPtr in docs
On 10/06/10 05:56, Tom Lane wrote: Robert Haasrobertmh...@gmail.com writes: On Wed, Jun 9, 2010 at 9:46 PM, Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp wrote: I found a term InvalidXLogRecPtr in 9.0 docs. http://developer.postgresql.org/pgdocs/postgres/functions-admin.html#FUNCTIONS-RECOVERY-INFO-TABLE | ... then the return value will be InvalidXLogRecPtr (0/0). Maybe we should be returning NULL instead of 0/0. +1 for using NULL instead of an artificially chosen value, for both of those functions. Agreed, committed a patch to do that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] InvalidXLogRecPtr in docs
On 10/06/10 09:42, Fujii Masao wrote: On Thu, Jun 10, 2010 at 11:56 AM, Tom Lanet...@sss.pgh.pa.us wrote: Robert Haasrobertmh...@gmail.com writes: On Wed, Jun 9, 2010 at 9:46 PM, Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp wrote: I found a term InvalidXLogRecPtr in 9.0 docs. http://developer.postgresql.org/pgdocs/postgres/functions-admin.html#FUNCTIONS-RECOVERY-INFO-TABLE | ... then the return value will be InvalidXLogRecPtr (0/0). Maybe we should be returning NULL instead of 0/0. +1 for using NULL instead of an artificially chosen value, for both of those functions. Okay, the attached patch makes those functions return NULL in that case. Ah, I just committed a patch to do the same, before seeing your email. Thanks anyway. BTW, the docs claim about pg_last_xlog_location() that While streaming replication is in progress this will increase monotonically. That's a bit misleading: when the replication connection is broken for some reason and we restart it, we begin streaming from the beginning of the last WAL segment. So at that moment, pg_last_xlog_location() moves backwards to the beginning of the WAL segment. Should we: 1. Just document that, 2. Change pg_last_xlog_location() to not move backwards in that case, or 3. Change the behavior so that we start streaming at the exact byte location where we left off? I believe that starting from the beginning of the WAL segment is just paranoia, to avoid creating a WAL file that's missing some data from the beginning. Right? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command to prune archive at restartpoints
On 09/06/10 10:21, Simon Riggs wrote: On Tue, 2010-06-08 at 18:30 -0400, Andrew Dunstan wrote: I prefer archive_cleanup_command. We should name things after their principal function, not an implementation detail, IMNSHO. More importantly, we should include an example in the docs. I created one the other day when this was actually bothering me a bit (see http://people.planetpostgresql.org/andrew/index.php?/archives/85-Keeping-a-hot-standby-log-archive-clean.html). That seemed to work ok, but maybe it's too long, and maybe people would prefer a shell script to perl. I submitted a patch to make the command pg_standby -a %r That's a more portable solution, ISTM. I'll commit that and fix the docs. Huh, wait. There's no -a option in pg_standby, so I presume you're planning to add that too. I don't like confusing pg_standby into this, the docs are currently quite clear that if you want to use the built-in standby mode, you can't use pg_standby, and this would muddy the waters. Maybe we could add a new pg_cleanuparchive binary, but we'll need some discussion... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] variable TriggerFile can be declared as static
Hi, Currently the variable TriggerFile is declared as extern, but it's not used in other source file than xlog.c. How about declaring it as static? Here is the patch. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center definition_triggerfile_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command to prune archive at restartpoints
On Thu, 2010-06-10 at 10:18 +0300, Heikki Linnakangas wrote: On 09/06/10 10:21, Simon Riggs wrote: On Tue, 2010-06-08 at 18:30 -0400, Andrew Dunstan wrote: I prefer archive_cleanup_command. We should name things after their principal function, not an implementation detail, IMNSHO. More importantly, we should include an example in the docs. I created one the other day when this was actually bothering me a bit (see http://people.planetpostgresql.org/andrew/index.php?/archives/85-Keeping-a-hot-standby-log-archive-clean.html). That seemed to work ok, but maybe it's too long, and maybe people would prefer a shell script to perl. I submitted a patch to make the command pg_standby -a %r That's a more portable solution, ISTM. I'll commit that and fix the docs. Huh, wait. There's no -a option in pg_standby, so I presume you're planning to add that too. I don't like confusing pg_standby into this, the docs are currently quite clear that if you want to use the built-in standby mode, you can't use pg_standby, and this would muddy the waters. It won't kill us to change that sentence. pg_standby is only used now within the cleanup command etc pg_standby already contains the exact logic we need here. Having two sets of code for the same thing isn't how we do things. Maybe we could add a new pg_cleanuparchive binary, but we'll need some discussion... Which will go nowhere, as we both already know. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] InvalidXLogRecPtr in docs
On Thu, Jun 10, 2010 at 4:07 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Ah, I just committed a patch to do the same, before seeing your email. Thanks anyway. Yeah, thanks a lot! BTW, the docs claim about pg_last_xlog_location() that While streaming replication is in progress this will increase monotonically. That's a bit misleading: when the replication connection is broken for some reason and we restart it, we begin streaming from the beginning of the last WAL segment. So at that moment, pg_last_xlog_location() moves backwards to the beginning of the WAL segment. Should we: 1. Just document that, 2. Change pg_last_xlog_location() to not move backwards in that case, or 3. Change the behavior so that we start streaming at the exact byte location where we left off? I'm for 2 as follows. diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index 26aeca6..f0fd813 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -524,7 +524,8 @@ XLogWalRcvFlush(void) /* Update shared-memory status */ SpinLockAcquire(walrcv-mutex); - walrcv-receivedUpto = LogstreamResult.Flush; + if (XLByteLT(walrcv-receivedUpto, LogstreamResult.Flush)) + walrcv-receivedUpto = LogstreamResult.Flush; SpinLockRelease(walrcv-mutex); I believe that starting from the beginning of the WAL segment is just paranoia, to avoid creating a WAL file that's missing some data from the beginning. Right? Only when the recovery starting record (i.e., the record at the checkpoint redo location) is not found, we need to start replication from the beginning of the segment, I think. That is, fetching_ckpt = true case in the following code. if (PrimaryConnInfo) { RequestXLogStreaming( fetching_ckpt ? RedoStartLSN : *RecPtr, PrimaryConnInfo); continue; } Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] variable TriggerFile can be declared as static
On 10/06/10 10:25, Fujii Masao wrote: Currently the variable TriggerFile is declared as extern, but it's not used in other source file than xlog.c. How about declaring it as static? Here is the patch. Thanks, applied. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] InvalidXLogRecPtr in docs
On 10/06/10 10:43, Fujii Masao wrote: On Thu, Jun 10, 2010 at 4:07 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: BTW, the docs claim about pg_last_xlog_location() that While streaming replication is in progress this will increase monotonically. That's a bit misleading: when the replication connection is broken for some reason and we restart it, we begin streaming from the beginning of the last WAL segment. So at that moment, pg_last_xlog_location() moves backwards to the beginning of the WAL segment. Should we: 1. Just document that, 2. Change pg_last_xlog_location() to not move backwards in that case, or 3. Change the behavior so that we start streaming at the exact byte location where we left off? I'm for 2 as follows. diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index 26aeca6..f0fd813 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -524,7 +524,8 @@ XLogWalRcvFlush(void) /* Update shared-memory status */ SpinLockAcquire(walrcv-mutex); - walrcv-receivedUpto = LogstreamResult.Flush; + if (XLByteLT(walrcv-receivedUpto, LogstreamResult.Flush)) + walrcv-receivedUpto = LogstreamResult.Flush; SpinLockRelease(walrcv-mutex); That's not enough, because we set receivedUpto in RequestXlogStreaming() already. I believe that starting from the beginning of the WAL segment is just paranoia, to avoid creating a WAL file that's missing some data from the beginning. Right? Only when the recovery starting record (i.e., the record at the checkpoint redo location) is not found, we need to start replication from the beginning of the segment, I think. That is, fetching_ckpt = true case in the following code. if (PrimaryConnInfo) { RequestXLogStreaming( fetching_ckpt ? RedoStartLSN : *RecPtr, PrimaryConnInfo); continue; } Even then, we wouldn't need to start from the beginning of the WAL segment AFAICS. The point is to start from the Redo pointer, not from the checkpoint record, because as soon as we read the checkpoint record we'll need to start applying WAL from the Redo pointer, which is earlier. The WAL file boundaries don't come into play there. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] InvalidXLogRecPtr in docs
On Thu, Jun 10, 2010 at 5:04 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Should we: 1. Just document that, 2. Change pg_last_xlog_location() to not move backwards in that case, or 3. Change the behavior so that we start streaming at the exact byte location where we left off? I'm for 2 as follows. diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index 26aeca6..f0fd813 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -524,7 +524,8 @@ XLogWalRcvFlush(void) /* Update shared-memory status */ SpinLockAcquire(walrcv-mutex); - walrcv-receivedUpto = LogstreamResult.Flush; + if (XLByteLT(walrcv-receivedUpto, LogstreamResult.Flush)) + walrcv-receivedUpto = LogstreamResult.Flush; SpinLockRelease(walrcv-mutex); That's not enough, because we set receivedUpto in RequestXlogStreaming() already. Ah, you are right. I believe that starting from the beginning of the WAL segment is just paranoia, to avoid creating a WAL file that's missing some data from the beginning. Right? Only when the recovery starting record (i.e., the record at the checkpoint redo location) is not found, we need to start replication from the beginning of the segment, I think. That is, fetching_ckpt = true case in the following code. if (PrimaryConnInfo) { RequestXLogStreaming( fetching_ckpt ? RedoStartLSN : *RecPtr, PrimaryConnInfo); continue; } Even then, we wouldn't need to start from the beginning of the WAL segment AFAICS. The point is to start from the Redo pointer, not from the checkpoint record, because as soon as we read the checkpoint record we'll need to start applying WAL from the Redo pointer, which is earlier. The WAL file boundaries don't come into play there. You mean that the WAL file containing the Redo pointer is guaranteed to exist if we could read the checkpoint record, so we don't need to start from the beginning of the segment? This is probably true. But what if we could not read the checkpoint record? In this case, the WAL file containing the Redo pointer also might not exist. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LLVM / clang
On ons, 2010-06-09 at 09:59 +0200, Florian Pflug wrote: The most heavily platform dependent part of the code is the spinlock implementation. You might want to check that it actually uses the version optimized for your platform, not the (much slower) generic implementation based on semaphores. You only get the slow implementation if you configure explicitly with --disable-spinlocks. A toolchain that didn't support spinlocks would fail the build and then the user could use that option to get past that problem. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LLVM / clang
On tis, 2010-06-08 at 12:12 +0200, P. Caillaud wrote: I'd like to experiment on compiling postgres with LLVM (either llvm-gcc or clang) on Linux, is it supported ? Where should I start ? The way to choose a compiler is ./configure CC=your-cc ...other...options... We support a fair amount of non-GCC compilers, so supporting one or two more should be possible. Quick testing shows that clang doesn't get through the configure stage on this Debian system -- it looks like some amount of better integration with glibc might be needed. Building with llvm-gcc works fine, but I understand that using llvm-gcc with native code generation isn't all that different from using gcc itself, so that's not a surprising result. The only issue is that the float8 regression test fails, so it is apparently not *exactly* the same. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] InvalidXLogRecPtr in docs
On 10/06/10 11:37, Fujii Masao wrote: On Thu, Jun 10, 2010 at 5:04 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I believe that starting from the beginning of the WAL segment is just paranoia, to avoid creating a WAL file that's missing some data from the beginning. Right? Only when the recovery starting record (i.e., the record at the checkpoint redo location) is not found, we need to start replication from the beginning of the segment, I think. That is, fetching_ckpt = true case in the following code. if (PrimaryConnInfo) { RequestXLogStreaming( fetching_ckpt ? RedoStartLSN : *RecPtr, PrimaryConnInfo); continue; } Even then, we wouldn't need to start from the beginning of the WAL segment AFAICS. The point is to start from the Redo pointer, not from the checkpoint record, because as soon as we read the checkpoint record we'll need to start applying WAL from the Redo pointer, which is earlier. The WAL file boundaries don't come into play there. You mean that the WAL file containing the Redo pointer is guaranteed to exist if we could read the checkpoint record, so we don't need to start from the beginning of the segment? This is probably true. But what if we could not read the checkpoint record? In this case, the WAL file containing the Redo pointer also might not exist. Oh, I think I understand the issue now: we need the header in the beginning of the WAL segment to be valid, even if the first record we're interested in is in the middle of the file. I missed that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby
On 10/06/10 09:14, Fujii Masao wrote: On Thu, Jun 10, 2010 at 12:09 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: BTW, should there be doc changes for this? I didn't find anything explaining how restartpoints are triggered, we should add a paragraph somewhere. +1 What about the attached patch? (description of wal_keep_segments) *** 1902,1907 SET ENABLE_SEQSCAN TO OFF; --- 1902,1908 for standby purposes, and the number of old WAL segments available for standbys is determined based only on the location of the previous checkpoint and status of WAL archiving. + This parameter has no effect on a restartpoint. This parameter can only be set in the filenamepostgresql.conf/ file or on the server command line. /para Hmm, I wonder if wal_keep_segments should take effect during recovery too? We don't support cascading slaves, but if you have two slaves connected to one master (without an archive), and you perform failover to one of them, without wal_keep_segments the 2nd slave might not find all the files it needs in the new master. Then again, that won't work without an archive anyway, because we error out at a TLI mismatch in replication. Seems like this is 9.1 material.. *** a/doc/src/sgml/wal.sgml --- b/doc/src/sgml/wal.sgml *** *** 424,429 --- 424,430 para There will always be at least one WAL segment file, and will normally not be more than (2 + varnamecheckpoint_completion_target/varname) * varnamecheckpoint_segments/varname + 1 +or varnamecheckpoint_segments/ + xref linkend=guc-wal-keep-segments + 1 files. Each segment file is normally 16 MB (though this size can be altered when building the server). You can use this to estimate space requirements for acronymWAL/acronym. That's not true, wal_keep_segments is the minimum number of files retained, independently of checkpoint_segments. The corret formula is (2 + checkpoint_completion_target * checkpoint_segments, wal_keep_segments) para +In archive recovery or standby mode, the server periodically performs +firsttermrestartpoints/indextermprimaryrestartpoint// +which are similar to checkpoints in normal operation: the server forces +all its state to disk, updates the filenamepg_control/ file to +indicate that the already-processed WAL data need not be scanned again, +and then recycles old log segment files if they are in the +filenamepg_xlog/ directory. Note that this recycling is not affected +by varnamewal_keep_segments/ at all. A restartpoint is triggered, +if at least one checkpoint record has been replayed since the last +restartpoint, every varnamecheckpoint_timeout/ seconds, or every +varnamecheckoint_segments/ log segments only in standby mode, +whichever comes first That last sentence is a bit unclear. How about: A restartpoint is triggered if at least one checkpoint record has been replayed and varnamecheckpoint_timeout/ seconds have passed since last restartpoint. In standby mode, a restartpoint is also triggered if varnamecheckoint_segments/ log segments have been replayed since last restartpoint and at least one checkpoint record has been replayed since. ... In log shipping case, the checkpoint interval +on the standby is normally smaller than that on the master. + /para What does that mean? Restartpoints can't be performed more frequently than checkpoints in the master because restartpoints can only be performed at checkpoint records. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] failover vs. read only queries
On Thu, Jun 10, 2010 at 5:06 AM, Tom Lane t...@sss.pgh.pa.us wrote: Josh Berkus j...@agliodbs.com writes: The fact that failover current does *not* terminate existing queries and transactions was regarded as a feature by the audience, rather than a bug, when I did demos of HS/SR. Of course, they might not have been thinking of the delay for writes. If there were an easy way to make the trigger file cancel all running queries, apply remaining logs and come up, then I'd vote for that for 9.0. I think it's the more desired behavior by most users. However, I'm opposed to any complex solutions which might delay 9.0 release. My feeling about it is that if you want fast failover you should not have your failover target server configured as hot standby at all, let alone hot standby with a long max_standby_delay. Such a slave could be very far behind on applying WAL when the crunch comes, and no amount of query killing will save you from that. Put your long-running standby queries on a different slave instead. We should consider whether we can improve the situation in 9.1, but it is not a must-fix for 9.0; especially when the correct behavior isn't immediately obvious. OK. Let's revisit in 9.1. I attached the proposal patch for 9.1. The patch treats max_standby_delay as zero (i.e., cancels all the conflicting queries immediately), ever since the trigger file is created. So we can cause a recovery to end without waiting for any lock held by queries, and minimize the failover time. OTOH, queries which don't conflict with a recovery survive the failover. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center prevent_lock_conflict_from_slowing_failover_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] failover vs. read only queries
On Thu, Jun 10, 2010 at 9:58 AM, Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp wrote: Fujii Masao masao.fu...@gmail.com wrote: 1. Reset max_standby_delay = 0 in postgresql.conf 2. pg_ctl reload 3. Create a trigger file As far as I read the HS code, SIGHUP is not checked while a recovery is waiting for queries :( So pg_ctl reload would have no effect on the conflicting queries. Independently from the problem I raised, I think that we should call HandleStartupProcInterrupts() in that sleep loop. Hmmm, if reload doesn't work, can we write a query like below? SELECT pg_terminate_backend(pid) FROM pg_locks WHERE conflicted-with-recovery-process; I'm not sure that, but as you suggested, we can minimize the failover time by using the following operation even in 9.0. 1. Reset max_standby_delay = 0 in postgresql.conf 2. pg_ctl reload 3. Cancel all the queries or all the conflicting ones 4. Create a trigger file For now, I'll use the above when building the HA system using 9.0 and a clusterware. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby
On Thu, Jun 10, 2010 at 7:19 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: --- 1902,1908 for standby purposes, and the number of old WAL segments available for standbys is determined based only on the location of the previous checkpoint and status of WAL archiving. + This parameter has no effect on a restartpoint. This parameter can only be set in the filenamepostgresql.conf/ file or on the server command line. /para Hmm, I wonder if wal_keep_segments should take effect during recovery too? We don't support cascading slaves, but if you have two slaves connected to one master (without an archive), and you perform failover to one of them, without wal_keep_segments the 2nd slave might not find all the files it needs in the new master. Then again, that won't work without an archive anyway, because we error out at a TLI mismatch in replication. Seems like this is 9.1 material.. Yep, since currently SR cannot get over the gap of TLI, wal_keep_segments is not worth taking effect during recovery. *** a/doc/src/sgml/wal.sgml --- b/doc/src/sgml/wal.sgml *** *** 424,429 --- 424,430 para There will always be at least one WAL segment file, and will normally not be more than (2 + varnamecheckpoint_completion_target/varname) * varnamecheckpoint_segments/varname + 1 + or varnamecheckpoint_segments/ + xref linkend=guc-wal-keep-segments + 1 files. Each segment file is normally 16 MB (though this size can be altered when building the server). You can use this to estimate space requirements for acronymWAL/acronym. That's not true, wal_keep_segments is the minimum number of files retained, independently of checkpoint_segments. The corret formula is (2 + checkpoint_completion_target * checkpoint_segments, wal_keep_segments) You mean that the maximum number of WAL files is: ? max { (2 + checkpoint_completion_target) * checkpoint_segments, wal_keep_segments } Just after a checkpoint removes old WAL files, there might be wal_keep_segments WAL files. Additionally, checkpoint_segments WAL files might be generated before the subsequent checkpoint removes old WAL files. So I think that the maximum number is max { (2 + checkpoint_completion_target) * checkpoint_segments, wal_keep_segments + checkpoint_segments } Am I missing something? para + In archive recovery or standby mode, the server periodically performs + firsttermrestartpoints/indextermprimaryrestartpoint// + which are similar to checkpoints in normal operation: the server forces + all its state to disk, updates the filenamepg_control/ file to + indicate that the already-processed WAL data need not be scanned again, + and then recycles old log segment files if they are in the + filenamepg_xlog/ directory. Note that this recycling is not affected + by varnamewal_keep_segments/ at all. A restartpoint is triggered, + if at least one checkpoint record has been replayed since the last + restartpoint, every varnamecheckpoint_timeout/ seconds, or every + varnamecheckoint_segments/ log segments only in standby mode, + whichever comes first That last sentence is a bit unclear. How about: A restartpoint is triggered if at least one checkpoint record has been replayed and varnamecheckpoint_timeout/ seconds have passed since last restartpoint. In standby mode, a restartpoint is also triggered if varnamecheckoint_segments/ log segments have been replayed since last restartpoint and at least one checkpoint record has been replayed since. Thanks! Seems good. ... In log shipping case, the checkpoint interval + on the standby is normally smaller than that on the master. + /para What does that mean? Restartpoints can't be performed more frequently than checkpoints in the master because restartpoints can only be performed at checkpoint records. Yes, that's what I meant. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parser handling of large object OIDs
On Wed, Jun 9, 2010 at 10:26 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Jun 9, 2010 at 5:02 PM, Robert Haas robertmh...@gmail.com wrote: I believe that the comment code is probably right, because I think IConst can only handle values 2^31, whereas OIDs can be as large as 2^32-1. I investigated this a little more and the above analysis turns out to be correct. ALTER LARGE OBJECT OWNER and GRANT ... ON LARGE OBJECT don't work for large objects outside the range of a signed integer. Yup. Session demonstrating the problem and proposed patch attached. This patch seems extremely grotty, though. Surely that's not the way we were doing it in the comment code? I pretty much just moved the existing code from CommentLargeObject() into a new function oidparse(). I couldn't really figure out where to put the oidparse() function so I eventually decided on oid.c, and therefore also ripped out the trip through the fmgr layer in favor of calling the appropriate code directly. Other than that it's the same code. I'm open to suggestions, but this is basically just a small bit of code rearrangement. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] No hash join across partitioned tables?
(moving to -hackers) On Wed, Jun 9, 2010 at 4:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: In going back through emails I had marked as possibly needing another look before 9.0 is released, I came across this issue again. As I understand it, analyze (or analyse) now collects statistics for both the parent individually, and for the parent and its children together. However, as I further understand it, autovacuum won't actually fire off an analyze unless there's enough activity on the parent table considered individually to warrant it. So if you have an empty parent and a bunch of children with data in it, your stats will still stink, unless you analyze by hand. Check. Assuming my understanding of the problem is correct, we could: (a) fix it, (b) document that you should consider periodic manual analyze commands in this situation, or (c) do nothing. Thoughts? The objections to (a) are that it might result in excessive ANALYZE work if not done intelligently, and that we haven't got a patch ready anyway. I would have liked to get to this for 9.0 but I feel it's a bit late now. I guess I can't really disagree with that. Should we try to document this in some way? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LLVM / clang
Peter Eisentraut pete...@gmx.net writes: Quick testing shows that clang doesn't get through the configure stage on this Debian system -- it looks like some amount of better integration with glibc might be needed. Building with llvm-gcc works fine, but I understand that using llvm-gcc with native code generation isn't all that different from using gcc itself, so that's not a surprising result. The only issue is that the float8 regression test fails, so it is apparently not *exactly* the same. There's a buildfarm animal using llvm-gcc, and it passes just fine ... so the float8 failure sounds to me like another integration problem. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] warning message in standby
On Mon, Jun 7, 2010 at 9:21 AM, Fujii Masao masao.fu...@gmail.com wrote: When an error is found in the WAL streamed from the master, a warning message is repeated without interval forever in the standby. This consumes CPU load very much, and would interfere with read-only queries. To fix this problem, we should add a sleep into emode_for_corrupt_record() or somewhere? Or we should stop walreceiver and retry to read WAL from pg_xlog or the archive? I ran into this problem at one point, too, but was in the middle of trying to investigate a different bug and didn't have time to track down what was causing it. I think the basic question here is - if there's an error in the WAL, how do we expect to EVER recover? Even if we can read from the archive or pg_xlog, presumably it's the same WAL - why should we be any more successful the second time? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] InvalidXLogRecPtr in docs
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Even then, we wouldn't need to start from the beginning of the WAL segment AFAICS. The point is to start from the Redo pointer, not from the checkpoint record, because as soon as we read the checkpoint record we'll need to start applying WAL from the Redo pointer, which is earlier. The WAL file boundaries don't come into play there. I don't believe it's a good idea to have SR not write full xlog segment files. Consider for example the following scenario: 1. SR writes some xlog file from the middle. 2. Filesystem says ah-hah, I know about sparse storage and doesn't allocate the first half of the file. 3. Failover: slave goes live. 4. xlog file gets recycled for re-use. 5. While reusing the file, we write into the first half ... or try to, but there's no disk space. 6. PANIC. There are probably some other good reasons not to allow incomplete copies of WAL files to exist on the slave system, anyway. I'm not sure if it's worth the trouble, or even a particularly smart idea, to force the output of the status function to be monotonic regardless of what happens underneath. I think removing that claim from the docs altogether is the easiest answer. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command to prune archive at restartpoints
On Thu, Jun 10, 2010 at 3:28 AM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, 2010-06-10 at 10:18 +0300, Heikki Linnakangas wrote: On 09/06/10 10:21, Simon Riggs wrote: On Tue, 2010-06-08 at 18:30 -0400, Andrew Dunstan wrote: I prefer archive_cleanup_command. We should name things after their principal function, not an implementation detail, IMNSHO. More importantly, we should include an example in the docs. I created one the other day when this was actually bothering me a bit (see http://people.planetpostgresql.org/andrew/index.php?/archives/85-Keeping-a-hot-standby-log-archive-clean.html). That seemed to work ok, but maybe it's too long, and maybe people would prefer a shell script to perl. I submitted a patch to make the command pg_standby -a %r That's a more portable solution, ISTM. I'll commit that and fix the docs. Huh, wait. There's no -a option in pg_standby, so I presume you're planning to add that too. I don't like confusing pg_standby into this, the docs are currently quite clear that if you want to use the built-in standby mode, you can't use pg_standby, and this would muddy the waters. It won't kill us to change that sentence. pg_standby is only used now within the cleanup command etc pg_standby already contains the exact logic we need here. Having two sets of code for the same thing isn't how we do things. Maybe we could add a new pg_cleanuparchive binary, but we'll need some discussion... Which will go nowhere, as we both already know. I have a feeling that I may be poking my nose into an incipient shouting match, but FWIW I agree with Heikki that it would be preferable to keep this separate from pg_standby. Considering that Andrew wrote this in 24 lines of Perl code (one-third of which are basically just there for logging purposes), I'm not that worried about code duplication, unless what we actually need is significantly more complicated. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] warning message in standby
Robert Haas robertmh...@gmail.com writes: On Mon, Jun 7, 2010 at 9:21 AM, Fujii Masao masao.fu...@gmail.com wrote: When an error is found in the WAL streamed from the master, a warning message is repeated without interval forever in the standby. This consumes CPU load very much, and would interfere with read-only queries. To fix this problem, we should add a sleep into emode_for_corrupt_record() or somewhere? Or we should stop walreceiver and retry to read WAL from pg_xlog or the archive? I ran into this problem at one point, too, but was in the middle of trying to investigate a different bug and didn't have time to track down what was causing it. I think the basic question here is - if there's an error in the WAL, how do we expect to EVER recover? Even if we can read from the archive or pg_xlog, presumably it's the same WAL - why should we be any more successful the second time? What warning message are we talking about? All the error cases I can think of in WAL-application are ERROR, or likely even PANIC. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command to prune archive at restartpoints
Robert Haas wrote: It won't kill us to change that sentence. pg_standby is only used now within the cleanup command etc pg_standby already contains the exact logic we need here. Having two sets of code for the same thing isn't how we do things. Well, we could factor out that part of the code so it could be used in two binaries. But ... Maybe we could add a new pg_cleanuparchive binary, but we'll need some discussion... Which will go nowhere, as we both already know. I have a feeling that I may be poking my nose into an incipient shouting match, but FWIW I agree with Heikki that it would be preferable to keep this separate from pg_standby. Considering that Andrew wrote this in 24 lines of Perl code (one-third of which are basically just there for logging purposes), I'm not that worried about code duplication, unless what we actually need is significantly more complicated. I think my logic needs a tiny piece of adjustment, to ignore the timeline segment of the file name. But that will hardly involve a great deal of extra code - just chop off the first 8 chars. It's not like the code for this in pg_standby.c is terribly complex. The virtue of a perl script is that it's very easily customizable, e.g. you might only delete files if they are older than a certain age. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Error with GIT Repository
Greetings all, I have been trying to create/run a build farm as part of a project I am working on. However, I have noticed the primary git repostitory, git.postgresql.org/git, does not seem to be working. Namely, whenever I try to clone the directory, I receive this error: Error: Unable to find 5e4933c31d3cd2750ee1793efe6eca43055fb273e under http://git.postgresql.org/git/postgresql.git Cannot obtain needed blob 5e4933c31d3cd2750ee1793efe6eca4305fb273e while processing commit c5609c66ce2ee4fdb180be95721252b47f90499 Error: fetch failed. I thought it would be prudent to notify the list so someone could possibly check into this. Thanks! Scott Luxenberg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fix use of posix_fadvise in xlog.c
On Jun 9, 2010, at 11:25 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 10/06/10 06:47, Mark Wong wrote: I wanted to propose a fix for to xlog.c regarding the use of posix_fadvise() for 9.1 (unless someone feels it's ok for 9.0). Currently posix_fadvise() is used right before a log file is closed so it's effectively not doing anything, when posix_fadvise is to be called. This patch moves the posix_fadvise() call into 3 other locations within XLogFileInit() where a file handle is returned. The first case is where an existing open file handle is returned. The next case is when a file is to be zeroed out. The third case is returning a file handle, which may be the file that was just zeroed out. I don't think POSIX_FADV_DONTNEED does what you think it does. It tells the kernel that you don't need to keep these pages in the cache anymore, I won't be accessing them anymore. If you call it when you open the file, before reading/writing, there is nothing in the cache and the call will do nothing. Oops, my bad. I think I was confused by the short description in the man page. I didn't read the longer descriptoon. :( Then would it be worth making the this call after the file is zeroed out? Regards, Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command to prune archive at restartpoints
On 10/06/10 17:38, Andrew Dunstan wrote: I think my logic needs a tiny piece of adjustment, to ignore the timeline segment of the file name. I'm not sure you should ignore it. Presumably anything in an older timeline is indeed not required anymore and can be removed, and anything in a newer timeline... how did it get there? Seems safer not remove it. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error with GIT Repository
Luxenberg, Scott I. wrote: Greetings all, I have been trying to create/run a build farm as part of a project I am working on. That seems an odd thing to do since we have one ... However, I have noticed the primary git repostitory, git.postgresql.org/git, does not seem to be working. Namely, whenever I try to clone the directory, I receive this error: Error: Unable to find 5e4933c31d3cd2750ee1793efe6eca43055fb273e under http://git.postgresql.org/git/postgresql.git Cannot obtain needed blob 5e4933c31d3cd2750ee1793efe6eca4305fb273e while processing commit c5609c66ce2ee4fdb180be95721252b47f90499 Error: fetch failed. I thought it would be prudent to notify the list so someone could possibly check into this. Why are you cloning over http? Here is the best way to clone, which seems to be working: [and...@sophia ]$ git clone --mirror git://git.postgresql.org/git/postgresql.git Initialized empty Git repository in /home/andrew/postgresql.git/ remote: Counting objects: 376865, done. remote: Compressing objects: 100% (87569/87569), done. remote: Total 376865 (delta 310187), reused 352950 (delta 287485) Receiving objects: 100% (376865/376865), 178.73 MiB | 251 KiB/s, done. Resolving deltas: 100% (310187/310187), done. [and...@sophia ]$ cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command to prune archive at restartpoints
Heikki Linnakangas wrote: On 10/06/10 17:38, Andrew Dunstan wrote: I think my logic needs a tiny piece of adjustment, to ignore the timeline segment of the file name. I'm not sure you should ignore it. Presumably anything in an older timeline is indeed not required anymore and can be removed, and anything in a newer timeline... how did it get there? Seems safer not remove it. Well, I was just following the logic in pg-standby.c: /* * We ignore the timeline part of the XLOG segment identifiers * in deciding whether a segment is still needed. This * ensures that we won't prematurely remove a segment from a * parent timeline. We could probably be a little more * proactive about removing segments of non-parent timelines, * but that would be a whole lot more complicated. * * We use the alphanumeric sorting property of the filenames * to decide which ones are earlier than the * exclusiveCleanupFileName file. Note that this means files * are not removed in the order they were originally written, * in case this worries you. */ if (strlen(xlde-d_name) == XLOG_DATA_FNAME_LEN strspn(xlde-d_name, 0123456789ABCDEF) == XLOG_DATA_FNAME_LEN strcmp(xlde-d_name + 8, exclusiveCleanupFileName + 8) 0) cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fix use of posix_fadvise in xlog.c
On 10/06/10 18:17, Mark Wong wrote: On Jun 9, 2010, at 11:25 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I don't think POSIX_FADV_DONTNEED does what you think it does. It tells the kernel that you don't need to keep these pages in the cache anymore, I won't be accessing them anymore. If you call it when you open the file, before reading/writing, there is nothing in the cache and the call will do nothing. Oops, my bad. I think I was confused by the short description in the man page. I didn't read the longer descriptoon. :( Then would it be worth making the this call after the file is zeroed out? Not sure. If you're churning through WAL files at a reasonable speed, the zeroed-out file will soon be written to again. OTOH, we always write whole pages, so maybe the OS is smart enough to not read the page back to memory just to overwrite it. In a steady-state situation new WAL files are not created very often because we recycle old ones, so it probably doesn't make much difference. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fix use of posix_fadvise in xlog.c
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: In a steady-state situation new WAL files are not created very often because we recycle old ones, so it probably doesn't make much difference. Yeah. We really don't worry too much about the performance of the new-WAL-file-creation code path because of this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How about closing some Open Items?
On Wed, Jun 9, 2010 at 8:01 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Thanks for clearing the list. There are only 5 remaining items, which is kinda exciting, though Tom's assertion that HS is still bug-ridden is a bit off-putting. It's a big piece of subtle code and it's quite possible it contains bugs. But people know that and as a result it's received a lot of testing and careful thought already. The chances it has bugs of its own are probably lower than for other major projects in the past. On the other hand it's recovery-related and it shakes me that we have no regression tests for recovery let alone standby databases. What's more scary are either of two cases: 1) There are use cases of varying degrees of obscurity which haven't been explicitly covered where the behaviour is not what people would expect. We've already fixed a few such cases such as shutdown semantics and setting up a standby based on an backup of initdb results before starting up the database. This is the kind of thing we need lots of users testing their real workloads with and doing test failovers and so on. 2) There are unrelated areas of the database which have collateral damage that nobody expected and thought to test for. Hopefully we have enough regression tests to detect this kind of thing but again as there are no regression tests for recovery we could have bugs in other systems that don't turn up until you use those systems on a standby database or after running the system as a standby database and then bringing it up. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] warning message in standby
On 10/06/10 17:38, Tom Lane wrote: Robert Haasrobertmh...@gmail.com writes: On Mon, Jun 7, 2010 at 9:21 AM, Fujii Masaomasao.fu...@gmail.com wrote: When an error is found in the WAL streamed from the master, a warning message is repeated without interval forever in the standby. This consumes CPU load very much, and would interfere with read-only queries. To fix this problem, we should add a sleep into emode_for_corrupt_record() or somewhere? Or we should stop walreceiver and retry to read WAL from pg_xlog or the archive? I ran into this problem at one point, too, but was in the middle of trying to investigate a different bug and didn't have time to track down what was causing it. I think the basic question here is - if there's an error in the WAL, how do we expect to EVER recover? Even if we can read from the archive or pg_xlog, presumably it's the same WAL - why should we be any more successful the second time? What warning message are we talking about? All the error cases I can think of in WAL-application are ERROR, or likely even PANIC. We're talking about a corrupt record (incorrect CRC, incorrect backlink etc.), not errors within redo functions. During crash recovery, a corrupt record means you've reached end of WAL. In standby mode, when streaming WAL from master, that shouldn't happen, and it's not clear what to do if it does. PANIC is not a good idea, at least if the server uses hot standby, because that only makes the situation worse from availability point of view. So we log the error as a WARNING, and keep retrying. It's unlikely that the problem will just go away, but we keep retrying anyway in the hope that it does. However, it seems that we're too aggressive with the retries. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] warning message in standby
On Thu, Jun 10, 2010 at 12:01 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: We're talking about a corrupt record (incorrect CRC, incorrect backlink etc.), not errors within redo functions. During crash recovery, a corrupt record means you've reached end of WAL. In standby mode, when streaming WAL from master, that shouldn't happen, and it's not clear what to do if it does. PANIC is not a good idea, at least if the server uses hot standby, because that only makes the situation worse from availability point of view. So we log the error as a WARNING, and keep retrying. It's unlikely that the problem will just go away, but we keep retrying anyway in the hope that it does. However, it seems that we're too aggressive with the retries. You can reproduce this problem by doing the following. 1. initdb 2. edit postgresql.conf, set wal_level=hot_standby, max_wal_senders=1; edit pg_hba.conf, trust local replication connections 3. pg_ctl start 4. make some changes to the database 5. take a hot backup to another directory (call it pgslave) 6. create pgslave/recovery.conf with standby_mode='on', primary_conninfo=whatever, edit pgslave/postgresql.conf change the port number, set hot_standby=on 7. pg_ctl start -D pgslave At this point you should have a working HS/SR setup. Now: 8. shut the slave down 9. move recovery.conf out of the way 10. restart the slave - it will do recovery and enter normal running 11. make some database changes 12. stop the slave 13. put recovery.conf back 14. restart the slave 15. make a bunch of changes on the master When the slave then tries to replay, you then get something like: WARNING: invalid record length at 0/4005330 WARNING: invalid record length at 0/4005330 WARNING: invalid record length at 0/4005330 ...ad infinitum. Obviously there are other ways this could occur - the WAL could really be corrupted, for example - but the current handling is not too graceful. I'm actually thinking it might be better to trigger a shutdown if this happens. Probably something has gone haywire and manual intervention is required. Retrying when there's no hope of success isn't really that helpful. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error with GIT Repository
Stephen Frost wrote: * Andrew Dunstan (and...@dunslane.net) wrote: Luxenberg, Scott I. wrote: I have been trying to create/run a build farm as part of a project I am working on. That seems an odd thing to do since we have one ... To clarify, he's setting up a build farm *member*. :) Aha. Amazing the difference one little word can make ... As a side-note, it works just fine from git-hub's http mirror and that's what we've been playing with, but I don't know if we want to recommend that for build-farm members.. I don't see why not. Buildfarm members are going to have to reset their repos when we finally cut over in a few months. Luckily, this is a fairly painless operation - blow away the repo and change the config file and the script will resync as if nothing had happened. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error with GIT Repository
* Andrew Dunstan (and...@dunslane.net) wrote: I don't see why not. Buildfarm members are going to have to reset their repos when we finally cut over in a few months. Luckily, this is a fairly painless operation - blow away the repo and change the config file and the script will resync as if nothing had happened. Should we stop bothering to offer http://git.postgresql.org then..? Or do we expect it to get fixed and work correctly once we cut over and rebuild? Also, perhaps we could list the git-hub option on the wiki (http://wiki.postgresql.org/wiki/Other_Git_Repositories)? (and, yea, it's the same me) Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] failover vs. read only queries
On 06/09/2010 07:36 PM, Mark Kirkwood wrote: On 10/06/10 14:07, Tatsuo Ishii wrote: The one of top 3 questions I got when we propose them our HA solution is, how long will it take to do failover when the master DB crashes? Same here +1 In that case, wouldn't they set max_standby_delay to 0? In which case the failover problem goes away, no? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] warning message in standby
On Thu, Jun 10, 2010 at 5:13 PM, Robert Haas robertmh...@gmail.com wrote: At this point you should have a working HS/SR setup. Now: 8. shut the slave down 9. move recovery.conf out of the way 10. restart the slave - it will do recovery and enter normal running 11. make some database changes 12. stop the slave 13. put recovery.conf back 14. restart the slave 15. make a bunch of changes on the master When the slave then tries to replay, you then get something like: WARNING: invalid record length at 0/4005330 WARNING: invalid record length at 0/4005330 WARNING: invalid record length at 0/4005330 Woah, why does this procedure lead to this situation? I would hope there's nothing a user could do which would cause it short of invoking dd to corrupt the WAL files. At precisely which step of the procedure did the user do something wrong? Is there any reason we can't detect that they've done it and throw a specific error message saying the configuration is invalid? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] warning message in standby
On Thu, Jun 10, 2010 at 12:49 PM, Greg Stark gsst...@mit.edu wrote: On Thu, Jun 10, 2010 at 5:13 PM, Robert Haas robertmh...@gmail.com wrote: At this point you should have a working HS/SR setup. Now: 8. shut the slave down 9. move recovery.conf out of the way 10. restart the slave - it will do recovery and enter normal running 11. make some database changes 12. stop the slave 13. put recovery.conf back 14. restart the slave 15. make a bunch of changes on the master When the slave then tries to replay, you then get something like: WARNING: invalid record length at 0/4005330 WARNING: invalid record length at 0/4005330 WARNING: invalid record length at 0/4005330 Woah, why does this procedure lead to this situation? I would hope there's nothing a user could do which would cause it short of invoking dd to corrupt the WAL files. At precisely which step of the procedure did the user do something wrong? 13. Is there any reason we can't detect that they've done it and throw a specific error message saying the configuration is invalid? I'm not sure how we'd go about doing that, but I agree it would be nice. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error with GIT Repository
On Thu, Jun 10, 2010 at 18:20, Stephen Frost sfr...@snowman.net wrote: * Andrew Dunstan (and...@dunslane.net) wrote: I don't see why not. Buildfarm members are going to have to reset their repos when we finally cut over in a few months. Luckily, this is a fairly painless operation - blow away the repo and change the config file and the script will resync as if nothing had happened. Should we stop bothering to offer http://git.postgresql.org then..? Or No, we should not. Especially if someone has a clue how to do it. The last time I fixed it by runnin repack, but that didn't work this time. I have no clue why it's asking for a file that doesn't exist. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
Seems I underestimated the importance of forensic breadcrumbs. On 6/9/2010 12:09 PM, Tom Lane wrote: I do like the idea of using a status bit rather than FrozenXid to mark a frozen tuple, because that eliminates the conflict between wanting to freeze aggressively for performance reasons and wanting to preserve Xids for forensic reasons. But it doesn't seem to do much for Josh's original problem. Depends. Specifically on transaction profiles and how long the blocks linger around before being written. If you can set the all visible bit by the time, the page is written the first time, what bit including the is-frozen one cannot be set at that time too? Maybe some analysis on the typical behavior of such system is in order. Especially the case Josh was mentioning seems to be a typical single insert logging style application, with little else going on on that particular database. I can't reveal specifics about that particular case, but think of something like taking frequent sensor readings, that need to be kept for years for forensics in case there is a product recall some day. And even if some cases still required another page write because those frozen bits cannot be set on first write, this seems to be a win-win. We would get rid of the FrozenXid completely and shift to a bit, so we can effectively have a min_ freeze_age of zero while keeping the xid's forever. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command to prune archive at restartpoints
On 10/06/10 22:24, Dimitri Fontaine wrote: Heikki Linnakangasheikki.linnakan...@enterprisedb.com writes: Maybe we could add a new pg_cleanuparchive binary, but we'll need some discussion... Would this binary ever be used manually, not invoked by PostgreSQL? As it depends on the %r option to be given and to be right, I don't think so. Hmm, actually it would be pretty handy. To make use of a base backup, you need all the WAL files following the one where pg_start_backup() was called. We create a .backup file in the archive to indicate that location, like: 0001002F.0020.backup So to clean up all WAL files older than those needed by that base backup, you would simply copy-paste that location and call pg_cleanuparchive: pg_cleanuparchive /walarchive/ 0001002F Of course, if there's a perl one-liner to do that, we can just put that in the docs and don't really need pg_cleanuparchive at all. Therefore my take on this problem is to provide internal commands here, that maybe wouldn't need to be explicitly passed any argument. If they're internal they certainly can access to the information they need? You want more flexibility in more advanced cases. Like if you have multiple standbys sharing the archive, you only want to remove old WAL files after they're not needed by *any* of the standbys anymore. Doing the cleanup directly in the archive_cleanup_command would cause the old WAL files to be removed prematurely, but you could put a shell script there to store the location to a file, and call pg_cleanuparchive with the max() of the locations reported by all standby servers. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
On Thu, Jun 3, 2010 at 11:41 AM, Greg Stark gsst...@mit.edu wrote: I think to make it work you need to store a whole 64-bit reference transaction id consisting of both a cycle counter and a transaction id. The invariant for the page is that every xid on the page can be compared to that reference transaction id using normal transactionid semantics. Actually I think the easiest way to do that is to set it to the oldest xid on the page. The first thing to do before comparing any transaction id on the page with a real transaction id would be to figure out whether the reference xid is comparable to the live xid, which if it's the oldest xid on the page implies they'll all be comparable. The way to maintain that invariant would be that any xid insertion on the page must advance the reference xid if it's not comparable to the newly inserted xid. It has to be advanced to the oldest xid that's still comparable with the newly inserted xid. Any xids on the page that are older than the new refernce xid have to be frozen or removed. I'm not sure how to do that without keeping clog forever though. So the more I think about this the more I think it's unavoidable that we would need to retain clog forever. I think the goal here is to be able to load data into the database and then never write the data ever again. Even if you visit the page years later after the transaction ids have wrapped around several times. In that case there's no avoiding that you'll need to know whether that transaction committed or aborted. Now we could make a bet that most transactions commit and therefore we could keep a list of aborted transactions only which we might be able to keep forever in very little space if very few transactions abort. Presumably we would only use this form once the transaction was about to be truncated out of clog. I'm not too happy with the assumption that there aren't many aborts though. Someone could come along with a use case where they have lots of aborts and run into strange limitations and performance characteristics. Alternatively we could do something like keeping a list of tables touched by any transaction. Then vacuum could look for any non-committed transactions old enough to be in danger of aging out of clog and ensure those tables are frozen. But any tables which have never been touched by any such old transaction could be left alone. when we read in the page we'll be able to recognize the old transactions as committed if they're beyond the end of the clog horizon. I don't really like that idea either because it leaves performance really quite unpredictable. I could have a large table that goes unvacuumed for a long time -- then when I come along with some tiny query where I hit C-c and cause an abort I suddenly set a trap which causes a huge vacuum freeze to fire off. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command to prune archive at restartpoints
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Maybe we could add a new pg_cleanuparchive binary, but we'll need some discussion... Would this binary ever be used manually, not invoked by PostgreSQL? As it depends on the %r option to be given and to be right, I don't think so. Therefore my take on this problem is to provide internal commands here, that maybe wouldn't need to be explicitly passed any argument. If they're internal they certainly can access to the information they need? As a user, I'd find it so much better to trust PostgreSQL for proposing sane defaults. As a developer, you will certainly find it easier to maintain, document and distribute. While at it, the other internal command we need is pg_archive_bypass for the archive_command so that windows users have the /usr/bin/true option too. Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command to prune archive at restartpoints
On Thu, 2010-06-10 at 22:49 +0300, Heikki Linnakangas wrote: On 10/06/10 22:24, Dimitri Fontaine wrote: Heikki Linnakangasheikki.linnakan...@enterprisedb.com writes: Maybe we could add a new pg_cleanuparchive binary, but we'll need some discussion... Would this binary ever be used manually, not invoked by PostgreSQL? As it depends on the %r option to be given and to be right, I don't think so. Hmm, actually it would be pretty handy. To make use of a base backup, you need all the WAL files following the one where pg_start_backup() was called. We create a .backup file in the archive to indicate that location, like: 0001002F.0020.backup So to clean up all WAL files older than those needed by that base backup, you would simply copy-paste that location and call pg_cleanuparchive: pg_cleanuparchive /walarchive/ 0001002F OK, sounds like we're on the same thought train. Here's the code. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services pg_archivecleanup.tar Description: Unix tar archive -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error with GIT Repository
Excerpts from Andrew Dunstan's message of jue jun 10 11:26:59 -0400 2010: Why are you cloning over http? Here is the best way to clone, which seems to be working: [and...@sophia ]$ git clone --mirror git://git.postgresql.org/git/postgresql.git Initialized empty Git repository in /home/andrew/postgresql.git/ In case you're a git-ignorant like me and are wondering why the above does not produce a usable checkout, the complete recipe is here: http://archives.postgresql.org/message-id/20090602162347.gf23...@yugib.highrise.ca (in short, you need a git clone --reference) -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fix use of posix_fadvise in xlog.c
Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: In a steady-state situation new WAL files are not created very often because we recycle old ones, so it probably doesn't make much difference. Yeah. We really don't worry too much about the performance of the new-WAL-file-creation code path because of this. The only situation where the WAL zeroing path turns ugly is if you launch a bunch of activity against a fresh server that doesn't have any segments to recycle yet. The last time we talked about improving that, the best idea I thought came out was to be better about preallocating segments than the code already is, rather than trying to speed up how the kernel deals with the situation. See the links for Be more aggressive about creating WAL files at http://wiki.postgresql.org/wiki/Todo I'm also not very optimistic about adding more posix_fadvise calls really helping just because the implementations of those are so unpredictable across operating systems. I'm sure that Mark could figure out the right magic to speed up this specific case on Linux, but have my doubts that work would translate very well to many other operating systems. Whereas a more generic preallocation improvement would help everywhere. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug / shortcoming in has_*_privilege
On Thu, Jun 10, 2010 at 5:54 PM, Jim Nasby j...@nasby.net wrote: test...@workbook=# select has_table_privilege( 'public', 'test', 'SELECT' ); ERROR: role public does not exist test...@workbook=# So there's no way to see if a particular privilege has been granted to public. ISTM 'public' should be accepted, since you can't use it as a role name anyway... test...@workbook=# create role public; ERROR: role name public is reserved test...@workbook=# create role public; ERROR: role name public is reserved It's a bit sticky - you could make that work for has_table_privilege(name, oid, text) or has_table_privilege(name, text, text), but what would you do about the versions whose first argument is an oid? It would seem a bit awkward to have the behavior by asymmetrical, although I guess we could... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Bug / shortcoming in has_*_privilege
test...@workbook=# select has_table_privilege( 'public', 'test', 'SELECT' ); ERROR: role public does not exist test...@workbook=# So there's no way to see if a particular privilege has been granted to public. ISTM 'public' should be accepted, since you can't use it as a role name anyway... test...@workbook=# create role public; ERROR: role name public is reserved test...@workbook=# create role public; ERROR: role name public is reserved -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 6/4/10, Robert Haas robertmh...@gmail.com wrote: On Fri, Jun 4, 2010 at 10:44 AM, Greg Stark gsst...@mit.edu wrote: A function which takes a starting xid and a number of transactions to return seems very tied to one particular application. I could easily see other systems such as a multi-master system instead only wanting to compare two transactions to find out which committed first. Or non-replication applications where you have an LSN and want to know whether a given transaction had committed by that time. So one possible interface would be to do something like xids_committed_between(lsn_start, lsn_end) -- and yes, possibly with an optional argument to limit the number or records returned. I'm imagining that the backend data storage for this would be a file containing, essentially, a struct for each commit repeated over and over again, packed tightly. It's easy to index into such a file using a sequence number (give me the 1000'th commit) but searching by LSN would require (a) storing the LSNs and (b) binary search. Maybe it's worth adding that complexity, but I'm not sure that it is. Keeping the size of this file small is important for ensuring that it has minimal performance impact (which is also why I'm not sold on trying to include the tuple counters that Jan proposed - I think we can solve the problem he's worried about there more cleanly in other ways). AIUI, you index the file by offset. I think we should be very careful about assuming that we understand replication and its needs better than someone who has spent many years developing one of the major PostgreSQL replication solutions. Well the flip side of that is that we want an interface that's useful for more than just one replication system. This is something basic enough that I think it will be useful for more than just replication if we design it generally enough. It should be useful for backup/restore processes and monitoring as well as various forms of replication including master-slave trigger based systems but also including PITR-based replication, log-parsing systems, multi-master trigger based systems, 2PC-based systems, etc. Making it general enough to serve multiple needs is good, but we've got to make sure that the extra complexity is buying us something. Jan seems pretty confident that this could be used by Londiste also, though it would be nice to have some confirmation from the Londiste developer(s) on that. I think it may also have applications for distributed transactions and multi-master replication, but I am not too sure it helps much for PITR-based replication or log-parsing systems. We want to design something that is good, but trying to solve too many problems may end up solving none of them well. The potential for single shared queue implementation, with the additional potential for merging async replication implementations sounds attractive. (Merging ~ having single one that satisfies broad range of needs.) Unless the functionality accepted into core will be limited to replication only and/or performs worse than current snapshot-based grouping. Then it is uninteresting, of course. Jan's proposal of storing small struct into segmented files sounds like it could work. Can't say anything more because I can't imagine it as well as Jan. Would need to play with working implementation to say more... -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ps display waiting for max_standby_delay
Takahiro Itagaki wrote: Hi, We have codes to change ps display for recovery process during hot standby. The current code always shows max_standby_delay for the message, but how about showing actual waiting time instead? Since DBAs can always get the parameter from postgresql.conf they wrote, so the parameter value itself is not so useful. Actual waiting time might be more useful to determine which values to be set to max_standby_delay, no? [backend/storage/ipc/standby.c] snprintf(new_status + len, 50, waiting for max_standby_delay (%d ms), MaxStandbyDelay); == GetCurrentTimestamp() - waitStart set_ps_display(new_status, false); I think SQL-based activity view will be more useful than ps display, but it's an item for 9.1. Sounds interesting, but how often would the ps statust display be updated? I hope not too often. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
Jan Wieck janwi...@yahoo.com writes: Depends. Specifically on transaction profiles and how long the blocks linger around before being written. If you can set the all visible bit by the time, the page is written the first time, what bit including the is-frozen one cannot be set at that time too? All-visible and is-frozen would be the same bit ... And even if some cases still required another page write because those frozen bits cannot be set on first write, this seems to be a win-win. We would get rid of the FrozenXid completely and shift to a bit, so we can effectively have a min_ freeze_age of zero while keeping the xid's forever. Right. I don't see any downside, other than eating another status bit per tuple, which we can afford. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ps display waiting for max_standby_delay
Bruce Momjian br...@momjian.us wrote: how about showing actual waiting time instead? waiting for max_standby_delay (%d ms), MaxStandbyDelay) Sounds interesting, but how often would the ps statust display be updated? I hope not too often. We can change the interval of updates to 500ms or so if do it, but I rethink ps display is not the best place for the information. I'd like to modify the additonal message waiting for max_standby_delay just to waiting, because we don't use waiting for statement_timeout for normal queries. If we need additional information about conflictions in recovery, we would supply them with SQL views instead of ps display in 9.1. Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] failover vs. read only queries
On Fri, Jun 11, 2010 at 1:48 AM, Josh Berkus j...@agliodbs.com wrote: On 06/09/2010 07:36 PM, Mark Kirkwood wrote: On 10/06/10 14:07, Tatsuo Ishii wrote: The one of top 3 questions I got when we propose them our HA solution is, how long will it take to do failover when the master DB crashes? Same here +1 In that case, wouldn't they set max_standby_delay to 0? In which case the failover problem goes away, no? Yes, but I guess they'd also like to run read only queries on the standby. Setting max_standby_delay to 0 would prevent them from doing that because the conflict with the replay of the VACUUM or HOT record would often happen. vacuum_defer_cleanup_age would be helpful for that case, but it seems to be hard to tune that. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] vacuum_defer_cleanup_age
Hi, vacuum_defer_cleanup_age is categorized as Statement Behavior parameter in the document. On the other hand, it's categorized as Hot Standby one in postgresql.conf. Why do we need to do so? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ps display waiting for max_standby_delay
On Fri, Jun 11, 2010 at 11:20 AM, Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp wrote: Bruce Momjian br...@momjian.us wrote: how about showing actual waiting time instead? waiting for max_standby_delay (%d ms), MaxStandbyDelay) Sounds interesting, but how often would the ps statust display be updated? I hope not too often. We can change the interval of updates to 500ms or so if do it, but I rethink ps display is not the best place for the information. I'd like to modify the additonal message waiting for max_standby_delay just to waiting, because we don't use waiting for statement_timeout for normal queries. +1 I don't think that it's useful to display the value of max_standby_delay. If we need additional information about conflictions in recovery, we would supply them with SQL views instead of ps display in 9.1. +1 Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug / shortcoming in has_*_privilege
Robert Haas robertmh...@gmail.com writes: On Thu, Jun 10, 2010 at 5:54 PM, Jim Nasby j...@nasby.net wrote: So there's no way to see if a particular privilege has been granted to public. ISTM 'public' should be accepted, since you can't use it as a role name anyway... It's a bit sticky - you could make that work for has_table_privilege(name, oid, text) or has_table_privilege(name, text, text), but what would you do about the versions whose first argument is an oid? Nothing. The only reason to use those forms is in a join against pg_authid, and the public group doesn't have an entry there. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LLVM / clang
On tor, 2010-06-10 at 11:55 +0300, Peter Eisentraut wrote: Quick testing shows that clang doesn't get through the configure stage on this Debian system -- it looks like some amount of better integration with glibc might be needed. Some details on this ... configure has two problems. The first is a present but cannot be compiled warning about wctype.h. This is described here: http://llvm.org/bugs/show_bug.cgi?id=6691. It looks like glibc 2.11 or some later version will fix this. (eglibc 2.11 doesn't have the fix yet.) But this doesn't cause a problem during the compile. The second problem is that the prototype check for accept() fails. This is because glibc defines the second argument to be a transparent union, apparently to make it look like a lot of things at once. clang apparently doesn't understand that. One could address this by checking for the typedef that glibc uses explicitly in the configure check, but that would appear to defeat the point of the *transparent* union. A workaround is to remove -D_GNU_SOURCE from src/template/linux. Predictably, this will make PL/Perl fail to build. Also, it will make src/backend/libpq/auth.c fail to build, because struct ucred is only defined when _GNU_SOURCE is used. This would actually fail to work on GCC as well, so I think we should add an explicit configure check for struct ucred. The rest of the build goes through and the regression tests pass. Some new warnings, however: xlog.c:7759:22: warning: self-comparison always results in a constant value max_locks_per_xact != max_locks_per_xact) ^ Looks like a bug. postmaster.c:3386:18: warning: more data arguments than '%' conversions [-Wformat-extra-args] remote_host, remote_port); ^ dt_common.c:818:75: warning: more data arguments than '%' conversions [-Wformat-extra-args] sprintf(str + strlen(str), (min != 0) ? %+03d:%02d : %+03d, hour, min); ~~~^ [and a few more like that] These are instances where a format string is an expression that results in a variable number of format arguments. Not sure if that is actually legal in C. print.c:778:22: warning: field width should have type 'int', but argument has type 'unsigned int' [-Wformat] fprintf(fout, %-*s%s\n, (width_total - width) / 2, , ^ ~ [and a few more like that] Not sure about that. Also there are boatloads of warnings in the regex stuff about unused things, that we probably don't have to worry about. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] warning message in standby
On Fri, Jun 11, 2010 at 1:01 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: We're talking about a corrupt record (incorrect CRC, incorrect backlink etc.), not errors within redo functions. During crash recovery, a corrupt record means you've reached end of WAL. In standby mode, when streaming WAL from master, that shouldn't happen, and it's not clear what to do if it does. PANIC is not a good idea, at least if the server uses hot standby, because that only makes the situation worse from availability point of view. So we log the error as a WARNING, and keep retrying. It's unlikely that the problem will just go away, but we keep retrying anyway in the hope that it does. However, it seems that we're too aggressive with the retries. Right. The attached patch calms down the retries: if we found an invalid record while streaming WAL from master, we sleep for 5 seconds (needs to be reduced?) before retrying to replay the record which is in the same location where the invalid one was found. Comments? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center calm_down_retries_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LLVM / clang
Peter Eisentraut pete...@gmx.net writes: [ assorted LLVM warnings ] dt_common.c:818:75: warning: more data arguments than '%' conversions [-Wformat-extra-args] sprintf(str + strlen(str), (min != 0) ? %+03d:%02d : %+03d, hour, min); ~~~^ [and a few more like that] These are instances where a format string is an expression that results in a variable number of format arguments. Not sure if that is actually legal in C. I believe it's legal, but I'd be in favor of making a project policy against it, simply because you aren't going to get any static checking from gcc about whether the arguments match the format string. There isn't any good excuse not to code the above like if (min != 0) sprintf(str + strlen(str), %+03d:%02d, hour, min); else sprintf(str + strlen(str), %+03d, hour); which would produce warnings if you managed to mess up the format match. print.c:778:22: warning: field width should have type 'int', but argument has type 'unsigned int' [-Wformat] fprintf(fout, %-*s%s\n, (width_total - width) / 2, , Not sure about that. That one, on the other hand, is pretty silly ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LLVM / clang
Peter Eisentraut pete...@gmx.net wrote: Some new warnings, however: xlog.c:7759:22: warning: self-comparison always results in a constant value max_locks_per_xact != max_locks_per_xact) ^ Looks like a bug. Ah, it should be compared with the same name field in ControlFile. Index: src/backend/access/transam/xlog.c === --- src/backend/access/transam/xlog.c (HEAD) +++ src/backend/access/transam/xlog.c (fixed) @@ -7756,7 +7756,7 @@ if (wal_level != ControlFile-wal_level || MaxConnections != ControlFile-MaxConnections || max_prepared_xacts != ControlFile-max_prepared_xacts || - max_locks_per_xact != max_locks_per_xact) + max_locks_per_xact != ControlFile-max_locks_per_xact) { /* * The change in number of backend slots doesn't need to be Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers