Re: [HACKERS] Synchronous replication - patch status inquiry
On Fri, 2010-09-03 at 12:50 +0900, Fujii Masao wrote: On Thu, Sep 2, 2010 at 11:32 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I understand what you're after, the idea of being able to set synchronization level on a per-transaction basis is cool. But I haven't seen a satisfactory design for it. I don't understand how it would work in practice. Even though it's cool, having different kinds of standbys connected is a more common scenario, and the design needs to accommodate that too. I'm all ears if you can sketch a design that can do that. That design would affect what the standby should reply. If we choose async/recv/fsync/replay on a per-transaction basis, the standby should send multiple LSNs and the master needs to decide when replication has been completed. OTOH, if we choose just sync/async, the standby has only to send one LSN. The former seems to be more useful, but triples the number of ACK from the standby. I'm not sure whether its overhead is ignorable, especially when the distance between the master and the standby is very long. No, it doesn't. There is no requirement for additional messages. It just adds 16 bytes onto the reply message, maybe 24. If there is a noticeable overhead from that, shoot me. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous replication - patch status inquiry
On Thu, Sep 2, 2010 at 7:24 PM, Fujii Masao masao.fu...@gmail.com wrote: I propose a configuration file standbys.conf, in the master: # STANDBY NAME SYNCHRONOUS TIMEOUT importantreplica yes 100ms tempcopy no 10s Seems good. In fact, instead of yes/no, async/recv/fsync/replay is specified in SYNCHRONOUS field? OTOH, something like standby_name parameter should be introduced in recovery.conf. We should allow multiple standbys with the same name? Probably yes. We might need to add NUMBER field into the standbys.conf, in the future. Here is the proposed detailed design: standbys.conf = # This is not initialized by initdb, so users need to create it under $PGDATA. * The template is located in the PREFIX/share directory. # This is read by postmaster at the startup as well as pg_hba.conf is. * In EXEC_BACKEND environement, each walsender must read it at the startup. * This is ignored when max_wal_senders is zero. * FATAL is emitted when standbys.conf doesn't exist even if max_wal_senders is positive. # SIGHUP makes only postmaser re-read the standbys.conf. * New configuration doesn't affect the existing connections to the standbys, i.e., it's used only for subsequent connections. * XXX: Should the existing connections react to new configuration? What if new standbys.conf doesn't have the standby_name of the existing connection? # The connection from the standby is rejected if its standby_name is not listed in standbys.conf. * Multiple standbys with the same name are allowed. # The valid values of SYNCHRONOUS field are async, recv, fsync and replay. standby_name # This is new string-typed parameter in recovery.conf. * XXX: Should standby_name and standby_mode be merged? # Walreceiver sends this to the master when establishing the connection. Comments? Is the above too complicated for the first step? If so, I'd propose to just introduce new recovery.conf parameter like replication_mode specifying the synchronization level, instead. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous replication - patch status inquiry
On 03/09/10 09:36, Simon Riggs wrote: On Fri, 2010-09-03 at 12:50 +0900, Fujii Masao wrote: That design would affect what the standby should reply. If we choose async/recv/fsync/replay on a per-transaction basis, the standby should send multiple LSNs and the master needs to decide when replication has been completed. OTOH, if we choose just sync/async, the standby has only to send one LSN. The former seems to be more useful, but triples the number of ACK from the standby. I'm not sure whether its overhead is ignorable, especially when the distance between the master and the standby is very long. No, it doesn't. There is no requirement for additional messages. Please explain how you do it then. When a commit record is sent to the standby, it needs to acknowledge it 1) when it has received it, 2) when it fsyncs it to disk and c) when it's replayed. I don't see how you can get around that. Perhaps you can save a bit by combining multiple messages together, like in Nagle's algorithm, but then you introduce extra delays which is exactly what you don't want. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous replication - patch status inquiry
On Fri, Sep 3, 2010 at 3:36 PM, Simon Riggs si...@2ndquadrant.com wrote: The former seems to be more useful, but triples the number of ACK from the standby. I'm not sure whether its overhead is ignorable, especially when the distance between the master and the standby is very long. No, it doesn't. There is no requirement for additional messages. It just adds 16 bytes onto the reply message, maybe 24. If there is a noticeable overhead from that, shoot me. The reply message would be sent at least three times every WAL chunk, i.e., when the standby has received, synced and replayed it. So ISTM that additional messagings happen. Though I'm not sure if this really harms the performance... You'd like to choose async/recv/fsync/replay on a per-transaction basis rather than async/sync? Even when async is chosen as the synchronization level in standbys.conf, it can be changed to other level in transaction? If so, the standby has to send the reply even if async is chosen and most replies might be ignored in the master. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous replication - patch status inquiry
On Fri, 2010-09-03 at 09:55 +0300, Heikki Linnakangas wrote: On 03/09/10 09:36, Simon Riggs wrote: On Fri, 2010-09-03 at 12:50 +0900, Fujii Masao wrote: That design would affect what the standby should reply. If we choose async/recv/fsync/replay on a per-transaction basis, the standby should send multiple LSNs and the master needs to decide when replication has been completed. OTOH, if we choose just sync/async, the standby has only to send one LSN. The former seems to be more useful, but triples the number of ACK from the standby. I'm not sure whether its overhead is ignorable, especially when the distance between the master and the standby is very long. No, it doesn't. There is no requirement for additional messages. Please explain how you do it then. When a commit record is sent to the standby, it needs to acknowledge it 1) when it has received it, 2) when it fsyncs it to disk and c) when it's replayed. I don't see how you can get around that. Perhaps you can save a bit by combining multiple messages together, like in Nagle's algorithm, but then you introduce extra delays which is exactly what you don't want. From my perspective, you seem to be struggling to find reasons why this should not happen, rather than seeing the alternatives that would obviously present themselves if your attitude was a positive one. We won't make any progress with this style of discussion. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] thousand unrelated data files in pg_default tablespace
hello 2010/8/31 Tom Lane t...@sss.pgh.pa.us: Pavel Stehule pavel.steh...@gmail.com writes: there is a dump from 8KB files Well, those certainly look like tables/indexes not temp files. So we can rule out one theory. You're *certain* these aren't referenced from pg_class.relfilenode of any of the databases in the server? I have a info, so these files are not in pg_class.relfilenode. More - these files are three months old, and in this time was server two times restarted. Regards Pavel Stehule regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] regclass without error?
Hi, Is there any way to use regclass without having ERROR? pgpool-II needs to find the oid from table name and for the purpose it issues something like SELECT 'table_name'::regproc::oid. Problem is, if the table does not exist, an error occured and the transaction aborts. Ideally if the table does not exist, the SELECT returns 0 (InvalidOid). Any idea? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous replication - patch status inquiry
On 03/09/10 10:45, Simon Riggs wrote: On Fri, 2010-09-03 at 09:55 +0300, Heikki Linnakangas wrote: On 03/09/10 09:36, Simon Riggs wrote: On Fri, 2010-09-03 at 12:50 +0900, Fujii Masao wrote: That design would affect what the standby should reply. If we choose async/recv/fsync/replay on a per-transaction basis, the standby should send multiple LSNs and the master needs to decide when replication has been completed. OTOH, if we choose just sync/async, the standby has only to send one LSN. The former seems to be more useful, but triples the number of ACK from the standby. I'm not sure whether its overhead is ignorable, especially when the distance between the master and the standby is very long. No, it doesn't. There is no requirement for additional messages. Please explain how you do it then. When a commit record is sent to the standby, it needs to acknowledge it 1) when it has received it, 2) when it fsyncs it to disk and c) when it's replayed. I don't see how you can get around that. Perhaps you can save a bit by combining multiple messages together, like in Nagle's algorithm, but then you introduce extra delays which is exactly what you don't want. From my perspective, you seem to be struggling to find reasons why this should not happen, rather than seeing the alternatives that would obviously present themselves if your attitude was a positive one. We won't make any progress with this style of discussion. Huh? You made a very clear claim above that you don't need additional messages. I explained why I don't think that's true, and asked you to explain why you think it is true. Whether the claim is true or not does not depend on my attitude. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] plan time of MASSIVE partitioning ...
hello everybody, we came across an issue which turned out to be more serious than previously expected. imagine a system with, say, 1000 partitions (heavily indexed) or so. the time taken by the planner is already fairly heavy in this case. i tried this one with 5000 unindexed tables (just one col): test=# \timing Timing is on. test=# prepare x(int4) AS select * from t_data order by id desc; PREPARE Time: 361.552 ms you will see similar or higher runtimes in case of 500 partitions and a handful of indexes. does anybody see a potential way to do a shortcut through the planner? a prepared query is no solution here as constraint exclusion would be dead in this case (making the entire thing an even bigger drama). did anybody think of a solution to this problem. or more precisely: can there be a solution to this problem? many thanks, hans -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
Hi, Dimitri Fontaine írta: Simon Riggs si...@2ndquadrant.com writes: Seems strange. If you have 2 standbys and you say you would like node1 to be the preferred candidate, then you load it so heavily that a remote server with by-definition much larger network delay responds first, then I say your preference was wrong. There's a communication mismatch here I think. The problem with the dynamic aspect of the system is that the admin wants to plan ahead and choose in advance the failover server. Other than that I much prefer the automatic and dynamic quorum idea. If you, Jan and Yeb wish to completely exclude standbys from being part of any quorum, then I guess we need to have per-standby settings to allow that to be defined. I'm in favour of giving people options. That needn't be a mandatory per-standby setting, just a non-default option, so that we can reduce the complexity of configuration for common cases. +1 Maximum Performance = quorum = 0 Maximum Availability = quorum = 1, timeout_action = commit Maximum Protection = quorum = 1, timeout_action = shutdown +1 Being able to say that a given server has not been granted to participate into the vote allowing to reach the global durability quorum will allow for choosing the failover candidates. Now you're able to have this reporting server and know for sure that your sync replicated transactions are not waiting for it. To summarize, the current per-transaction approach would be : - transaction level replication synchronous behaviour Sorry for answering such an old mail, but what is the purpose of a transaction level synchronous behaviour if async transactions can be held back by a sync transaction? In my patch, when the transactions were waiting for ack from the standby, they have already released all their locks, the wait happened at the latest possible point in CommitTransaction(). In Fujii's patch (I am looking at synch_rep_0722.patch, is there a newer one?) the wait happens in RecordTransactionCommit() so other transactions still see the sync transaction and most importantly, the locks held by the sync transaction will make the async transactions waiting for the same lock wait too. - proxy/cascading in core - quorum setup for deciding any commit is safe - any server can be excluded from the sync quorum - timeout can still raises exception or ignore (commit)? This last point seems to need some more discussion, or I didn't understand well the current positions and proposals. Regards, Best regards, Zoltán Böszörményi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous replication - patch status inquiry
On Fri, 2010-09-03 at 12:33 +0300, Heikki Linnakangas wrote: On 03/09/10 10:45, Simon Riggs wrote: On Fri, 2010-09-03 at 09:55 +0300, Heikki Linnakangas wrote: On 03/09/10 09:36, Simon Riggs wrote: On Fri, 2010-09-03 at 12:50 +0900, Fujii Masao wrote: That design would affect what the standby should reply. If we choose async/recv/fsync/replay on a per-transaction basis, the standby should send multiple LSNs and the master needs to decide when replication has been completed. OTOH, if we choose just sync/async, the standby has only to send one LSN. The former seems to be more useful, but triples the number of ACK from the standby. I'm not sure whether its overhead is ignorable, especially when the distance between the master and the standby is very long. No, it doesn't. There is no requirement for additional messages. Please explain how you do it then. When a commit record is sent to the standby, it needs to acknowledge it 1) when it has received it, 2) when it fsyncs it to disk and c) when it's replayed. I don't see how you can get around that. Perhaps you can save a bit by combining multiple messages together, like in Nagle's algorithm, but then you introduce extra delays which is exactly what you don't want. From my perspective, you seem to be struggling to find reasons why this should not happen, rather than seeing the alternatives that would obviously present themselves if your attitude was a positive one. We won't make any progress with this style of discussion. Huh? You made a very clear claim above that you don't need additional messages. I explained why I don't think that's true, and asked you to explain why you think it is true. Whether the claim is true or not does not depend on my attitude. Why exactly would we need to send 3 messages when we could send 1? Replace your statements of it needs to with why would it instead. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous replication - patch status inquiry
On 03/09/10 13:20, Simon Riggs wrote: On Fri, 2010-09-03 at 12:33 +0300, Heikki Linnakangas wrote: On 03/09/10 10:45, Simon Riggs wrote: On Fri, 2010-09-03 at 09:55 +0300, Heikki Linnakangas wrote: On 03/09/10 09:36, Simon Riggs wrote: On Fri, 2010-09-03 at 12:50 +0900, Fujii Masao wrote: That design would affect what the standby should reply. If we choose async/recv/fsync/replay on a per-transaction basis, the standby should send multiple LSNs and the master needs to decide when replication has been completed. OTOH, if we choose just sync/async, the standby has only to send one LSN. The former seems to be more useful, but triples the number of ACK from the standby. I'm not sure whether its overhead is ignorable, especially when the distance between the master and the standby is very long. No, it doesn't. There is no requirement for additional messages. Please explain how you do it then. When a commit record is sent to the standby, it needs to acknowledge it 1) when it has received it, 2) when it fsyncs it to disk and c) when it's replayed. I don't see how you can get around that. Perhaps you can save a bit by combining multiple messages together, like in Nagle's algorithm, but then you introduce extra delays which is exactly what you don't want. From my perspective, you seem to be struggling to find reasons why this should not happen, rather than seeing the alternatives that would obviously present themselves if your attitude was a positive one. We won't make any progress with this style of discussion. Huh? You made a very clear claim above that you don't need additional messages. I explained why I don't think that's true, and asked you to explain why you think it is true. Whether the claim is true or not does not depend on my attitude. Why exactly would we need to send 3 messages when we could send 1? Replace your statements of it needs to with why would it instead. (scratches head..) What's the point of differentiating received/fsynced/replayed, if the master receives the ack for all of them at the same time? Let's try this with an example: In the master, I do stuff and commit a transaction. I want to know when the transaction is fsynced in the standby. The WAL is sent to the standby, up to the commit record. Upthread you said that: The standby does *not* need to know the wishes of transactions on the master. So, when does standby send the single message back to the master? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)
On 02/09/10 23:13, Tom Lane wrote: The WaitLatch ...timeout API could use a bit of refinement. I'd suggest defining negative timeout as meaning wait forever, so that timeout = 0 can be used for check but don't wait. Also, it seems like the function shouldn't just return void but should return a bool to show whether it saw the latch set or timed out. In case of WaitLatchOrSocket, the caller might want to know if a latch was set, the socket became readable, or it timed out. So we need three different return values. (Yeah, I realize the caller could look into the latch to find that out, but callers really ought to treat latches as opaque structs.) Hmm, maybe we need a TestLatch function to check if a latch is set. I don't think you have the select-failed logic right in WaitLatchOrSocket; on EINTR it will suppose that FD_ISSET is a valid test to make, which I think ain't the case. Just continue around the loop. Yep. I also realized that the timeout handling is a bit surprising with interrupts. After EINTR we call select() again with the same timeout, so a signal effectively restarts the timer. We seem to have similar behavior in a couple of other places, in pgstat.c and auth.c. So maybe that's OK and just needs to be documented, but I thought I'd bring it up. It seems like both implementations are #include'ing more than they ought to --- why replication/walsender.h, in particular? Windows implementation needs it for the max_wal_senders variable, to allocate enough shared Event objects in LatchShmemInit. In unix_latch.c it's not needed. Also, using sig_atomic_t for owner_pid is entirely not sane. On many platforms sig_atomic_t is only a byte, and besides which you have no need for that field to be settable by a signal handler. Hmm, true, it doesn't need to be set from signal handler, but is there an atomicity problem if one process calls ReleaseLatch while another process is in SetLatch? ReleaseLatch sets owner_pid to 0, while SetLatch reads it and calls kill() on it. Can we assume that pid_t is atomic, or do we need a spinlock to protect it? (Windows implementation has a similar issue with HANDLE instead of pid_t) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] thousand unrelated data files in pg_default tablespace
On 03/09/10 11:16, Pavel Stehule wrote: 2010/8/31 Tom Lanet...@sss.pgh.pa.us: Pavel Stehulepavel.steh...@gmail.com writes: there is a dump from 8KB files Well, those certainly look like tables/indexes not temp files. So we can rule out one theory. You're *certain* these aren't referenced from pg_class.relfilenode of any of the databases in the server? I have a info, so these files are not in pg_class.relfilenode. More - these files are three months old, and in this time was server two times restarted. Maybe they're tables that were created in a transaction, but the process crashed hard before committing? Like: BEGIN; CREATE TABLE foo (...); COPY foo FROM ...; kill -9 postgres That will leave behind a file like that. Do you do something like that in the application? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Streaming a base backup from master
It's been discussed before that it would be cool if you could stream a new base backup from the master server, via libpq. That way you would not need low-level filesystem access to initialize a new standby. Magnus mentioned today that he started hacking on that, and coincidentally I just started experimenting with it yesterday as well :-). So let's get this out on the mailing list. Here's a WIP patch. It adds a new TAKE_BACKUP command to the replication command set. Upon receiving that command, the master starts a COPY, and streams a tarred copy of the data directory to the client. The patch includes a simple command-line tool, pg_streambackup, to connect to a server and request a backup that you can then redirect to a .tar file or pipe to tar x. TODO: * We need a smarter way to do pg_start/stop_backup() with this. At the moment, you can only have one backup running at a time, but we shouldn't have that limitation with this built-in mechanism. * The streamed backup archive should contain all the necessary WAL files too, so that you don't need to set up archiving to use this. You could just point the tiny client tool to the server, and get a backup archive containing everything that's necessary to restore correctly. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile index 113dc3f..5bb4159 100644 --- a/src/backend/replication/Makefile +++ b/src/backend/replication/Makefile @@ -12,6 +12,6 @@ subdir = src/backend/replication top_builddir = ../../.. include $(top_builddir)/src/Makefile.global -OBJS = walsender.o walreceiverfuncs.o walreceiver.o +OBJS = walsender.o walreceiverfuncs.o walreceiver.o basebackup.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c new file mode 100644 index 000..9be9bdf --- /dev/null +++ b/src/backend/replication/basebackup.c @@ -0,0 +1,276 @@ +/*- + * + * basebackup.c + * code for taking a base backup and streaming it a standby + * + * Portions Copyright (c) 2010-2010, PostgreSQL Global Development Group + * + * IDENTIFICATION + * $PostgreSQL$ + * + *- + */ +#include postgres.h + +#include sys/types.h +#include sys/stat.h +#include unistd.h +#include dirent.h +#include time.h + +#include access/xlog_internal.h /* for pg_start/stop_backup */ +#include utils/builtins.h +#include lib/stringinfo.h +#include libpq/libpq.h +#include libpq/pqformat.h +#include replication/basebackup.h +#include storage/fd.h + +static void sendDir(char *path); +static void sendFile(char *path); +static void _tarWriteHeader(char *filename, uint64 fileLen); + +void +SendBaseBackup(void) +{ + StringInfoData buf; + + DirectFunctionCall2(pg_start_backup, CStringGetTextDatum(basebackup), + BoolGetDatum(true)); + + /* Send CopyOutResponse message */ + pq_beginmessage(buf, 'H'); + pq_sendbyte(buf, 0); /* overall format */ + pq_sendint(buf, 0, 2); /* natts */ + pq_endmessage(buf); + + /* tar up the data directory */ + sendDir(.); + + /* Send CopyDone message */ + pq_putemptymessage('c'); + + /* XXX: Is there no DirectFunctionCall0? */ + DirectFunctionCall1(pg_stop_backup, (Datum) 0); +} + +static void +sendDir(char *path) +{ + DIR *dir; + struct dirent *de; + char pathbuf[MAXPGPATH]; + struct stat statbuf; + + dir = AllocateDir(path); + while ((de = ReadDir(dir, path)) != NULL) + { + /* Skip special stuff */ + if (strcmp(de-d_name, .) == 0 || strcmp(de-d_name, ..) == 0) + continue; + + snprintf(pathbuf, MAXPGPATH, %s/%s, path, de-d_name); + + /* Skip pg_xlog and postmaster.pid */ + if (strcmp(pathbuf, ./pg_xlog) == 0) + continue; + if (strcmp(pathbuf, ./postmaster.pid) == 0) + continue; + + if (lstat(pathbuf, statbuf) != 0) + { + if (errno != ENOENT) + { +elog(WARNING, could not stat file or directory \%s\: %m, + pathbuf); + } + continue; + } + + if (S_ISDIR(statbuf.st_mode)) + { + /* call ourselves recursively for a directory */ + sendDir(pathbuf); + } + else if (S_ISREG(statbuf.st_mode)) + { + sendFile(pathbuf); + } + else + elog(WARNING, skipping special file \%s\, pathbuf); + } + FreeDir(dir); +} + + +/* Functions for handling tar file format, copied from pg_dump **/ + +/* + * Utility routine to print possibly larger than 32 bit integers in a + * portable fashion. Filled with zeros. + */ +static void +print_val(char *s, uint64 val, unsigned int base, size_t len) +{ + int i; + + for (i = len; i 0; i--) + { + int digit = val % base; + + s[i - 1] = '0' + digit; + val = val / base; + } +} + +/* + * Maximum file size for a tar member: The limit inherent in the + * format is 2^33-1 bytes (nearly 8 GB). But we don't want to exceed + * what we can represent in pgoff_t. + */ +#define
Re: [HACKERS] Streaming a base backup from master
On 3 September 2010 12:19, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: TODO: * We need a smarter way to do pg_start/stop_backup() with this. At the moment, you can only have one backup running at a time, but we shouldn't have that limitation with this built-in mechanism. Would it be possible to not require pg_start/stop_backup() for this new feature? (yes, I'm probably missing something obvious here) -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Here's a WIP patch. It adds a new TAKE_BACKUP command to the replication command set. Upon receiving that command, the master starts a COPY, and streams a tarred copy of the data directory to the client. The patch includes a simple command-line tool, pg_streambackup, to connect to a server and request a backup that you can then redirect to a .tar file or pipe to tar x. Cool. Can you add a TODO to build in code to un-tar the archive? tar is not usually found on Windows systems, and as we already have tar extraction code in pg_restore it could presumably be added relatively painlessly. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
On Fri, Sep 3, 2010 at 13:19, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: It's been discussed before that it would be cool if you could stream a new base backup from the master server, via libpq. That way you would not need low-level filesystem access to initialize a new standby. Magnus mentioned today that he started hacking on that, and coincidentally I just started experimenting with it yesterday as well :-). So let's get this out on the mailing list. Here's a WIP patch. It adds a new TAKE_BACKUP command to the replication command set. Upon receiving that command, the master starts a COPY, and streams a tarred copy of the data directory to the client. The patch includes a simple command-line tool, pg_streambackup, to connect to a server and request a backup that you can then redirect to a .tar file or pipe to tar x. TODO: * We need a smarter way to do pg_start/stop_backup() with this. At the moment, you can only have one backup running at a time, but we shouldn't have that limitation with this built-in mechanism. * The streamed backup archive should contain all the necessary WAL files too, so that you don't need to set up archiving to use this. You could just point the tiny client tool to the server, and get a backup archive containing everything that's necessary to restore correctly. For this last point, this should of course be *optional*, but it would be very good to have that option (and probably on by default). Couple of quick comments that I saw directly differentiated from the code I have :-) We chatted some about it already, but it should be included for others... * It should be possible to pass the backup label through, not just hardcode it to basebackup * Needs support for tablespaces. We should either follow the symlinks and pick up the files, or throw an error if it's there. Silently delivering an incomplete backup is not a good thing :-) * Is there a point in adapting the chunk size to the size of the libpq buffers? FWIW, my implementation was as a user-defined function, which has the advantage it can run on 9.0. But most likely this code can be ripped out and provided as a separate backport project for 9.0 if necessary - no need to have separate codebases. Other than that, our code is remarkably similar. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Fri, Sep 3, 2010 at 6:43 PM, Boszormenyi Zoltan z...@cybertec.at wrote: In my patch, when the transactions were waiting for ack from the standby, they have already released all their locks, the wait happened at the latest possible point in CommitTransaction(). In Fujii's patch (I am looking at synch_rep_0722.patch, is there a newer one?) No ;) We'll have to create the patch based on the result of the recent discussion held on other thread. the wait happens in RecordTransactionCommit() so other transactions still see the sync transaction and most importantly, the locks held by the sync transaction will make the async transactions waiting for the same lock wait too. The transaction should be invisible to other transactions until its replication has been completed. So I put the wait before CommitTransaction() calls ProcArrayEndTransaction(). Is this unsafe? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
On Fri, Sep 3, 2010 at 13:25, Thom Brown t...@linux.com wrote: On 3 September 2010 12:19, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: TODO: * We need a smarter way to do pg_start/stop_backup() with this. At the moment, you can only have one backup running at a time, but we shouldn't have that limitation with this built-in mechanism. Would it be possible to not require pg_start/stop_backup() for this new feature? (yes, I'm probably missing something obvious here) You don't need to run it *manually*, but the process needs to run it automatically in the background for you. Which it does already in the suggested patch. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
On 3 September 2010 12:30, Magnus Hagander mag...@hagander.net wrote: On Fri, Sep 3, 2010 at 13:25, Thom Brown t...@linux.com wrote: On 3 September 2010 12:19, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: TODO: * We need a smarter way to do pg_start/stop_backup() with this. At the moment, you can only have one backup running at a time, but we shouldn't have that limitation with this built-in mechanism. Would it be possible to not require pg_start/stop_backup() for this new feature? (yes, I'm probably missing something obvious here) You don't need to run it *manually*, but the process needs to run it automatically in the background for you. Which it does already in the suggested patch. Ah, clearly I didn't read the patch in any detail. Thanks :) -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
On 03/09/10 14:25, Thom Brown wrote: On 3 September 2010 12:19, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: TODO: * We need a smarter way to do pg_start/stop_backup() with this. At the moment, you can only have one backup running at a time, but we shouldn't have that limitation with this built-in mechanism. Would it be possible to not require pg_start/stop_backup() for this new feature? (yes, I'm probably missing something obvious here) Well, pg_start_backup() does several things: 1. It sets the forceFullPageWrites flag, so that we don't get partial pages in the restored database. 2. It performs a checkpoint 3. It creates a backup label file We certainly need 1 and 2. We don't necessary need to write the backup label file to the data directory when we're streaming the backup directly to the client, we can just include it in the streamed archive. pg_stop_backup() also does several things: 1. It clears the forceFullPageWrites flag. 2. It writes an end-of-backup WAL record 3. It switches to new WAL segment, to get the final WAL segment archived. 4. It writes a backup history file 5. It removes the backup label file. 6. It waits for all the required WAL files to be archived. We need 1, but the rest we could do in a smarter way. When we have more control of the backup process, I don't think we need the end-of-backup WAL record or the backup label anymore. We can add the pg_control file as the last file in the archive, and set minRecoveryPoint in it to the last WAL record needed to recover. So no, we don't really need pg_start/stop_backup() per se, but we'll need something similar... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
On 03/09/10 14:28, Dave Page wrote: On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Here's a WIP patch. It adds a new TAKE_BACKUP command to the replication command set. Upon receiving that command, the master starts a COPY, and streams a tarred copy of the data directory to the client. The patch includes a simple command-line tool, pg_streambackup, to connect to a server and request a backup that you can then redirect to a .tar file or pipe to tar x. Cool. Can you add a TODO to build in code to un-tar the archive? tar is not usually found on Windows systems, and as we already have tar extraction code in pg_restore it could presumably be added relatively painlessly. Ok. Another obvious thing that people will want is to gzip the tar file while sending it, to reduce network traffic. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
On Fri, Sep 3, 2010 at 13:48, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 03/09/10 14:28, Dave Page wrote: On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Here's a WIP patch. It adds a new TAKE_BACKUP command to the replication command set. Upon receiving that command, the master starts a COPY, and streams a tarred copy of the data directory to the client. The patch includes a simple command-line tool, pg_streambackup, to connect to a server and request a backup that you can then redirect to a .tar file or pipe to tar x. Cool. Can you add a TODO to build in code to un-tar the archive? tar is not usually found on Windows systems, and as we already have tar extraction code in pg_restore it could presumably be added relatively painlessly. Ok. Another obvious thing that people will want is to gzip the tar file while sending it, to reduce network traffic. Not necessarily obvious, needs to be configurable. There are a lot of cases where you might not want it. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost estimates for parameterized paths
On Thu, Sep 2, 2010 at 5:31 PM, Tom Lane t...@sss.pgh.pa.us wrote: Awhile back I ranted about replacing the planner's concept of inner indexscans with a more generalized notion of parameterized paths: http://archives.postgresql.org/pgsql-hackers/2009-10/msg00994.php The executor fixes for that are done, and now I'm grappling with getting the planner to do something useful with it. The biggest problem I've run into is that a parameterized path can't really be assigned a fixed cost in the same way that a normal path can. The current implementation of cost_index() depends on knowing the size of the outer relation --- that is, the expected number of execution loops for the indexscan --- in order to account for cache effects sanely while estimating the average cost of any one inner indexscan. We know that that is an important thing to do because the cost estimates seem to be a lot closer to reality now that we do that than what we were getting before; so dropping the consideration is entirely out of the question. The planner is already cheating on this to a considerable extent, because it estimates the cost of an inner indexscan only once, using the first outer rel we try to join to. That cost is cached and reused with other potential outer-rel join partners, even though very different numbers of outer rows might be involved. This problem will get a lot worse with the types of plans that I hope the planner will be able to come up with after this fix goes in, because the indexscan might be at the bottom of a join nest. So we need a real fix not another hack. The best idea I can come up with at the moment is to compute best case and worst case costs for a parameterized path, corresponding to the largest and smallest numbers of loops we might expect it to be repeated for. The largest number of loops could be estimated via the cartesian product of the sizes of the other tables in the query, for example. The worst case cost is its cost if only repeated once. Then, during path pruning in add_path, we only discard a parameterized path if its best-case cost is worse than the worst-case cost of some otherwise comparable path. Whenever we join the parameterized path with the required outer relation, we redo the cost calculation using that rel's actual rowcount estimate in order to form a final cost estimate for the no-longer-parameterized join path. Interestingly, I previously proposed almost exactly this approach to handle a couple of other problems: http://archives.postgresql.org/pgsql-hackers/2009-09/msg01379.php (index only scans) http://archives.postgresql.org/pgsql-hackers/2009-10/msg00125.php (per-query work mem) I'm not entirely sure whether we can use this approach for more than one kind of problem at a time; if we can't, it's probably not a good idea to do it at all. I also fear that any venture into this area will involve slowing down add_path(), which is a hotspot when planning large join nests. That might not be a win, if this is the only case it allows us to improve. *thinks* It strikes me that the outer tuple count is essentially being used to derate the cost of the index probes. So another way to look at this is that the best-case cost of an index probe is just the CPU cost, and the worst-case cost of an index probe is the CPU cost plus the disk cost. So your fear that not many parameterized paths will be discarded is probably valid. But then, maybe that's OK. It seems to me that the critical point is to make sure that we don't form them in the first place in cases where we could get the same benefit by commuting the joins. Once we've gotten to the point of considering a plan of this type, the chances that it's actually the best plan seem pretty high. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plan time of MASSIVE partitioning ...
* PostgreSQL - Hans-Jürgen Schönig (postg...@cybertec.at) wrote: did anybody think of a solution to this problem. or more precisely: can there be a solution to this problem? Please post to the correct list (-performance) and provide information like PG version, postgresql.conf, the actual table definition, the resulting query plan, etc, etc... Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] plan time of MASSIVE partitioning ...
On Sep 3, 2010, at 2:04 PM, Stephen Frost wrote: * PostgreSQL - Hans-Jürgen Schönig (postg...@cybertec.at) wrote: did anybody think of a solution to this problem. or more precisely: can there be a solution to this problem? Please post to the correct list (-performance) and provide information like PG version, postgresql.conf, the actual table definition, the resulting query plan, etc, etc... Thanks, Stephen hello stephen, this seems like more a developer question to me than a pre performance one. it is not related to the table structure at all - it is basically an issue with incredibly large inheritance lists. it applies to postgres 9 and most likely to everything before. postgresql.conf is not relevant at all at this point. the plan is pretty fine. the question is rather: does anybody see a chance to handle such lists more efficiently inside postgres? also, it is not the point if my data structure is sane or not. it is really more generic - namely a shortcut for this case inside the planing process. many thanks, hans -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: * We need a smarter way to do pg_start/stop_backup() with this. At the moment, you can only have one backup running at a time, but we shouldn't have that limitation with this built-in mechanism. Well there's no particular reason we couldn't support having multiple pg_start_backup() pending either. It's just not usually something people have need so far. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plan time of MASSIVE partitioning ...
* PostgreSQL - Hans-Jürgen Schönig (postg...@cybertec.at) wrote: this seems like more a developer question to me than a pre performance one. it is not related to the table structure at all - it is basically an issue with incredibly large inheritance lists. it applies to postgres 9 and most likely to everything before. postgresql.conf is not relevant at all at this point. Really? What's constraint_exclusion set to? Is GEQO getting used? What are the GEQO parameters set to? Do you have any CHECK constraints on the tables? If you want someone else to build a test case and start looking into it, it's best to not make them have to guess at what you've done. the plan is pretty fine. the question is rather: does anybody see a chance to handle such lists more efficiently inside postgres? also, it is not the point if my data structure is sane or not. it is really more generic - namely a shortcut for this case inside the planing process. Coming up with cases where PG doesn't perform well in a completely contrived unrealistic environment isn't likely to impress anyone to do anything. A small (but complete..) test case which mimics a real world environment and real world problem would go alot farther. I can certainly believe that people out there have partitions set up with lots of tables and that it will likely grow- but they probably also have CHECK constraints, have tweaked what constraint_exclusion is set to, have adjusted their work_mem and related settings, maybe tweaked some planner GUCs, etc, etc. This is an area I'm actually interested in and curious about, I'd rather work together on it than be told that the questions I'm asking aren't relevant. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Streaming a base backup from master
On 03/09/10 15:16, Greg Stark wrote: On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: * We need a smarter way to do pg_start/stop_backup() with this. At the moment, you can only have one backup running at a time, but we shouldn't have that limitation with this built-in mechanism. Well there's no particular reason we couldn't support having multiple pg_start_backup() pending either. It's just not usually something people have need so far. The backup label file makes that hard. There can be only one at a time. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] thousand unrelated data files in pg_default tablespace
2010/9/3 Heikki Linnakangas heikki.linnakan...@enterprisedb.com: On 03/09/10 11:16, Pavel Stehule wrote: 2010/8/31 Tom Lanet...@sss.pgh.pa.us: Pavel Stehulepavel.steh...@gmail.com writes: there is a dump from 8KB files Well, those certainly look like tables/indexes not temp files. So we can rule out one theory. You're *certain* these aren't referenced from pg_class.relfilenode of any of the databases in the server? I have a info, so these files are not in pg_class.relfilenode. More - these files are three months old, and in this time was server two times restarted. Maybe they're tables that were created in a transaction, but the process crashed hard before committing? Like: BEGIN; CREATE TABLE foo (...); COPY foo FROM ...; kill -9 postgres yes, it's possible - but there are not any record about server crash - sometimes client crashes. Regards Pavel That will leave behind a file like that. Do you do something like that in the application? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
Fujii Masao írta: On Fri, Sep 3, 2010 at 6:43 PM, Boszormenyi Zoltan z...@cybertec.at wrote: In my patch, when the transactions were waiting for ack from the standby, they have already released all their locks, the wait happened at the latest possible point in CommitTransaction(). In Fujii's patch (I am looking at synch_rep_0722.patch, is there a newer one?) No ;) We'll have to create the patch based on the result of the recent discussion held on other thread. the wait happens in RecordTransactionCommit() so other transactions still see the sync transaction and most importantly, the locks held by the sync transaction will make the async transactions waiting for the same lock wait too. The transaction should be invisible to other transactions until its replication has been completed. Invisible? How can it be invisible? You are in RecordTransactionCommit(), even before calling ProcArrayEndTransaction(MyProc, latestXid) and releasing the locks the transaction holds. So I put the wait before CommitTransaction() calls ProcArrayEndTransaction(). Is this unsafe? I don't know whether it's unsafe. In my patch, I only registered the Xid at the point where you do WaitXLogSend(), this was the safe point to setup the waiting for sync ack. Otherwise, when the Xid registration for the sync ack was done in CommitTransaction() later than RecordTransactionCommit(), there was a race between the primary and the standby. The scenario was that the standby received and processed the COMMIT of certain Xids even before the backend on the primary properly registered its Xid, so the backend has set up the waiting for sync ack after this Xid was acked by the standby. The result was stuck backends. My idea to split up the registration for wait and the waiting itself would allow for transaction-level synchronous setting, i.e. in my patch the transaction released the locks and did all the post-commit cleanups *then* it waited for sync ack if needed. In the meantime, because locks were already released, other transactions could progress with their job, allowing that e.g. async transactions to progress and theoretically finish faster than the sync transaction that was waiting for the ack. The solution in my patch was not racy, registration of the Xid was done before XLogInsert() in RecordTransactionCommit(). If the standby acked the Xid to the primary before reaching the end of CommitTransaction() then this backend didn't even needed to wait because the Xid was found in its PGPROC structure and the waiting for sync ack was torn down. But with the LSNs, as you are waiting for XactLastRecEnd which is set by XLogInsert(). I don't know if it's safe to WaitXLogSend() after XLogFlush() in RecordTransactionCommit(). I remember that in previous instances of my patch even if I put the waiting for sync ack directly after latestXid = RecordTransactionCommit(); in CommitTransaction(), there were cases when I got stuck backends after a pgbench run. I had the primary and standbys on the same machine on different ports, so the ack was almost instant, which wouldn't be the case with a real network. But the race condition was still there it just doesn't show up with networks being slower than memory. In your patch, the waiting happens almost at the end of RecordTransactionCommit(), so theoretically it has the same race condition. Am I missing something? Best regards, Zoltán Böszörményi Regards, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
On Fri, Sep 3, 2010 at 7:28 AM, Dave Page dp...@pgadmin.org wrote: On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Here's a WIP patch. It adds a new TAKE_BACKUP command to the replication command set. Upon receiving that command, the master starts a COPY, and streams a tarred copy of the data directory to the client. The patch includes a simple command-line tool, pg_streambackup, to connect to a server and request a backup that you can then redirect to a .tar file or pipe to tar x. Cool. Can you add a TODO to build in code to un-tar the archive? tar is not usually found on Windows systems, and as we already have tar extraction code in pg_restore it could presumably be added relatively painlessly. It seems like the elephant in the room here is updating an existing backup without recopying the entire data directory. Perhaps that's phase two, but worth keeping in mind... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
On Fri, Sep 3, 2010 at 15:24, Robert Haas robertmh...@gmail.com wrote: On Fri, Sep 3, 2010 at 7:28 AM, Dave Page dp...@pgadmin.org wrote: On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Here's a WIP patch. It adds a new TAKE_BACKUP command to the replication command set. Upon receiving that command, the master starts a COPY, and streams a tarred copy of the data directory to the client. The patch includes a simple command-line tool, pg_streambackup, to connect to a server and request a backup that you can then redirect to a .tar file or pipe to tar x. Cool. Can you add a TODO to build in code to un-tar the archive? tar is not usually found on Windows systems, and as we already have tar extraction code in pg_restore it could presumably be added relatively painlessly. It seems like the elephant in the room here is updating an existing backup without recopying the entire data directory. Perhaps that's phase two, but worth keeping in mind... I'd say that's a very different use-case, but still a very useful one of course. It's probably going to be a lot more complex (it would require bi-directional traffic, I think)... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
On Fri, Sep 3, 2010 at 2:24 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Sep 3, 2010 at 7:28 AM, Dave Page dp...@pgadmin.org wrote: On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Here's a WIP patch. It adds a new TAKE_BACKUP command to the replication command set. Upon receiving that command, the master starts a COPY, and streams a tarred copy of the data directory to the client. The patch includes a simple command-line tool, pg_streambackup, to connect to a server and request a backup that you can then redirect to a .tar file or pipe to tar x. Cool. Can you add a TODO to build in code to un-tar the archive? tar is not usually found on Windows systems, and as we already have tar extraction code in pg_restore it could presumably be added relatively painlessly. It seems like the elephant in the room here is updating an existing backup without recopying the entire data directory. Perhaps that's phase two, but worth keeping in mind... rsync? Might be easier to use that from day 1 (well, day 2) than to retrofit later. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plan time of MASSIVE partitioning ...
2010/9/3 PostgreSQL - Hans-Jürgen Schönig postg...@cybertec.at: i tried this one with 5000 unindexed tables (just one col): test=# \timing Timing is on. test=# prepare x(int4) AS select * from t_data order by id desc; PREPARE Time: 361.552 ms you will see similar or higher runtimes in case of 500 partitions and a handful of indexes. I'd like to see (1) a script to reproduce your test environment (as Stephen also requested) and (2) gprof or oprofile results. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
On Fri, Sep 3, 2010 at 9:26 AM, Dave Page dp...@pgadmin.org wrote: rsync? Might be easier to use that from day 1 (well, day 2) than to retrofit later. I'm not sure we want to depend on an external utility like that, particularly one that users may not have installed. And I'm not sure if that can be made to work over a libpq channel, either. But certainly something with that functionality would be nice to have, whether it ends up sharing code or not. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
On Fri, Sep 3, 2010 at 2:29 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Sep 3, 2010 at 9:26 AM, Dave Page dp...@pgadmin.org wrote: rsync? Might be easier to use that from day 1 (well, day 2) than to retrofit later. I'm not sure we want to depend on an external utility like that, particularly one that users may not have installed. And I'm not sure if that can be made to work over a libpq channel, either. But certainly something with that functionality would be nice to have, whether it ends up sharing code or not. No, I agree we don't want an external dependency (I was just bleating about needing tar on Windows). I was assuming/hoping there's a librsync somewhere... -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regclass without error?
On Fri, Sep 3, 2010 at 4:28 AM, Tatsuo Ishii is...@postgresql.org wrote: Is there any way to use regclass without having ERROR? pgpool-II needs to find the oid from table name and for the purpose it issues something like SELECT 'table_name'::regproc::oid. Problem is, if the table does not exist, an error occured and the transaction aborts. Ideally if the table does not exist, the SELECT returns 0 (InvalidOid). Any idea? You can write a query against the system catalog tables. Or you could install a function that wraps the regclass cast in an exception handler. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
On Sep 2, 2010, at 1:20 AM, Robert Haas wrote: On Sep 1, 2010, at 10:21 AM, Greg Stark gsst...@mit.edu wrote: For what it's worth I disagree with Tom. I think this is a situation where we need *both* types of solution. Ideally we will be able to use a plain Append node for cases where we know the relative ordering of the data in different partitions, but there will always be cases where the structured partition data doesn't actually match up with the ordering requested and we'll need to fall back to a merge-append node. I agree. Explicit partitioning may open up some additional optimization possibilities in certain cases, but Merge Append is more general and extremely valuable in its own right. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers we have revised greg's wonderful work and ported the entire thing to head. it solves the problem of merge_append. i did some testing earlier on today and it seems most important cases are working nicely. here are some test cases: test=# \d t_data Table public.t_data Column | Type | Modifiers +-+--- id | integer | tstamp | date| test=# \d t_data_1 Table public.t_data_1 Column | Type | Modifiers +-+--- id | integer | tstamp | date| Indexes: idx_1 btree (id) Check constraints: t_data_1_id_check CHECK (id = 1 AND id = 1) Inherits: t_data test=# \d t_data_2 Table public.t_data_2 Column | Type | Modifiers +-+--- id | integer | tstamp | date| Indexes: idx_2 btree (id) Check constraints: t_data_2_id_check CHECK (id = 10001 AND id = 2) Inherits: t_data test=# \d t_data_3 Table public.t_data_3 Column | Type | Modifiers +-+--- id | integer | tstamp | date| Indexes: idx_3 btree (id) Check constraints: t_data_3_id_check CHECK (id = 20001 AND id = 3) Inherits: t_data simple windowing ... test=# explain select *, max(id) OVER ( ORDER BY id) from t_data ; QUERY PLAN - WindowAgg (cost=149.99..2154.43 rows=32140 width=8) - Result (cost=149.99..1672.33 rows=32140 width=8) - Append (cost=149.99..1672.33 rows=32140 width=8) - Sort (cost=149.78..155.13 rows=2140 width=8) Sort Key: public.t_data.id - Seq Scan on t_data (cost=0.00..31.40 rows=2140 width=8) - Index Scan using idx_1 on t_data_1 t_data (cost=0.00..318.25 rows=1 width=8) - Index Scan using idx_2 on t_data_2 t_data (cost=0.00..318.25 rows=1 width=8) - Index Scan using idx_3 on t_data_3 t_data (cost=0.00..318.25 rows=1 width=8) (9 rows) it does a nice index scan; merges the stuff and puts it up into the high level doing the windowing. test=# select *, max(id) OVER ( ORDER BY id) from t_data LIMIT 10; id | tstamp | max ++- 1 | 2010-01-01 | 1 2 | 2010-01-01 | 2 3 | 2010-01-01 | 3 4 | 2010-01-01 | 4 5 | 2010-01-01 | 5 6 | 2010-01-01 | 6 7 | 2010-01-01 | 7 8 | 2010-01-01 | 8 9 | 2010-01-01 | 9 10 | 2010-01-01 | 10 (10 rows) the cost model does what it should as well: test=# explain select *, max(id) OVER ( ORDER BY id) from t_data ; QUERY PLAN - WindowAgg (cost=2872.41..3434.86 rows=32140 width=8) - Sort (cost=2872.41..2952.76 rows=32140 width=8) Sort Key: public.t_data.id - Result (cost=0.00..466.40 rows=32140 width=8) - Append (cost=0.00..466.40 rows=32140 width=8) - Seq Scan on t_data (cost=0.00..31.40 rows=2140 width=8) - Seq Scan on t_data_1 t_data (cost=0.00..145.00 rows=1 width=8) - Seq Scan on t_data_2 t_data (cost=0.00..145.00 rows=1 width=8) - Seq Scan on t_data_3 t_data (cost=0.00..145.00 rows=1 width=8) (9 rows) it has proven to be really valuable in my first tests. maybe this is helpful for some people out there. many thanks, hans merge-append-91-v1.diff Description: Binary data -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
On Fri, Sep 3, 2010 at 9:32 AM, Dave Page dp...@pgadmin.org wrote: No, I agree we don't want an external dependency (I was just bleating about needing tar on Windows). I was assuming/hoping there's a librsync somewhere... The rsync code itself is not modular, I believe. I think the author thereof kind of took the approach of placing efficiency before all. See: http://www.samba.org/rsync/how-rsync-works.html ... especially the section on The Rsync Protocol I Googled librsync and got a hit, but that code is a rewrite of the source base and seems to have little or no activity since 2004. http://librsync.sourceforge.net/ That page writes: librsync is not wire-compatible with rsync 2.x, and is not likely to be in the future. The current version of rsync is 3.0.7. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] english parser in text search: support for multiple words in the same position
On Wed, Sep 1, 2010 at 2:42 AM, Sushant Sinha sushant...@gmail.com wrote: I have attached a patch that emits parts of a host token, a url token, an email token and a file token. Further, it makes sure that a host/url/email/file token and the first part-token are at the same position in tsvector. You should probably add this patch here: https://commitfest.postgresql.org/action/commitfest_view/open -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regclass without error?
You can write a query against the system catalog tables. That was pretty hard than I though(schema search path etc.). Or you could install a function that wraps the regclass cast in an exception handler. That requires users to install the function. Annoying for users. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
* Robert Haas (robertmh...@gmail.com) wrote: The rsync code itself is not modular, I believe. I think the author thereof kind of took the approach of placing efficiency before all. Yeah, I looked into this when discussing this same concept at PGCon with folks. There doesn't appear to be a good librsync and, even if there was, there's a heck of alot of complexity there that we *don't* need. rsync is a great tool, don't get me wrong, but let's not try to go over our heads here. We don't need permissions handling, as an example. I also don't think we need the binary diff/partial file transfer capability- we already break relations into 1G chunks (when/if they reach that size), so you won't necessairly be copying the entire relation if you're just doing mtime based or per-file-checksum based detection. We don't need device node handling, we don't need auto-ignoring files, or pattern exclusion/inclusion, we don't really need a progress bar (though it'd be nice.. :), etc, etc, etc. Thanks, Stephen signature.asc Description: Digital signature
Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)
Fujii Masao masao.fu...@gmail.com writes: On Fri, Sep 3, 2010 at 11:08 AM, Tom Lane t...@sss.pgh.pa.us wrote: elog(FATAL) is *certainly* not a better idea. I think there's really nothing that can be done, you just have to silently ignore the error. Hmm.. some functions called by a signal handler use elog(FATAL), e.g., RecoveryConflictInterrupt() do that when unknown conflict mode is given as an argument. Are these calls unsafe, too? [ shrug... ] I stated before that the Hot Standby patch is doing utterly unsafe things in signal handlers. Simon rejected that. I am waiting for irrefutable evidence to emerge from the field (and am very confident that it will be forthcoming...) before I argue with him further. Meanwhile, I'm not going to accept anything unsafe in a core facility like this patch is going to be. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regclass without error?
On 2010-09-03 4:46 PM, Tatsuo Ishii wrote: You can write a query against the system catalog tables. That was pretty hard than I though(schema search path etc.). Or you could install a function that wraps the regclass cast in an exception handler. That requires users to install the function. Annoying for users. How about using a SAVEPOINT before the cast? Wouldn't fail your transaction.. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regclass without error?
On Fri, Sep 03, 2010 at 05:10:44PM +0300, Marko Tiikkaja wrote: On 2010-09-03 4:46 PM, Tatsuo Ishii wrote: You can write a query against the system catalog tables. That was pretty hard than I though(schema search path etc.). Or you could install a function that wraps the regclass cast in an exception handler. That requires users to install the function. Annoying for users. How about using a SAVEPOINT before the cast? Wouldn't fail your transaction.. For unattended operation, there are some issues: * Generating appropriate SAVEPOINT names * Keeping track of same * Detecting errors * Issuing ROLLBACKs to the aforementioned SAVEPOINT None of this works super well for a bulk load. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regclass without error?
Tatsuo Ishii is...@postgresql.org writes: Is there any way to use regclass without having ERROR? pgpool-II needs to find the oid from table name and for the purpose it issues something like SELECT 'table_name'::regproc::oid. Problem is, if the table does not exist, an error occured and the transaction aborts. Ideally if the table does not exist, the SELECT returns 0 (InvalidOid). I don't think the cast should act that way, but I could see providing a separate conversion function that returns 0 ... or perhaps better NULL ... if no match. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regclass without error?
Hello, guys. You wrote: TI Hi, TI Is there any way to use regclass without having ERROR? TI pgpool-II needs to find the oid from table name and for the purpose it TI issues something like SELECT 'table_name'::regproc::oid. Problem is, TI if the table does not exist, an error occured and the transaction TI aborts. Ideally if the table does not exist, the SELECT returns 0 TI (InvalidOid). Agreed with Tatsuo about having InvalidOid. TI Any idea? TI -- TI Tatsuo Ishii TI SRA OSS, Inc. Japan TI English: http://www.sraoss.co.jp/index_en.php TI Japanese: http://www.sraoss.co.jp -- With best wishes, Pavel mailto:pa...@gf.microolap.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plan time of MASSIVE partitioning ...
=?iso-8859-1?Q?PostgreSQL_-_Hans-J=FCrgen_Sch=F6nig?= postg...@cybertec.at writes: imagine a system with, say, 1000 partitions (heavily indexed) or so. the time taken by the planner is already fairly heavy in this case. As the fine manual points out, the current scheme for managing partitioned tables isn't intended to scale past a few dozen partitions. I think we'll be able to do better when we have an explicit representation of partitioning, since then the planner won't have to expend large amounts of effort reverse-engineering knowledge about how an inheritance tree is partitioned. Before that happens, it's not really worth the trouble to worry about such cases. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plan time of MASSIVE partitioning ...
On Sep 3, 2010, at 4:40 PM, Tom Lane wrote: =?iso-8859-1?Q?PostgreSQL_-_Hans-J=FCrgen_Sch=F6nig?= postg...@cybertec.at writes: imagine a system with, say, 1000 partitions (heavily indexed) or so. the time taken by the planner is already fairly heavy in this case. As the fine manual points out, the current scheme for managing partitioned tables isn't intended to scale past a few dozen partitions. I think we'll be able to do better when we have an explicit representation of partitioning, since then the planner won't have to expend large amounts of effort reverse-engineering knowledge about how an inheritance tree is partitioned. Before that happens, it's not really worth the trouble to worry about such cases. regards, tom lane thank you ... - the manual is clear here but we wanted to see if there is some reasonably low hanging fruit to get around this. it is no solution but at least a clear statement ... many thanks, hans -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: On 02/09/10 23:13, Tom Lane wrote: (Yeah, I realize the caller could look into the latch to find that out, but callers really ought to treat latches as opaque structs.) Hmm, maybe we need a TestLatch function to check if a latch is set. +1. It could be a macro for now. I wish that we could declare Latch as an opaque struct, but we probably need to let callers embed it in a larger struct, so we'll just have to rely on callers to code cleanly. I also realized that the timeout handling is a bit surprising with interrupts. After EINTR we call select() again with the same timeout, so a signal effectively restarts the timer. Actually it's platform-dependent. On some machines (I think BSD-derived) the value of the timeout struct gets reduced by the elapsed time before returning, so that if you just loop around you don't get more than the originally specified delay time in total. We seem to have similar behavior in a couple of other places, in pgstat.c and auth.c. So maybe that's OK and just needs to be documented, but I thought I'd bring it up. Yeah. I am hoping that this facility will greatly reduce the need for callers to care about the exact timeout delay, so I think that what we should do for now is just document that the timeout might be several times as long as specified, and see how it goes from there. We could fix the problem if we had to, by adding some gettimeofday() calls and computing the remaining delay time each time through the loop. But let's avoid doing that until it's clear we need it. Also, using sig_atomic_t for owner_pid is entirely not sane. Hmm, true, it doesn't need to be set from signal handler, but is there an atomicity problem if one process calls ReleaseLatch while another process is in SetLatch? If there is *any* possibility of that happening then you have far worse problems than whether the field is atomically readable or not: the behavior will be unpredictable at just slightly larger timescales. This is the reason why I think it'd be better if ReleaseLatch simply didn't exist. That'd discourage people from designing dynamic latch structures, which are fundamentally going to be subject to race conditions. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: On 03/09/10 15:16, Greg Stark wrote: On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: * We need a smarter way to do pg_start/stop_backup() with this. At the moment, you can only have one backup running at a time, but we shouldn't have that limitation with this built-in mechanism. Well there's no particular reason we couldn't support having multiple pg_start_backup() pending either. It's just not usually something people have need so far. The backup label file makes that hard. There can be only one at a time. I don't actually see a use-case for streaming multiple concurrent backups. How many people are going to be able to afford that kind of load on the master's I/O bandwidth? Certainly for version 1, it would be sufficient to throw an error if someone tries to start a backup while another one is in progress. *Maybe*, down the road, we'd want to relax it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
Stephen Frost sfr...@snowman.net wrote: there's a heck of alot of complexity there that we *don't* need. rsync is a great tool, don't get me wrong, but let's not try to go over our heads here. Right -- among other things, it checks for portions of a new file which match the old file at a different location. For example, if you have a very large text file, and insert a line or two at the start, it will wind up only sending the new lines. (Well, that and all the checksums which help it determine that the rest of the file matches at a shifted location.) I would think that PostgreSQL could just check whether *corresponding* portions of a file matched, which is much simpler. we already break relations into 1G chunks (when/if they reach that size), so you won't necessairly be copying the entire relation if you're just doing mtime based or per-file-checksum based detection. While 1GB granularity would be OK, I doubt it's optimal; I think CRC checks for smaller chunks might be worthwhile. My gut feel is that somewhere in the 64kB to 1MB range would probably be optimal for us, although the sweet spot will depend on how the database is used. A configurable or self-adjusting size would be cool. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Windows Tools
Hello, At work, I've been tasked with providing some Windows connection libraries, etc. Are people still using MinGW, or should I just recommend using a proprietary toolkit to do the builds? If so, which one(s)? Cheers, David (Windows n00b) -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windows Tools
On Fri, Sep 3, 2010 at 4:11 PM, David Fetter da...@fetter.org wrote: Hello, At work, I've been tasked with providing some Windows connection libraries, etc. Are people still using MinGW, or should I just recommend using a proprietary toolkit to do the builds? If so, which one(s)? Visual C++ 2005 for 8.3/8.4 or 2008 for 9.0. You can use the free Express versions, but you'll need the Windows SDK as well with those (also free). Setup info is in the docs - note that it can be a little painful to get the SDK fully registered with 2005, but it seems to be completely automatic with 2008. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
Kevin, * Kevin Grittner (kevin.gritt...@wicourts.gov) wrote: While 1GB granularity would be OK, I doubt it's optimal; I think CRC checks for smaller chunks might be worthwhile. My gut feel is that somewhere in the 64kB to 1MB range would probably be optimal for us, although the sweet spot will depend on how the database is used. A configurable or self-adjusting size would be cool. We have something much better, called WAL. If people want to keep their backup current, they should use that after getting the base backup up and working. We don't need to support this for the base backup, imv. In any case, it's certainly not something required for an initial implementation.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Streaming a base backup from master
On 3 September 2010 16:01, Tom Lane t...@sss.pgh.pa.us wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: On 03/09/10 15:16, Greg Stark wrote: On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: * We need a smarter way to do pg_start/stop_backup() with this. At the moment, you can only have one backup running at a time, but we shouldn't have that limitation with this built-in mechanism. Well there's no particular reason we couldn't support having multiple pg_start_backup() pending either. It's just not usually something people have need so far. The backup label file makes that hard. There can be only one at a time. I don't actually see a use-case for streaming multiple concurrent backups. How many people are going to be able to afford that kind of load on the master's I/O bandwidth? To make it affordable, could functionality be added to allow slaves to become chainable? (i.e. master streams to standby 1, which streams to standby 2 etc) This would help reduce bandwidth for normal streaming replication too, which would be useful on particularly busy databases. Obviously in synchronous replication this would be horribly slow so not feasible for that. -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
Stephen Frost sfr...@snowman.net wrote: We have something much better, called WAL. If people want to keep their backup current, they should use that after getting the base backup up and working. Unless you want to provide support for Point In Time Recovery without excessive recovery times. We don't need to support this for the base backup, imv. We found that making a hard-link copy of the previous base backup and using rsync to bring it up to date used 1% the WAN bandwidth as sending a complete, compressed base backup. Just sending modified files in their entirety would have bought the first order of magnitude; recognizing the unchanged portions buys the second order of magnitude. In any case, it's certainly not something required for an initial implementation.. No disagreement there; but sometimes it pays to know where you might want to go, so you don't do something to make further development in that direction unnecessarily difficult. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
On 03/09/10 18:01, Tom Lane wrote: Heikki Linnakangasheikki.linnakan...@enterprisedb.com writes: On 03/09/10 15:16, Greg Stark wrote: On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: * We need a smarter way to do pg_start/stop_backup() with this. At the moment, you can only have one backup running at a time, but we shouldn't have that limitation with this built-in mechanism. Well there's no particular reason we couldn't support having multiple pg_start_backup() pending either. It's just not usually something people have need so far. The backup label file makes that hard. There can be only one at a time. I don't actually see a use-case for streaming multiple concurrent backups. How many people are going to be able to afford that kind of load on the master's I/O bandwidth? It's more a matter of convenience when you're setting up test environments with small databases or something like that. I don't see many people regularly using the streaming backup for anything larger than a few hundred gigabytes anyway. At that point you'll most likely want to use something more efficient. Certainly for version 1, it would be sufficient to throw an error if someone tries to start a backup while another one is in progress. *Maybe*, down the road, we'd want to relax it. Yeah, it's OK for 1st version. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
On Fri, Sep 3, 2010 at 11:20 AM, Stephen Frost sfr...@snowman.net wrote: Kevin, * Kevin Grittner (kevin.gritt...@wicourts.gov) wrote: While 1GB granularity would be OK, I doubt it's optimal; I think CRC checks for smaller chunks might be worthwhile. My gut feel is that somewhere in the 64kB to 1MB range would probably be optimal for us, although the sweet spot will depend on how the database is used. A configurable or self-adjusting size would be cool. We have something much better, called WAL. If people want to keep their backup current, they should use that after getting the base backup up and working. We don't need to support this for the base backup, imv. In any case, it's certainly not something required for an initial implementation.. While I'm certainly not knocking WAL, it's not difficult to think of cases where being able to incrementally update a backup saves you an awful lot of bandwidth. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)
On Fri, Sep 3, 2010 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote: Fujii Masao masao.fu...@gmail.com writes: On Fri, Sep 3, 2010 at 11:08 AM, Tom Lane t...@sss.pgh.pa.us wrote: elog(FATAL) is *certainly* not a better idea. I think there's really nothing that can be done, you just have to silently ignore the error. Hmm.. some functions called by a signal handler use elog(FATAL), e.g., RecoveryConflictInterrupt() do that when unknown conflict mode is given as an argument. Are these calls unsafe, too? [ shrug... ] I stated before that the Hot Standby patch is doing utterly unsafe things in signal handlers. Simon rejected that. I am waiting for irrefutable evidence to emerge from the field (and am very confident that it will be forthcoming...) before I argue with him further. Meanwhile, I'm not going to accept anything unsafe in a core facility like this patch is going to be. Oh. I thought you had ignored his objections and fixed it. Why are we releasing 9.0 with this problem again? Surely this is nuts. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
Kevin Grittner kevin.gritt...@wicourts.gov writes: Stephen Frost sfr...@snowman.net wrote: In any case, it's certainly not something required for an initial implementation.. No disagreement there; but sometimes it pays to know where you might want to go, so you don't do something to make further development in that direction unnecessarily difficult. I think that setting out to reimplement rsync, or to go down a design path where we're likely to do a lot of that eventually, is the height of folly. We should be standing on the shoulders of other projects, not rolling our own because of misguided ideas about people not having those projects installed. IOW, what I'd like to see is protocol extensions that allow an external copy of rsync to be invoked; not build in rsync, or tar, or anything else that we could get off-the-shelf. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
On Fri, Sep 3, 2010 at 11:47 AM, Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: Stephen Frost sfr...@snowman.net wrote: In any case, it's certainly not something required for an initial implementation.. No disagreement there; but sometimes it pays to know where you might want to go, so you don't do something to make further development in that direction unnecessarily difficult. I think that setting out to reimplement rsync, or to go down a design path where we're likely to do a lot of that eventually, is the height of folly. We should be standing on the shoulders of other projects, not rolling our own because of misguided ideas about people not having those projects installed. IOW, what I'd like to see is protocol extensions that allow an external copy of rsync to be invoked; not build in rsync, or tar, or anything else that we could get off-the-shelf. We used to use cp to create databases. Should we go back to that system? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
On Fri, Sep 3, 2010 at 11:47 AM, Tom Lane t...@sss.pgh.pa.us wrote: IOW, what I'd like to see is protocol extensions that allow an external copy of rsync to be invoked; not build in rsync, or tar, or anything else that we could get off-the-shelf. Personally, I would love to see protocol-level compression added. (Yes, going over a compressed SSH tunnel works well, but in general isn't user-friendly.) Josh: we talked on IRC awhile back and you mentioned that CMD had added this in Mammoth? Would you be interested in having someone get that integrated back into the community? David Blewett -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regclass without error?
On Sep 3, 2010, at 7:31 AM, Tom Lane wrote: I don't think the cast should act that way, but I could see providing a separate conversion function that returns 0 ... or perhaps better NULL ... if no match. +1 I could use this in pgTAP. David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
* Tom Lane (t...@sss.pgh.pa.us) wrote: IOW, what I'd like to see is protocol extensions that allow an external copy of rsync to be invoked; not build in rsync, or tar, or anything else that we could get off-the-shelf. I'd much rather use an existing library to implement it than call out to some external utility. That said, I'm about as thrilled with libtar as librsync after a bit of googling around. :/ Thanks, Stephen signature.asc Description: Digital signature
Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)
Robert Haas robertmh...@gmail.com writes: On Fri, Sep 3, 2010 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote: [ shrug... ] I stated before that the Hot Standby patch is doing utterly unsafe things in signal handlers. Simon rejected that. I am waiting for irrefutable evidence to emerge from the field (and am very confident that it will be forthcoming...) before I argue with him further. Meanwhile, I'm not going to accept anything unsafe in a core facility like this patch is going to be. Oh. I thought you had ignored his objections and fixed it. Why are we releasing 9.0 with this problem again? Surely this is nuts. My original review of hot standby found about half a dozen things I thought were broken: http://archives.postgresql.org/pgsql-hackers/2010-05/msg00178.php After a *very* long-drawn-out fight I fixed one of them (max_standby_delay), largely still over Simon's objections. I don't have the energy to repeat that another half-dozen times, so I'm going to wait for the suspected problems to be proven by field experience. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
Tom Lane t...@sss.pgh.pa.us wrote: what I'd like to see is protocol extensions that allow an external copy of rsync to be invoked; not build in rsync, or tar, or anything else that we could get off-the-shelf. The complexities of dealing with properly invoking rsync externally could well require more code and be considerably more fragile than passing the data through the existing SR connection; particularly since to get the full benefits of rsync you need to be dealing with a daemon which has the appropriate modules configured -- the location of which you wouldn't easily know. If we were talking about re-implementing rsync, or doing more than a rough approximation, kinda, of 5% of what rsync does, I'd be with you. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
On 03/09/10 19:09, Stephen Frost wrote: * Tom Lane (t...@sss.pgh.pa.us) wrote: IOW, what I'd like to see is protocol extensions that allow an external copy of rsync to be invoked; not build in rsync, or tar, or anything else that we could get off-the-shelf. I'd much rather use an existing library to implement it than call out to some external utility. That said, I'm about as thrilled with libtar as librsync after a bit of googling around. :/ The code to build a tar archive is about 200 lines of code. The amount of code for untar is about the same. That's about the amount of effort We could add zlib compression since we already link with that, but that's about it. I'm not interested in adding more infrastructure for more tools. For more complicated scenarios, you can still use pg_start/stop_backup() as usual, there's nothing wrong with that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
On 03/09/10 18:53, David Blewett wrote: On Fri, Sep 3, 2010 at 11:47 AM, Tom Lanet...@sss.pgh.pa.us wrote: IOW, what I'd like to see is protocol extensions that allow an external copy of rsync to be invoked; not build in rsync, or tar, or anything else that we could get off-the-shelf. Personally, I would love to see protocol-level compression added. (Yes, going over a compressed SSH tunnel works well, but in general isn't user-friendly.) Josh: we talked on IRC awhile back and you mentioned that CMD had added this in Mammoth? Would you be interested in having someone get that integrated back into the community? There's a recent thread on pgsql-general about just that: http://archives.postgresql.org/pgsql-general/2010-08/msg3.php I agree with Tom's comments there, I'd like to have something to enable/disable SSL compression rather than implement our own. There was some discussion that it might not be available on JDBC SSL implementations, but if it's done in our protocol, you'll need changes to the client to make it work anyway. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windows Tools
On Fri, Sep 3, 2010 at 17:18, Dave Page dp...@pgadmin.org wrote: On Fri, Sep 3, 2010 at 4:11 PM, David Fetter da...@fetter.org wrote: Hello, At work, I've been tasked with providing some Windows connection libraries, etc. Are people still using MinGW, or should I just recommend using a proprietary toolkit to do the builds? If so, which one(s)? Visual C++ 2005 for 8.3/8.4 or 2008 for 9.0. You can use the free Express versions, but you'll need the Windows SDK as well with those (also free). Setup info is in the docs - note that it can be a little painful to get the SDK fully registered with 2005, but it seems to be completely automatic with 2008. If you download the latest SDK, you don't even need Visual C++ Express - you can build directly from the SDK. Assuming you don't *want* the IDE, of course... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)
On Fri, Sep 3, 2010 at 12:10 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Sep 3, 2010 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote: [ shrug... ] I stated before that the Hot Standby patch is doing utterly unsafe things in signal handlers. Simon rejected that. I am waiting for irrefutable evidence to emerge from the field (and am very confident that it will be forthcoming...) before I argue with him further. Meanwhile, I'm not going to accept anything unsafe in a core facility like this patch is going to be. Oh. I thought you had ignored his objections and fixed it. Why are we releasing 9.0 with this problem again? Surely this is nuts. My original review of hot standby found about half a dozen things I thought were broken: http://archives.postgresql.org/pgsql-hackers/2010-05/msg00178.php After a *very* long-drawn-out fight I fixed one of them (max_standby_delay), largely still over Simon's objections. I don't have the energy to repeat that another half-dozen times, so I'm going to wait for the suspected problems to be proven by field experience. Bummer. Allow me to cast a vote for doing something about the fact that handle_standby_sig_alarm() thinks it can safely acquire an LWLock in a signal handler. I think we should be making our decisions on what to change in the code based on what is technically sound, rather than based on how much the author complains about changing it. Of course there may be cases where there is a legitimate difference of opinion concerning the best way forward, but I don't believe this is one of them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regclass without error?
Excerpts from David Fetter's message of vie sep 03 10:21:00 -0400 2010: How about using a SAVEPOINT before the cast? Wouldn't fail your transaction.. For unattended operation, there are some issues: * Generating appropriate SAVEPOINT names * Keeping track of same * Detecting errors * Issuing ROLLBACKs to the aforementioned SAVEPOINT None of this works super well for a bulk load. Why do you need any of this? The logic is pretty simple: SAVEPOINT my_savepoint; select ...::regclass; -- if it works RELEASE my_savepoint; -- if it fails ROLLBACK TO my_savepoint; RELEASE my_savepoint; Granted, it's not super-performant ... -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regclass without error?
On Fri, Sep 03, 2010 at 01:28:15PM -0400, Alvaro Herrera wrote: Excerpts from David Fetter's message of vie sep 03 10:21:00 -0400 2010: How about using a SAVEPOINT before the cast? Wouldn't fail your transaction.. For unattended operation, there are some issues: * Generating appropriate SAVEPOINT names * Keeping track of same * Detecting errors * Issuing ROLLBACKs to the aforementioned SAVEPOINT None of this works super well for a bulk load. Why do you need any of this? The logic is pretty simple: SAVEPOINT my_savepoint; select ...::regclass; -- if it works RELEASE my_savepoint; -- if it fails ROLLBACK TO my_savepoint; RELEASE my_savepoint; Granted, it's not super-performant ... We have no mechanism to do this, or any other check, during a bulk load. It'd be a great feature to have :) Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regclass without error?
Excerpts from David Fetter's message of vie sep 03 13:31:00 -0400 2010: We have no mechanism to do this, or any other check, during a bulk load. It'd be a great feature to have :) I'm not sure what kind of bulk load you are talking about, nor what does it have to do with pgpool-II. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)
On 03/09/10 17:51, Tom Lane wrote: Heikki Linnakangasheikki.linnakan...@enterprisedb.com writes: On 02/09/10 23:13, Tom Lane wrote: Also, using sig_atomic_t for owner_pid is entirely not sane. Hmm, true, it doesn't need to be set from signal handler, but is there an atomicity problem if one process calls ReleaseLatch while another process is in SetLatch? If there is *any* possibility of that happening then you have far worse problems than whether the field is atomically readable or not: the behavior will be unpredictable at just slightly larger timescales. This is the reason why I think it'd be better if ReleaseLatch simply didn't exist. That'd discourage people from designing dynamic latch structures, which are fundamentally going to be subject to race conditions. Each Walsender needs a latch, and walsenders come and go. I first experimented with had no ReleaseLatch function; instead any process could call WaitLatch on any shared latch, as long as only one process waits on a given latch at a time. But it had exactly the same problem, WaitLatch had to set the pid on the Latch struct to allow other processes to send the signal. Another process could call SetLatch and read the pid field, while WaitLatch is just setting it. I think we'll have to put a spinlock there, if we can't assume that assignment of pid_t is atomic. It's not the end of the world.. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost estimates for parameterized paths
Robert Haas robertmh...@gmail.com writes: On Thu, Sep 2, 2010 at 5:31 PM, Tom Lane t...@sss.pgh.pa.us wrote: The best idea I can come up with at the moment is to compute best case and worst case costs for a parameterized path, Interestingly, I previously proposed almost exactly this approach to handle a couple of other problems: I thought it seemed familiar ;-) I'm not entirely sure whether we can use this approach for more than one kind of problem at a time; if we can't, it's probably not a good idea to do it at all. I don't see why we couldn't do it in principle. The issue of course is how many paths survive, and can we tolerate that from a planner performance standpoint? On reflection I think that for parameterized paths the problem won't be too bad, because (a) we'll ignore parameterized paths except when considering a join to the right outer rel, so their presence in the rel's pathlist won't cost much otherwise, and (b) we will only generate them when there's a suitable joinclause, so the number of potential paths will be limited. I am not sure those statements can be made for the other applications you suggested, though. There's probably not much we can do at this point except code up the idea and see how well it works. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: On 03/09/10 17:51, Tom Lane wrote: If there is *any* possibility of that happening then you have far worse problems than whether the field is atomically readable or not: the behavior will be unpredictable at just slightly larger timescales. Each Walsender needs a latch, and walsenders come and go. Well, then we need to think extremely hard about the circumstances in which we need to send a cross-process latch signal to walsenders and what the behavior needs to be in the race conditions. WaitLatch had to set the pid on the Latch struct to allow other processes to send the signal. Another process could call SetLatch and read the pid field, while WaitLatch is just setting it. I think we'll have to put a spinlock there, if we can't assume that assignment of pid_t is atomic. It's not the end of the world.. Yes it is. Signal handlers can't take spinlocks (what if they interrupt while the mainline is holding the lock?). It's probably not too unreasonable to assume that pid_t assignment is atomic. But I'm still thinking that we have bigger problems than that if there are really cases where SetLatch can execute at approximately the same time as a latch owner is coming or going. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] ps buffer is incorrectly padded on the (latest) OS X
Hi, I always wondered why ps ax|grep postgres shows several extra blank lines after the process name, i.e. 972 ?? Ss 0:00.69 postgres: writer process 973 ?? Ss 0:00.51 postgres: wal writer process (I put newlines instead of spaces there). By looking into the code I've found this part of set_ps_display: #ifdef PS_USE_CLOBBER_ARGV /* pad unused memory; need only clobber remainder of old status string */ if (last_status_len ps_buffer_cur_len) MemSet(ps_buffer + ps_buffer_cur_len, PS_PADDING, last_status_len - ps_buffer_cur_len); last_status_len = ps_buffer_cur_len; #endif /* PS_USE_CLOBBER_ARGV */ PS_PADDING padding on __darwin__ is set to ' '. Apparently this doesn't work correctly with OS X 10.6. After I changed the define to use '\0' on darwin extra blank likes (actually consisting of hundreds of spaces without a line break) disappeared. The one-liner change follows: === diff --git a/src/backend/utils/misc/ps_status.c b/src/backend/utils/misc/ps_status.c index f27a52f..c2ddf33 100644 --- a/src/backend/utils/misc/ps_status.c +++ b/src/backend/utils/misc/ps_status.c @@ -76,7 +76,7 @@ bool update_process_title = true; /* Different systems want the buffer padded differently */ -#if defined(_AIX) || defined(__linux__) || defined(__svr4__) +#if defined(_AIX) || defined(__linux__) || defined(__svr4__) || defined(__darwin__) #define PS_PADDING '\0' #else #define PS_PADDING ' ' === I don't have different OS X versions to test, so I'm not sure whether 10.5 or below are also affected. Also, the patch should specifically check for 10.6, though I don't know how to distinguish between different OS X versions in postgres sources (any suggestions?). Regards, -- Alexey Klyukin http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windows Tools
On Fri, Sep 03, 2010 at 06:24:06PM +0200, Magnus Hagander wrote: On Fri, Sep 3, 2010 at 17:18, Dave Page dp...@pgadmin.org wrote: On Fri, Sep 3, 2010 at 4:11 PM, David Fetter da...@fetter.org wrote: Hello, At work, I've been tasked with providing some Windows connection libraries, etc. Are people still using MinGW, or should I just recommend using a proprietary toolkit to do the builds? If so, which one(s)? Visual C++ 2005 for 8.3/8.4 or 2008 for 9.0. You can use the free Express versions, but you'll need the Windows SDK as well with those (also free). Setup info is in the docs - note that it can be a little painful to get the SDK fully registered with 2005, but it seems to be completely automatic with 2008. If you download the latest SDK, you don't even need Visual C++ Express - you can build directly from the SDK. Assuming you don't *want* the IDE, of course... Here are a few of the things I'd rather have than that IDE: * Root canal therapy without anaesthesia * Migraine * Windows for my desktop Fortunately, it looks like I'll get none of the above for this project :) Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)
On 03/09/10 21:16, Tom Lane wrote: Heikki Linnakangasheikki.linnakan...@enterprisedb.com writes: WaitLatch had to set the pid on the Latch struct to allow other processes to send the signal. Another process could call SetLatch and read the pid field, while WaitLatch is just setting it. I think we'll have to put a spinlock there, if we can't assume that assignment of pid_t is atomic. It's not the end of the world.. Yes it is. Signal handlers can't take spinlocks (what if they interrupt while the mainline is holding the lock?). Ok, I see. It's probably not too unreasonable to assume that pid_t assignment is atomic. But I'm still thinking that we have bigger problems than that if there are really cases where SetLatch can execute at approximately the same time as a latch owner is coming or going. I don't see how to avoid it. A walsender, or any process really, can exit at any time. It can make the latch inaccessible to others before it exits to minimize the window, but it's always going to be possible that another process is just about to call SetLatch when you exit. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: On 03/09/10 21:16, Tom Lane wrote: It's probably not too unreasonable to assume that pid_t assignment is atomic. But I'm still thinking that we have bigger problems than that if there are really cases where SetLatch can execute at approximately the same time as a latch owner is coming or going. I don't see how to avoid it. A walsender, or any process really, can exit at any time. It can make the latch inaccessible to others before it exits to minimize the window, but it's always going to be possible that another process is just about to call SetLatch when you exit. Well, in that case what we need to do is presume that the latch object has a continuing existence but the owner/receiver can come and go. I would suggest that InitLatch needs to initialize the object into a valid but unowned state; there is *no* deinitialize operation; and there are AcquireLatch and ReleaseLatch operations to become owner or stop being owner. We also need to define the semantics of SetLatch on an unowned latch --- does this set a signal condition that will be available to the next owner? This amount of complexity might be overkill for local latches, but I think we need it for shared ones. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ps buffer is incorrectly padded on the (latest) OS X
Alexey Klyukin al...@commandprompt.com writes: I always wondered why ps ax|grep postgres shows several extra blank lines after the process name, i.e. AFAIR it's always done that on OSX. I thought we'd tried the '\0' padding way back when and it didn't work nicely, but maybe Apple fixed that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)
Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Sep 3, 2010 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote: [ shrug... ] I stated before that the Hot Standby patch is doing utterly unsafe things in signal handlers. Simon rejected that. I am waiting for irrefutable evidence to emerge from the field (and am very confident that it will be forthcoming...) [...] [...]Why are we releasing 9.0 with this problem again? Surely this is nuts. Will the docs give enough info so that release note readers will know when they're giving well-informed consent to volunteer to produce such field evidence? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] returning multiple result sets from a stored procedure
On Thu, Sep 2, 2010 at 7:17 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from John Adams's message of jue sep 02 18:25:45 -0400 2010: I noticed in postgres you cannot return multiple result sets from a stored procedure (surprisingly as it looks like a very good dbms). If you're really intent about doing this, you can emulate it by returning a set of refcursors. Also arrays can work pretty well, depending on how much data there is and where it's going. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
On Fri, Sep 03, 2010 at 09:56:12AM -0400, Stephen Frost wrote: * Robert Haas (robertmh...@gmail.com) wrote: The rsync code itself is not modular, I believe. I think the author thereof kind of took the approach of placing efficiency before all. Yeah, I looked into this when discussing this same concept at PGCon with folks. There doesn't appear to be a good librsync and, even if there was, there's a heck of alot of complexity there that we *don't* need. rsync is a great tool, don't get me wrong, but let's not try to go over our heads here. rsync is not rocket science. All you need is for the receiving end to send a checksum for each block it has. The server side does the same checksum and for each block sends back same or new data. The client and the server don't need to synchronise at all. If the client sends nothing, the server sends everything. The tricky part of rsync (finding block that have moved) is not needed here. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ Patriotism is when love of your own people comes first; nationalism, when hate for people other than your own comes first. - Charles de Gaulle signature.asc Description: Digital signature
[HACKERS] The other major HS TODO: standby promotion
All, As far as I'm concerned, the other big missing feature for HS is the ability to promote standbys to become the new master. If we had that feature, then HS can be the backbone of a large-scale PostgreSQL cloud; if we don't have it, then HS does not contribute very much to scalability beyond a couple of servers. It also seems like it ought to be relatively easy to do, if I understand the issues correctly. Please advise me if I understand the two obstacles for this: a) for a standby to become a master to another standby, the promoted standby must be equal to or ahead of the nonpromoted standby in the replication stream. b) when we bring a standby up, it comes up on a new timeline. Since the other standbys don't have this new timeline, they are incompatible with it. c) when we promote a standby, it would also need to save all of its transaction logs until the other standbys connect. (a) seems easily enough solved by giving two steps: giving the DBA a way to check where in the replication stream each standby is (I think we already have this) and by having the re-mastering mechanism check for regressions in timestamps or the XID sequence. I can see two ways to tackle (b). One would be NOT to start a new timeline (as an option) when we promote the standby. That method probably has complications we don't want to get into. The second method would be by giving standbys a way to subscribe to a new timeline. This seems like the better approach, as it would logically be part of the re-mastering command. What changes would be required to do this? (c) can actually already be dealt with by setting an archive_command on each standby. Beyond that, I don't think that we really need to do anything; DBAs can have a choice between archiving logs to allow for remastering of all standbys, or saving space and bandwidth, and forcing some standbys to be re-cloned if you run out of time. It would be nice, eventually, to have a way to tell PostgreSQL to retain more or less WAL segments without restarting the server, but I don't see this as critical. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] returning multiple result sets from a stored procedure
I noticed in postgres you cannot return multiple result sets from a stored procedure (surprisingly as it looks like a very good dbms). That feature has been on the TODO list for years. However, nobody has stepped forward to either write it, or to fund working on it. If your company has programmers or money to build this feature, it could probably get done fairly quickly (as in, next version). -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)
On Fri, Sep 3, 2010 at 3:11 PM, Ron Mayer rm...@cheapcomplexdevices.com wrote: Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Sep 3, 2010 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote: [ shrug... ] I stated before that the Hot Standby patch is doing utterly unsafe things in signal handlers. Simon rejected that. I am waiting for irrefutable evidence to emerge from the field (and am very confident that it will be forthcoming...) [...] [...]Why are we releasing 9.0 with this problem again? Surely this is nuts. Will the docs give enough info so that release note readers will know when they're giving well-informed consent to volunteer to produce such field evidence? Yeah, exactly. Good news: you can now run queries on the standby. Bad news: we've abandoned our policy of not releasing with known bugs. Have fun and enjoy using PostgreSQL, the world's most advanced open source databSegmentation fault -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] returning multiple result sets from a stored procedure
Josh Berkus j...@agliodbs.com writes: I noticed in postgres you cannot return multiple result sets from a stored procedure (surprisingly as it looks like a very good dbms). That feature has been on the TODO list for years. However, nobody has stepped forward to either write it, or to fund working on it. If your company has programmers or money to build this feature, it could probably get done fairly quickly (as in, next version). Part of the reason it's sat on TODO is lack of consensus about how such a feature ought to look/work; particularly since most of the discussion about it has considered that it'd go along with stored procedures executing outside of transactions. It's not just a matter of needing to find some programming manpower. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)
On 03/09/10 19:38, Robert Haas wrote: On Fri, Sep 3, 2010 at 12:10 PM, Tom Lanet...@sss.pgh.pa.us wrote: Robert Haasrobertmh...@gmail.com writes: On Fri, Sep 3, 2010 at 10:07 AM, Tom Lanet...@sss.pgh.pa.us wrote: [ shrug... ] I stated before that the Hot Standby patch is doing utterly unsafe things in signal handlers. Simon rejected that. I am waiting for irrefutable evidence to emerge from the field (and am very confident that it will be forthcoming...) before I argue with him further. Meanwhile, I'm not going to accept anything unsafe in a core facility like this patch is going to be. Oh. I thought you had ignored his objections and fixed it. Why are we releasing 9.0 with this problem again? Surely this is nuts. My original review of hot standby found about half a dozen things I thought were broken: http://archives.postgresql.org/pgsql-hackers/2010-05/msg00178.php After a *very* long-drawn-out fight I fixed one of them (max_standby_delay), largely still over Simon's objections. I don't have the energy to repeat that another half-dozen times, so I'm going to wait for the suspected problems to be proven by field experience. Bummer. Allow me to cast a vote for doing something about the fact that handle_standby_sig_alarm() thinks it can safely acquire an LWLock in a signal handler. I think we should be making our decisions on what to change in the code based on what is technically sound, rather than based on how much the author complains about changing it. Of course there may be cases where there is a legitimate difference of opinion concerning the best way forward, but I don't believe this is one of them. Hmm, just to make the risk more concrete, here's one scenario that could happen: 1. Startup process tries to acquire cleanup lock on a page. It's pinned, so it has to wait, and calls ResolveRecoveryConflictWithBufferPin(). 2. ResolveRecoveryConflictWithBufferPin enables the standby SIGALRM handler by calling enable_standby_sig_alarm(), and calls ProcWaitForSignal(). 3. ProcWaitForSignal() calls semop() (assuming sysv semaphores here) to wait for the process semaphore 4. Max standby delay is reached and SIGALRM fired. CheckStandbyTimeout() is called in signal handler. CheckStandbyTimeout() calls SendRecoveryConflictWithBufferPin(), which calls CancelDBBackends() 5. CancelDBBackends() tries to acquire ProcArrayLock in exclusive mode. It's being held by another process, so we have to sleep 6. To sleep, LWLockAcquire calls PGSemaphoreLock, which calls semop() to wait on on the process semaphore. So we now have the same process nested twice inside a semop() call. Looking at the Linux signal (7) man page, it is undefined what happens if semop() is re-entered in a signal handler while another semop() call is happening in main line of execution. Assuming it somehow works, which semop() call is going to return when the semaphore is incremented? Maybe that's ok, if I'm reading the deadlock checker code correctly, it also calls semop() to increment the another process' semaphore, and the deadlock checker can be invoked from a signal handler while in semop() to wait on our process' semaphore. BTW, sem_post(), which is the Posix function for incrementing a semaphore, is listed as a safe function to call in a signal handler. But it's certainly fishy. A safer approach would be to just PGSemaphoreUnlock() in the signal handler, and do all the other processing outside it. You'd still call semop() within semop(), but at least it would be closer to the semop() within semop() we already do in the deadlock checker. And there would be less randomness from timing and lock contention involved, making it easier to test the behavior on various platforms. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost estimates for parameterized paths
On Fri, Sep 3, 2010 at 2:04 PM, Tom Lane t...@sss.pgh.pa.us wrote: On reflection I think that for parameterized paths the problem won't be too bad, because (a) we'll ignore parameterized paths except when considering a join to the right outer rel, so their presence in the rel's pathlist won't cost much otherwise, Hmm. Maybe they should go into a separate path list, and perhaps we could do the min/max calculations only with that pathlist (at least for now), thus avoiding taking a generalized penalty to handle this specific case. IIUC, a parameterized path should never cause an unparamaterized path to be thrown out, even if the unparameterized path costs more than the worst-case cost for the parameterized path, because the parameterized path constrains the possible join strategies higher up, so what looked like a great idea at first blush might turn out to suck when the chips are down. Then, too, I'm not sure we can even guarantee it will always be possible to form a valid plan around a given parameterized path, which is another good reason not to discard any unparameterized alternatives that may exist. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming a base backup from master
On Fri, Sep 3, 2010 at 12:23 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 03/09/10 18:53, David Blewett wrote: On Fri, Sep 3, 2010 at 11:47 AM, Tom Lanet...@sss.pgh.pa.us wrote: IOW, what I'd like to see is protocol extensions that allow an external copy of rsync to be invoked; not build in rsync, or tar, or anything else that we could get off-the-shelf. Personally, I would love to see protocol-level compression added. (Yes, going over a compressed SSH tunnel works well, but in general isn't user-friendly.) Josh: we talked on IRC awhile back and you mentioned that CMD had added this in Mammoth? Would you be interested in having someone get that integrated back into the community? There's a recent thread on pgsql-general about just that: http://archives.postgresql.org/pgsql-general/2010-08/msg3.php I agree with Tom's comments there, I'd like to have something to enable/disable SSL compression rather than implement our own. There was some discussion that it might not be available on JDBC SSL implementations, but if it's done in our protocol, you'll need changes to the client to make it work anyway. While I agree that combining SSL with compression is a great win, I'm not sold on Tom's argument that compression is only needed in WAN situations. I've seen great benefit to using an SSH tunnel with compression over LAN connections (100 and 1000 mbps). At work, we do have a private WAN that it would be nice to be able to use compression with no encryption on. I think it's a general-use thing. While I know it's not the best argument, MySQL does provide compression at the connection level. David Blewett -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: OT: OFF TOPIC: [HACKERS] returning multiple result sets from a stored procedure
John, I honestly do not mean any offence, just out of curiosity. If you guys care about money and time why would you spend the best years of your life basically copying commercial products for free? We don't do it to copy commercial products. We do it to build something better than them. I do not have experience with open source and I kind of thought open source guys do not need or care about money and time. It's a common misapprehension that open source software is somehow produced for free. The press has contributed to this myth a great deal by calling open source socialism and altruism. What's actually true about open source is that the organization which releases the product (the open source project) is not necessarily the same organzation which pays the developers. However, if you look at any mature, large open source project you will find that at least 1/4 of its code contributors are paid to work on the project by *someone*, and that those paid developers account for 70% to 95% of the code. PostgreSQL is no exception to this rule. The three differences between an open source project and proprietary software in terms of adding new features are: a) it's pay or play, which means that you have the option of writing the new feature yourself instead of funding it in cash, and b) the cost of developing new features if you choose to fund them is much cheaper (generally a couple orders of magnitude cheaper) than proprietary software because of the open market for developers and greater efficiency of OSS development, and c) it's *much* easier for multiple companies to contribute to the same project if that project is open source than if it's proprietary. Ultimately, however, if a feature is going to be added to any OSS project, that feature is going to be paid for by someone, either in money, time, or both. It does help us to get feedback like the feedback you gave eariler, even if you can't contribute to the project because it helps us prioritize new features. But you should recognize that if you're not contributing money or time to the project, you may have a long wait for the feature *you* want. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: OT: OFF TOPIC: [HACKERS] returning multiple result sets from a stored procedure
On 9/3/10 2:20 PM, Josh Berkus wrote: However, if you look at any mature, large open source project you will find that at least 1/4 of its code contributors are paid to work on the project by *someone*, and that those paid developers account for 70% to 95% of the code. Relevant link for this: http://apcmag.com/linux-now-75-corporate.htm -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)
On Fri, Sep 3, 2010 at 4:20 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Maybe that's ok, if I'm reading the deadlock checker code correctly, it also calls semop() to increment the another process' semaphore, and the deadlock checker can be invoked from a signal handler while in semop() to wait on our process' semaphore. BTW, sem_post(), which is the Posix function for incrementing a semaphore, is listed as a safe function to call in a signal handler. But it's certainly fishy. Color me confused; I may need to backpedal rapidly here. I had thought that what Tom was complaining about was the fact that the signal handler was taking LWLocks, which I would have thought to be totally unsafe. But it seems the deadlock detector does the same thing, more or less because the signal handlers are set up so that they don't do anything unless we're within a limited section of code where nothing too interesting can happen. I'm not too sure why we think that it's safe to invoke the deadlock detector that way, but it's also not too clear to me why this case is any worse. It furthermore appears that Simon's reply to Tom's complaint about this function was: This was modelled very closely on handle_sig_alarm() and was reviewed by other hackers. I'm not great on that, as you know, so if you can explain what it is I can't do, and how that differs from handle_sig_alarm running the deadlock detector in the same way, then I'll work on it some more. I guess I need the same explanation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: A safer approach would be to just PGSemaphoreUnlock() in the signal handler, and do all the other processing outside it. I don't see any particularly good reason to assume that PGSemaphoreUnlock is safe either: you're still talking about nested semop operations. The pre-existing SIGALRM handler uses a self-signal (kill(MyProcPid, SIGINT)) to kick the process off any wait it might be doing. I'd rather do something like that. Or maybe the work you're doing on latches would help ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cost estimates for parameterized paths
Robert Haas robertmh...@gmail.com writes: On Fri, Sep 3, 2010 at 2:04 PM, Tom Lane t...@sss.pgh.pa.us wrote: On reflection I think that for parameterized paths the problem won't be too bad, because (a) we'll ignore parameterized paths except when considering a join to the right outer rel, so their presence in the rel's pathlist won't cost much otherwise, Hmm. Maybe they should go into a separate path list, and perhaps we could do the min/max calculations only with that pathlist (at least for now), thus avoiding taking a generalized penalty to handle this specific case. IIUC, a parameterized path should never cause an unparamaterized path to be thrown out, Yeah, but the converse isn't true. I had considered the idea of keeping parameterized paths in a separate list, but you'd still have to examine the main list to look for unparameterized paths that might dominate them. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Interruptible sleeps (was Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!)
Robert Haas robertmh...@gmail.com writes: Color me confused; I may need to backpedal rapidly here. I had thought that what Tom was complaining about was the fact that the signal handler was taking LWLocks, which I would have thought to be totally unsafe. Well, it's unsafe if the signal could interrupt mainline code that is trying to take an LWLock or already holds the same LWLock --- or, if you consider deadlock risks against other processes, already holding other LWLocks might be problematic too. And trying to use facilities like elog or palloc is also pretty unsafe if the mainline is too. The reason the deadlock checker is okay is that there is only a *very* narrow range of mainline code that it could possibly be interrupting, basically nothing but the PGSemaphoreLock() call in ProcSleep. (There's a lot of other code inside the loop there, but notice that it's protected by tests that ensure that it won't run unless the deadlock checker already did.) One salient thing about ProcSleep is that you shouldn't call it while holding any LWLocks, and another is that if you're sleeping while holding regular heavyweight locks, the deadlock checker is exactly what will get you out of trouble if there's a deadlock. Now the HS case likewise appears to be set up so that the signal can only directly interrupt ProcWaitForSignal, so I think the core issue is whether any deadlock situations are possible. Given that this gets called from a low-level place like LockBufferForCleanup, I don't feel too comfortable about that. I certainly haven't seen any analysis or documentation of what locks can safely be held at that point. The deadlock checker only tries to take the LockMgr LWLocks, so extrapolating from whether it is safe to whether touching the ProcArrayLock is safe seems entirely unfounded. It might be worth pointing out here that LockBufferForCleanup is already known to be a risk factor for undetected deadlocks, even without HS in the picture, because of the possibility of deadlocks involving a chain of both heavyweight locks and LWLocks. Whether HS makes it materially worse may be something that we need field experience to determine. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers