Re: [HACKERS] Group commit, revised
On 31.01.2012 01:35, Simon Riggs wrote: The plan here is to allow WAL flush and clog updates to occur concurrently. Which allows the clog contention and update time to be completely hidden behind the wait for the WAL flush. That is only possible if we have the WALwriter involved since we need two processes to be actively involved. ... The theory behind this is clear, but needs some explanation. There are 5 actions that need to occur at commit 1) insert WAL record 2) optionally flush WAL record 3) mark the clog AND set LSN from (1) if we skipped (2) 4) optionally wait for sync rep 5) remove the proc from the procarray > ... Notice that step (2) and step (3) are actually independent of each other. So an improved design for commit is to 2) request flush up to LSN, but don't wait 3) mark the clog and set LSN 4) wait for LSN once, either for walwriter or walsender to release us That seems like a pretty marginal gain. If you're bound by the speed of fsyncs, this will reduce the latency by the time it takes to mark the clog, which is tiny in comparison to all the other stuff that needs to happen, like, flushing the WAL. And that's ignoring any additional overhead caused by the signaling between processes. If you're bound by CPU capacity, this doesn't help at all because it just moves the work around. Anyway, this is quite different from the original goal and patch for group commit, so can we please leave this for 9.3, and move on with the review of pending 9.2 patches. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
On Tue, Jan 31, 2012 at 4:59 AM, Kyotaro HORIGUCHI wrote: > This is fixed version of dblink.c for row processor. > >> i'll re-send the properly fixed patch for dblink.c later. > > - malloc error in initStoreInfo throws ERRCODE_OUT_OF_MEMORY. (new error) > > - storeHandler() now returns FALSE on malloc failure. Garbage > cleanup is done in dblink_fetch() or dblink_record_internal(). > The behavior that this dblink displays this error as 'unkown > error/could not execute query' on the user session is as it did > before. Another thing: if realloc() fails, the old pointer stays valid. So it needs to be assigned to temp variable first, before overwriting old pointer. And seems malloc() is preferable to palloc() to avoid exceptions inside row processor. Although latter one could be made to work, it might be unnecessary complexity. (store current pqresult into remoteConn?) -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for parallel pg_dump
On Mon, Jan 30, 2012 at 12:20 PM, Robert Haas wrote: > But the immediate problem is that pg_dump.c is heavily reliant on > global variables, which isn't going to fly if we want this code to use > threads on Windows (or anywhere else). It's also bad style. Technically, since most of pg_dump.c dumps the catalog and since this isn't done in parallel but only in the master process, most functions need not be changed for the parallel restore. Only those that are called from the worker threads need to be changed, this has been done in e.g. dumpBlobs(), the change that you quoted upthread. > But it > seems possible that we might someday want to dump from one database > and restore into another database at the same time, so maybe we ought > to play it safe and use different variables for those things. Actually I've tried that but in the end concluded that it's best to have at most one database connection in an ArchiveHandle if you don't want to do a lot more refactoring. Besides the normal connection parameters like host, port, ... there's also std_strings, encoding, savedPassword, currUser/currSchema, lo_buf, remoteVersion ... that wouldn't be obvious where they belonged to. Speaking about refactoring, I'm happy to also throw in the idea to make the dump and restore more symmetrical than they are now. I kinda disliked RestoreOptions* being a member of the ArchiveHandle without having something similar for the dump. Ideally I'd say there should be a DumpOptions and a RestoreOptions field (with a "struct Connection" being part of them containing all the different connection parameters). They could be a union if you wanted to allow only one connection, or not if you want more than one. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Why extract( ... from timestamp ) is not immutable?
Josh Berkus writes: > On 1/30/12 5:41 PM, Tom Lane wrote: >> Well, the current marking is clearly incorrect. What to do about that >> is a bit less clear --- should we downgrade the marking, or change the >> function's behavior so that it really is immutable? > AFAIK, the only case which is NOT immutable is extract(epoch FROM > timestamp without time zone), no? That's the only one we currently know is not immutable. But before we make any decisions, I think it'd be a good idea to scrutinize all the other cases too, because obviously this area has gotten some careless hacking (*) done on it in the past. regards, tom lane (*) I have a nasty feeling that the carelessness was mine. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] pg_dump -s dumps data?!
[ Note: please follow-up to pgsql-hackers not pgsql-general; I think this discussion needs to move there ] hubert depesz lubaczewski writes: > On Mon, Jan 30, 2012 at 11:30:51AM -0500, Tom Lane wrote: >> That is way too vague for my taste, as you have not shown the pg_dump >> options you're using, for example. > i tried to explain that the options don't matter, but here we go. full > example: > [ example showing pg_dump's odd behavior for extension config tables ] [ traces through that with gdb... ] As I suspected, the behavioral change from 9.1 to HEAD is not intentional. It is an artifact of commit 7b070e896ca835318c90b02c830a5c4844413b64, which is almost, but not quite, entirely broken. I won't enumerate its shortcomings here, because they're not really relevant, but it does seem appropriate to discuss exactly what we think *should* happen for tables created inside extensions. The original design intention for non-config tables was, per the manual: Ordinarily, if a table is part of an extension, neither the table's definition nor its content will be dumped by pg_dump. the assumption being that both the definition and the content would be re-loaded by executing the extension's SQL script. The purpose of pg_extension_config_dump() is to allow part or all of the data in the table to be treated as user data and thus dumped; this is assumed to be data not supplied by the extension script but by subsequent user insertions. I don't recall that we thought very hard about what should happen when pg_dump switches are used to produce a selective dump, but ISTM reasonable that if it's "user data" then it should be dumped only if data in a regular user table would be. So I agree it's pretty broken that "pg_dump -t foo" will dump data belonging to a config table not selected by the -t switch. I think this should be changed in both HEAD and 9.1 (note that HEAD will presumably return to the 9.1 behavior once that --exclude-table-data patch gets fixed). What's not apparent to me is whether there's an argument for doing more than that. It strikes me that the current design is not very friendly towards the idea of an extension that creates a table that's meant solely to hold user data --- you'd have to mark it as "config" which seems a bit unfortunate terminology for that case. Is it important to do something about that, and if so what? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusing EXPLAIN output in case of inherited tables
I don't believe that the problem has anything to do with the names of > other tables that might happen to exist in the database. It's a matter > of what RTE names/aliases are exposed for variable references in > different parts of the query. > > Names of other tables come into picture when we schema qualify the fake aliases in the generated query. See examples in first post. >regards, tom lane > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company
Re: [HACKERS] [GENERAL] Why extract( ... from timestamp ) is not immutable?
On 1/30/12 5:41 PM, Tom Lane wrote: > hubert depesz lubaczewski writes: >> On Mon, Jan 30, 2012 at 10:35:21AM -0800, Josh Berkus wrote: >>> We can't have functions which are immutable or not depending on their >>> inputs. That way lies madness. > >> but this is exactly what's happening now. > > Well, the current marking is clearly incorrect. What to do about that > is a bit less clear --- should we downgrade the marking, or change the > function's behavior so that it really is immutable? AFAIK, the only case which is NOT immutable is extract(epoch FROM timestamp without time zone), no? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
This is fixed version of dblink.c for row processor. > i'll re-send the properly fixed patch for dblink.c later. - malloc error in initStoreInfo throws ERRCODE_OUT_OF_MEMORY. (new error) - storeHandler() now returns FALSE on malloc failure. Garbage cleanup is done in dblink_fetch() or dblink_record_internal(). The behavior that this dblink displays this error as 'unkown error/could not execute query' on the user session is as it did before. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index 36a8e3e..7a82ea1 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -63,11 +63,23 @@ typedef struct remoteConn bool newXactForCursor; /* Opened a transaction for a cursor */ } remoteConn; +typedef struct storeInfo +{ + Tuplestorestate *tuplestore; + int nattrs; + MemoryContext oldcontext; + AttInMetadata *attinmeta; + char** valbuf; + int *valbuflen; + bool error_occurred; + bool nummismatch; + ErrorData *edata; +} storeInfo; + /* * Internal declarations */ static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async); -static void materializeResult(FunctionCallInfo fcinfo, PGresult *res); static remoteConn *getConnectionByName(const char *name); static HTAB *createConnHash(void); static void createNewConnection(const char *name, remoteConn *rconn); @@ -90,6 +102,10 @@ static char *escape_param_str(const char *from); static void validate_pkattnums(Relation rel, int2vector *pkattnums_arg, int32 pknumatts_arg, int **pkattnums, int *pknumatts); +static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo); +static void finishStoreInfo(storeInfo *sinfo); +static int storeHandler(PGresult *res, void *param, PGrowValue *columns); + /* Global */ static remoteConn *pconn = NULL; @@ -503,6 +519,7 @@ dblink_fetch(PG_FUNCTION_ARGS) char *curname = NULL; int howmany = 0; bool fail = true; /* default to backward compatible */ + storeInfo storeinfo; DBLINK_INIT; @@ -559,15 +576,36 @@ dblink_fetch(PG_FUNCTION_ARGS) appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname); /* + * Result is stored into storeinfo.tuplestore instead of + * res->result retuned by PQexec below + */ + initStoreInfo(&storeinfo, fcinfo); + PQregisterRowProcessor(conn, storeHandler, &storeinfo); + + /* * Try to execute the query. Note that since libpq uses malloc, the * PGresult will be long-lived even though we are still in a short-lived * memory context. */ res = PQexec(conn, buf.data); + finishStoreInfo(&storeinfo); + if (!res || (PQresultStatus(res) != PGRES_COMMAND_OK && PQresultStatus(res) != PGRES_TUPLES_OK)) { + /* finishStoreInfo saves the fields referred to below. */ + if (storeinfo.nummismatch) + { + /* This is only for backward compatibility */ + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("remote query result rowtype does not match " + "the specified FROM clause rowtype"))); + } + else if (storeinfo.edata) + ReThrowError(storeinfo.edata); + dblink_res_error(conname, res, "could not fetch from cursor", fail); return (Datum) 0; } @@ -579,8 +617,8 @@ dblink_fetch(PG_FUNCTION_ARGS) (errcode(ERRCODE_INVALID_CURSOR_NAME), errmsg("cursor \"%s\" does not exist", curname))); } + PQclear(res); - materializeResult(fcinfo, res); return (Datum) 0; } @@ -640,6 +678,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async) remoteConn *rconn = NULL; bool fail = true; /* default to backward compatible */ bool freeconn = false; + storeInfo storeinfo; /* check to see if caller supports us returning a tuplestore */ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo)) @@ -715,164 +754,217 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async) rsinfo->setResult = NULL; rsinfo->setDesc = NULL; + + /* + * Result is stored into storeinfo.tuplestore instead of + * res->result retuned by PQexec/PQgetResult below + */ + initStoreInfo(&storeinfo, fcinfo); + PQregisterRowProcessor(conn, storeHandler, &storeinfo); + /* synchronous query, or async result retrieval */ if (!is_async) res = PQexec(conn, sql); else - { res = PQgetResult(conn); - /* NULL means we're all done with the async results */ - if (!res) - return (Datum) 0; - } - /* if needed, close the connection to the database and cleanup */ - if (freeconn) - PQfinish(conn); + finishStoreInfo(&storeinfo); - if (!res || - (PQresultStatus(res) != PGRES_COMMAND_OK && - PQresultStatus(res) != PGRES_TUPLES_OK)) + /* NULL res from async get means we're all done with the results */ + if (res || !is_async) { - dblink_res_error(conname, res, "could not execute query", fail); - return (Datum) 0; + if (freeconn) + PQfinish(conn); + + if (!res || + (PQresultStatus(res) != PGRES_COMMAND_OK && + PQresultStatus(res) != PGRES_TUPLES_OK)) + { + /* finishStoreI
Re: [HACKERS] [GENERAL] Why extract( ... from timestamp ) is not immutable?
hubert depesz lubaczewski writes: > On Mon, Jan 30, 2012 at 10:35:21AM -0800, Josh Berkus wrote: >> We can't have functions which are immutable or not depending on their >> inputs. That way lies madness. > but this is exactly what's happening now. Well, the current marking is clearly incorrect. What to do about that is a bit less clear --- should we downgrade the marking, or change the function's behavior so that it really is immutable? I haven't formed an opinion on that myself, other than to think that it's something that requires more than a moment's thought. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for parallel pg_dump
Robert Haas writes: > But the immediate problem is that pg_dump.c is heavily reliant on > global variables, which isn't going to fly if we want this code to use > threads on Windows (or anywhere else). It's also bad style. So I > suggest that we refactor pg_dump.c to get rid of g_conn and g_fout. I've looked at that a bit in the past and decided that the notational overhead would be too much. OTOH, if we're going to be forced into it in order to support parallel pg_dump, we might as well do it first in a separate patch. > ... But it > seems possible that we might someday want to dump from one database > and restore into another database at the same time, so maybe we ought > to play it safe and use different variables for those things. So I'm > inclined to think we could just add a "PGconn *remote_connection" > argument to the Archive structure (to go with all of the similarly > named "remote" fields, all of which describe the DB to be dumped) and > then that would be available everywhere that the Archive itself is. I always thought that the "remote" terminology was singularly unhelpful. It implies there's a "local" connection floating around somewhere, which of course there is not, and it does nothing to remind you of whether the connection leads to a database being dumped or a database being restored into. If we are going to have two fields, could we name them something less opaque, perhaps "src_connection" and "dest_connection"? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On Tue, Jan 24, 2012 at 03:47:16PM -0300, Alvaro Herrera wrote: > The biggest item remaining is the point you raised about multixactid > wraparound. This is closely related to multixact truncation and the way > checkpoints are to be handled. If we think that MultiXactId wraparound > is possible, and we need to involve autovacuum to keep it at bay, then I To prove it possible, we need prove there exists some sequence of operations consuming N xids and M > N multixactids. Have N transactions key-lock N-1 rows apiece, then have each of them key-lock one of the rows locked by each other transaction. This consumes N xids and N(N-1) multixactids. I believe you could construct a workload with N! multixact usage, too. Existence proofs are one thing, real workloads another. My unsubstantiated guess is that multixactid use will not overtake xid use in bulk on a workload not specifically tailored to do so. So, I think it's enough to notice it, refuse to assign a new multixactid, and tell the user to clear the condition with a VACUUM FREEZE of all databases. Other opinions would indeed be useful. > think the only way to make that work is to add another column to > pg_class so that each table's oldest multixact is tracked, same as we do > with relfrozenxid for Xids. If we do that, I think we can do away with > most of the MultiXactTruncate junk I added -- things would become a lot > simpler. The cost would be bloating pg_class a bit more. Are we okay > with paying that cost? I asked this question some months ago and I > decided that I would not add the column, but I am starting to lean the > other way. I would like some opinions on this. That's not the only way; autovacuum could just accelerate normal freezing to advance the multixactid horizon indirectly. Your proposal could make it far more efficient, though. > You asked two questions about WAL-logging locks: one was about the level > of detail we log for each lock we grab; the other was about > heap_xlog_update logging the right info. AFAICS, the main thing that > makes detailed WAL logging necessary is hot standbys. That is, the > standby must have all the locking info so that concurrent transactions > are similarly locked as in the master ... or am I wrong in that? (ISTM > this means I need to fix heap_xlog_update so that it faithfully > registers the lock info we're storing, not just the current Xid). Standby servers do not need accurate tuple locks, because it's not possible to wait on a tuple lock during recovery. (By the way, the question about log detail was just from my own curiosity. We don't especially need an answer to move forward with this patch, unless you want one.) > * Columns that are part of the key > Noah thinks the set of columns should only consider those actually > referenced > by keys, not those that *could* be referenced. Well, do you disagree? To me it's low-hanging fruit, because it isolates the UPDATE-time overhead of this patch to FK-referenced tables rather than all tables having a PK or PK-like index (often just "all tables"). > Also, in a table without columns, are all columns part of the key, or is the > key the empty set? I changed HeapSatisfiesHOTUpdate but that seems > arbitrary. It does seem arbitrary. What led you to switch in a later version? Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Group commit, revised
On Mon, Jan 30, 2012 at 8:04 PM, Heikki Linnakangas wrote: > So, what's the approach you're working on? I've had a few days leave at end of last week, so no time to fully discuss the next steps with the patch. That's why you were requested not to commit anything. You've suggested there was no reason to have the WALwriter be involved, which isn't the case and made other comments about latches that weren't correct also. The plan here is to allow WAL flush and clog updates to occur concurrently. Which allows the clog contention and update time to be completely hidden behind the wait for the WAL flush. That is only possible if we have the WALwriter involved since we need two processes to be actively involved. It's a relatively minor change and uses code that is already committed and working, not some just invented low level stuff that might not work right. You might then ask, why the delay? Just simply because my absence has prevented moving forwards. We'll have a patch tomorrow. The theory behind this is clear, but needs some explanation. There are 5 actions that need to occur at commit 1) insert WAL record 2) optionally flush WAL record 3) mark the clog AND set LSN from (1) if we skipped (2) 4) optionally wait for sync rep 5) remove the proc from the procarray Dependencies between those actions are these Step (3) must always happen before (5) otherwise we get race conditions in visibility checking. Step (4) must always happen before (5) otherwise we also get race conditions in disaster cases. Step (1) must always happen before (2) if it happens Step (1) must always happen before (3) if we skipped (2) Notice that step (2) and step (3) are actually independent of each other. So an improved design for commit is to 2) request flush up to LSN, but don't wait 3) mark the clog and set LSN 4) wait for LSN once, either for walwriter or walsender to release us This is free of race conditions as long as step (3) marks each clog page with a valid LSN, just as we would do for asynchronous commit. Marking the clog with an LSN ensures that we issue XLogFlush(LSN) on the clog page before it is written, so we always get WAL flushed to the desired LSN before the clog mark appears on disk. Does this cause any other behaviour? No, because the LSN marked on the clog is always flushed by the time we hit step (5), so there is no delay in any hint bit setting, or any other effect. So step (2) requests the flush, which is then performed by WALwriter. Backend then performs (3) while the flush takes place and then waits at step (4) to be woken We only wait once in step 4, rather than waiting for flush at step (2) and then waiting again at step (4). So we use the existing code path for TransactionIdAsyncCommitTree() yet we wait at step (4) using the SyncRep code. Step 5 happens last, as always. There are two benefits to this approach * The clog update happens "for free" since it is hidden behind a longer running task * The implementation uses already tested and robust code for SyncRep and AsyncCommit -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Make group commit more effective.
On 30.01.2012 23:06, Heikki Linnakangas wrote: On 30.01.2012 22:50, Heikki Linnakangas wrote: On 30.01.2012 20:27, Robert Haas wrote: Either this patch, or something else committed this morning, is causing "make check" to hang or run extremely slowly for me. I think it's this patch, because I attached to a backend and stopped it a few times, and all the backtraces look like this: Yeah, sure looks like it's the group commit commit. It works for me, and staring at the code, I have no idea what could be causing it. The buildfarm seems happy too, so this is pretty mysterious. And just after sending that, I succeeded to reproduce this. I had to lower wal_buffers to a small value to make it happen. I'm debugging this now.. It was a bug in the LWLockRelease code, after all. Fixed. Unfortunately this added a couple more instructions to that critical codepath, but I think it should still go without notice. Let me know if this doesn't fix the hang on your laptop. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch pg_is_in_backup()
On 30-01-2012 15:33, Gilles Darold wrote: > Yesterday I was facing a little issue with a backup software (NetBackup) > that do not report error when a post backup script is run. The problem > is that this script execute pg_stop_backup() and if there's any failure > PostgreSQL keeps running in on-line backup mode. So the backup is not > completed and the next one too because the call to pg_start_backup() > will fail. > I use something similar to your pl/PgSQL version and was about to propose something along these lines to core. +1 to include it. Please, add your patch to the next CF. -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Make group commit more effective.
On 30.01.2012 22:50, Heikki Linnakangas wrote: On 30.01.2012 20:27, Robert Haas wrote: Either this patch, or something else committed this morning, is causing "make check" to hang or run extremely slowly for me. I think it's this patch, because I attached to a backend and stopped it a few times, and all the backtraces look like this: Yeah, sure looks like it's the group commit commit. It works for me, and staring at the code, I have no idea what could be causing it. The buildfarm seems happy too, so this is pretty mysterious. And just after sending that, I succeeded to reproduce this. I had to lower wal_buffers to a small value to make it happen. I'm debugging this now.. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Make group commit more effective.
On 30.01.2012 20:27, Robert Haas wrote: Either this patch, or something else committed this morning, is causing "make check" to hang or run extremely slowly for me. I think it's this patch, because I attached to a backend and stopped it a few times, and all the backtraces look like this: Yeah, sure looks like it's the group commit commit. It works for me, and staring at the code, I have no idea what could be causing it. The buildfarm seems happy too, so this is pretty mysterious. I did find one bug, see attached, but AFAICS it should only cause unnecessary wakeups in some corner cases, which is harmless. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com commit 73b479198d9e7e1cbdb2da705a0cd9226b0d8265 Author: Heikki Linnakangas Date: Mon Jan 30 20:56:35 2012 +0200 Fix thinko in the group commit patch. When releasing a lwlock, if the first waiter in the queue waited for a shared lock, and all the rest were of the new "wait-until-free" kind, the releaseOK flag would not be cleared. That's harmless AFAICS, but it was not intended. diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c index bee35b8..eb9ff95 100644 --- a/src/backend/storage/lmgr/lwlock.c +++ b/src/backend/storage/lmgr/lwlock.c @@ -768,9 +768,9 @@ LWLockRelease(LWLockId lockid) while (proc->lwWaitLink != NULL && proc->lwWaitLink->lwWaitMode != LW_EXCLUSIVE) { - proc = proc->lwWaitLink; if (proc->lwWaitMode != LW_WAIT_UNTIL_FREE) releaseOK = false; + proc = proc->lwWaitLink; } } /* proc is now the last PGPROC to be released */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention, part 2
On Fri, Jan 27, 2012 at 8:21 PM, Jeff Janes wrote: > On Fri, Jan 27, 2012 at 3:16 PM, Merlin Moncure wrote: >> On Fri, Jan 27, 2012 at 4:05 PM, Jeff Janes wrote: >>> Also, I think the general approach is wrong. The only reason to have >>> these pages in shared memory is that we can control access to them to >>> prevent write/write and read/write corruption. Since these pages are >>> never written, they don't need to be in shared memory. Just read >>> each page into backend-local memory as it is needed, either >>> palloc/pfree each time or using a single reserved block for the >>> lifetime of the session. Let the kernel worry about caching them so >>> that the above mentioned reads are cheap. >> >> right -- exactly. but why stop at one page? > > If you have more than one, you need code to decide which one to evict > (just free) every time you need a new one. And every process needs to > be running this code, while the kernel is still going to need make its > own decisions for the entire system. It seems simpler to just let the > kernel do the job for everyone. Are you worried that a read syscall > is going to be slow even when the data is presumably cached in the OS? I think that would be a very legitimate worry. You're talking about copying 8kB of data because you need two bits. Even if the user/kernel mode context switch is lightning-fast, that's a lot of extra data copying. In a previous commit, 33aaa139e6302e81b4fbf2570be20188bb974c4f, we increased the number of CLOG buffers from 8 to 32 (except in very low-memory configurations). The main reason that shows a win on Nate Boley's 32-core test machine appears to be because it avoids the scenario where there are, say, 12 people simultaneously wanting to read 12 different CLOG buffers, and so 4 of them have to wait for a buffer to become available before they can even think about starting a read. The really bad latency spikes were happening not because the I/O took a long time, but because it can't be started immediately. However, these spikes are now gone, as a result of the above-commit. Probably you can get them back with enough cores, but you'll probably hit a lot of other, more serious problems first. I assume that if there's any purpose to further optimization here, it's either because the overall miss rate of the cache is too large, or because the remaining locking costs are too high. Unfortunately I haven't yet had time to look at this patch and understand what it does, or machine cycles available to benchmark it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Group commit, revised
On 30.01.2012 21:55, Simon Riggs wrote: On Mon, Jan 30, 2012 at 3:32 PM, Heikki Linnakangas wrote: On 30.01.2012 17:18, Simon Riggs wrote: Peter and I have been working on a new version that seems likely to improve performance over your suggestions. We should be showing something soon. Please post those ideas, and let's discuss them. If it's something simple, maybe we can still sneak them into this release. Otherwise, let's focus on the existing patches that are pending review or commit. If you really did want to discuss it, it would have taken you 5 minutes to check whether there was consensus on the patch before committing it. Your actions betray the opposite of a willingness to discuss anything. Yes, I'd like to discuss ideas, not just ram home a half-discussed and half-finished patch that happens to do things the way you personally prefer, overriding all inputs. Especially when you know we're working on another version. Sorry. So, what's the approach you're working on? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Group commit, revised
On Mon, Jan 30, 2012 at 3:32 PM, Heikki Linnakangas wrote: > On 30.01.2012 17:18, Simon Riggs wrote: >> I asked clearly and specifically for you to hold back committing >> anything. Not sure why you would ignore that and commit without >> actually asking myself or Peter. On a point of principle alone, I >> think you should revert. Working together is difficult if >> communication channels are openly ignored and disregarded. > > > You must be referring to this: > > http://archives.postgresql.org/pgsql-hackers/2012-01/msg01406.php > > What I committed in the end was quite different from the version that was in > reply to, too. If you have a specific objection to the patch as committed, > please let me know. I said "There is much yet to discuss so please don't think about committing anything yet." There's not really any way you could misinterpret them. >> Peter and I have been working on a new version that seems likely to >> improve performance over your suggestions. We should be showing >> something soon. > > > Please post those ideas, and let's discuss them. If it's something simple, > maybe we can still sneak them into this release. Otherwise, let's focus on > the existing patches that are pending review or commit. If you really did want to discuss it, it would have taken you 5 minutes to check whether there was consensus on the patch before committing it. Your actions betray the opposite of a willingness to discuss anything. Yes, I'd like to discuss ideas, not just ram home a half-discussed and half-finished patch that happens to do things the way you personally prefer, overriding all inputs. Especially when you know we're working on another version. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Why extract( ... from timestamp ) is not immutable?
On Mon, Jan 30, 2012 at 10:35:21AM -0800, Josh Berkus wrote: > > > preferably I would see extract( epoch from timestamp ) to be really > > immutable, i.e. (in my opinion) it should treat incoming data as UTC > > - for epoch calculation. > > Alternatively - perhaps epoch extraction should be moved to specialized > > function, which would have swapped mutability: > > We can't have functions which are immutable or not depending on their > inputs. That way lies madness. but this is exactly what's happening now. extract( ... from timestamp) is marked as immutable, while in some cases (namely when you want epoch) it should be stable because the return from function changes. Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Why extract( ... from timestamp ) is not immutable?
> preferably I would see extract( epoch from timestamp ) to be really > immutable, i.e. (in my opinion) it should treat incoming data as UTC > - for epoch calculation. > Alternatively - perhaps epoch extraction should be moved to specialized > function, which would have swapped mutability: We can't have functions which are immutable or not depending on their inputs. That way lies madness. > get_epoch(timestamptz) would be immutable > while > get_epoch(timestamp) would be stable Well, to_epoch, in order to be consistent with other conversion functions. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Patch pg_is_in_backup()
Hello, Yesterday I was facing a little issue with a backup software (NetBackup) that do not report error when a post backup script is run. The problem is that this script execute pg_stop_backup() and if there's any failure PostgreSQL keeps running in on-line backup mode. So the backup is not completed and the next one too because the call to pg_start_backup() will fail. After some time searching for a Pg system administration function like pg_is_in_recovery(), let's say pg_is_in_backup(), I couldn't find one. The minor patch attached here adds this administrative function that can be used with others backup control functions. This is a very little patch outside documentation because the function is only a wrapper to the internal BackupInProgress() function, just like pg_is_in_recovery() is calling RecoveryInProgress(). I hope it could be included as it could help to save time for some sysadmin who want to monitor on-line backup process and that do not have SQL knowledge. Saying that it's always possible to check if the backup_label file exist under PGDATA but this need to be checked on the host with postgres priviledge. The other way is to create a plpgsql function with security definer that will have the same effect than the patch above except that it need some SQL knowledge. I've give it here for web search, thanks to Guillaume and Marc. CREATE OR REPLACE FUNCTION pg_is_in_backup ( ) RETURNS BOOLEAN AS $$ DECLARE is_exists BOOLEAN; BEGIN -- Set a secure search_path: trusted schemas, then 'pg_temp'. PERFORM pg_catalog.set_config('search_path', 'pg_temp', true); SELECT ((pg_stat_file('backup_label')).modification IS NOT NULL) INTO is_exists; RETURN is_exists; EXCEPTION WHEN undefined_file THEN RETURN false; END $$ LANGUAGE plpgsql SECURITY DEFINER; Regards, -- Gilles Darold http://dalibo.com - http://dalibo.org diff -ru postgresql/doc/src/sgml/func.sgml postgresql-is_in_backup-patch/doc/src/sgml/func.sgml --- postgresql/doc/src/sgml/func.sgml 2012-01-26 23:01:31.956613398 +0100 +++ postgresql-is_in_backup-patch/doc/src/sgml/func.sgml 2012-01-26 23:14:29.468613019 +0100 @@ -14371,6 +14371,9 @@ pg_current_xlog_location +pg_is_in_backup + + pg_start_backup @@ -14424,6 +14427,14 @@ +pg_is_in_backup() + + bool + True if an on-line backup is still in progress. + + + + pg_start_backup(label text , fast boolean ) text diff -ru postgresql/src/backend/access/transam/xlogfuncs.c postgresql-is_in_backup-patch/src/backend/access/transam/xlogfuncs.c --- postgresql/src/backend/access/transam/xlogfuncs.c 2012-01-26 23:01:32.032613398 +0100 +++ postgresql-is_in_backup-patch/src/backend/access/transam/xlogfuncs.c 2012-01-26 23:16:04.100612972 +0100 @@ -465,3 +465,12 @@ { PG_RETURN_BOOL(RecoveryInProgress()); } + +/* + * Returns bool with current on-line backup mode, a global state. + */ +Datum +pg_is_in_backup(PG_FUNCTION_ARGS) +{ + PG_RETURN_BOOL(BackupInProgress()); +} diff -ru postgresql/src/include/access/xlog_internal.h postgresql-is_in_backup-patch/src/include/access/xlog_internal.h --- postgresql/src/include/access/xlog_internal.h 2012-01-26 23:01:32.332613398 +0100 +++ postgresql-is_in_backup-patch/src/include/access/xlog_internal.h 2012-01-26 23:22:51.492612774 +0100 @@ -281,5 +281,6 @@ extern Datum pg_xlog_replay_pause(PG_FUNCTION_ARGS); extern Datum pg_xlog_replay_resume(PG_FUNCTION_ARGS); extern Datum pg_is_xlog_replay_paused(PG_FUNCTION_ARGS); +extern Datum pg_is_in_backup(PG_FUNCTION_ARGS); #endif /* XLOG_INTERNAL_H */ diff -ru postgresql/src/include/catalog/pg_proc.h postgresql-is_in_backup-patch/src/include/catalog/pg_proc.h --- postgresql/src/include/catalog/pg_proc.h 2012-01-26 23:01:32.340613398 +0100 +++ postgresql-is_in_backup-patch/src/include/catalog/pg_proc.h 2012-01-27 09:26:09.918736657 +0100 @@ -2899,6 +2899,8 @@ DESCR("prepare for taking an online backup"); DATA(insert OID = 2173 ( pg_stop_backup PGNSP PGUID 12 1 0 0 0 f f f t f v 0 0 25 "" _null_ _null_ _null_ _null_ pg_stop_backup _null_ _null_ _null_ )); DESCR("finish taking an online backup"); +DATA(insert OID = 3882 ( pg_is_in_backup PGNSP PGUID 12 1 0 0 0 f f f t f v 0 0 16 "" _null_ _null_ _null_ _null_ pg_is_in_backup _null_ _null_ _null_ )); +DESCR("true if server is in on-line backup"); DATA(insert OID = 2848 ( pg_switch_xlog PGNSP PGUID 12 1 0 0 0 f f f t f v 0 0 25 "" _null_ _null_ _null_ _null_ pg_switch_xlog _null_ _null_ _null_ )); DESCR("switch to new xlog file"); DATA(insert OID = 3098 ( pg_create_restore_point PGNSP PGUID 12 1 0 0 0 f f f t f v 1 0 25 "25" _null_ _null_ _null_ _null_ pg_create_restore_point _null_ _null_ _null_ )); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Make group commit more effective.
On Mon, Jan 30, 2012 at 9:55 AM, Heikki Linnakangas wrote: > Make group commit more effective. > > When a backend needs to flush the WAL, and someone else is already flushing > the WAL, wait until it releases the WALInsertLock and check if we still need > to do the flush or if the other backend already did the work for us, before > acquiring WALInsertLock. This helps group commit, because when the WAL flush > finishes, all the backends that were waiting for it can be woken up in one > go, and the can all concurrently observe that they're done, rather than > waking them up one by one in a cascading fashion. > > This is based on a new LWLock function, LWLockWaitUntilFree(), which has > peculiar semantics. If the lock is immediately free, it grabs the lock and > returns true. If it's not free, it waits until it is released, but then > returns false without grabbing the lock. This is used in XLogFlush(), so > that when the lock is acquired, the backend flushes the WAL, but if it's > not, the backend first checks the current flush location before retrying. > > Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although > this patch as committed ended up being very different from that. Either this patch, or something else committed this morning, is causing "make check" to hang or run extremely slowly for me. I think it's this patch, because I attached to a backend and stopped it a few times, and all the backtraces look like this: #0 0x7fff8a545b22 in semop () #1 0x0001001ff8df in PGSemaphoreLock (sema=0x103d7de70, interruptOK=0 '\0') at pg_sema.c:418 #2 0x00010024d7dd in LWLockWaitUntilFree (lockid=, mode=) at lwlock.c:666 #3 0x00010005d3b3 in XLogFlush (record=) at xlog.c:2148 #4 0x0001000506bb in CommitTransaction () at xact.c:1113 #5 0x000100050b35 in CommitTransactionCommand () at xact.c:2613 #6 0x00010025a403 in finish_xact_command () at postgres.c:2388 #7 0x00010025d525 in exec_simple_query (query_string=0x101055638 "CREATE INDEX wowidx ON test_tsvector USING gin (a);") at postgres.c:1052 #8 0x00010025dfc1 in PostgresMain (argc=2, argv=, username=) at postgres.c:3881 #9 0x00010020c258 in ServerLoop () at postmaster.c:3587 #10 0x00010020d167 in PostmasterMain (argc=6, argv=0x100d08f40) at postmaster.c:1110 #11 0x00010019e745 in main (argc=6, argv=0x100d08f40) at main.c:199 -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] Add GUC sepgsql.client_label
On Sun, Jan 29, 2012 at 7:27 AM, Kohei KaiGai wrote: > 2012/1/28 Kohei KaiGai : >> 2012/1/26 Robert Haas : >>> On Thu, Jan 26, 2012 at 2:07 PM, Kohei KaiGai wrote: 2012/1/26 Robert Haas : > I'm wondering if a function would be a better fit than a GUC. I don't > think you can really restrict the ability to revert a GUC change - > i.e. if someone does a SET and then a RESET, you pretty much have to > allow that. I think. But if you expose a function then it can work > however you like. > One benefit to use GUC is that we can utilize existing mechanism to revert a value set within a transaction block on error. If we implement same functionality with functions, XactCallback allows sepgsql to get control on appropriate timing? >>> >>> Not sure, but I thought the use case was to set this at connection >>> startup time and then hand the connection off to a client. What keeps >>> the client from just issuing RESET? >>> >> In the use case of Joshua, the original security label does not privileges >> to reference any tables, and it cannot translate any domains without >> credential within IC-cards. Thus, RESET is harmless. >> >> However, I also assume one other possible use-case; the originator >> has privileges to translate 10 of different domains, but disallows to >> revert it. In this case, RESET without permission checks are harmful. >> (This scenario will be suitable with multi-category-model.) >> >> Apart from this issue, I found a problem on implementation using GUC >> options. It makes backend crash, if it tries to reset sepgsql.client_label >> without permissions within error handler because of double-errors. >> > I found an idea to avoid this scenario. > > The problematic case is reset GUC variable because of transaction > rollback and permission violation, however, we don't intend to apply > permission checks, since it is not committed yet. > Thus, I considered to check state of the transaction using > IsTransactionState() on the assign_hook of GUC variable. > > Unlike function based implementation, it is available to utilize existing > infrastructure to set the client_label to be transaction-aware. > > It shall be implemented as: > > void > sepgsql_assign_client_label(const char *newval, void *extra) > { > if (!IsTransactionState) > return; > > /* check whether valid security context */ > > /* check permission check to switch current context */ > } > > How about such an implementation? Well, I think the idea that GUC changes can be reverted at any time is fairly deeply ingrained in the GUC machinery. So I think that's a hack: it might happen to work today (or at least on the cases we can think of to test), but I wouldn't be a bit surprised if there are other cases where it fails, either now or in the future. I don't see any good reason to assume that unrevertable changes are OK even inside a transaction context. For example, if someone applies a context to a function with ALTER FUNCTION, they're going to expect that the altered context remains in effect while the function is running and gets reverted on exit. If you throw an error when the system tries to revert the change at function-exit time, it might not crash the server, but it's certainly going to violate the principle of least astonishment. Of course, we already have a trusted procedure mechanism which is intended to support temporary changes to the effective security label, so you might say, hey, people shouldn't do that. But they might. And I wouldn't like to bet that's the only case that could be problematic either. What about a setting in postgresql.conf? You could end up being asked to set the security label to some other value before you've initialized it. What about SET LOCAL? It's not OK to fail to revert that at transaction exit. Hence my suggestion of a function: if you use functions, you can implement whatever semantics you want. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
On Mon, Jan 30, 2012 at 06:06:57PM +0900, Kyotaro HORIGUCHI wrote: > The gain of performance is more than expected. Measure script now > does query via dblink ten times for stability of measuring, so > the figures become about ten times longer than the previous ones. > >sec% to Original > Original : 31.5 100.0% > RowProcessor patch : 31.3 99.4% > dblink patch : 24.6 78.1% > > RowProcessor patch alone makes no loss or very-little gain, and > full patch gives us 22% gain for the benchmark(*1). Excellent! > - The callers of RowProcessor() no more set null_field to > PGrowValue.value. Plus, the PGrowValue[] which RowProcessor() > receives has nfields + 1 elements to be able to make rough > estimate by cols->value[nfields].value - cols->value[0].value - > something. The somthing here is 4 * nfields for protocol3 and > 4 * (non-null fields) for protocol2. I fear that this applies > only for textual transfer usage... Excact estimate is not important here. And (nfields + 1) elem feels bit too much magic, considering that most users probably do not need it. Without it, the logic would be: total = last.value - first.value + ((last.len > 0) ? last.len : 0) which isn't too complex. So I think we can remove it. = Problems = * Remove the dubious memcpy() in pqAddRow() * I think the dynamic arrays in getAnotherTuple() are not portable enough, please do proper allocation for array. I guess in PQsetResultAttrs()? = Minor notes = These can be argued either way, if you don't like some suggestion, you can drop it. * Move PQregisterRowProcessor() into fe-exec.c, then we can make pqAddRow static. * Should PQclear() free RowProcessor error msg? It seems it should not get outside from getAnotherTuple(), but thats not certain. Perhaps it would be clearer to free it here too. * Remove the part of comment in getAnotherTuple(): * Buffer content may be shifted on reloading additional * data. So we must set all pointers on every scan. It's confusing why it needs to clarify that, as there is nobody expecting it. * PGrowValue documentation should mention that ->value pointer is always valid. * dblink: Perhaps some of those mallocs() could be replaced with pallocs() or even StringInfo, which already does the realloc dance? I'm not familiar with dblink, and various struct lifetimes there so I don't know it that actually makes sense or not. It seems this patch is getting ReadyForCommitter soon... -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby off of hot standby?
On 1/29/12 8:36 PM, Igor Schtein wrote: > Hi, > > Is it possible to use a standby instance as a master/primary for another > standby in Postgres 9.0? In other words, does PG 9.0 supports cascading > standby configuration? No, that's a 9.2 feature in development. If you can build PostgreSQL from source and apply patches, we could use your help testing it! -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Why extract( ... from timestamp ) is not immutable?
On Wed, Jan 25, 2012 at 11:30:49AM -0500, Tom Lane wrote: > hubert depesz lubaczewski writes: > > anyway - the point is that in \df date_part(, timestamp) says it's > > immutable, while it is not. > > Hmm, you're right. I thought we'd fixed that way back when, but > obviously not. Or maybe the current behavior of the epoch case > postdates that. is there a chance something will happen with/about it? preferably I would see extract( epoch from timestamp ) to be really immutable, i.e. (in my opinion) it should treat incoming data as UTC - for epoch calculation. Alternatively - perhaps epoch extraction should be moved to specialized function, which would have swapped mutability: get_epoch(timestamptz) would be immutable while get_epoch(timestamp) would be stable Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simulating Clog Contention
On Mon, Jan 30, 2012 at 10:48 AM, Jeff Janes wrote: > On Mon, Jan 30, 2012 at 7:24 AM, Robert Haas wrote: >> On Sat, Jan 28, 2012 at 3:32 PM, Jeff Janes wrote: I think that even in normal (non-initialization) usage, this message should be suppressed when the provided scale factor is equal to the pgbench_branches table count. >>> >>> The attached patch does just that. There is probably no reason to >>> warn people that we are doing what they told us to, but not for the >>> reason they think. >> >> In my opinion, a more sensible approach than anything we're doing >> right now would be to outright *reject* options that will only be >> ignored. If -s isn't supported except with -i, then trying to specify >> -s without -i should just error out at the options-parsing stage, >> before we even try to connect to the database. It's not very helpful >> to accept options and then ignore them, and we have many instances of >> that right now: initialization-only switches are accepted and ignored >> when not initializing, and run-only switches are accepted and ignored >> with initializing. > > I like the ability to say, effectively, "I think I had previously did > the initialization with -s 40, if I actually didn't then scream at me, > and if I did then go ahead and do the pgbench I just asked for". > But, since it does unconditionally report the scale actually used and > I just have to have the discipline to go look at the result, I can see > where this is perhaps overkill. In my own (non-PG-related) code, > when I have tasks that have to be run in multiple phases that can get > out of sync if I am not careful, I like to be able to specify the > flags even in the "unused" invocation, so that the code can verify I > am being consistent. Code is better at that than I am. I guess my point is - if we're gonna have it do that, then shouldn't we actually *error out* if the scales don't match, not just print a warning and then charge ahead? > I'm not sure I know what all would be incompatible with what. I could > start drawing that matrix up once the API stabilizes, but I think you > are still planning on whacking this -I option around a bit. It's mostly that everything in the "initialization options" section of pgbench --help is incompatible with everything in the "benchmarking options" section, and the other way around. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simulating Clog Contention
On Mon, Jan 30, 2012 at 10:53 AM, Jeff Janes wrote: > On Thu, Jan 26, 2012 at 6:18 AM, Abhijit Menon-Sen wrote: >> At 2012-01-12 12:31:20 +, si...@2ndquadrant.com wrote: >>> >>> The following patch adds a pgbench option -I to load data using >>> INSERTs >> >> This is just to confirm that the patch applies and builds and works >> fine (though of course it does take a long time… pity there doesn't >> seem to be any easy way to add progress indication like -i has). > > I was thinking the opposite. That -i should only print progress > indication when -d is given. Or at least knock an order of magnitude > or two off of how often it does so. I'd be in all in favor of having -i emit progress reports 10x less often; even on a laptop, the current reports are very chatty. But I think 100x less often might be taking it too far. Either way, if we're going to have an option for inserts, they should produce the same progress reports that COPY does - though possibly more often, since I'm guessing it's likely to be way slower. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for parallel pg_dump
On Sun, Jan 29, 2012 at 12:00 PM, Tom Lane wrote: > Joachim Wieland writes: >> I know that you took back some of your comments, but I'm with you >> here. Archive is allocated as an ArchiveHandle and then casted back to >> Archive*, so you always know that an Archive is an ArchiveHandle. I'm >> all for getting rid of Archive and just using ArchiveHandle throughout >> pg_dump which would get rid of these useless casts. > > I'd like to see a more thoroughgoing look at the basic structure of > pg_dump. Everybody who's ever looked at that code has found it > confusing, with the possible exception of the original author who is > long gone from the project anyway. I don't know exactly what would make > it better, but the useless distinction between Archive and ArchiveHandle > seems like a minor annoyance, not the core disease. > > Not that there'd be anything wrong with starting with that. After some study, I'm reluctant to completely abandon the Archive/ArchiveHandle distinction because it seems to me that it does serve some purpose: right now, nothing in pg_dump.c - where all the code to actually dump stuff lives - knows anything about what's inside the ArchiveHandle, just the Archive. So having two data structures really does serve a purpose: if it's part of the Archive, you need it in order to query the system catalogs and generate SQL. If it isn't, you don't. Considering how much more crap there is inside an ArchiveHandle than an Archive, I don't think we should lightly abandon that distinction. Now, that having been said, there are probably better ways of making that distinction than what we have here; Archive and ArchiveHandle could be better named, perhaps, and we could have pointers from one structure to the other instead of magically embedding them inside each other. All the function pointers that are part of the ArchiveHandle could be separated out into a static, constant structure with a name like ArchiveMethod, and we could keep a pointer to the appropriate ArchiveMethod inside each ArchiveHandle instead of copying all the pointers into it. I think that'd be a lot less confusing. But the immediate problem is that pg_dump.c is heavily reliant on global variables, which isn't going to fly if we want this code to use threads on Windows (or anywhere else). It's also bad style. So I suggest that we refactor pg_dump.c to get rid of g_conn and g_fout. Getting rid of g_fout should be fairly easy: in many places, we're already passing Archive *fout as a parameter. If we pass it as a parameter to the functions that need it but aren't yet getting it as a parameter, then it can cease to exist as a global variable and become local to main(). Getting rid of g_conn is a little harder. Joachim's patch takes the view that we can safely overload the existing ArchiveHandle.connection member. Currently, that member is the connection to which we are doing a direct to database restore; what he's proposing to do is also use it to store the connection from which are doing the dump. But it seems possible that we might someday want to dump from one database and restore into another database at the same time, so maybe we ought to play it safe and use different variables for those things. So I'm inclined to think we could just add a "PGconn *remote_connection" argument to the Archive structure (to go with all of the similarly named "remote" fields, all of which describe the DB to be dumped) and then that would be available everywhere that the Archive itself is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusing EXPLAIN output in case of inherited tables
Ashutosh Bapat writes: > So, as I understand we have two problems here > 1. Prefixing schemaname to the fake alises if there is another RTE with > same name. There may not be a relation with that name (fake alias name > given) in the schema chosen as prefix. > 2. Fake aliases themselves can be conflicting. Well, the issue is more that a fake alias might unintentionally collide with a regular alias elsewhere in the query. There's no guard against that in the current behavior, and ISTM there needs to be. The one possibly-simplifying thing about this whole issue is that we needn't cater for references that couldn't have been made in the original query. For instance, if the inner and outer queries both have explicit aliases "tx", it's impossible for the inner query to have referred to any columns of the outer "tx" --- so we don't have to try to make it possible in the dumped form. > If I understand correctly, if we solve the second problem, first problem > will not occur. Is that correct? I don't believe that the problem has anything to do with the names of other tables that might happen to exist in the database. It's a matter of what RTE names/aliases are exposed for variable references in different parts of the query. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Group commit, revised
On Sun, Jan 29, 2012 at 1:20 PM, Greg Smith wrote: > On 01/28/2012 07:48 PM, Jeff Janes wrote: >> > >> I haven't inspected that deep fall off at 30 clients for the patch. >> By way of reference, if I turn off synchronous commit, I get >> tps=1245.8 which is 100% CPU limited. This sets an theoretical upper >> bound on what could be achieved by the best possible group committing >> method. > > > This sort of thing is why I suspect that to completely isolate some results, > we're going to need a moderately high end server--with lots of > cores--combined with an intentionally mismatched slow drive. It's too easy > to get pgbench and/or PostgreSQL to choke on something other than I/O when > using smaller core counts. I don't think I have anything where the floor is > 24 TPS per client though. Hmmm...I think I can connect an IDE drive to my > MythTV box and format it with ext4. Thanks for the test idea. > > One thing you could try on this system is using the -N "Do not update > pgbench_tellers and pgbench_branches". That eliminates a lot of the > contention that might be pulling down your higher core count tests, while > still giving a completely valid test of whether the group commit mechanism > works. Not sure whether that will push up the top-end usefully for you, > worth a try if you have time to test again. Adding the -N did eliminate the fall-off at 30 clients for group_commit patch. But, I still want to explore why the fall off occurs when I get a chance. I know why the curve would stop going up without using -N (with -s of 40 and -c of 30, many connections will be waiting on row locks for updates to pgbench_branches) but that should cause a leveling off, not a collapse. Other than the lack of drop off at 30 clients, -N didn't meaningfully change anything. Everyone got slightly faster except at -c1. >> If the group_commit patch goes in, would we then rip out commit_delay >> and commit_siblings? > > > The main reason those are still hanging around at all are to allow pushing > on the latency vs. throughput trade-off on really busy systems. The use > case is that you expect, say, 10 clients to constantly be committing at a > high rate. So if there's just one committing so far, assume it's the > leading edge of a wave and pause a bit for the rest to come in. I don't > think the cases where this is useful behavior--people both want it and the > current mechanism provides it--are very common in the real world. The tests I did are exactly that environment where commit_delay might be expected to help. And it did help, but just not all that much. One of the problems is that while it does wait for those others to come in and then it does flush them in one fsync; but often the others never get woken up successfully to realize that they have already been flushed. They continue to block. The group_commit patch, on the other hand, accomplishes exactly what commit_delay was intended to accomplish but doesn't do a very good job of. With the -N option, I also used commit_delay on top of group_commit, and the difference between the two look like it was within the margin of error. So commit_delay did not obviously cause further improvement. > It can be > useful for throughput oriented benchmarks though, which is why I'd say it > hasn't killed off yet. > > We'll have to see whether the final form this makes sense in will usefully > replace that sort of thing. I'd certainly be in favor of nuking > commit_delay and commit_siblings with a better option; it would be nice if > we don't eliminate this tuning option in the process though. But I'm pretty sure that group_commit has stolen that thunder. Obviously a few benchmarks on one system isn't enough to prove that, though. The only use case I see left for commit_delay is where it is set on a per-connection basis rather than system-wide. Once you start a fsync, everyone who missed the bus is locked out until the next one. So low-priority connections can set commit_delay so as not to trigger the bus to leave before the high priority process gets on. But that seems like a pretty tenuous use case with better ways to do it. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simulating Clog Contention
On Thu, Jan 26, 2012 at 6:18 AM, Abhijit Menon-Sen wrote: > At 2012-01-12 12:31:20 +, si...@2ndquadrant.com wrote: >> >> The following patch adds a pgbench option -I to load data using >> INSERTs > > This is just to confirm that the patch applies and builds and works > fine (though of course it does take a long time… pity there doesn't > seem to be any easy way to add progress indication like -i has). I was thinking the opposite. That -i should only print progress indication when -d is given. Or at least knock an order of magnitude or two off of how often it does so. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simulating Clog Contention
On Mon, Jan 30, 2012 at 7:24 AM, Robert Haas wrote: > On Sat, Jan 28, 2012 at 3:32 PM, Jeff Janes wrote: >>> I think that even in normal (non-initialization) usage, this message >>> should be suppressed when the provided scale factor >>> is equal to the pgbench_branches table count. >> >> The attached patch does just that. There is probably no reason to >> warn people that we are doing what they told us to, but not for the >> reason they think. > > In my opinion, a more sensible approach than anything we're doing > right now would be to outright *reject* options that will only be > ignored. If -s isn't supported except with -i, then trying to specify > -s without -i should just error out at the options-parsing stage, > before we even try to connect to the database. It's not very helpful > to accept options and then ignore them, and we have many instances of > that right now: initialization-only switches are accepted and ignored > when not initializing, and run-only switches are accepted and ignored > with initializing. I like the ability to say, effectively, "I think I had previously did the initialization with -s 40, if I actually didn't then scream at me, and if I did then go ahead and do the pgbench I just asked for". But, since it does unconditionally report the scale actually used and I just have to have the discipline to go look at the result, I can see where this is perhaps overkill. In my own (non-PG-related) code, when I have tasks that have to be run in multiple phases that can get out of sync if I am not careful, I like to be able to specify the flags even in the "unused" invocation, so that the code can verify I am being consistent. Code is better at that than I am. I'm not sure I know what all would be incompatible with what. I could start drawing that matrix up once the API stabilizes, but I think you are still planning on whacking this -I option around a bit. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 01/30/2012 09:54 AM, Abhijit Menon-Sen wrote: At 2012-01-27 09:47:05 +0530, a...@toroid.org wrote: I've started reviewing this patch, but it'll take me a bit longer to go through json.c properly. OK, I finished reading json.c. I don't have an answer to the detoasting question in the XXX comment, but the code looks fine. Looking at somewhat analogous code in xml.c, it doesn't seem to be done there. SO maybe we don't need to worry about it. Aside: is query_to_json really necessary? It seems rather ugly and easily avoidable using row_to_json. I started with this, again by analogy with query_to_xml(). But I agree it's a bit ugly. If we're not going to do it, then we definitely need to look at caching the output funcs in the function info. A closer approximation is actually: SELECT array_to_json(array_agg(q)) FROM ( your query here ) q; But then I'd want the ability to break that up a bit with line feeds, so we'd need to adjust the interface slightly. (Hint: don't try the above with "select * from pg_class".) I'll wait on further comments, but I can probably turn these changes around very quickly once we're agreed. Cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Group commit, revised
On 30.01.2012 17:18, Simon Riggs wrote: On Mon, Jan 30, 2012 at 2:57 PM, Heikki Linnakangas wrote: I committed this. I ran pgbench test on an 8-core box and didn't see any slowdown. It would still be good if you get a chance to rerun the bigger test, but I feel confident that there's no measurable slowdown. I asked clearly and specifically for you to hold back committing anything. Not sure why you would ignore that and commit without actually asking myself or Peter. On a point of principle alone, I think you should revert. Working together is difficult if communication channels are openly ignored and disregarded. You must be referring to this: http://archives.postgresql.org/pgsql-hackers/2012-01/msg01406.php What I committed in the end was quite different from the version that was in reply to, too. If you have a specific objection to the patch as committed, please let me know. Peter and I have been working on a new version that seems likely to improve performance over your suggestions. We should be showing something soon. Please post those ideas, and let's discuss them. If it's something simple, maybe we can still sneak them into this release. Otherwise, let's focus on the existing patches that are pending review or commit. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Group commit, revised
On Mon, Jan 30, 2012 at 2:57 PM, Heikki Linnakangas wrote: > I committed this. I ran pgbench test on an 8-core box and didn't see any > slowdown. It would still be good if you get a chance to rerun the bigger > test, but I feel confident that there's no measurable slowdown. I asked clearly and specifically for you to hold back committing anything. Not sure why you would ignore that and commit without actually asking myself or Peter. On a point of principle alone, I think you should revert. Working together is difficult if communication channels are openly ignored and disregarded. Peter and I have been working on a new version that seems likely to improve performance over your suggestions. We should be showing something soon. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
On Fri, Jan 27, 2012 at 4:07 PM, Dan Scales wrote: > The advantage of putting the checksum calculation in smgrwrite() (or > mdwrite()) is that it catches a bunch of page writes that don't go through > the buffer pool (see calls to smgrwrite() in nbtree.c, nbtsort.c, spginsert.c) Maybe we should have some sort of wrapper function, then. I really dislike the idea that the smgr layer knows anything about the page format, and if md has to know that's even worse. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simulating Clog Contention
On Sat, Jan 28, 2012 at 3:32 PM, Jeff Janes wrote: >> I think that even in normal (non-initialization) usage, this message >> should be suppressed when the provided scale factor >> is equal to the pgbench_branches table count. > > The attached patch does just that. There is probably no reason to > warn people that we are doing what they told us to, but not for the > reason they think. In my opinion, a more sensible approach than anything we're doing right now would be to outright *reject* options that will only be ignored. If -s isn't supported except with -i, then trying to specify -s without -i should just error out at the options-parsing stage, before we even try to connect to the database. It's not very helpful to accept options and then ignore them, and we have many instances of that right now: initialization-only switches are accepted and ignored when not initializing, and run-only switches are accepted and ignored with initializing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring Postgres to Add A New Source File
On Fri, Jan 27, 2012 at 6:37 PM, Tareq Aljabban wrote: > Indeed, I'm a beginner in "Make", but I read few tutorials and was able to > do what I wanted outside of PG using a simple make file. > Now, when moving to PG, I found the Make structure much more complicated and > didn't know where to add my configuration. > I'm looking only for this file to run in PG (the required functionality is > done already). My code will be a part of the backend, so I want to keep it > there. Uh, no it won't. The compile command you showed had it creating a standalone executable, hdfs_test. You're likely to find that linking a JVM into the backend significantly decreases the overall robustness of the system. For example, ^C isn't going to be able to abort out of processing a query that's off doing something inside the JVM. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multithread Query Planner
On Fri, Jan 27, 2012 at 2:56 PM, Pierre C wrote: >> Right. I think it makes more sense to try to get parallelism working >> first with the infrastructure we have. Converting to use threading, >> if we ever do it at all, should be something we view as a later >> performance optimization. But I suspect we won't want to do it >> anyway; I think there will be easier ways to get where we want to be. > > Multithreading got fashionable with the arrival of the Dual-core CPU a few > years ago. However, multithreading as it is used currently has a huge > problem : usually, threads share all of their memory. This opens the door to > an infinite number of hard to find bugs, and more importantly, defeats the > purpose. > > "Re-entrant palloc()" is nonsense. Suppose you can make a reentrant palloc() > which scales OK at 2 threads thanks to a cleverly placed atomic instruction. > How is it going to scale on 64 cores ? On HP's new 1000-core ARM server with > non-uniform memory access ? Probably it would suck very very badly... not to > mention the horror of multithreaded exception-safe deallocation when 1 > thread among many blows up on an error... There are academic papers out there on how to build a thread-safe, highly concurrent memory allocator. You seem to be assuming that everyone doing allocations would need to compete for access to a single freelist, or something like that, which is simply not true. A lot of effort and study has been put into figuring out how to get past bottlenecks in this area, because there is a lot of multi-threaded code out there that needs to surmount these problems. I don't believe that the problem is that it can't be done, but rather that we haven't done it. > For the ultimate in parallelism, ask a FPGA guy. Is he using shared memory > to wire together his 12000 DSP blocks ? Nope, he's using isolated Processes > which share nothing and communicate through FIFOs and hardware message > passing. Like shell pipes, basically. Or Erlang. I'm not sure we can conclude much from this example. The programming style of people using FPGAs is probably governed by the nature of the interface and the type of computation they are doing rather than anything else. > Good parallelism = reduce shared state and communicate through data/message > channels. > > Shared-everything multithreading is going to be in a lot of trouble on > future many-core machines. Incidentally, Postgres, with its Processes, > sharing only what is needed, has a good head start... > > With more and more cores coming, you guys are going to have to fight to > reduce the quantity of shared state between processes, not augment it by > using shared memory threads !... I do agree that it's important to reduce shared state. We've seen some optimizations this release cycle that work precisely because they cut down on the rate at which cache lines must be passed between cores, and it's pretty clear that we need to go farther in that direction. On the other hand, I think it's a mistake to confuse the programming model with the amount of shared state. In a multi-threaded programming model there is likely to be a lot more memory that is technically "shared" in the sense that any thread could technically access it. But if the application is coded in such a way that actual sharing is minimal, then it's not necessarily any worse than a process model as far as concurrency is concerned. Threading provides a couple of key advantages which, with our process model, we can't get: it avoids the cost of a copy-on-write operation every time a child is forked, and it allows arbitrary amounts of memory rather than being limited to a single shared memory segment that must be sized in advance. The major disadvantage is really with robustness, not performance, I think: in a threaded environment, with a shared address space, the consequences of a random memory stomp will be less predictable. > Say you want to parallelize sorting. > Sorting is a black-box with one input data pipe and one output data pipe. > Data pipes are good for parallelism, just like FIFOs. FPGA guys love black > boxes with FIFOs between them. > > Say you manage to send tuples through a FIFO like zeromq. Now you can even > run the sort on another machine and allow it to use all the RAM if you like. > Now split the black box in two black boxes (qsort and merge), instanciate as > many qsort boxes as necessary, and connect that together with pipes. Run > some boxes on some of this machine's cores, some other boxes on another > machine, etc. That would be very flexible (and scalable). > > Of course the black box has a small backdoor : some comparison functions can > access shared state, which is basically *the* issue (not reentrant stuff, > which you do not need). Well, you do need reentrant stuff, if the comparator does anything non-trivial. It's easy to imagine that comparing strings or dates or whatever is a trivial operation that's done without allocating any memory or throw
Re: [HACKERS] Group commit, revised
On 27.01.2012 15:38, Robert Haas wrote: On Fri, Jan 27, 2012 at 8:35 AM, Heikki Linnakangas wrote: Yeah, we have to be careful with any overhead in there, it can be a hot spot. I wouldn't expect any measurable difference from the above, though. Could I ask you to rerun the pgbench tests you did recently with this patch? Or can you think of a better test for this? I can't do so immediately, because I'm waiting for Nate Boley to tell me I can go ahead and start testing on that machine again. But I will do it once I get the word. I committed this. I ran pgbench test on an 8-core box and didn't see any slowdown. It would still be good if you get a chance to rerun the bigger test, but I feel confident that there's no measurable slowdown. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
At 2012-01-27 09:47:05 +0530, a...@toroid.org wrote: > > I've started reviewing this patch, but it'll take me a bit longer to go > through json.c properly. OK, I finished reading json.c. I don't have an answer to the detoasting question in the XXX comment, but the code looks fine. Aside: is query_to_json really necessary? It seems rather ugly and easily avoidable using row_to_json. -- ams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Hot standby off of hot standby?
Hi, Is it possible to use a standby instance as a master/primary for another standby in Postgres 9.0? In other words, does PG 9.0 supports cascading standby configuration? Thanks, Igor
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
I'm sorry. > Thank you for comments, this is revised version of the patch. The malloc error handling in dblink.c of the patch is broken. It is leaking memory and breaking control. i'll re-send the properly fixed patch for dblink.c later. # This severe back pain should have made me stupid :-p regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS]
I'm sorry. > Thank you for comments, this is revised version of the patch. The malloc error handling in dblink.c of the patch is broken. It is leaking memory and breaking control. i'll re-send the properly fixed patch for dblink.c later. # This severe back pain should have made me stupid :-p regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confusing EXPLAIN output in case of inherited tables
Thanks Tom for giving a stronger case. I found the problem whille looking at inherited tables, and didn't think beyond inherited tables. Since inherited tables are expanded when subquery planner is invoked, I thought the problem will occur only in Explain output as we won't generate queries, that can be used elsewhere after/during planning. So, as I understand we have two problems here 1. Prefixing schemaname to the fake alises if there is another RTE with same name. There may not be a relation with that name (fake alias name given) in the schema chosen as prefix. 2. Fake aliases themselves can be conflicting. If I understand correctly, if we solve the second problem, first problem will not occur. Is that correct? On Sat, Jan 28, 2012 at 8:08 AM, Tom Lane wrote: > Robert Haas writes: > > It's a feature, not a bug, that we schema-qualify names when VERBOSE > > is specified. That was done on purpose for the benefit of external > > tools that might need this information to disambiguate which object is > > being referenced. > > > Table *aliases*, of course, should not be schema-qualified, but I > > don't think that's what we're doing. You could make it more clear by > > including an alias in the query, like this: > > > explain verbose select * into table ramp from road hwy where name ~ > '.*Ramp'; > > I think you are both focusing on the wrong thing. There is a lot of > squishiness in what EXPLAIN prints out, since SQL notation is not always > well suited to what an execution plan actually does. But this code has > a hard and fast requirement that it dump view definitions correctly, > else pg_dump doesn't work. And after looking at this I think Ashutosh > has in fact found a bug. Consider this example: > > regression=# create schema s1; > CREATE SCHEMA > regression=# create schema s2; > CREATE SCHEMA > regression=# create table s1.t1 (f1 int); > CREATE TABLE > regression=# create table s2.t1 (f1 int); > CREATE TABLE > regression=# create view v1 as > regression-# select * from s1.t1 where exists ( > regression(# select 1 from s2.t1 where s2.t1.f1 = s1.t1.f1 > regression(# ); > CREATE VIEW > regression=# \d+ v1 > View "public.v1" > Column | Type | Modifiers | Storage | Description > +-+---+-+- > f1 | integer | | plain | > View definition: > SELECT t1.f1 > FROM s1.t1 > WHERE (EXISTS ( SELECT 1 > FROM s2.t1 > WHERE t1.f1 = s1.t1.f1)); > > regression=# alter table s2.t1 rename to tx; > ALTER TABLE > regression=# \d+ v1 > View "public.v1" > Column | Type | Modifiers | Storage | Description > +-+---+-+- > f1 | integer | | plain | > View definition: > SELECT t1.f1 > FROM s1.t1 > WHERE (EXISTS ( SELECT 1 > FROM s2.tx t1 > WHERE t1.f1 = s1.t1.f1)); > > Both of the above displays of the view are formally correct, in that the > variables will be taken to refer to the correct upper or lower RTE. > But let's change that back and rename the other table: > > regression=# alter table s2.tx rename to t1; > ALTER TABLE > regression=# alter table s1.t1 rename to tx; > ALTER TABLE > regression=# \d+ v1 > View "public.v1" > Column | Type | Modifiers | Storage | Description > +-+---+-+- > f1 | integer | | plain | > View definition: > SELECT t1.f1 > FROM s1.tx t1 > WHERE (EXISTS ( SELECT 1 > FROM s2.t1 > WHERE t1.f1 = s1.t1.f1)); > > This is just plain wrong, as you'll see if you try to execute that > query: > > regression=# SELECT t1.f1 > regression-#FROM s1.tx t1 > regression-# WHERE (EXISTS ( SELECT 1 > regression(#FROM s2.t1 > regression(# WHERE t1.f1 = s1.t1.f1)); > ERROR: invalid reference to FROM-clause entry for table "t1" > LINE 5: WHERE t1.f1 = s1.t1.f1)); >^ > HINT: There is an entry for table "t1", but it cannot be referenced > from this part of the query. > > (The HINT is a bit confused here, but the query is certainly invalid.) > > So what we have here is a potential failure to dump and reload view > definitions, which is a lot more critical in my book than whether > EXPLAIN's output is confusing. > > If we stick with the existing rule for attaching a fake alias to renamed > RTEs, I think that Ashutosh's patch or something like it is probably > appropriate, because the variable-printing code ought to be in step with > the RTE-printing code. Unfortunately, I think the hack to attach a fake > alias to renamed RTEs creates some issues of its own. Consider > >select * from s1.t1 > where exists (select 1 from s2.t2 t1 where t1.f1 = s1.t1.f1); > > If s1.t1 is now renamed to s1.tx, it is still possible to express > the same semantics: > >select * from s1.tx > where exists (select 1 from s
Re: [HACKERS] Remembering bug #6123
On Thu, Jan 12, 2012 at 4:30 PM, Tom Lane wrote: > "Kevin Grittner" writes: >> Tom Lane wrote: >>> Also, what's the point of testing update_ctid? I don't see that >>> it matters whether the outdate was a delete or an update. > >> The update_ctid code was a carry-over from my old, slightly >> different approach, which I failed to change as I should have. I'll >> fix that along with the other. > > Actually, on reflection there might be a reason for checking > update_ctid, with a view to allowing "harmless" cases. I see > these cases: > > * UPDATE finds a trigger already updated the row: must throw error > since we can't apply the update. > > * UPDATE finds a trigger already deleted the row: arguably, we could > let the deletion stand and ignore the update action. > > * DELETE finds a trigger already updated the row: must throw error > since we can't apply the delete. > > * DELETE finds a trigger already deleted the row: arguably, there's > no reason to complain. > > Don't know if that was your reasoning as well. But if it is, then again > the comment needs to cover that. Just been reading this thread a little. ISTM that seeing a SelfUpdated row on a cursor when we didn't use WHERE CURRENT OF really ought to raise some kind of exception condition like a WARNING or a NOTICE. That situation seems most likely to be a bug or at least an unintended consequence. As Tom points out, the multiple flavours of weirdness that can result even if we do fix this are not things we should do silently. I think we should do the best job we can without throwing an error, but we must make some attempt to let the developer know we did that for them so they can investigate. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby off of hot standby?
On Mon, Jan 30, 2012 at 4:36 AM, Igor Schtein wrote: > Is it possible to use a standby instance as a master/primary for another > standby in Postgres 9.0? In other words, does PG 9.0 supports cascading > standby configuration? No, but 9.2 will support that feature, known as cascading replication. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
Thank you for comments, this is revised version of the patch. The gain of performance is more than expected. Measure script now does query via dblink ten times for stability of measuring, so the figures become about ten times longer than the previous ones. sec% to Original Original : 31.5 100.0% RowProcessor patch : 31.3 99.4% dblink patch : 24.6 78.1% RowProcessor patch alone makes no loss or very-little gain, and full patch gives us 22% gain for the benchmark(*1). The modifications are listed below. - No more use of PGresAttValue for this mechanism, and added PGrowValue instead. PGresAttValue has been put back to libpq-int.h - pqAddTuple() is restored as original and new function paAddRow() to use as RowProcessor. (Previous pqAddTuple implement had been buggily mixed the two usage of PGresAttValue) - PQgetRowProcessorParam has been dropped. Contextual parameter is passed as one of the parameters of RowProcessor(). - RowProcessor() returns int (as bool, is that libpq convension?) instead of void *. (Actually, void * had already become useless as of previous patch) - PQsetRowProcessorErrMes() is changed to do strdup internally. - The callers of RowProcessor() no more set null_field to PGrowValue.value. Plus, the PGrowValue[] which RowProcessor() receives has nfields + 1 elements to be able to make rough estimate by cols->value[nfields].value - cols->value[0].value - something. The somthing here is 4 * nfields for protocol3 and 4 * (non-null fields) for protocol2. I fear that this applies only for textual transfer usage... - PQregisterRowProcessor() sets the default handler when given NULL. (pg_conn|pg_result).rowProcessor cannot be NULL for its lifetime. - initStoreInfo() and storeHandler() has been provided with malloc error handling. And more.. - getAnotherTuple()@fe-protocol2.c is not tested utterly. - The uniformity of the size of columns in the test data prevents realloc from execution in dblink... More test should be done. regards, = (*1) The benchmark is done as follows, ==test.sql select dblink_connect('c', 'host=localhost dbname=test'); select * from dblink('c', 'select a,c from foo limit 200') as (a text b bytea) limit 1; ...(repeat 9 times more) select dblink_disconnect('c'); == $ for i in $(seq 1 10); do time psql test -f t.sql; done The environment is CentOS 6.2 on VirtualBox on Core i7 965 3.2GHz # of processor 1 Allocated mem 2GB Test DB schema is Column | Type | Modifiers +---+--- a | text | b | text | c | bytea | Indexes: "foo_a_bt" btree (a) "foo_c_bt" btree (c) test=# select count(*), min(length(a)) as a_min, max(length(a)) as a_max, min(length(c)) as c_min, max(length(c)) as c_max from foo; count | a_min | a_max | c_min | c_max -+---+---+---+--- 200 |29 |29 |29 |29 (1 row) -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt index 1af8df6..5ed083c 100644 --- a/src/interfaces/libpq/exports.txt +++ b/src/interfaces/libpq/exports.txt @@ -160,3 +160,5 @@ PQconnectStartParams 157 PQping158 PQpingParams 159 PQlibVersion 160 +PQregisterRowProcessor 161 +PQsetRowProcessorErrMes 162 diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index d454538..4fe2f41 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -2692,6 +2692,8 @@ makeEmptyPGconn(void) conn->allow_ssl_try = true; conn->wait_ssl_try = false; #endif + conn->rowProcessor = pqAddRow; + conn->rowProcessorParam = NULL; /* * We try to send at least 8K at a time, which is the usual size of pipe @@ -5076,3 +5078,10 @@ PQregisterThreadLock(pgthreadlock_t newhandler) return prev; } + +void +PQregisterRowProcessor(PGconn *conn, RowProcessor func, void *param) +{ + conn->rowProcessor = (func ? func : pqAddRow); + conn->rowProcessorParam = param; +} diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c index b743566..82914fd 100644 --- a/src/interfaces/libpq/fe-exec.c +++ b/src/interfaces/libpq/fe-exec.c @@ -66,6 +66,7 @@ static PGresult *PQexecFinish(PGconn *conn); static int PQsendDescribe(PGconn *conn, char desc_type, const char *desc_target); static int check_field_number(const PGresult *res, int field_num); +static int pqAddTuple(PGresult *res, PGresAttValue *tup); /* @@ -160,6 +161,9 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status) result->curBlock = NULL; result->curOffset = 0; result->spaceLeft = 0; + result->rowProcessor = pqAddRow; + result->rowProcessorParam = NULL; + result->rowProcessorErrMes = NULL; if (conn) { @@ -194,6 +198,10 @@ PQmakeEmptyP
Re: [HACKERS] Patch: Allow SQL-language functions to reference parameters by parameter name
On Sun, Jan 29, 2012 at 1:08 AM, Matthew Draper wrote: > On 25/01/12 18:37, Hitoshi Harada wrote: >>> I'm still not sure whether to just revise (almost) all the SQL function >>> examples to use parameter names, and declare them the "right" choice; as >>> it's currently written, named parameters still seem rather second-class. >> >> Agreed. > > I'll try a more comprehensive revision of the examples. > >> The patch seems ok, except an example I've just found. >> >> db1=# create function t(a int, t t) returns int as $$ select t.a $$ >> language sql; >> CREATE FUNCTION >> db1=# select t(0, row(1, 2)::t); >> t >> --- >> 1 >> (1 row) >> >> Should we throw an error in such ambiguity? Or did you make it happen >> intentionally? If latter, we should also mention the rule in the >> manual. > > > I did consider it, and felt it was the most consistent: > > # select t.x, t, z from (select 1) t(x), (select 2) z(t); > x | t | z > ---+---+- > 1 | 2 | (2) > (1 row) > > > I haven't yet managed to find the above behaviour described in the > documentation either, though. To me, it feels like an obscure corner > case, whose description would leave the rules seeming more complicated > than they generally are. Makes sense to me. I marked this as Ready for committer. Thanks, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers