Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
On Fri, Oct 17, 2014 at 10:37 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Fri, Oct 17, 2014 at 9:23 PM, Fujii Masao masao.fu...@gmail.com wrote: In this case, the patch seems to make the restartpoint recycle even WAL files which have .ready files and will have to be archived later. Thought? The real problem currently is that it is possible to have a segment file not marked as .done during recovery when stream connection is abruptly cut when this segment is switched, marking it as .ready in archive_status and simply letting this segment in pg_xlog because it will neither be recycled nor removed. I have not been able to look much at this code these days, so I am not sure how invasive it would be in back-branches, but perhaps we should try to improve code such as when a segment file is switched and connection to the is cut, we guarantee that this file is completed and marked as .done. I have spent more time on that, with a bit more of underground work... First, the problem can be reproduced most of the time by running this simple command: psql -c 'select pg_switch_xlog()'; pg_ctl restart -m immediate This will enforce a segment file switch and restart the master in crash recovery. This has as effect to immediately cut the WAL stream on slave, symbolized by a FATAL in libpqrcv_receive where rawlen == 0. For example, let's imagine that stream fails when switching from 00010003 to the next segment, then the the last XLogRecPtr in XLogWalRcvProcessMsg for dataStart is for example 0/310, and walrcv-latestWalEnd is 0/400. When stream restarts it will begin once again from 0/400, ignoring that 00010003 should be marked as .done, ultimately marking it in .ready state when old segment files are recycled or removed. There is nothing that can really be done to enforce the creation of a .done file before the FATAL of libpqrcv_receive because we cannot predict the stream failure.. Now, we can do better than what we have now by looking at WAL start position used when starting streaming in WAL receiver and enforce .done if the start position is the last one of previous segment. Hence, in the case of start position 0/400 that was found previously, the file that will be enforced to .done is 00010003. I have written the patch attached that implements this idea and fixes the problem. Now let's see if you guys see any flaws in this simple logic which uses a sniper gun instead of a bazooka as in the previous patches sent. Regards, -- Michael From ce7e1eec97dbe7b1648b4f56a1f9825eeba7ebed Mon Sep 17 00:00:00 2001 From: Michael Paquier mpaqu...@vmware.com Date: Mon, 20 Oct 2014 14:40:37 +0900 Subject: [PATCH] Mark segment file .done for last WAL position at stream start When stream connection between a standby and its root node is unstable, WAL stream errors make this standby node fail with FATAL errors because of the WAL receiver, forcing it to restart in crash-recovery mode. Now, when the crash occurs exactly when a switch to a new segment file is done, it is possible that the WAL receiver restarts from a position located on the next segment file, while the previous file is let as is, marked ultimately in .ready state once old WAL files are recycled or removed. Note that this file should have been marked as .done by the WAL receiver itself. With this patch, WAL receiver checks for the presence of the previous segment file of start position if this WAL position is the last one of the previous segment file and enforces it to .done, preventing .ready files from being accumulated on standbys. This failure can be easily reproducible with the following command: psql -c 'select pg_switch_xlog()'; pg_ctl restart -m immediate --- src/backend/access/transam/xlogarchive.c | 26 ++ src/backend/replication/walreceiver.c| 12 src/include/access/xlog_internal.h | 1 + 3 files changed, 39 insertions(+) diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c index 047efa2..25a153f 100644 --- a/src/backend/access/transam/xlogarchive.c +++ b/src/backend/access/transam/xlogarchive.c @@ -553,6 +553,32 @@ XLogArchiveNotifySeg(XLogSegNo segno) XLogArchiveNotify(xlog); } + +/* + * XLogArchiveForceDoneIfPresent + * + * Wrapper of XLogArchiveForceDone, used conditionally based on the presence + * of given XLOG segment file. + */ +void +XLogArchiveForceDoneIfPresent(TimeLineID tli, XLogSegNo segno) +{ + struct stat stat_buf; + char xlogfname[MAXFNAMELEN]; + char xlogfpath[MAXPGPATH]; + + XLogFilePath(xlogfpath, tli, segno); + + /* File is missing, nothing to do */ + if (stat(xlogfpath, stat_buf) != 0) + return; + + /* All is fine, process... */ + XLogFileName(xlogfname, tli, segno); + XLogArchiveForceDone(xlogfname); +} + + /* * XLogArchiveForceDone * diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index
Re: [HACKERS] Typos in comments
On Mon, Oct 20, 2014 at 6:54 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: I ran into a type a a in a comment in snapmgr.c, and found that there are four other places that've made the same typo, by the grep command. And in one of those places, I found yet another typo. Please find attached a patch. Just while on this topic, I had a quick look at the results from the regex \b(\w+)\s+\1\b which finds duplicate words. After sifting through the false positives I found a few more. Patch attached. Regards David Rowley diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 384c9b6..385d18b 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -574,7 +574,7 @@ StartReplication(StartReplicationCmd *cmd) * to find the requested WAL segment in pg_xlog. * * XXX: we could be more strict here and only allow a startpoint -* that's older than the switchpoint, if it it's still in the same +* that's older than the switchpoint, if it's still in the same * WAL segment. */ if (!XLogRecPtrIsInvalid(switchpoint) diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c index 89b22f2..c6c90fb 100644 --- a/src/bin/pg_basebackup/receivelog.c +++ b/src/bin/pg_basebackup/receivelog.c @@ -753,7 +753,7 @@ ReadEndOfStreamingResult(PGresult *res, XLogRecPtr *startpos, uint32 *timeline) * initiating streaming with the START_STREAMING command. * * If the COPY ends (not necessarily successfully) due a message from the - * server, returns a PGresult and sets sets *stoppos to the last byte written. + * server, returns a PGresult and sets *stoppos to the last byte written. * On any other sort of error, returns NULL. */ static PGresult * diff --git a/src/interfaces/ecpg/pgtypeslib/timestamp.c b/src/interfaces/ecpg/pgtypeslib/timestamp.c index b0f9bf1..cf1fed2 100644 --- a/src/interfaces/ecpg/pgtypeslib/timestamp.c +++ b/src/interfaces/ecpg/pgtypeslib/timestamp.c @@ -419,7 +419,7 @@ dttofmtasc_replace(timestamp * ts, date dDate, int dow, struct tm * tm, replace_val.str_val = months[tm-tm_mon]; replace_type = PGTYPES_TYPE_STRING_CONSTANT; break; - /* the full name name of the month */ + /* the full name of the month */ /* XXX should be locale aware */ case 'B': replace_val.str_val = pgtypes_date_months[tm-tm_mon]; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] agent_init concurrent running question
hello,I am reading the pool manager code,and have some problem: in poolmgr.c function agent_init(): if (agent-pool) agent_release_connections(agent, false); .. #ifdef XCP /* find database */ agent-pool = find_database_pool(database, user_name); /* create if not found */ if (agent-pool == NULL) agent-pool = create_database_pool(database, user_name); in concurrent running,this function with the same dbname and username,create_database_pool(database, user_name) may be run twice,so there may be more than one DatabasePool in the databasePools list? But find_database_pool(database, user_name) will always return the first find,other may be not used,there may be memory leak? and why this function call agent_release_connections,and then use find_database_pool to get a new pool? For concurrent running,is it possible one agent release the connections which other agent still is in use? thanks! Jov blog: http:amutu.com/blog http://amutu.com/blog
Re: [HACKERS] alter user/role CURRENT_USER
I gone through patch and here is the review for this patch: .) patch go applied on master branch with patch -p1 command (git apply failed) .) regression make check run fine .) testcase coverage is missing in the patch .) Over all coding/patch looks good. Few comments: 1) Any particular reason for not adding SESSION_USER/USER also made usable for this command ? 2) I think RoleId_or_curruser can be used for following role: /* ALTER TABLE name OWNER TO RoleId */ | OWNER TO RoleId 3) In the documentation patch, added for CURRENT_USER but CURRENT_ROLE is missing. On Fri, Oct 10, 2014 at 1:57 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Hello, on the way considering alter role set .., I found that ALTER ROLE/USER cannot take CURRENT_USER as the role name. In addition to that, documents of ALTER ROLE / USER are inconsistent with each other in spite of ALTER USER is said to be an alias for ALTER ROLE. Plus, ALTER USER cannot take ALL as user name although ALTER ROLE can. This patch does following things, - ALTER USER/ROLE now takes CURRENT_USER as user name. - Rewrite sysnopsis of the documents for ALTER USER and ALTER ROLE so as to they have exactly same syntax. - Modify syntax of ALTER USER so as to be an alias of ALTER ROLE. - Added CURRENT_USER/CURRENT_ROLE syntax to both. - Added ALL syntax as user name to ALTER USER. - Added IN DATABASE syntax to ALTER USER. x Integrating ALTER ROLE/USER syntax could not be done because of shift/reduce error of bison. x This patch contains no additional regressions. Is it needed? SESSION_USER/USER also can be made usable for this command, but this patch doesn't so (yet). regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Rushabh Lathia
[HACKERS] Refactor status detection for XLOG segment file in xlogarchive.c (Was: *FF WALs under 9.2)
Hi all, Except if I am missing something, is there any reason why the sequence used in XLogArchiveCheckDone and XLogArchiveIsBusy to check if a XLOG segment has been already archived is duplicated? I guess that doing a little bit of refactoring here would make sense for simplicity, patch is attached. Regards, -- Michael From 9b96e5825f6f6d2cbbc5ddf9c45c6609e6c01fb7 Mon Sep 17 00:00:00 2001 From: Michael Paquier mpaqu...@vmware.com Date: Mon, 20 Oct 2014 16:33:26 +0900 Subject: [PATCH] Refactor archive status analysis into a single function The same code block was used for XLogArchiveIsBusy and XLogArchiveCheckDone to check the archive status of a given segment... --- src/backend/access/transam/xlogarchive.c | 50 +--- src/include/access/xlog_internal.h | 1 + 2 files changed, 27 insertions(+), 24 deletions(-) diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c index 047efa2..047e8f8 100644 --- a/src/backend/access/transam/xlogarchive.c +++ b/src/backend/access/transam/xlogarchive.c @@ -607,12 +607,7 @@ XLogArchiveForceDone(const char *xlog) } /* - * XLogArchiveCheckDone - * - * This is called when we are ready to delete or recycle an old XLOG segment - * file or backup history file. If it is okay to delete it then return true. - * If it is not time to delete it, make sure a .ready file exists, and return - * false. + * XLogArchiveIsDone * * If XLOG.done exists, then return true; else if XLOG.ready exists, * then return false; else create XLOG.ready and return false. @@ -621,15 +616,11 @@ XLogArchiveForceDone(const char *xlog) * create XLOG.ready fails, we'll retry during subsequent checkpoints. */ bool -XLogArchiveCheckDone(const char *xlog) +XLogArchiveIsDone(const char *xlog) { char archiveStatusPath[MAXPGPATH]; struct stat stat_buf; - /* Always deletable if archiving is off */ - if (!XLogArchivingActive()) - return true; - /* First check for .done --- this means archiver is done with it */ StatusFilePath(archiveStatusPath, xlog, .done); if (stat(archiveStatusPath, stat_buf) == 0) @@ -645,6 +636,28 @@ XLogArchiveCheckDone(const char *xlog) if (stat(archiveStatusPath, stat_buf) == 0) return true; + return false; +} + +/* + * XLogArchiveCheckDone + * + * This is called when we are ready to delete or recycle an old XLOG segment + * file or backup history file. If it is okay to delete it then return true. + * If it is not time to delete it, make sure a .ready file exists, and return + * false. + */ +bool +XLogArchiveCheckDone(const char *xlog) +{ + /* Always deletable if archiving is off */ + if (!XLogArchivingActive()) + return true; + + /* Check if segment is marked as .done */ + if (XLogArchiveIsDone(xlog)) + return true; + /* Retry creation of the .ready file */ XLogArchiveNotify(xlog); return false; @@ -666,19 +679,8 @@ XLogArchiveIsBusy(const char *xlog) char archiveStatusPath[MAXPGPATH]; struct stat stat_buf; - /* First check for .done --- this means archiver is done with it */ - StatusFilePath(archiveStatusPath, xlog, .done); - if (stat(archiveStatusPath, stat_buf) == 0) - return false; - - /* check for .ready --- this means archiver is still busy with it */ - StatusFilePath(archiveStatusPath, xlog, .ready); - if (stat(archiveStatusPath, stat_buf) == 0) - return true; - - /* Race condition --- maybe archiver just finished, so recheck */ - StatusFilePath(archiveStatusPath, xlog, .done); - if (stat(archiveStatusPath, stat_buf) == 0) + /* Check that segment is not yet marked as .done */ + if (XLogArchiveIsDone(xlog)) return false; /* diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h index 27b9899..6d1a66c 100644 --- a/src/include/access/xlog_internal.h +++ b/src/include/access/xlog_internal.h @@ -287,6 +287,7 @@ extern void KeepFileRestoredFromArchive(char *path, char *xlogfname); extern void XLogArchiveNotify(const char *xlog); extern void XLogArchiveNotifySeg(XLogSegNo segno); extern void XLogArchiveForceDone(const char *xlog); +extern bool XLogArchiveIsDone(const char *xlog); extern bool XLogArchiveCheckDone(const char *xlog); extern bool XLogArchiveIsBusy(const char *xlog); extern void XLogArchiveCleanup(const char *xlog); -- 2.1.2 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] agent_init concurrent running question
On Mon, Oct 20, 2014 at 4:29 PM, Jov am...@amutu.com wrote: in poolmgr.c function agent_init(): This code is part of Postgres-XC (or Postgres-XL) and not PostgreSQL itself, you should send your questions there: https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers Regards, -- Michael
Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
On Fri, Oct 17, 2014 at 10:12 PM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Oct 17, 2014 at 9:23 PM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Oct 9, 2014 at 3:26 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Wed, Oct 8, 2014 at 10:00 PM, Michael Paquier michael.paqu...@gmail.com wrote: The additional process at promotion sounds like a good idea, I'll try to get a patch done tomorrow. This would result as well in removing the XLogArchiveForceDone stuff. Either way, not that I have been able to reproduce the problem manually, things can be clearly solved. Please find attached two patches aimed to fix this issue and to improve the situation: - 0001 prevents the apparition of those phantom WAL segment file by ensuring that when a node is in recovery it will remove it whatever its status in archive_status. This patch is the real fix, and should be applied down to 9.2. - 0002 is a patch implementing Heikki's idea of enforcing all the segment files present in pg_xlog to have their status to .done, marking them for removal. When looking at the code, I finally concluded that Fujii-san's point, about marking in all cases as .done segment files that have been fully streamed, actually makes more sense to not be backward. This patch would actually not be mandatory for back-patching, but it makes the process more robust IMO. Thanks for the patches! While reviewing the patch, I found another bug related to WAL file in recovery mode. The problem is that exitArchiveRecovery() always creates .ready file for the last WAL file of the old timeline even when it's restored from the archive and has .done file. So this causes the already-archived WAL file to be archived again Attached patch fixes this bug. That's a good catch! Patch looks good. I think that you should change xlogpath to use MAXFNAMELEN instead of MAXPGPATH in exitArchiveRecovery. This is only for correctness, so that's a master-only remark, because this variable is used only to calculate a segment file name, and not a path. Renaming the variable from xlogpath to xlogname would make sense as well. Regards, -- Michael
Re: [HACKERS] Perfomance degradation 9.3 (vs 9.2) for FreeBSD
Hello, How did this testing turn out? Palle 3 jul 2014 kl. 12:15 skrev Tatsuo Ishii is...@postgresql.org: Hi, Hi, Attached you can find a short (compile tested only ) patch implementing a 'shared_memory_type' GUC, akin to 'dynamic_shared_memory_type'. Will only apply to 9.4, not 9.3, but it should be easy to convert for it. Greetings, Andres Freund I have rebased Andres's patch against 9.3-STABLE tree. Please take a look at attached patches and let me know if you find anything strange. I am going to test the patch on a huge HP machine: DL980G7/64 cores/2TB mem. With the patch I would be able to report back if using SysV shared mem fixes the 9.3 performance problem. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c index f746c81..e82054a 100644 --- a/src/backend/port/sysv_shmem.c +++ b/src/backend/port/sysv_shmem.c @@ -72,6 +72,7 @@ static void IpcMemoryDetach(int status, Datum shmaddr); static void IpcMemoryDelete(int status, Datum shmId); static PGShmemHeader *PGSharedMemoryAttach(IpcMemoryKey key, IpcMemoryId *shmid); +static void *CreateAnonymousSegment(Size *size); /* @@ -389,49 +390,19 @@ PGSharedMemoryCreate(Size size, bool makePrivate, int port) * developer use, this shouldn't be a big problem. */ #ifndef EXEC_BACKEND + if (shared_memory_type == SHMEM_TYPE_MMAP) { - longpagesize = sysconf(_SC_PAGE_SIZE); - - /* - * Ensure request size is a multiple of pagesize. - * - * pagesize will, for practical purposes, always be a power of two. - * But just in case it isn't, we do it this way instead of using - * TYPEALIGN(). - */ - if (pagesize 0 size % pagesize != 0) - size += pagesize - (size % pagesize); - - /* - * We assume that no one will attempt to run PostgreSQL 9.3 or later - * on systems that are ancient enough that anonymous shared memory is - * not supported, such as pre-2.4 versions of Linux. If that turns - * out to be false, we might need to add a run-time test here and do - * this only if the running kernel supports it. - */ - AnonymousShmem = mmap(NULL, size, PROT_READ | PROT_WRITE, PG_MMAP_FLAGS, - -1, 0); - if (AnonymousShmem == MAP_FAILED) - { - int saved_errno = errno; - - ereport(FATAL, - (errmsg(could not map anonymous shared memory: %m), - (saved_errno == ENOMEM) ? - errhint(This error usually means that PostgreSQL's request - for a shared memory segment exceeded available memory - or swap space. To reduce the request size (currently - %lu bytes), reduce PostgreSQL's shared memory usage, - perhaps by reducing shared_buffers or - max_connections., - (unsigned long) size) : 0)); - } + AnonymousShmem = CreateAnonymousSegment(size); AnonymousShmemSize = size; - /* Now we need only allocate a minimal-sized SysV shmem block. */ sysvsize = sizeof(PGShmemHeader); } + else #endif + { + Assert(shared_memory_type == SHMEM_TYPE_SYSV); + sysvsize = size; + } /* Make sure PGSharedMemoryAttach doesn't fail without need */ UsedShmemSegAddr = NULL; @@ -631,3 +602,47 @@ PGSharedMemoryAttach(IpcMemoryKey key, IpcMemoryId *shmid) return hdr; } + +/* + * Creates an anonymous mmap()ed shared memory segment. + * + * Pass the requested size in *size. This function will modify *size to the + * actual size of the allocation, if it ends up allocating a segment that is + * larger than requested. + */ +#ifndef EXEC_BACKEND +static void * +CreateAnonymousSegment(Size *size) +{ + Sizeallocsize = *size; + void *ptr = MAP_FAILED; + int mmap_errno = 0; + + /* + * use the original size, not the rounded up value, when falling back + * to non-huge pages. + */ + allocsize = *size; + ptr = mmap(NULL, allocsize, PROT_READ | PROT_WRITE, +PG_MMAP_FLAGS, -1, 0); + mmap_errno = errno; + + if (ptr ==
Re: [HACKERS] Proposal : REINDEX SCHEMA
On Mon, Oct 20, 2014 at 9:41 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Sun, Oct 19, 2014 at 10:37 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Sun, Oct 19, 2014 at 1:02 PM, Sawada Masahiko sawada.m...@gmail.com wrote: On Fri, Oct 17, 2014 at 4:32 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Wed, Oct 15, 2014 at 11:41 AM, Sawada Masahiko sawada.m...@gmail.com wrote: On Mon, Oct 13, 2014 at 11:16 PM, Robert Haas robertmh...@gmail.com wrote: On Sun, Oct 12, 2014 at 1:27 PM, Stephen Frost sfr...@snowman.net wrote: * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: Sawada Masahiko wrote: Attached WIP patch adds new syntax REINEX SCHEMA which does reindexing all table of specified schema. There are syntax dose reindexing specified index, per table and per database, but we can not do reindexing per schema for now. It seems doubtful that there really is much use for this feature, but if there is, I think a better syntax precedent is the new ALTER TABLE ALL IN TABLESPACE thingy, rather than your proposed REINDEX SCHEMA. Something like REINDEX TABLE ALL IN SCHEMA perhaps. Yeah, I tend to agree that we should be looking at the 'ALL IN TABLESPACE' and 'ALL IN SCHEMA' type of commands to keep things consistent. This might be an alternative for the vacuum / analyze / reindex database commands also.. Urgh. I don't have a problem with that syntax in general, but it clashes pretty awfully with what we're already doing for REINDEX otherwise. Attached patches are latest version patch. Ok. I changed syntax to REINDEX ALL IN SCHEMA, but I felt a sense of discomfort a little as Robert mentioned. I understood, but the real problem will in a near future when the features will be pushed... :-) They are separated features, but maybe we can join this features to a one future commit... it's just an idea. Anyway, you can apply these patches in numerical order, can use REINDEX ALL IN SCHEMA feature and -S/--schema option in reindexdb. 000_reindex_all_in_schema_v2.patch : It contains REINDEX ALL IN SCHEMA feature 1) Compile without warnings 2) IMHO you can add more test cases to better code coverage: * reindex a schema that doesn't exists * try to run reindex all in schema inside a transaction block 3) Isn't enough just? bool do_database = (kind == OBJECT_DATABASE); ... instead of... + bool do_database = (kind == OBJECT_DATABASE) ? true : false; 4) IMHO you can add other Assert to check valid relkinds, like: Assert(kind == OBJECT_DATABASE || kind == OBJECT_SCHEMA); 5) I think is more legible: /* Get OID of object for result */ if (do_database) objectOid = MyDatabaseId else objectOid = get_namespace_oid(objectName, false); ... insead of ... + /* Get OID of object for result */ + objectOid = (do_database) ? MyDatabaseId : get_namespace_oid(objectName, false); 001_Add_schema_option_to_reindexdb_v1.patch : It contains reindexdb -S/--schema supporting The code itself is good for me, but IMHO you can add test cases to src/bin/scripts/t/090_reindexdb.pl Thank you for reviewing. You're welcome! I agree 2) - 5). :-) Attached patch is latest version patch I modified above. All is fine to me now... all work as expected and no compiler warnings. There are just a little fix to do in src/bin/scripts/t/090_reindexdb.pl -use Test::More tests = 7; +use Test::More tests = 8; Because you added a new testcase to suittest, so you need to increase the test count at beginning of the file. Patch attached. Now the regress run without errors. Thank you for reviewing and revising! I also did successfully. It looks good to me :) Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : REINDEX SCHEMA
On Mon, Oct 20, 2014 at 11:24 AM, Sawada Masahiko sawada.m...@gmail.com wrote: On Mon, Oct 20, 2014 at 9:41 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Sun, Oct 19, 2014 at 10:37 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Sun, Oct 19, 2014 at 1:02 PM, Sawada Masahiko sawada.m...@gmail.com wrote: On Fri, Oct 17, 2014 at 4:32 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Wed, Oct 15, 2014 at 11:41 AM, Sawada Masahiko sawada.m...@gmail.com wrote: On Mon, Oct 13, 2014 at 11:16 PM, Robert Haas robertmh...@gmail.com wrote: On Sun, Oct 12, 2014 at 1:27 PM, Stephen Frost sfr...@snowman.net wrote: * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: Sawada Masahiko wrote: Attached WIP patch adds new syntax REINEX SCHEMA which does reindexing all table of specified schema. There are syntax dose reindexing specified index, per table and per database, but we can not do reindexing per schema for now. It seems doubtful that there really is much use for this feature, but if there is, I think a better syntax precedent is the new ALTER TABLE ALL IN TABLESPACE thingy, rather than your proposed REINDEX SCHEMA. Something like REINDEX TABLE ALL IN SCHEMA perhaps. Yeah, I tend to agree that we should be looking at the 'ALL IN TABLESPACE' and 'ALL IN SCHEMA' type of commands to keep things consistent. This might be an alternative for the vacuum / analyze / reindex database commands also.. Urgh. I don't have a problem with that syntax in general, but it clashes pretty awfully with what we're already doing for REINDEX otherwise. Attached patches are latest version patch. Ok. I changed syntax to REINDEX ALL IN SCHEMA, but I felt a sense of discomfort a little as Robert mentioned. I understood, but the real problem will in a near future when the features will be pushed... :-) They are separated features, but maybe we can join this features to a one future commit... it's just an idea. Anyway, you can apply these patches in numerical order, can use REINDEX ALL IN SCHEMA feature and -S/--schema option in reindexdb. 000_reindex_all_in_schema_v2.patch : It contains REINDEX ALL IN SCHEMA feature 1) Compile without warnings 2) IMHO you can add more test cases to better code coverage: * reindex a schema that doesn't exists * try to run reindex all in schema inside a transaction block 3) Isn't enough just? bool do_database = (kind == OBJECT_DATABASE); ... instead of... + bool do_database = (kind == OBJECT_DATABASE) ? true : false; 4) IMHO you can add other Assert to check valid relkinds, like: Assert(kind == OBJECT_DATABASE || kind == OBJECT_SCHEMA); 5) I think is more legible: /* Get OID of object for result */ if (do_database) objectOid = MyDatabaseId else objectOid = get_namespace_oid(objectName, false); ... insead of ... + /* Get OID of object for result */ + objectOid = (do_database) ? MyDatabaseId : get_namespace_oid(objectName, false); 001_Add_schema_option_to_reindexdb_v1.patch : It contains reindexdb -S/--schema supporting The code itself is good for me, but IMHO you can add test cases to src/bin/scripts/t/090_reindexdb.pl Thank you for reviewing. You're welcome! I agree 2) - 5). :-) Attached patch is latest version patch I modified above. All is fine to me now... all work as expected and no compiler warnings. There are just a little fix to do in src/bin/scripts/t/090_reindexdb.pl -use Test::More tests = 7; +use Test::More tests = 8; Because you added a new testcase to suittest, so you need to increase the test count at beginning of the file. Patch attached. Now the regress run without errors. Thank you for reviewing and revising! You're welcome! I also did successfully. It looks good to me :) Changed status to Ready for commiter. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello
Re: [HACKERS] Typos in comments
On Mon, Oct 20, 2014 at 3:04 AM, David Rowley dgrowle...@gmail.com wrote: On Mon, Oct 20, 2014 at 6:54 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: I ran into a type a a in a comment in snapmgr.c, and found that there are four other places that've made the same typo, by the grep command. And in one of those places, I found yet another typo. Please find attached a patch. Just while on this topic, I had a quick look at the results from the regex \b(\w+)\s+\1\b which finds duplicate words. After sifting through the false positives I found a few more. Patch attached. Thanks. I have committed both patches. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Would you help to review our modifications
Dear All, L've got a question for you. If the value of the integer type is converted to a value of type Boolean,PostgreSQL will display just likeThe rule has already exist; 1. CREATE CAST (integer AS bool) WITH INOUT AS IMPLICIT; ERROR: cast from type integer to type boolean already exists If I want to drop the rule,will be told likeFrom integer to boolean type rules may not be dropped; 2. DROP CAST (integer AS bool) ; ERROR: cannot drop cast from integer to boolean because it is required by the database system Since the rule has already exist,Why execute the following methods will be in error? 3. insert into testbool(p1,p2)values(20,0::integer); ERROR: column p2 is of type boolean but expression is of type integer So how to deal with this kind of situation if I want a implicit conversion? Best Regards, rohtodeveloper
Re: [HACKERS] Wrong filename in comment
Marko Tiikkaja ma...@joh.to writes: Commit 32984d8fc3dbb90a3fafb69fece0134f1ea790f9 forgot to change the filename in the comment in contrib/pgcrypto/pgcrypto--1.2.sql. Trivial patch attached. Pushed, thanks. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
Noah Misch n...@leadboat.com writes: I reproduced narwhal's problem using its toolchain on another 32-bit Windows Server 2003 system. The crash happens at the SHGetFolderPath() call in pqGetHomeDirectory(). A program can acquire that function via shfolder.dll or via shell32.dll; we've used the former method since commit 889f038, for better compatibility[1] with Windows NT 4.0. On this system, shfolder.dll's version loads and unloads shell32.dll. In PostgreSQL built using this older compiler, shfolder.dll:SHGetFolderPath() unloads libpq in addition to unloading shell32! That started with commit 846e91e. Thanks for doing the detective work on this! I don't expect to understand the mechanism behind it, but I recommend we switch back to linking libpq with shell32.dll. The MSVC build already does that in all supported branches, and it feels right for the MinGW build to follow suit in 9.4+. Windows versions that lack the symbol in shell32.dll are now ancient history. This is basically reverting 889f038, right? It looks to me like we made that change only to support NT4, which was obsolete even in 2005, so no objection from me. Magnus might have a different idea though. I happened to try the same contrib/dblink test suite on PostgreSQL built with modern MinGW-w64 (i686-4.9.1-release-win32-dwarf-rt_v3-rev1). That, too, gave a crash-like symptom starting with commit 846e91e. Ick. Passing -static-libgcc to the link restores the libgcc situation as it stood before commit 846e91e. The main beneficiary of shared libgcc is C++/Java exception handling, so PostgreSQL doesn't care. No doubt there's some deeper bug in libgcc or in PostgreSQL; loading a module that links with shared libgcc should not disrupt exit(). I'm content with this workaround. Works for me too, until such time as somebody feels like digging deeper. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On Mon, Oct 20, 2014 at 11:06:54AM -0400, Tom Lane wrote: Noah Misch n...@leadboat.com writes: I don't expect to understand the mechanism behind it, but I recommend we switch back to linking libpq with shell32.dll. The MSVC build already does that in all supported branches, and it feels right for the MinGW build to follow suit in 9.4+. Windows versions that lack the symbol in shell32.dll are now ancient history. This is basically reverting 889f038, right? Mostly so. Keeping the SHGetSpecialFolderPath() - SHGetFolderPath() part, but reverting the shell32.dll - shfolder.dll part. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alter user/role CURRENT_USER
Kyotaro, Food for thought. Couldn't you reduce the following block: + if (strcmp(stmt-role, current_user) == 0) + { + roleid = GetUserId(); + tuple = SearchSysCache1(AUTHOID, ObjectIdGetDatum(roleid)); + if (!HeapTupleIsValid(tuple)) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg(roleid %d does not exist, roleid))); + } + else + { + tuple = SearchSysCache1(AUTHNAME, PointerGetDatum(stmt-role)); + if (!HeapTupleIsValid(tuple)) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg(role \%s\ does not exist, stmt-role))); To: if (strcmp(stmt-role, current_user) == 0) roleid = GetUserId(); else roleid = get_role_oid(stmt-role, false); tuple = SearchSysCache1(AUTHOID, ObjectIdGetDatum(roleid)); if (!HeapTupleIsValid(tuple)) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_OBJECT), errmsg(roleid %d does not exist, roleid))); I think this makes it a bit cleaner. It also reuses existing code as 'get_role_oid()' already does a valid role name check and will raise the proper error. -Adam On Mon, Oct 20, 2014 at 3:40 AM, Rushabh Lathia rushabh.lat...@gmail.com wrote: I gone through patch and here is the review for this patch: .) patch go applied on master branch with patch -p1 command (git apply failed) .) regression make check run fine .) testcase coverage is missing in the patch .) Over all coding/patch looks good. Few comments: 1) Any particular reason for not adding SESSION_USER/USER also made usable for this command ? 2) I think RoleId_or_curruser can be used for following role: /* ALTER TABLE name OWNER TO RoleId */ | OWNER TO RoleId 3) In the documentation patch, added for CURRENT_USER but CURRENT_ROLE is missing. On Fri, Oct 10, 2014 at 1:57 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Hello, on the way considering alter role set .., I found that ALTER ROLE/USER cannot take CURRENT_USER as the role name. In addition to that, documents of ALTER ROLE / USER are inconsistent with each other in spite of ALTER USER is said to be an alias for ALTER ROLE. Plus, ALTER USER cannot take ALL as user name although ALTER ROLE can. This patch does following things, - ALTER USER/ROLE now takes CURRENT_USER as user name. - Rewrite sysnopsis of the documents for ALTER USER and ALTER ROLE so as to they have exactly same syntax. - Modify syntax of ALTER USER so as to be an alias of ALTER ROLE. - Added CURRENT_USER/CURRENT_ROLE syntax to both. - Added ALL syntax as user name to ALTER USER. - Added IN DATABASE syntax to ALTER USER. x Integrating ALTER ROLE/USER syntax could not be done because of shift/reduce error of bison. x This patch contains no additional regressions. Is it needed? SESSION_USER/USER also can be made usable for this command, but this patch doesn't so (yet). regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Rushabh Lathia -- Adam Brightwell - adam.brightw...@crunchydatasolutions.com Database Engineer - www.crunchydatasolutions.com
Re: [HACKERS] Proposal: Log inability to lock pages during vacuum
On Mon, Oct 20, 2014 at 2:57 AM, Jim Nasby jim.na...@bluetreble.com wrote: Currently, a non-freeze vacuum will punt on any page it can't get a cleanup lock on, with no retry. Presumably this should be a rare occurrence, but I think it's bad that we just assume that and won't warn the user if something bad is going on. My thought is that if we skip any pages elog(LOG) how many we skipped. If we skip more than 1% of the pages we visited (not relpages) then elog(WARNING) instead. Is there some specific failure you've run into where a page was stuck in a pinned state and never got vacuumed? I would like to see a more systematic way of going about this. What LSN or timestamp is associated with the oldest unvacuumed page? How many times have we tried to visit it? What do those numbers look like overall -- i.e. what's the median number of times it takes to vacuum a page and what does the distribution look like of the unvacuumed ages? With that data it should be possible to determine if the behaviour is actually working well and where to draw the line to determine outliers that might represent bugs. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Trailing comma support in SELECT statements
On Oct 18, 2014, at 7:06 PM, Jim Nasby jim.na...@bluetreble.com wrote: Yes. The only case I can think of where we wouldn't want this is COPY. BTW, this should also apply to delimiters other than commas; for example, some geometry types use ; as a delimiter between points. I don’t think it should apply to the internals of types, necessarily. JSON, for example, always dies on an trailing comma, so should probably stay that way. Well, maybe allow it on JSONB input, but not JSON. Though we perhaps don’t want their behaviors to diverge. D smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Would you help to review our modifications
rohtodeveloper wrote So how to deal with this kind of situation if I want a implicit conversion? As of the out-of-support 8.3 release many of the implicit casts previously defined have been changed to explicit casts. It is a catalog change - obviously, since you can still define implicit casts - so if you absolutely must have the pre-existing cast be implicit you can modify the catalog directly. You may wish to describe why you think this is the solution you need - with implicit casting there are generally more downsides that upsides. Dave -- View this message in context: http://postgresql.1045698.n5.nabble.com/Would-you-help-to-review-our-modifications-tp5823666p5823679.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Trailing comma support in SELECT statements
On 10/20/2014 11:59 AM, David E. Wheeler wrote: On Oct 18, 2014, at 7:06 PM, Jim Nasby jim.na...@bluetreble.com wrote: Yes. The only case I can think of where we wouldn't want this is COPY. BTW, this should also apply to delimiters other than commas; for example, some geometry types use ; as a delimiter between points. I don’t think it should apply to the internals of types, necessarily. JSON, for example, always dies on an trailing comma, so should probably stay that way. Well, maybe allow it on JSONB input, but not JSON. Though we perhaps don’t want their behaviors to diverge. The JSON spec is quite clear on this. Leading and trailing commas are not allowed. I would fight tooth and nail not to allow it for json (and by implication jsonb, since they use literally the same parser - in fact we do that precisely so their input grammars can't diverge). cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Superuser connect during smart shutdown
On Sun, Oct 19, 2014 at 12:27 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: But TBH I suspect 95% of the problems here would vanish if smart shutdown weren't the default ... But for your repeated objections, we would have changed the default to fast years ago. AFAICT everyone else is in favor of that. I've certainly objected to it in the past, but I don't believe I was the only one objecting. What's your feeling now? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_basebackup fails with long tablespace paths
My Salesforce colleague Thomas Fanghaenel observed that the TAP tests for pg_basebackup fail when run in a sufficiently deeply-nested directory tree. The cause appears to be that we rely on standard tar format to represent the symlink for a tablespace, and POSIX tar format has a hard-wired restriction of 99 bytes in a symlink's expansion. What do we want to do about this? I think a minimum expectation would be for pg_basebackup to notice and complain when it's trying to create an unworkably long symlink entry, but it would be far better if we found a way to cope instead. One thing we could possibly do without reinventing tar is to avoid using absolute path names if a PGDATA-relative one would do. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] wal-size limited to 16MB - Performance issue for subsequent backup
Hi. One of our production issues is that the system generates lots of wal-files, lots is like 151952 files over the last 24h, which is about 2.4TB worth of WAL files. I wouldn't say that isn't an issue by itself, but the system does indeed work fine. We do subsequently gzip the files to limit actual disk-usage, this makes the files roughly 30-50% in size. That being said, along comes the backup, scheduled ones a day and tries to read off these wal-files, which to the backup looks like an awfull lot of small files, our backup utillized a single thread to read of those files and levels of at reading through 30-40MB/s from a 21 drive Raid50 of rotating drives, which is quite bad. That causes a daily incremental run to take in the order of 24h. Differential picking up larger deltas and full are even worse. One could of-course question a lot of the other things here, but currently the only limiting factor is actually the backup being able to keep up transporting the wal-log away from the system due to the small wal size. A short test like: find . -type f -ctime -1 | tail -n 50 | xargs cat | pipebench /dev/null confirms the backup speed to be roughly the same as seen by the backup software. Another test from the same volume doing: find . -type f -ctime -1 | tail -n 50 | xargs cat largefile And then wait for the fs to not cache the file any more and cat largefile | pipebench /dev/null confirms that the disk-subsystem can do way (150-200MB/s) better on larger files. Thread here around the same topic: http://postgresql.1045698.n5.nabble.com/How-do-you-change-the-size-of-the-WAL-files-td3425516.html But not a warm welcoming workaround. Suggestions are welcome. An archive-command/restore command that could combine/split wal-segments might be the easiest workaround, but how about crash-safeness? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wal-size limited to 16MB - Performance issue for subsequent backup
jes...@krogh.cc writes: Thread here around the same topic: http://postgresql.1045698.n5.nabble.com/How-do-you-change-the-size-of-the-WAL-files-td3425516.html But not a warm welcoming workaround. configure --with-wal-segsize=something ? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Superuser connect during smart shutdown
Robert Haas robertmh...@gmail.com writes: On Sun, Oct 19, 2014 at 12:27 PM, Tom Lane t...@sss.pgh.pa.us wrote: I've certainly objected to it in the past, but I don't believe I was the only one objecting. What's your feeling now? I'm prepared to yield on the point. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wal-size limited to 16MB - Performance issue for subsequent backup
On Mon, Oct 20, 2014 at 09:03:59PM +0200, jes...@krogh.cc wrote: Hi. One of our production issues is that the system generates lots of wal-files, lots is like 151952 files over the last 24h, which is about 2.4TB worth of WAL files. I wouldn't say that isn't an issue by itself, but the system does indeed work fine. We do subsequently gzip the files to limit actual disk-usage, this makes the files roughly 30-50% in size. ... Suggestions are welcome. An archive-command/restore command that could combine/split wal-segments might be the easiest workaround, but how about crash-safeness? Hi, Have you considered using something like tar/star in the archive command to pack them into much larger tar archives? Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wal-size limited to 16MB - Performance issue for subsequent backup
jes...@krogh.cc writes: Thread here around the same topic: http://postgresql.1045698.n5.nabble.com/How-do-you-change-the-size-of-the-WAL-files-td3425516.html But not a warm welcoming workaround. configure --with-wal-segsize=something ? Yes, but there are good reasons not to go down that route. One is that: 1) It looks like I'am going to be the only one, beware of the dragons. 2) It requires apparently a re-initdb, thus dump/restore of 4.5TB of data The latter can absolutely be done by scheduling downtime, but the former would compromise any level of good practice around our DB-operations. Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wal-size limited to 16MB - Performance issue for subsequent backup
Hi, On 2014-10-20 21:03:59 +0200, jes...@krogh.cc wrote: One of our production issues is that the system generates lots of wal-files, lots is like 151952 files over the last 24h, which is about 2.4TB worth of WAL files. I wouldn't say that isn't an issue by itself, but the system does indeed work fine. We do subsequently gzip the files to limit actual disk-usage, this makes the files roughly 30-50% in size. Have you analyzed what the source of that volume is? Which version of postgres are you using? What's your checkpoint_timeout/segments settings? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wal-size limited to 16MB - Performance issue for subsequent backup
jes...@krogh.cc writes: configure --with-wal-segsize=something ? Yes, but there are good reasons not to go down that route. One is that: 1) It looks like I'am going to be the only one, beware of the dragons. 2) It requires apparently a re-initdb, thus dump/restore of 4.5TB of data I think a clean shutdown and pg_resetxlog would be sufficient. I agree you'd need to do some testing though ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wal-size limited to 16MB - Performance issue for subsequent backup
On 2014-10-20 21:03:59 +0200, jes...@krogh.cc wrote: One of our production issues is that the system generates lots of wal-files, lots is like 151952 files over the last 24h, which is about 2.4TB worth of WAL files. I wouldn't say that isn't an issue by itself, but the system does indeed work fine. We do subsequently gzip the files to limit actual disk-usage, this makes the files roughly 30-50% in size. Have you analyzed what the source of that volume is? Which version of postgres are you using? What's your checkpoint_timeout/segments settings? Suggestions are surely welcome. I do suspect the majority is from 30 concurrent processes updating an 506GB GIN index, but it would be nice to confirm that. There is also a message-queue in the DB with a fairly high turnaround. Currently PG 9.2 moving to 9.3 hopefully before end-of-year, checkpoint_timeout = 30min, checkpoint_segments = 4096. According to logs checkpoints are roughly 15 minutes apart. Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wal-size limited to 16MB - Performance issue for subsequent backup
On 2014-10-20 21:41:26 +0200, jes...@krogh.cc wrote: On 2014-10-20 21:03:59 +0200, jes...@krogh.cc wrote: One of our production issues is that the system generates lots of wal-files, lots is like 151952 files over the last 24h, which is about 2.4TB worth of WAL files. I wouldn't say that isn't an issue by itself, but the system does indeed work fine. We do subsequently gzip the files to limit actual disk-usage, this makes the files roughly 30-50% in size. I'm a bit doubtful that 16MB vs., say, 64MB files really changes anything substantial for you. If it indeed is a problem, it's simple enough to join the files temporarily? Have you analyzed what the source of that volume is? Which version of postgres are you using? What's your checkpoint_timeout/segments settings? Suggestions are surely welcome. Once you're on 9.3 I'd suggest using pg_xlogdump --stats on it. There's a backport of the facility for 9.3 (looking somewhat different than what is now in 9.5) at http://archives.postgresql.org/message-id/CABRT9RAzGowqLFcEE8aF6VdPoFEy%2BP9gmu7ktGRzw0dgRwVr9Q%40mail.gmail.com That'd tell you a fair bit more. It's noticeably harder to backport to 9.3. I do suspect the majority is from 30 concurrent processes updating an 506GB GIN index, but it would be nice to confirm that. There is also a message-queue in the DB with a fairly high turnaround. A 506GB GIN index? Uh, interesting :). What's it used for? Trigrams? I'd suspect that the message queue isn't the primary culprit, but it's hard to say for sure. Currently PG 9.2 moving to 9.3 hopefully before end-of-year, checkpoint_timeout = 30min, checkpoint_segments = 4096. Generally a high checkpoint_timeout can significantly reduce the WAL volume because of fewer full page writes. I've seen cases where spacing checkpoint further apart by a factor of two reduced the overall WAL volume by more than two. According to logs checkpoints are roughly 15 minutes apart. Can you show log_checkpoints output? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wal-size limited to 16MB - Performance issue for subsequent backup
On 2014-10-20 21:41:26 +0200, jes...@krogh.cc wrote: On 2014-10-20 21:03:59 +0200, jes...@krogh.cc wrote: One of our production issues is that the system generates lots of wal-files, lots is like 151952 files over the last 24h, which is about 2.4TB worth of WAL files. I wouldn't say that isn't an issue by itself, but the system does indeed work fine. We do subsequently gzip the files to limit actual disk-usage, this makes the files roughly 30-50% in size. I'm a bit doubtful that 16MB vs., say, 64MB files really changes anything substantial for you. If it indeed is a problem, it's simple enough to join the files temporarily? I am trying to get my head around a good way to do that. 64MB probably isn't a silverbullet. But it would definetely benefit the backup in terms of single thread access to data on rotating drives. Have you analyzed what the source of that volume is? Which version of postgres are you using? What's your checkpoint_timeout/segments settings? Suggestions are surely welcome. Once you're on 9.3 I'd suggest using pg_xlogdump --stats on it. There's a backport of the facility for 9.3 (looking somewhat different than what is now in 9.5) at http://archives.postgresql.org/message-id/CABRT9RAzGowqLFcEE8aF6VdPoFEy%2BP9gmu7ktGRzw0dgRwVr9Q%40mail.gmail.com That'd tell you a fair bit more. It's noticeably harder to backport to 9.3. I'll bookmark that one. I do suspect the majority is from 30 concurrent processes updating an 506GB GIN index, but it would be nice to confirm that. There is also a message-queue in the DB with a fairly high turnaround. A 506GB GIN index? Uh, interesting :). What's it used for? Trigrams? It is for full-text-search, but it is being updated entirely regulary, ~100M records. A dump/restore cycle typically reduces the size to 30-40% of current size. Currently PG 9.2 moving to 9.3 hopefully before end-of-year, checkpoint_timeout = 30min, checkpoint_segments = 4096. Generally a high checkpoint_timeout can significantly reduce the WAL volume because of fewer full page writes. I've seen cases where spacing checkpoint further apart by a factor of two reduced the overall WAL volume by more than two. I'll work with that, I was just uncomfortable bumping checkpoint_segments up much higher, any field experience in that corner? According to logs checkpoints are roughly 15 minutes apart. Can you show log_checkpoints output? 2014-10-20 18:10:22 CEST LOG: checkpoint starting: time 2014-10-20 18:15:44 CEST LOG: checkpoint complete: wrote 76851 buffers (7.3%); 0 transaction log file(s) added, 0 removed, 3238 recycled; write=295.834 s, sync=23.903 s, total=322.011 s; sync files=2115, longest=0.278 s, average=0.011 s 2014-10-20 18:40:22 CEST LOG: checkpoint starting: time 2014-10-20 18:44:30 CEST LOG: checkpoint complete: wrote 60550 buffers (5.8%); 0 transaction log file(s) added, 0 removed, 3460 recycled; write=224.678 s, sync=21.795 s, total=248.340 s; sync files=2090, longest=0.963 s, average=0.010 s 2014-10-20 19:10:22 CEST LOG: checkpoint starting: time 2014-10-20 19:14:11 CEST LOG: checkpoint complete: wrote 42720 buffers (4.1%); 0 transaction log file(s) added, 0 removed, 3598 recycled; write=206.259 s, sync=21.185 s, total=229.254 s; sync files=2065, longest=0.945 s, average=0.010 s 2014-10-20 19:40:22 CEST LOG: checkpoint starting: time 2014-10-20 19:43:31 CEST LOG: checkpoint complete: wrote 32897 buffers (3.1%); 0 transaction log file(s) added, 0 removed, 3626 recycled; write=161.801 s, sync=26.936 s, total=189.635 s; sync files=2115, longest=0.458 s, average=0.012 s 2014-10-20 20:10:22 CEST LOG: checkpoint starting: time 2014-10-20 20:14:04 CEST LOG: checkpoint complete: wrote 37557 buffers (3.6%); 0 transaction log file(s) added, 0 removed, 3285 recycled; write=205.011 s, sync=16.550 s, total=222.442 s; sync files=2113, longest=0.935 s, average=0.007 s 2014-10-20 20:40:22 CEST LOG: checkpoint starting: time 2014-10-20 20:45:18 CEST LOG: checkpoint complete: wrote 58012 buffers (5.5%); 0 transaction log file(s) added, 0 removed, 3678 recycled; write=252.750 s, sync=39.178 s, total=295.107 s; sync files=2075, longest=0.990 s, average=0.018 s 2014-10-20 21:10:22 CEST LOG: checkpoint starting: time 2014-10-20 21:13:31 CEST LOG: checkpoint complete: wrote 40530 buffers (3.9%); 0 transaction log file(s) added, 0 removed, 3652 recycled; write=167.925 s, sync=19.719 s, total=189.057 s; sync files=2077, longest=0.470 s, average=0.009 s 2014-10-20 21:40:22 CEST LOG: checkpoint starting: time 2014-10-20 21:44:20 CEST LOG: checkpoint complete: wrote 45158 buffers (4.3%); 0 transaction log file(s) added, 0 removed, 3449 recycled; write=202.986 s, sync=32.564 s, total=237.441 s; sync files=2100, longest=0.445 s, average=0.015 s -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
On 2014-10-19 20:43:29 -0500, Jim Nasby wrote: On 10/19/14, 11:41 AM, Andres Freund wrote: On 2014-10-18 21:36:48 -0500, Jim Nasby wrote: The weird part is that if it's not doing a freeze it will just punt on a page if it can't get the cleanup lock. I don't think that's particularly wierd. Otherwise vacuum can get stuck behind a single very hot page - leading to much, much more bloat. I have to believe that could seriously screw up autovacuum scheduling. Why? I'm worried there could be some pathological cases where we'd skip a large number of pages, perhaps if a vacuum scan and a seqscan ended up running alongside each other. I've seen little evidence of that. The reverse, a stuck autovacuum, is imo much more likely. For this to be an actual problem you'd need to encounter many pages that are not locked, but are pinned. That state doesn't exist for very long. Perhaps this is just paranoia, but we have no idea how bad things might be, because we don't have any logging for how many pages we skipped because we couldn't lock them. But so what? If we skip individual pages it won't be too bad - and very likely waiting very long is going to be more painful. The page won't be marked 'all visible' so the next vacuum will come around to it again. And it'll also get cleaned up by opportunistic hot pruning. Also, if this really is that big a deal for heap pages, how come we don't get screwed by it on Btree index pages, where we mandate that we acquire a cleanup lock? Because we never hold pins for btree pages for very long. Whereas we do that for heap pages. If you e.g. run a cursor forward you can hold a pin for essentially unbounded time. Now that we have forks, I'm wondering if it would be best to come up with a per-page system that could be used to determine when a table needs background work to be done. The visibility map could serve a lot of this purpose, but I'm not sure if it would work for getting hint bits set in the background. It would. Per definition, all tuples that are 'all visible' need to be fully hint bitted. I think it would also be a win if we had a way to advance relfrozenxid and relminmxid. Perhaps something that simply remembered the last XID that touched each page... Not sure what you're getting at here? That ultimately, our current method for determining when and what to vacuum is rather crude, and likely results in wasted effort during scans as well as not firing autovac often enough. Keep in mind that autovac started as a user-space utility and the best it could possibly do was to keep a table of stats counters. I agree that we should trigger autovacuum more often. It's *intentionally* not triggered *at all* for insert only workloads (if you discount anti wraparound vacuums). I think it's time to change that. For that we'd need to make vacuums that don't delete any tuples cheaper. We already rescan only the changed parts of the heaps - but we always scan indexes fully... The visibility map obviously helps cut down on extra work during a scan, but it only goes so far in that regard. Aha. Instead of relying on the crude methods, if we reliably tracked certain txids on a per-block basis in a fork, we could cheaply scan the fork and make an extremely informed decision on how much a vacuum would gain us, and exactly what blocks it should hit. Let me use freezing as an example. If we had a reliable list of the lowest txid for each block of a relation that would allow us to do a freeze scan by hitting only blocks with minimum txid within our freeze range. The same could be done for multixacts. It'd also become a prime contention point because you'd need to constantly update it. In contrast to a simple 'is frozen' bit (akin to is_visible) which only changes infrequently, and only in one direction. If we stored 3 txids for each block in a fork, we could fit information for ~680 heap blocks in each fork block. So in a database with 680G of heap data, we could fully determine every *block* (not table) we needed to vacuum by scanning just 1GB of data. That would allow for far better autovacuum scheduling than what we do today. It's not that simple. Wraparounds and locking complicate it significantly. I think the big missing piece lest something like Heikki's xid lsn ranges thing gets finished is a freeze map. The problem with a simple freeze map is when do you actually set the bit? There's precisely one place where you can set it for normal operation. During vacuum's scan. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 2014-10-20 01:03:31 -0400, Noah Misch wrote: On Wed, Oct 15, 2014 at 12:53:03AM -0400, Noah Misch wrote: On Tue, Oct 14, 2014 at 07:07:17PM -0400, Tom Lane wrote: Dave Page dp...@pgadmin.org writes: On Tue, Oct 14, 2014 at 11:38 PM, Tom Lane t...@sss.pgh.pa.us wrote: I think we're hoping that somebody will step up and investigate how narwhal's problem might be fixed. I have planned to look at reproducing narwhal's problem once the dust settles on orangutan, but I wouldn't mind if narwhal went away instead. No argument here. I would kind of like to have more than zero understanding of *why* it's failing, just in case there's more to it than oh, probably a bug in this old toolchain. But finding that out might well take significant time, and in the end not tell us anything very useful. Agreed on all those points. I reproduced narwhal's problem using its toolchain on another 32-bit Windows Server 2003 system. The crash happens at the SHGetFolderPath() call in pqGetHomeDirectory(). A program can acquire that function via shfolder.dll or via shell32.dll; we've used the former method since commit 889f038, for better compatibility[1] with Windows NT 4.0. On this system, shfolder.dll's version loads and unloads shell32.dll. In PostgreSQL built using this older compiler, shfolder.dll:SHGetFolderPath() unloads libpq in addition to unloading shell32! That started with commit 846e91e. I don't expect to understand the mechanism behind it, but I recommend we switch back to linking libpq with shell32.dll. The MSVC build already does that in all supported branches, and it feels right for the MinGW build to follow suit in 9.4+. Windows versions that lack the symbol in shell32.dll are now ancient history. Ick. Nice detective work of a ugly situation. I happened to try the same contrib/dblink test suite on PostgreSQL built with modern MinGW-w64 (i686-4.9.1-release-win32-dwarf-rt_v3-rev1). That, too, gave a crash-like symptom starting with commit 846e91e. Specifically, a backend that LOADed any module linked to libpq (libpqwalreceiver, dblink, postgres_fdw) would suffer this after calling exit(0): === 3056 2014-10-20 00:40:15.163 GMT LOG: disconnection: session time: 0:00:00.515 user=cyg_server database=template1 host=127.0.0.1 port=3936 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. 9300 2014-10-20 00:40:15.163 GMT LOG: server process (PID 3056) exited with exit code 3 === The mechanism turned out to be disjoint from the mechanism behind the ancient-compiler crash. Based on the functions called from exit(), my best guess is that exit() encountered recursion and used something like an abort() to escape. Hm. (I can send the gdb transcript if anyone is curious to see the gory details.) That would be interesting. The proximate cause was commit 846e91e allowing modules to use shared libgcc. A 32-bit libpq acquires 64-bit integer division from libgcc. Passing -static-libgcc to the link restores the libgcc situation as it stood before commit 846e91e. The main beneficiary of shared libgcc is C++/Java exception handling, so PostgreSQL doesn't care. No doubt there's some deeper bug in libgcc or in PostgreSQL; loading a module that links with shared libgcc should not disrupt exit(). I'm content with this workaround. I'm unconvinced by this reasoning. Popular postgres extensions like postgis do use C++. It's imo not hard to imagine situations where switching to a statically linked libgcc statically could cause problems. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Add regression tests for autocommit-off mode for psql and fix some omissions
Jim Nasby-5 wrote On 10/7/14, 2:11 AM, Feike Steenbergen wrote: On 7 October 2014 01:41, Jim Nasbylt; Jim.Nasby@ gt; wrote: The options I see... 1) If there's a definitive way to tell from backend source code what commands disallow transactions then we can just use that information to generate the list of commands psql shouldn't do that with. 2) Always run the regression test with auto-commit turned off. 3) Run the regression in both modes (presumably only on the build farm due to how long it would take). 1) I don't know about a definitive way. I used grep to find all statements calling PreventTransactionChain. Perhaps it wouldn't be too horrific to create some perl code that would figure out what all of those commands are, and we could then use that to generate the appropriate list for psql. 2) - I expect most people use autocommit-on; so only running it in autocommit-off would not test the majority of users. - autocommit-off also obliges you to explicitly rollback transactions after errors occur; this would probably mean a rewrite of some tests? Well, that is at least doable, but probably rather ugly. It would probably be less ugly if our test framework had a way to test for errors (ala pgTap). Where I was going with this is a full-on brute-force test: execute every possible command with autocommit turned off. We don't need to check that each command does what it's supposed to do, only that it can execute. Of course, the huge problem with that is knowing how to actually successfully run each command. :( Theoretically the tests could be structured in such a way that there's a subset of tests that just see if the command even executes, but creating that is obviously a lot of work and with our current test framework probably a real pain to maintain. From the comments here the effort needed to prevent this particular oversight seems excessive compared to the error it is trying to prevent - an error that is fairly easily remedied in a minor release and which has an easy work around. That said can we just do: 1) I don't know about a definitive way. I used grep to find all statements calling PreventTransactionChain. and save the results to an .out file with a comment somewhere that if there is any change to the content of this file the corresponding command should be manually tested in psql with autocommit=on. This seems to be what you are saying but the psql check does not have to be automated... David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Add-regression-tests-for-autocommit-off-mode-for-psql-and-fix-some-omissions-tp5821889p5823728.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup fails with long tablespace paths
On 10/20/14 2:59 PM, Tom Lane wrote: What do we want to do about this? I think a minimum expectation would be for pg_basebackup to notice and complain when it's trying to create an unworkably long symlink entry, but it would be far better if we found a way to cope instead. Isn't it the backend that should error out before sending truncated files names? src/port/tar.c: /* Name 100 */ sprintf(h[0], %.99s, filename); And then do we need to prevent the creation of tablespaces that can't be backed up? One thing we could possibly do without reinventing tar is to avoid using absolute path names if a PGDATA-relative one would do. Maybe we could hack up the tar format to store the symlink target as the file body, like cpio does. Of course then we'd lose the property of this actually being tar. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Patch: Add launchd Support
Hackers, In Mac OS X 10.10 “Yosemite,” Apple removed SystemStarter, upon which our OS X start script has relied since 2007. So here is a patch that adds support for its replacement, launchd. It includes 7 day log rotation like the old script did. The install script still prefers the SystemStarter approach for older versions of the OS, for the sake of easier backward compatibility. We could change that if we wanted, since launchd has been part of the OS for around a decade. launchd.patch Description: Binary data smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] pg_dump/pg_restore seem broken on hamerkop
Buildfarm member hamerkop has been failing in the pg_upgrade regression test for the last several days. The problem looks like this: command: C:/buildfarm/build_root/HEAD/pgsql.build/contrib/pg_upgrade/tmp_check/install/bin/pg_restore --port 50432 --username Administrator --exit-on-error --verbose --dbname postgres pg_upgrade_dump_12145.custom pg_upgrade_dump_12145.log 21 pg_restore: connecting to database for restore pg_restore: [archiver (db)] Error while INITIALIZING: pg_restore: [archiver (db)] could not execute query: ERROR: invalid byte sequence for encoding UTF8: 0x93 I can't help noticing that this started immediately after commit 0eea804 pg_dump: Reduce use of global variables. No idea why the issue is only showing up on this one animal. I guess one of possibilities is there's garbage in memory which is related to restore the process. If so, coverity may be able to find something. The last coverity analysis was Oct 12. The commit 0eea804 was made on Oct 14. So let's see what new coverity scan reports... Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Add launchd Support
David E. Wheeler da...@justatheory.com writes: In Mac OS X 10.10 âYosemite,â Apple removed SystemStarter, upon which our OS X start script has relied since 2007. So here is a patch that adds support for its replacement, launchd. It includes 7 day log rotation like the old script did. The install script still prefers the SystemStarter approach for older versions of the OS, for the sake of easier backward compatibility. We could change that if we wanted, since launchd has been part of the OS for around a decade. (1) I'd vote for just removing the SystemStarter stuff: it complicates understanding what's happening, to no very good end. We can easily check that the launchd way works back to whatever we think our oldest supported OS X release is. (10.4.x according to the buildfarm, at least; and I think SystemStarter was deprecated even then ...) (2) AFAICS, this .plist file doesn't do anything about launchd's habit of not waiting for the network to come up. See my comments in today's thread in -general: http://www.postgresql.org/message-id/1239.1413823...@sss.pgh.pa.us (3) I don't think you want Disabled = true. (4) I'm suspicious of all the -c arguments in the .plist file. In general I'm not a fan of specifying GUCs on the postmaster command line; that makes it impossible to override their values via normal methods like postgresql.conf or ALTER SYSTEM. (5) According to the launchd.plist man page, there are options for redirecting stdout and stderr to someplace useful. It might be worth exercising those ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Questions on domain on composite / casts ignoring domains
I'm trying to create what amounts to a new type. This would be rather easy if I could perform a CHECK on a composite type, which I could do if I could create a domain on top of a composite. Is there any reason in particular that hasn't been done? As an alternative, I tried accomplishing this with a straight domain. That would work, except for this: WARNING: cast will be ignored because the source data type is a domain Why do we ignore casts from domains to other data types? I'm guessing because it's simply not what domains were meant for? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Add launchd Support
On 10/20/14, 5:59 PM, David E. Wheeler wrote: In Mac OS X 10.10 “Yosemite,” Apple removed SystemStarter, upon which our OS X start script has relied since 2007. So here is a patch that adds support for its replacement, launchd. It includes 7 day log rotation like the old script did. The install script still prefers the SystemStarter approach for older versions of the OS, for the sake of easier backward compatibility. We could change that if we wanted, since launchd has been part of the OS for around a decade. You're enabling POSTGRESQL in /etc/hostconfig before any of the files are copied over... what happens if we puke before the files get copied? Would it be better to enable after the scripts are in place? BTW, Mavericks has a comment that /etc/hostconfig is going away, but google isn't telling me what's replacing it... -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Add launchd Support
On Oct 20, 2014, at 4:36 PM, Tom Lane t...@sss.pgh.pa.us wrote: (1) I'd vote for just removing the SystemStarter stuff: it complicates understanding what's happening, to no very good end. We can easily check that the launchd way works back to whatever we think our oldest supported OS X release is. (10.4.x according to the buildfarm, at least; and I think SystemStarter was deprecated even then ...) Okay. Might have to use OnDemand instead of KeepAlive on 10.4. The former was deprecated in 10.5, but I’m not sure when the former was added. (2) AFAICS, this .plist file doesn't do anything about launchd's habit of not waiting for the network to come up. See my comments in today's thread in -general: http://www.postgresql.org/message-id/1239.1413823...@sss.pgh.pa.us Ha! How funny you posted a call for a patch today. I didn’t see that, just needed to get it working today myself. Anyway, I knew there was a reason I didn’t bother with this years ago: launchd does not support dependencies. From the launchd.plist(5) DEPENDENCIES Unlike many bootstrapping daemons, launchd has no explicit dependency model. Interdependencies are expected to be solved through the use of IPC. It is therefore in the best interest of a job developer who expects dependents to define all of the sockets in the configuration file. This has the added ben- efit of making it possible to start the job based on demand instead of imme- diately. launchd will continue to place as many restrictions on jobs that do not conform to this model as possible. This another reason not to use KeepAlive, I guess. OnDemand is supposed to fire up a job only when it’s needed. No idea what that means. We might be able to put something in LaunchEvents that gets it to fire when the network launches, but documentation is hella thin (and may only be supported on Yosemite, where there are a bunch of poorly-documented launchd changes). (3) I don't think you want Disabled = true. It’s the default. When you run `launchctl load -w` it overrides it to false in its database. I’m fine to have it be less opaque, though. (4) I'm suspicious of all the -c arguments in the .plist file. In general I'm not a fan of specifying GUCs on the postmaster command line; that makes it impossible to override their values via normal methods like postgresql.conf or ALTER SYSTEM. Yeah, I am okay with removing those; they weren’t in the SystemStarter script. Was the only way to replicate the log rotation stuff, but probably best not to do that in the start script, anyway. (5) According to the launchd.plist man page, there are options for redirecting stdout and stderr to someplace useful. It might be worth exercising those ... Suggestions? Best, David smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Patch: Add launchd Support
On Oct 20, 2014, at 4:58 PM, Jim Nasby jim.na...@bluetreble.com wrote: You're enabling POSTGRESQL in /etc/hostconfig before any of the files are copied over... what happens if we puke before the files get copied? Would it be better to enable after the scripts are in place? That code was there; I just indented it in an if/then block. BTW, Mavericks has a comment that /etc/hostconfig is going away, but google isn't telling me what's replacing it... launchd. Best, David smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Re: Add regression tests for autocommit-off mode for psql and fix some omissions
On 10/20/14, 3:49 PM, David G Johnston wrote: Well, that is at least doable, but probably rather ugly. It would probably be less ugly if our test framework had a way to test for errors (ala pgTap). Where I was going with this is a full-on brute-force test: execute every possible command with autocommit turned off. We don't need to check that each command does what it's supposed to do, only that it can execute. Of course, the huge problem with that is knowing how to actually successfully run each command.:( Theoretically the tests could be structured in such a way that there's a subset of tests that just see if the command even executes, but creating that is obviously a lot of work and with our current test framework probably a real pain to maintain. From the comments here the effort needed to prevent this particular oversight seems excessive compared to the error it is trying to prevent - an error that is fairly easily remedied in a minor release and which has an easy work around. That said can we just do: 1) I don't know about a definitive way. I used grep to find all statements calling PreventTransactionChain. and save the results to an .out file with a comment somewhere that if there is any change to the content of this file the corresponding command should be manually tested in psql with autocommit=on. This seems to be what you are saying but the psql check does not have to be automated... Are you thinking we'd commit the expected output of the perl script and have the regression suite call that script to verify it? That seems like a good way to fix this. The only better option I can come up with is if the perl script generated an actual test that we know would fail if a new command showed up. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] get_actual_variable_range vs idx_scan/idx_tup_fetch
On 10/18/14, 8:58 AM, Bruce Momjian wrote: On Fri, Oct 17, 2014 at 11:03:04PM -0400, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: On Fri, Oct 17, 2014 at 06:15:37PM -0400, Tom Lane wrote: Those stats were perfectly valid: what the planner is looking for is accurate minimum and maximum values for the index's leading column, and that's what it got. You're correct that a narrower index could have given the same results with a smaller disk footprint, but the planner got the results it needed from the index you provided for it to work with. Uh, why is the optimizer looking at the index on a,b,c and not just the stats on column a, for example? I am missing something here. Because it needs up-to-date min/max values in order to avoid being seriously misled about selectivities of values near the endpoints. See commit 40608e7f949fb7e4025c0ddd5be01939adc79eec. Oh, I had forgotten we did that. It is confusing that there is no way via EXPLAIN to see the access, making the method of consulting pg_stat_* and using EXPLAIN unreliable. Should we document this somewhere? I think we should. The common (mis)conception is that pg_stats shows *user-driven* access, not access because of stuff the system is doing. This is actually a huge problem for anyone who's trying to figure out how useful indexes are; they see usage and thing they have queries that are using the index when in reality they don't. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Add launchd Support
David E. Wheeler da...@justatheory.com writes: On Oct 20, 2014, at 4:36 PM, Tom Lane t...@sss.pgh.pa.us wrote: (1) I'd vote for just removing the SystemStarter stuff: it complicates understanding what's happening, to no very good end. We can easily check that the launchd way works back to whatever we think our oldest supported OS X release is. (10.4.x according to the buildfarm, at least; and I think SystemStarter was deprecated even then ...) Okay. Might have to use OnDemand instead of KeepAlive on 10.4. The former was deprecated in 10.5, but Iâm not sure when the former was added. [ looks ... ] Yeah, there's no mention of KeepAlive in 10.4's launchd.plist man page. It does have a convenient example saying that OnDemand = false does what we want: The following XML Property List simply keeps exampled running continu- ously: ?xml version=1.0 encoding=UTF-8? !DOCTYPE plist PUBLIC -//Apple Computer//DTD PLIST 1.0//EN http://www.apple.com/DTDs/PropertyList-1.0.dtd plist version=1.0 dict keyLabel/key stringcom.example.exampled/string keyProgramArguments/key array stringexampled/string /array keyOnDemand/key false/ /dict /plist (5) According to the launchd.plist man page, there are options for redirecting stdout and stderr to someplace useful. It might be worth exercising those ... Suggestions? I'd just drop them into files in the data directory; we're still going to recommend that people use the logging_collector, so this is just a stopgap to collect startup errors. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Log inability to lock pages during vacuum
On 10/20/14, 10:29 AM, Greg Stark wrote: On Mon, Oct 20, 2014 at 2:57 AM, Jim Nasby jim.na...@bluetreble.com wrote: Currently, a non-freeze vacuum will punt on any page it can't get a cleanup lock on, with no retry. Presumably this should be a rare occurrence, but I think it's bad that we just assume that and won't warn the user if something bad is going on. My thought is that if we skip any pages elog(LOG) how many we skipped. If we skip more than 1% of the pages we visited (not relpages) then elog(WARNING) instead. Is there some specific failure you've run into where a page was stuck in a pinned state and never got vacuumed? Not that I know of... but how would I actually know? Having that info available is the point of my proposal. :) I would like to see a more systematic way of going about this. What LSN or timestamp is associated with the oldest unvacuumed page? How many times have we tried to visit it? What do those numbers look like overall -- i.e. what's the median number of times it takes to vacuum a page and what does the distribution look like of the unvacuumed ages? With that data it should be possible to determine if the behaviour is actually working well and where to draw the line to determine outliers that might represent bugs. I agree we could use better data about/for vacuum (see http://www.postgresql.org/message-id/544468c1.6050...@bluetreble.com). In the meantime, I think it's worth adding this logging. If in fact this basically never happens (the current assumption), it doesn't hurt anything. If it turns out our assumption is wrong, then we'll actually be able to find that out. :) -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Log inability to lock pages during vacuum
On 2014-10-20 19:18:31 -0500, Jim Nasby wrote: In the meantime, I think it's worth adding this logging. If in fact this basically never happens (the current assumption), it doesn't hurt anything. If it turns out our assumption is wrong, then we'll actually be able to fin that out. :) It does happen, and not infrequently. Just not enough pages to normally cause significant bloat. The most likely place where it happens is very small tables that all connections hit with a high frequency. Starting to issue high volume log spew for a nonexistant problem isn't helping. If you're super convinced this is urgent then add it as a *single* datapoint inside the existing messages. But I think there's loads of stuff in vacuum logging that are more important this. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Add launchd Support
On Oct 20, 2014, at 5:17 PM, Tom Lane t...@sss.pgh.pa.us wrote: [ looks ... ] Yeah, there's no mention of KeepAlive in 10.4's launchd.plist man page. It does have a convenient example saying that OnDemand = false does what we want: Yeah, let’s see if we can cover both. I'd just drop them into files in the data directory; we're still going to recommend that people use the logging_collector, so this is just a stopgap to collect startup errors. How about this? plist version=1.0 dict keyDisabled/key false/ keyLabel/key stringorg.postgresql.postgresql/string keyUserName/key stringpostgres/string keyGroupName/key stringpostgres/string keyProgramArguments/key array string/usr/local/pgsql/bin/postgres/string string-D/string string/usr/local/pgsql/data/string /array keyStandardOutPath/key string/usr/local/pgsql/data/launchd.log/string keyStandardErrorPath/key string/usr/local/pgsql/data/launchd.log/string keyOnDemand/key!-- OS X 10.4 -- false/ keyKeepAlive/key!-- OS X 10.5+ -- true/ /dict /plist No fix for the networking issue, of course. Best, David smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
On 10/20/14, 3:11 PM, Andres Freund wrote: On 2014-10-19 20:43:29 -0500, Jim Nasby wrote: On 10/19/14, 11:41 AM, Andres Freund wrote: On 2014-10-18 21:36:48 -0500, Jim Nasby wrote: The weird part is that if it's not doing a freeze it will just punt on a page if it can't get the cleanup lock. I don't think that's particularly wierd. Otherwise vacuum can get stuck behind a single very hot page - leading to much, much more bloat. I have to believe that could seriously screw up autovacuum scheduling. Why? I'm worried there could be some pathological cases where we'd skip a large number of pages, perhaps if a vacuum scan and a seqscan ended up running alongside each other. I've seen little evidence of that. The reverse, a stuck autovacuum, is imo much more likely. For this to be an actual problem you'd need to encounter many pages that are not locked, but are pinned. That state doesn't exist for very long. How would you actually get evidence of this... we don't log it. :) (See my proposal at http://www.postgresql.org/message-id/54446c10.2080...@bluetreble.com) Perhaps this is just paranoia, but we have no idea how bad things might be, because we don't have any logging for how many pages we skipped because we couldn't lock them. But so what? If we skip individual pages it won't be too bad - and very likely waiting very long is going to be more painful. The page won't be marked 'all visible' so the next vacuum will come around to it again. And it'll also get cleaned up by opportunistic hot pruning. Probably true. Hopefully we can start logging it and then we'll know for sure. That ultimately, our current method for determining when and what to vacuum is rather crude, and likely results in wasted effort during scans as well as not firing autovac often enough. Keep in mind that autovac started as a user-space utility and the best it could possibly do was to keep a table of stats counters. I agree that we should trigger autovacuum more often. It's *intentionally* not triggered *at all* for insert only workloads (if you discount anti wraparound vacuums). I think it's time to change that. For that we'd need to make vacuums that don't delete any tuples cheaper. We already rescan only the changed parts of the heaps - but we always scan indexes fully... Or maybe vacuum isn't the right way to handle some of these scenarios. It's become the catch-all for all of this stuff, but maybe that doesn't make sense anymore. Certainly when it comes to dealing with inserts there's no reason we *have* to do anything other than set hint bits and possibly freeze xmin. Instead of relying on the crude methods, if we reliably tracked certain txids on a per-block basis in a fork, we could cheaply scan the fork and make an extremely informed decision on how much a vacuum would gain us, and exactly what blocks it should hit. Let me use freezing as an example. If we had a reliable list of the lowest txid for each block of a relation that would allow us to do a freeze scan by hitting only blocks with minimum txid within our freeze range. The same could be done for multixacts. It'd also become a prime contention point because you'd need to constantly update it. In contrast to a simple 'is frozen' bit (akin to is_visible) which only changes infrequently, and only in one direction. Actually, the contention on freeze would very possibly be minimal, because it probably doesn't change very often. Even if it did, it's OK if the value isn't 100% accurate, so long as the recorded XID is guaranteed older than what's actually on the page. If we stored 3 txids for each block in a fork, we could fit information for ~680 heap blocks in each fork block. So in a database with 680G of heap data, we could fully determine every *block* (not table) we needed to vacuum by scanning just 1GB of data. That would allow for far better autovacuum scheduling than what we do today. It's not that simple. Wraparounds and locking complicate it significantly. I realize what I'm talking about isn't trivial (though, I'm confused by your comment about wraparound since presumably TransactionIdPrecedes() and it's ilk solve that problem...) My ultimate point here is that we're using what are (today) very crude methods to control what gets vacuumed when, and I think that now that we have resource forks would could do *much* better without a tremendous amount of work. But to make a big advancement here we'll need to take a step back and rethink some things (like vacuum is the only way to handle these problems). Let me put some thought into this. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
On 10/20/2014 05:39 PM, Jim Nasby wrote: Or maybe vacuum isn't the right way to handle some of these scenarios. It's become the catch-all for all of this stuff, but maybe that doesn't make sense anymore. Certainly when it comes to dealing with inserts there's no reason we *have* to do anything other than set hint bits and possibly freeze xmin. +1 -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Log inability to lock pages during vacuum
On 10/20/14, 7:31 PM, Andres Freund wrote: On 2014-10-20 19:18:31 -0500, Jim Nasby wrote: In the meantime, I think it's worth adding this logging. If in fact this basically never happens (the current assumption), it doesn't hurt anything. If it turns out our assumption is wrong, then we'll actually be able to fin that out.:) It does happen, and not infrequently. Just not enough pages to normally cause significant bloat. The most likely place where it happens is very small tables that all connections hit with a high frequency. Starting to issue high volume log spew for a nonexistant problem isn't helping. How'd you determine that? Is there some way to measure this? I'm not doubting you; I just don't want to work on a problem that's already solved. If you're super convinced this is urgent then add it as a*single* datapoint inside the existing messages. But I think there's loads of stuff in vacuum logging that are more important this. See my original proposal; at it's most intrusive this would issue one warning per (auto)vacuum if it was over a certain threshold. I would think that a DBA would really like to know when this happens, but if we think that's too much spew we can limit it to normal vacuum logging. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
On 2014-10-20 17:43:26 -0700, Josh Berkus wrote: On 10/20/2014 05:39 PM, Jim Nasby wrote: Or maybe vacuum isn't the right way to handle some of these scenarios. It's become the catch-all for all of this stuff, but maybe that doesn't make sense anymore. Certainly when it comes to dealing with inserts there's no reason we *have* to do anything other than set hint bits and possibly freeze xmin. +1 A page read is a page read. What's the point of heaving another process do it? Vacuum doesn't dirty pages if they don't have to be dirtied. Especially stuff like freezing cannot really be dealt with outside of vacuum unless you make already complex stuff more complex for a marginal benefit. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Log inability to lock pages during vacuum
On 2014-10-20 19:43:38 -0500, Jim Nasby wrote: On 10/20/14, 7:31 PM, Andres Freund wrote: On 2014-10-20 19:18:31 -0500, Jim Nasby wrote: In the meantime, I think it's worth adding this logging. If in fact this basically never happens (the current assumption), it doesn't hurt anything. If it turns out our assumption is wrong, then we'll actually be able to fin that out.:) It does happen, and not infrequently. Just not enough pages to normally cause significant bloat. The most likely place where it happens is very small tables that all connections hit with a high frequency. Starting to issue high volume log spew for a nonexistant problem isn't helping. How'd you determine that? Is there some way to measure this? You'd seen individual pages with too old dead rows in them. If you're super convinced this is urgent then add it as a*single* datapoint inside the existing messages. But I think there's loads of stuff in vacuum logging that are more important this. See my original proposal; at it's most intrusive this would issue one warning per (auto)vacuum if it was over a certain threshold. Which would vastly increase the log output for setups with small tables and a nonzero log_autovacuum_min_duration. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Add launchd Support
On Oct 20, 2014, at 5:03 PM, David E. Wheeler da...@justatheory.com wrote: This another reason not to use KeepAlive, I guess. OnDemand is supposed to fire up a job only when it’s needed. No idea what that means. I think the idea of OnDemand is for launchd items to act a bit like inetd does: launchd creates the listening socket (or mach port or file-change notification) on the port specified in the plist, and only starts the process when someone tries to connect to it. This might not be a terrible thing to support, but it would require more changes throughout postgres (postmaster would have to check in with launchd at start time to get the listen socket; ports and socket paths would no longer be specifiable in postgres’ config, etc) and I’m skeptical that it’d be worth the work without a more concrete motivation. Apple has published their changes to Postgres (since they ship it in recent versions of OSX) here, fwiw, including the launchd plist they use: http://www.opensource.apple.com/source/PostgreSQL/ One thing I noticed is that Apple also used the label “org.postgres.postgres” for their launchd job. I don’t know if that will collide in some way with a second job with the same label. Launchctl load/unload takes a pathname, not a job label, so I don’t think it’d be a problem unless you actually do want to run both copies of postgres at the same time. MacPorts also has a launchd job for their postgresql port, which invokes daemondo, which invokes a wrapper script, which invokes postgres. I’m not sure why they did it that way. 2) AFAICS, this .plist file doesn't do anything about launchd's habit of not waiting for the network to come up. Have you experimented with this setting?: keyKeepAlive/key dictkeyNetworkState/keytrue//dict The launchd.plist man page claims that if you set that key in the sub-dictionary: If true, the job will be kept alive as long as the network is up, where up is defined as at least one non-loopback interface being up and having IPv4 or IPv6 addresses assigned to them. If false, the job will be kept alive in the inverse condition. On the other hand, it’s not unreasonable to have postgres running on a machine with only a loopback interface, depending on the use. We might be able to put something in LaunchEvents that gets it to fire when the network launches, but documentation is hella thin (and may only be supported on Yosemite, where there are a bunch of poorly-documented launchd changes). If one were desperate enough... it’s possible to dig through the launchd sources to make up for the gaps in the documentation (also on opensource.apple.com; there used to be a community-ish site for it at macosforge.org as well). It’s rough going, though, IIRC. (3) I don't think you want Disabled = true. It’s the default. When you run `launchctl load -w` it overrides it to false in its database. I’m fine to have it be less opaque, though. Yes, AFAICT it’s conventional to specify Disabled=true in a launchd plist and use launchctl to enable the item. BTW, Mavericks has a comment that /etc/hostconfig is going away, but google isn't telling me what's replacing it… I think that’s been “going away” for a decade now. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Trailing comma support in SELECT statements
On 10/20/14, 11:16 AM, Andrew Dunstan wrote: On 10/20/2014 11:59 AM, David E. Wheeler wrote: On Oct 18, 2014, at 7:06 PM, Jim Nasby jim.na...@bluetreble.com wrote: Yes. The only case I can think of where we wouldn't want this is COPY. BTW, this should also apply to delimiters other than commas; for example, some geometry types use ; as a delimiter between points. I don’t think it should apply to the internals of types, necessarily. JSON, for example, always dies on an trailing comma, so should probably stay that way. Well, maybe allow it on JSONB input, but not JSON. Though we perhaps don’t want their behaviors to diverge. The JSON spec is quite clear on this. Leading and trailing commas are not allowed. I would fight tooth and nail not to allow it for json (and by implication jsonb, since they use literally the same parser - in fact we do that precisely so their input grammars can't diverge). +1. Data types that implement specs should follow the spec. I was more concerned about things like polygon, but the real point (ha!) is that we need to think about the data types too. (I will say I don't think things that mandate an exact number of elements (like point, box, etc) should support extra delimiters). -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Questions on domain on composite / casts ignoring domains
Jim Nasby-5 wrote I'm trying to create what amounts to a new type. This would be rather easy if I could perform a CHECK on a composite type, which I could do if I could create a domain on top of a composite. Is there any reason in particular that hasn't been done? As an alternative, I tried accomplishing this with a straight domain. That would work, except for this: WARNING: cast will be ignored because the source data type is a domain Why do we ignore casts from domains to other data types? I'm guessing because it's simply not what domains were meant for? A domain is a base type with a constraint. When you cast you already know the existing value is valid and the system simply uses the cast available for the base type instead. i.e., You cannot have a domain with a different cast rule than the base type over which it is defined. Likely the lack of capability is simply a matter of complexity in the face of somewhat uncommon usage and limited resources. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Questions-on-domain-on-composite-casts-ignoring-domains-tp5823745p5823763.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inconsistencies in documentation of row-level locking
On 10/10/14, 8:31 AM, Michael Paquier wrote: Hi all, Currently all the row-level lock modes are described in the page for SELECT query: http://www.postgresql.org/docs/devel/static/explicit-locking.html#LOCKING-ROWS However, after browsing the documentation, I noticed in the page describing all the explicit locks of the system that there is a portion dedicated to row-level locks and this section is not mentioning at all FOR KEY SHARE and FOR NO KEY UPDATE. It seems that this is something rather misleading for the user: http://www.postgresql.org/docs/devel/static/explicit-locking.html#LOCKING-ROWS Attached is a patch that refactors the whole and improves the documentation: - Addition of a table showing the conflicts between each lock - Moved description of each row-level lock mode to the explicit locking page - Addition of a link in SELECT portion to redirect the user to the explicit locking page Did this get committed? Should probably add it to the commitfest if not... -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On Mon, Oct 20, 2014 at 10:24:47PM +0200, Andres Freund wrote: On 2014-10-20 01:03:31 -0400, Noah Misch wrote: On Wed, Oct 15, 2014 at 12:53:03AM -0400, Noah Misch wrote: I happened to try the same contrib/dblink test suite on PostgreSQL built with modern MinGW-w64 (i686-4.9.1-release-win32-dwarf-rt_v3-rev1). That, too, gave a crash-like symptom starting with commit 846e91e. Specifically, a backend that LOADed any module linked to libpq (libpqwalreceiver, dblink, postgres_fdw) would suffer this after calling exit(0): === 3056 2014-10-20 00:40:15.163 GMT LOG: disconnection: session time: 0:00:00.515 user=cyg_server database=template1 host=127.0.0.1 port=3936 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. 9300 2014-10-20 00:40:15.163 GMT LOG: server process (PID 3056) exited with exit code 3 === The mechanism turned out to be disjoint from the mechanism behind the ancient-compiler crash. Based on the functions called from exit(), my best guess is that exit() encountered recursion and used something like an abort() to escape. Hm. (I can send the gdb transcript if anyone is curious to see the gory details.) That would be interesting. Attached. (rep 100 s calls a macro equivalent to issuing s 100 times.) The proximate cause was commit 846e91e allowing modules to use shared libgcc. A 32-bit libpq acquires 64-bit integer division from libgcc. Passing -static-libgcc to the link restores the libgcc situation as it stood before commit 846e91e. The main beneficiary of shared libgcc is C++/Java exception handling, so PostgreSQL doesn't care. No doubt there's some deeper bug in libgcc or in PostgreSQL; loading a module that links with shared libgcc should not disrupt exit(). I'm content with this workaround. I'm unconvinced by this reasoning. Popular postgres extensions like postgis do use C++. It's imo not hard to imagine situations where switching to a statically linked libgcc statically could cause problems. True; I was wrong to say that PostgreSQL doesn't care. MinGW builds of every released PostgreSQL version use static libgcc. That changed as an unintended consequence of a patch designed to remove our reliance on dlltool. Given the lack of complaints about our historic use of static libgcc, I think it's fair to revert to 9.3's use thereof. Supporting shared libgcc would be better still, should someone make that effort. Breakpoint 1, 0x77bcaf46 in msvcrt!exit () from C:\WINDOWS\system32\msvcrt.dll #0 0x77bcaf46 in msvcrt!exit () from C:\WINDOWS\system32\msvcrt.dll #1 0x0065637b in proc_exit (code=code@entry=0) at ipc.c:143 #2 0x006798bf in PostgresMain (argc=1, argv=argv@entry=0x225830, dbname=0x224498 template1, username=0x224460 cyg_server) at postgres.c:4232 #3 0x0061b26e in BackendRun (port=0x205f520) at postmaster.c:4113 #4 SubPostmasterMain (argc=argc@entry=3, argv=argv@entry=0x222e88) at postmaster.c:4617 #5 0x007af2a1 in main (argc=3, argv=0x222e88) at main.c:196 (gdb) s Single stepping until exit from function msvcrt!exit, which has no line number information. 0x77bcae7a in msvcrt!_initterm () from C:\WINDOWS\system32\msvcrt.dll (gdb) Single stepping until exit from function msvcrt!_initterm, which has no line number information. 0x77bc84c4 in strerror () from C:\WINDOWS\system32\msvcrt.dll (gdb) Single stepping until exit from function strerror, which has no line number information. 0x77bcae86 in msvcrt!_initterm () from C:\WINDOWS\system32\msvcrt.dll (gdb) Single stepping until exit from function msvcrt!_initterm, which has no line number information. atexit_callback () at ipc.c:283 warning: Source file is more recent than executable. 283 proc_exit_prepare(-1); (gdb) proc_exit_prepare (code=-1) at ipc.c:153 153 { (gdb) fin Run till exit from #0 proc_exit_prepare (code=-1) at ipc.c:153 0x77bcaed6 in msvcrt!_initterm () from C:\WINDOWS\system32\msvcrt.dll (gdb) s Single stepping until exit from function msvcrt!_initterm, which has no line number information. Breakpoint 1, 0x77bcaf61 in msvcrt!_exit () from C:\WINDOWS\system32\msvcrt.dll (gdb) rep 100 s Single stepping until exit from function msvcrt!_exit, which has no line number information. 0x77bcae7a in msvcrt!_initterm () from C:\WINDOWS\system32\msvcrt.dll Single stepping until exit from function msvcrt!_initterm, which has no line number information. 0x77bc84c4 in strerror () from C:\WINDOWS\system32\msvcrt.dll Single stepping until exit from function strerror, which has no line number information. 0x77bcae86 in msvcrt!_initterm () from C:\WINDOWS\system32\msvcrt.dll Single stepping until exit from function msvcrt!_initterm, which has no line
Re: [HACKERS] pg_basebackup fails with long tablespace paths
On Tue, Oct 21, 2014 at 12:29 AM, Tom Lane t...@sss.pgh.pa.us wrote: My Salesforce colleague Thomas Fanghaenel observed that the TAP tests for pg_basebackup fail when run in a sufficiently deeply-nested directory tree. The cause appears to be that we rely on standard tar format to represent the symlink for a tablespace, and POSIX tar format has a hard-wired restriction of 99 bytes in a symlink's expansion. What do we want to do about this? I think a minimum expectation would be for pg_basebackup to notice and complain when it's trying to create an unworkably long symlink entry, but it would be far better if we found a way to cope instead. One way to cope with such a situation could be that during backup we create a backup symlink file which contains listing of symlinks and then archive recovery recreates it. Basically this is the solution (patch), I have proposed for Windows [1]. [1] - https://commitfest.postgresql.org/action/patch_view?id=1512 With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] [TODO] Track number of files ready to be archived in pg_stat_archiver
Julien, The following is an initial review: * Applies cleanly to master (f330a6d). * Regression tests updated and pass, including 'check-world'. * Documentation updated and builds successfully. * Might want to consider replacing the following magic number with a constant or perhaps calculated value. + int basenamelen = (int) strlen(rlde-d_name) - 6; * Wouldn't it be easier, or perhaps more reliable to use strrchr() with the following instead? + strcmp(rlde-d_name + basenamelen, .ready) == 0) char *extension = strrchr(ride-d_name, '.'); ... strcmp(extension, .ready) == 0) I think this approach might also help to resolve the magic number above. For example: char *extension = strrchr(ride-d_name, '.'); int basenamelen = (int) strlen(ride-d_name) - strlen(extension); -Adam -- Adam Brightwell - adam.brightw...@crunchydatasolutions.com Database Engineer - www.crunchydatasolutions.com