Re: [HACKERS] REVIEW Single pass vacuum - take 2
On Wed, Sep 7, 2011 at 8:28 AM, Andy Colson a...@squeakycode.net wrote: On 08/22/2011 01:22 AM, Pavan Deolasee wrote: Hi Pavan, I tried to apply your patch to git master (as of just now) and it failed. I assume that's what I should be checking out, right? Yeah, seems like it bit-rotted. Please try the attached patch. I also fixed a typo and added some more comments as per suggestion by Jim. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com diff --git a/contrib/pageinspect/heapfuncs.c b/contrib/pageinspect/heapfuncs.c index fa50655..2c1ab2c 100644 --- a/contrib/pageinspect/heapfuncs.c +++ b/contrib/pageinspect/heapfuncs.c @@ -150,6 +150,7 @@ heap_page_items(PG_FUNCTION_ARGS) * many other ways, but at least we won't crash. */ if (ItemIdHasStorage(id) + !ItemIdIsDead(id) lp_len = sizeof(HeapTupleHeader) lp_offset == MAXALIGN(lp_offset) lp_offset + lp_len = raw_page_size) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 06db65d..cf65c05 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -3984,7 +3984,8 @@ log_heap_clean(Relation reln, Buffer buffer, OffsetNumber *redirected, int nredirected, OffsetNumber *nowdead, int ndead, OffsetNumber *nowunused, int nunused, - TransactionId latestRemovedXid) + TransactionId latestRemovedXid, + uint32 vacgen) { xl_heap_clean xlrec; uint8 info; @@ -3999,6 +4000,7 @@ log_heap_clean(Relation reln, Buffer buffer, xlrec.latestRemovedXid = latestRemovedXid; xlrec.nredirected = nredirected; xlrec.ndead = ndead; + xlrec.vacgen = vacgen; rdata[0].data = (char *) xlrec; rdata[0].len = SizeOfHeapClean; @@ -4300,6 +4302,7 @@ heap_xlog_clean(XLogRecPtr lsn, XLogRecord *record) int ndead; int nunused; Size freespace; + uint32 vacgen; /* * We're about to remove tuples. In Hot Standby mode, ensure that there's @@ -4332,6 +4335,7 @@ heap_xlog_clean(XLogRecPtr lsn, XLogRecord *record) nredirected = xlrec-nredirected; ndead = xlrec-ndead; + vacgen = xlrec-vacgen; end = (OffsetNumber *) ((char *) xlrec + record-xl_len); redirected = (OffsetNumber *) ((char *) xlrec + SizeOfHeapClean); nowdead = redirected + (nredirected * 2); @@ -4343,7 +4347,8 @@ heap_xlog_clean(XLogRecPtr lsn, XLogRecord *record) heap_page_prune_execute(buffer, redirected, nredirected, nowdead, ndead, - nowunused, nunused); + nowunused, nunused, + vacgen); freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */ diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c index 61f2ce4..ee64758 100644 --- a/src/backend/access/heap/pruneheap.c +++ b/src/backend/access/heap/pruneheap.c @@ -29,9 +29,12 @@ typedef struct TransactionId new_prune_xid; /* new prune hint value for page */ TransactionId latestRemovedXid; /* latest xid to be removed by this * prune */ + int already_dead; /* number of already dead line pointers */ + int nredirected; /* numbers of entries in arrays below */ int ndead; int nunused; + /* arrays that accumulate indexes of items to be changed */ OffsetNumber redirected[MaxHeapTuplesPerPage * 2]; OffsetNumber nowdead[MaxHeapTuplesPerPage]; @@ -123,8 +126,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, TransactionId OldestXmin) TransactionId ignore = InvalidTransactionId; /* return value not * needed */ - /* OK to prune */ - (void) heap_page_prune(relation, buffer, OldestXmin, true, ignore); + /* OK to prune - pass invalid vacuum generation number */ + (void) heap_page_prune(relation, buffer, OldestXmin, true, ignore, 0); } /* And release buffer lock */ @@ -151,13 +154,15 @@ heap_page_prune_opt(Relation relation, Buffer buffer, TransactionId OldestXmin) */ int heap_page_prune(Relation relation, Buffer buffer, TransactionId OldestXmin, -bool report_stats, TransactionId *latestRemovedXid) +bool report_stats, TransactionId *latestRemovedXid, +uint32 current_vacgen) { int ndeleted = 0; Page page = BufferGetPage(buffer); OffsetNumber offnum, maxoff; PruneState prstate; + uint32 last_finished_vacgen = RelationGetLastVacGen(relation); /* * Our strategy is to scan the page and make lists of items to change, @@ -173,6 +178,7 @@ heap_page_prune(Relation relation, Buffer buffer, TransactionId OldestXmin, prstate.new_prune_xid = InvalidTransactionId; prstate.latestRemovedXid = InvalidTransactionId; prstate.nredirected = prstate.ndead = prstate.nunused = 0; + prstate.already_dead = 0; memset(prstate.marked, 0, sizeof(prstate.marked)); /* Scan the page */ @@ -189,8 +195,26 @@ heap_page_prune(Relation relation, Buffer buffer, TransactionId OldestXmin, /* Nothing to do if slot is empty or already dead */ itemid = PageGetItemId(page, offnum); - if
Re: [HACKERS] cheaper snapshots redux
I wanted to clarify my understanding and have some doubts. What I'm thinking about instead is using a ring buffer with three pointers: a start pointer, a stop pointer, and a write pointer. When a transaction ends, we advance the write pointer, write the XIDs or a whole new snapshot into the buffer, and then advance the stop pointer. If we wrote a whole new snapshot, we advance the start pointer to the beginning of the data we just wrote. Someone who wants to take a snapshot must read the data between the start and stop pointers, and must then check that the write pointer hasn't advanced so far in the meantime that the data they read might have been overwritten before they finished reading it. Obviously, that's a little risky, since we'll have to do the whole thing over if a wraparound occurs, but if the ring buffer is large enough it shouldn't happen very often. Clarification -- 1. With the above, you want to reduce/remove the concurrency issue between the GetSnapshotData() [used at begining of sql command execution] and ProcArrayEndTransaction() [used at end transaction]. The concurrency issue is mainly ProcArrayLock which is taken by GetSnapshotData() in Shared mode and by ProcArrayEndTransaction() in X mode. There may be other instances for similar thing, but this the main thing which you want to resolve. 2. You want to resolve it by using ring buffer such that readers don't need to take any lock. Is my above understanding correct? Doubts 1. 2 Writers; Won't 2 different sessions who try to commit at same time will get the same write pointer. I assume it will be protected as even indicated in one of your replies as I understood? 2. 1 Reader, 1 Writter; It might be case that some body has written a new snapshot and advanced the stop pointer and at that point of time one reader came and read between start pointer and stop pointer. Now the reader will see as follows: snapshot, few XIDs, snapshot So will it handle this situation such that it will only read latest snapshot? 3. How will you detect overwrite. 4. Won't it effect if we don't update xmin everytime and just noting the committed XIDs. The reason I am asking is that it is used in tuple visibility check so with new idea in some cases instead of just returning from begining by checking xmin it has to go through the committed XID list. I understand that there may be less cases or the improvement by your idea can supesede this minimal effect. However some cases can be defeated. -- With Regards, Amit Kapila. *** This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient's) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas Sent: Sunday, August 28, 2011 7:17 AM To: Gokulakannan Somasundaram Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] cheaper snapshots redux On Sat, Aug 27, 2011 at 1:38 AM, Gokulakannan Somasundaram gokul...@gmail.com wrote: First i respectfully disagree with you on the point of 80MB. I would say that its very rare that a small system( with 1 GB RAM ) might have a long running transaction sitting idle, while 10 million transactions are sitting idle. Should an optimization be left, for the sake of a very small system to achieve high enterprise workloads? With the design where you track commit-visbility sequence numbers instead of snapshots, you wouldn't need 10 million transactions that were all still running. You would just need a snapshot that had been sitting around while 10 million transactions completed meanwhile. That having been said, I don't necessarily think that design is doomed. I just think it's going to be trickier to get working than the design I'm now hacking on, and a bigger change from what we do now. If this doesn't pan out, I might try that one, or something else. Second, if we make use of the memory mapped files, why should we think, that all the 80MB of data will always reside in memory? Won't they get paged out by the operating system, when it is in need of memory? Or do you have some specific OS in mind? No, I don't think it will all be in memory - but that's part of the performance calculation. If you need to check on the status of an XID and find that you need to read a page of data in from disk, that's going to be many orders of magnitude slower than anything we do with s snapshot now. Now, if you gain enough elsewhere, it could still
Re: [HACKERS] [GENERAL] pg_upgrade problem
On Tue, Sep 06, 2011 at 09:21:02PM -0400, Bruce Momjian wrote: Tom Lane wrote: hubert depesz lubaczewski dep...@depesz.com writes: Worked a bit to get the ltree problem down to smallest possible, repeatable, situation. I looked at this again and verified that indeed, commit 8eee65c996048848c20f6637c1d12b319a4ce244 introduced an incompatible change into the on-disk format of ltree columns: it widened ltree_level.len, which is one component of an ltree on disk. So the crash is hardly surprising. I think that the only thing pg_upgrade could do about it is refuse to upgrade when ltree columns are present in an 8.3 database. I'm not sure though how you'd identify contrib/ltree versus some random user-defined type named ltree. It is actually easy to do using the attached patch. I check for the functions that support the data type and check of they are from an 'ltree' shared object. I don't check actual user table type names in this case. While it will prevent failures in future, it doesn't solve my problem now :( Will try to do it via: - drop indexes on ltree - convert ltree to text - upgrade - convert text to ltree - create indexes on ltree Best regards, depesz -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held
On Tue, Aug 23, 2011 at 12:15:23PM -0400, Robert Haas wrote: On Mon, Aug 22, 2011 at 3:31 AM, daveg da...@sonic.net wrote: So far I've got: - affects system tables - happens very soon after process startup - in 8.4.7 and 9.0.4 - not likely to be hardware or OS related - happens in clusters for period of a few second to many minutes I'll work on printing the LOCK and LOCALLOCK when it happens, but it's hard to get downtime to pick up new builds. Any other ideas on getting to the bottom of this? I've been thinking this one over, and doing a little testing. I'm still stumped, but I have a few thoughts. What that error message is really saying is that the LOCALLOCK bookkeeping doesn't match the PROCLOCK bookkeeping; it doesn't tell us which one is to blame. ... My second thought is that perhaps a process is occasionally managing to exit without fully cleaning up the associated PROCLOCK entry. At first glance, it appears that this would explain the observed symptoms. A new backend gets the PGPROC belonging to the guy who didn't clean up after himself, hits the error, and disconnects, sticking himself right back on to the head of the SHM_QUEUE where the next connection will inherit the same PGPROC and hit the same problem. But it's not clear to me what could cause the system to get into this state in the first place, or how it would eventually right itself. It might be worth kludging up your system to add a test to InitProcess() to verify that all of the myProcLocks SHM_QUEUEs are either NULL or empty, along the lines of the attached patch (which assumes that assertions are enabled; otherwise, put in an elog() of some sort). Actually, I wonder if we shouldn't move all the SHMQueueInit() calls for myProcLocks to InitProcGlobal() rather than doing it over again every time someone calls InitProcess(). Besides being a waste of cycles, it's probably less robust this way. If there somehow are leftovers in one of those queues, the next successful call to LockReleaseAll() ought to clean up the mess, but of course there's no chance of that working if we've nuked the queue pointers. I did this in the elog flavor as we don't build production images with asserts. It has been running on all hosts for a few days. Today it hit the extra checks in initproc. 00:02:32.782 8626 [unknown] [unknown] LOG: connection received: host=bk0 port=42700 00:02:32.783 8627 [unknown] [unknown] LOG: connection received: host=op2 port=45876 00:02:32.783 8627 d61 apps FATAL: Initprocess myProclocks[4] not empty: queue 0x2ae6b4b895f8 (prev 0x2ae6b4a2b558, next 0x2ae6b4a2b558) 00:02:32.783 8626 d35 postgres LOG: connection authorized: user=postgres database=c35 00:02:32.783 21535 LOG: server process (PID 8627) exited with exit code 1 00:02:32.783 21535 LOG: terminating any other active server processes 00:02:32.783 8626 c35 postgres WARNING: terminating connection because of crash of another server process The patch that produced this is attached. If you can think of anything I can add to this to help I'd be happy to do so. Also, can I clean this up and continue somehow? Maybe clear the queue instead having to have a restart? Or is there a way to just pause this proc here, maybe mark it not to be used and exit, or just to sleep forever so I can debug later? Thanks -dg -- David Gould da...@sonic.net 510 536 1443510 282 0869 If simplicity worked, the world would be overrun with insects. --- postgresql-9.0.4/src/backend/storage/lmgr/proc.c2011-04-14 20:15:53.0 -0700 +++ postgresql-9.0.4.dg/src/backend/storage/lmgr/proc.c 2011-08-23 17:30:03.505176019 -0700 @@ -323,7 +323,15 @@ MyProc-waitLock = NULL; MyProc-waitProcLock = NULL; for (i = 0; i NUM_LOCK_PARTITIONS; i++) + { + SHM_QUEUE *queue = (MyProc-myProcLocks[i]); + if (! (!queue-prev || queue-prev == queue || + !queue-next || queue-next == queue) + ) + elog(FATAL, Initprocess myProclocks[%d] not empty: queue %p (prev %p, next %p) , + i, queue, queue-prev, queue-next); SHMQueueInit((MyProc-myProcLocks[i])); + } MyProc-recoveryConflictPending = false; /* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascaded standby message
On Wed, Sep 7, 2011 at 03:44, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Sep 7, 2011 at 5:22 AM, Simon Riggs si...@2ndquadrant.com wrote: Fujii - the original goal here was to avoid having a unexplained disconnection in the logs. The only way to do this cleanly is to have a disconnection reason passed to the walsender, rather than just a blind signal. Looks like we need to multiplex or other mechanism. That's an idea. But what about the patch that I proposed before? http://archives.postgresql.org/pgsql-hackers/2011-08/msg00816.php Seems like a good solution to me - I hadn't noticed that patch before posting my complaint. I don't think it's a problem if it logs it multiple times when it happens - I think it's a much bigger problem that it logs it when it didn't actually do anything. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascaded standby message
On Wed, Sep 7, 2011 at 2:44 AM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Sep 7, 2011 at 5:22 AM, Simon Riggs si...@2ndquadrant.com wrote: Fujii - the original goal here was to avoid having a unexplained disconnection in the logs. The only way to do this cleanly is to have a disconnection reason passed to the walsender, rather than just a blind signal. Looks like we need to multiplex or other mechanism. That's an idea. But what about the patch that I proposed before? http://archives.postgresql.org/pgsql-hackers/2011-08/msg00816.php Thanks for that. Committed. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] savepoint commit performance
On Tue, Sep 6, 2011 at 9:18 PM, Simon Riggs si...@2ndquadrant.com wrote: I'm back now and will act as advertised. I've revoked the performance aspect. The difference between release and commit has been maintained since it makes the code easier to understand. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascaded standby message
On 7 September 2011 11:56, Simon Riggs si...@2ndquadrant.com wrote: On Wed, Sep 7, 2011 at 2:44 AM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Sep 7, 2011 at 5:22 AM, Simon Riggs si...@2ndquadrant.com wrote: Fujii - the original goal here was to avoid having a unexplained disconnection in the logs. The only way to do this cleanly is to have a disconnection reason passed to the walsender, rather than just a blind signal. Looks like we need to multiplex or other mechanism. That's an idea. But what about the patch that I proposed before? http://archives.postgresql.org/pgsql-hackers/2011-08/msg00816.php Thanks for that. Committed. http://www.postgresql.org/mailpref/pgsql-hackers This appears to be the patch submitted to the commitfest. https://commitfest.postgresql.org/action/patch_view?id=630 Can this now be marked as committed? -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Cascaded standby message
On Wed, Sep 7, 2011 at 9:10 PM, Thom Brown t...@linux.com wrote: On 7 September 2011 11:56, Simon Riggs si...@2ndquadrant.com wrote: That's an idea. But what about the patch that I proposed before? http://archives.postgresql.org/pgsql-hackers/2011-08/msg00816.php Thanks for that. Committed. Thanks! This appears to be the patch submitted to the commitfest. https://commitfest.postgresql.org/action/patch_view?id=630 Can this now be marked as committed? Yes. I did that. Thanks! Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] Fix Leaky View Problem
On 24 August 2011 13:38, Kohei Kaigai kohei.kai...@emea.nec.com wrote: The (2) is new stuff from the revision in commit-fest 1st. It enables to supply NOLEAKY option on CREATE FUNCTION statement, then the function is allowed to distribute across security barrier. Only superuser can set this option. NOLEAKY doesn't really sound appropriate as it sounds like pidgin English. Also, it could be read as Don't allow leaks in this function. Could we instead use something like TRUSTED or something akin to it being allowed to do more than safer functions? It then describes its level of behaviour rather than what it promises not to do. -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company
[HACKERS] problem of win32.mak
Hi Magnus-san and Bruce-san. I am sorry in a very late reaction... Is it enough for a 9.1release? libpq of bcc32 and win32 has a problem. == error of nmake == ライブラリ .\Release\libpqdll.lib とオブジェクト .\Release\libpqdll.exp を作成中 libpq.lib(fe-connect.obj) : error LNK2019: 未解決の外部シンボル _inet_net_ntop が関数 _connectFailureMessage で参照されました。 libpq.lib(getaddrinfo.obj) : error LNK2001: 外部シンボル _inet_net_ntop は未解決です。 libpq.lib(fe-connect.obj) : error LNK2019: 未解決の外部シンボル _pg_get_encoding_from_locale が関数 _PQsetClientEncoding で参照されました。 .\Release\libpq.dll : fatal error LNK1120: 外部参照 2 が未解決です。 == end == Please take this into consideration. Thanks. Regards, Hiroshi Saito *** src/interfaces/libpq/bcc32.mak.old Fri Aug 19 06:23:13 2011 --- src/interfaces/libpq/bcc32.mak Wed Sep 7 21:50:33 2011 *** *** 80,85 --- 80,87 -@erase $(INTDIR)\inet_aton.obj -@erase $(INTDIR)\crypt.obj -@erase $(INTDIR)\noblock.obj + -@erase $(INTDIR)\chklocale.obj + -@erase $(INTDIR)\inet_net_ntop.obj -@erase $(INTDIR)\md5.obj -@erase $(INTDIR)\ip.obj -@erase $(INTDIR)\fe-auth.obj *** *** 123,128 --- 125,132 $(INTDIR)\inet_aton.obj \ $(INTDIR)\crypt.obj \ $(INTDIR)\noblock.obj \ + $(INTDIR)\chklocale.obj \ + $(INTDIR)\inet_net_ntop.obj \ $(INTDIR)\md5.obj \ $(INTDIR)\ip.obj \ $(INTDIR)\fe-auth.obj \ *** *** 220,225 --- 224,239 $(INTDIR)\noblock.obj : ..\..\port\noblock.c $(CPP) @ $(CPP_PROJ) ..\..\port\noblock.c + + + $(INTDIR)\chklocale.obj : ..\..\port\chklocale.c + $(CPP) @ + $(CPP_PROJ) ..\..\port\chklocale.c + + + $(INTDIR)\inet_net_ntop.obj : ..\..\port\inet_net_ntop.c + $(CPP) @ + $(CPP_PROJ) ..\..\port\inet_net_ntop.c $(INTDIR)\md5.obj : ..\..\backend\libpq\md5.c *** src/interfaces/libpq/win32.mak.old Fri Aug 19 06:23:13 2011 --- src/interfaces/libpq/win32.mak Wed Sep 7 21:45:22 2011 *** *** 87,92 --- 87,94 -@erase $(INTDIR)\inet_aton.obj -@erase $(INTDIR)\crypt.obj -@erase $(INTDIR)\noblock.obj + -@erase $(INTDIR)\chklocale.obj + -@erase $(INTDIR)\inet_net_ntop.obj -@erase $(INTDIR)\md5.obj -@erase $(INTDIR)\ip.obj -@erase $(INTDIR)\fe-auth.obj *** *** 132,137 --- 134,141 $(INTDIR)\inet_aton.obj \ $(INTDIR)\crypt.obj \ $(INTDIR)\noblock.obj \ + $(INTDIR)\chklocale.obj \ + $(INTDIR)\inet_net_ntop.obj \ $(INTDIR)\md5.obj \ $(INTDIR)\ip.obj \ $(INTDIR)\fe-auth.obj \ *** *** 258,263 --- 262,277 $(INTDIR)\noblock.obj : ..\..\port\noblock.c $(CPP) @ $(CPP_PROJ) ..\..\port\noblock.c + + + $(INTDIR)\chklocale.obj : ..\..\port\chklocale.c + $(CPP) @ + $(CPP_PROJ) ..\..\port\chklocale.c + + + $(INTDIR)\inet_net_ntop.obj : ..\..\port\inet_net_ntop.c + $(CPP) @ + $(CPP_PROJ) ..\..\port\inet_net_ntop.c $(INTDIR)\md5.obj : ..\..\backend\libpq\md5.c -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] Fix Leaky View Problem
2011/9/7 Thom Brown t...@linux.com: On 24 August 2011 13:38, Kohei Kaigai kohei.kai...@emea.nec.com wrote: The (2) is new stuff from the revision in commit-fest 1st. It enables to supply NOLEAKY option on CREATE FUNCTION statement, then the function is allowed to distribute across security barrier. Only superuser can set this option. NOLEAKY doesn't really sound appropriate as it sounds like pidgin English. Also, it could be read as Don't allow leaks in this function. Could we instead use something like TRUSTED or something akin to it being allowed to do more than safer functions? It then describes its level of behaviour rather than what it promises not to do. Thanks for your comment. I'm not a native English specker, so it is helpful. TRUSTED sounds meaningful for me, however, it is confusable with a concept of trusted procedure in label-based MAC. It is not only SELinux, Oracle's label based security also uses this term to mean a procedure that switches user's credential during its execution. http://download.oracle.com/docs/cd/B28359_01/network.111/b28529/storproc.htm So, how about CREDIBLE, instead of TRUSTED? Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] Fix Leaky View Problem
On 7 September 2011 14:34, Kohei KaiGai kai...@kaigai.gr.jp wrote: 2011/9/7 Thom Brown t...@linux.com: On 24 August 2011 13:38, Kohei Kaigai kohei.kai...@emea.nec.com wrote: The (2) is new stuff from the revision in commit-fest 1st. It enables to supply NOLEAKY option on CREATE FUNCTION statement, then the function is allowed to distribute across security barrier. Only superuser can set this option. NOLEAKY doesn't really sound appropriate as it sounds like pidgin English. Also, it could be read as Don't allow leaks in this function. Could we instead use something like TRUSTED or something akin to it being allowed to do more than safer functions? It then describes its level of behaviour rather than what it promises not to do. Thanks for your comment. I'm not a native English specker, so it is helpful. TRUSTED sounds meaningful for me, however, it is confusable with a concept of trusted procedure in label-based MAC. It is not only SELinux, Oracle's label based security also uses this term to mean a procedure that switches user's credential during its execution. http://download.oracle.com/docs/cd/B28359_01/network.111/b28529/storproc.htm So, how about CREDIBLE, instead of TRUSTED? I can't say I'm keen on that alternative, but I'm probably not the one to participate in bike-shedding here, so I'll leave comment to you hackers. :) -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held
On Wed, Sep 7, 2011 at 5:16 AM, daveg da...@sonic.net wrote: On Tue, Aug 23, 2011 at 12:15:23PM -0400, Robert Haas wrote: On Mon, Aug 22, 2011 at 3:31 AM, daveg da...@sonic.net wrote: So far I've got: - affects system tables - happens very soon after process startup - in 8.4.7 and 9.0.4 - not likely to be hardware or OS related - happens in clusters for period of a few second to many minutes I'll work on printing the LOCK and LOCALLOCK when it happens, but it's hard to get downtime to pick up new builds. Any other ideas on getting to the bottom of this? I've been thinking this one over, and doing a little testing. I'm still stumped, but I have a few thoughts. What that error message is really saying is that the LOCALLOCK bookkeeping doesn't match the PROCLOCK bookkeeping; it doesn't tell us which one is to blame. ... My second thought is that perhaps a process is occasionally managing to exit without fully cleaning up the associated PROCLOCK entry. At first glance, it appears that this would explain the observed symptoms. A new backend gets the PGPROC belonging to the guy who didn't clean up after himself, hits the error, and disconnects, sticking himself right back on to the head of the SHM_QUEUE where the next connection will inherit the same PGPROC and hit the same problem. But it's not clear to me what could cause the system to get into this state in the first place, or how it would eventually right itself. It might be worth kludging up your system to add a test to InitProcess() to verify that all of the myProcLocks SHM_QUEUEs are either NULL or empty, along the lines of the attached patch (which assumes that assertions are enabled; otherwise, put in an elog() of some sort). Actually, I wonder if we shouldn't move all the SHMQueueInit() calls for myProcLocks to InitProcGlobal() rather than doing it over again every time someone calls InitProcess(). Besides being a waste of cycles, it's probably less robust this way. If there somehow are leftovers in one of those queues, the next successful call to LockReleaseAll() ought to clean up the mess, but of course there's no chance of that working if we've nuked the queue pointers. I did this in the elog flavor as we don't build production images with asserts. It has been running on all hosts for a few days. Today it hit the extra checks in initproc. 00:02:32.782 8626 [unknown] [unknown] LOG: connection received: host=bk0 port=42700 00:02:32.783 8627 [unknown] [unknown] LOG: connection received: host=op2 port=45876 00:02:32.783 8627 d61 apps FATAL: Initprocess myProclocks[4] not empty: queue 0x2ae6b4b895f8 (prev 0x2ae6b4a2b558, next 0x2ae6b4a2b558) 00:02:32.783 8626 d35 postgres LOG: connection authorized: user=postgres database=c35 00:02:32.783 21535 LOG: server process (PID 8627) exited with exit code 1 00:02:32.783 21535 LOG: terminating any other active server processes 00:02:32.783 8626 c35 postgres WARNING: terminating connection because of crash of another server process The patch that produced this is attached. If you can think of anything I can add to this to help I'd be happy to do so. Also, can I clean this up and continue somehow? Maybe clear the queue instead having to have a restart? Or is there a way to just pause this proc here, maybe mark it not to be used and exit, or just to sleep forever so I can debug later? I think what we really need to know here is: when the debugging code you injected here fires, what happened to the previous owner of that PGPROC that caused it to not clean up its state properly? The PGPROC that 8627 inherited in the above example is obviously messed up - but what did the last guy do that messed it up? It would be nice to log the address of the PGPROC in every log message here - then you could go back and look to see whether the last guy terminated in some unusual way. In the meantime, could you could look back a couple of minutes from the time of the above occurrence and see if there are any FATAL errors, or usual ERRORs? After spending some time staring at the code, I do have one idea as to what might be going on here. When a backend is terminated, ShutdownPostgres() calls AbortOutOfAnyTransaction() and then LockReleaseAll(USER_LOCKMETHOD, true). The second call releases all user locks, and the first one (or so we suppose) releases all normal locks as part of aborting the transaction. But what if there's no transaction in progress? In that case, AbortOutOfAnyTransaction() won't do anything - which is fine as far as transaction-level locks go, because we shouldn't be holding any of those anyway if there's no transaction in progress. However, if we hold a session-level lock at that point, then we'd orphan it. We don't make much use of session locks. As far as I can see, they are used by (1) VACUUM, (2) CREATE INDEX CONCURRENTLY, (3) ALTER DATABASE .. SET TABLESPACE, and (4) on standby
Re: [HACKERS] Rectifying wrong Date outputs
Applied, with a function rename. The only odd case we have left is: test= select to_date('079', 'YYY'); to_date 1979-01-01 (1 row) (Note the zero is ignored.) I can't see an easy way to fix this and continue to be easily documented. --- Bruce Momjian wrote: Bruce Momjian wrote: Piyush Newe wrote: Hi, I was randomly testing some date related stuff on PG observed that the outputs were wrong. e.g. postgres=# SELECT TO_DATE('01-jan-2010', 'DD-MON-YY'); to_date 3910-01-01 - Look at this (1 row) postgres=# SELECT TO_DATE('01-jan-2010', 'DD-MON-'); to_date 2010-01-01 (1 row) I have done some work on this problem, and have developed the attached patch. It genarates the output in the final column of this table: Oracle PostgreSQL With PG Patch 1 TO_DATE('01-jan-1', 'DD-MON-Y')01-JAN-2011 01-JAN-2001 01-JAN-2001+ 2 TO_DATE('01-jan-1', 'DD-MON-YY') 01-JAN-2001 01-JAN-2001 01-JAN-2001 3 TO_DATE('01-jan-1', 'DD-MON-YYY') 01-JAN-2001 01-JAN-2001 01-JAN-2001 4 TO_DATE('01-jan-1', 'DD-MON-') 01-JAN-0001 01-JAN-0001 01-JAN-0001 5 TO_DATE('01-jan-10', 'DD-MON-Y') Error 01-JAN-2010 01-JAN-2010 6 TO_DATE('01-jan-10', 'DD-MON-YY') 01-JAN-2010 01-JAN-2010 01-JAN-2010 7 TO_DATE('01-jan-10', 'DD-MON-YYY') 01-JAN-2010 01-JAN-2010 01-JAN-2010 8 TO_DATE('01-jan-10', 'DD-MON-')01-JAN-0010 01-JAN-0010 01-JAN-0010 9 TO_DATE('01-jan-067', 'DD-MON-Y') Error 01-JAN-2067 01-JAN-2067 10 TO_DATE('01-jan-111', 'DD-MON-YY') 01-JAN-0111 01-JAN-2011 01-JAN-2111*+ 11 TO_DATE('01-jan-678', 'DD-MON-YYY')01-JAN-2678 01-JAN-1678 01-JAN-1678+ 12 TO_DATE('01-jan-001', 'DD-MON-') 01-JAN-0001 01-JAN-0001 01-JAN-0001 13 TO_DATE('01-jan-2010', 'DD-MON-Y') Error 01-JAN-4010 01-JAN-2010* 14 TO_DATE('01-jan-2010', 'DD-MON-YY')01-JAN-2010 01-JAN-3910 01-JAN-2010* 15 TO_DATE('01-jan-2010', 'DD-MON-YYY') Error 01-JAN-3010 01-JAN-2010* 16 TO_DATE('01-jan-2010', 'DD-MON-') 01-JAN-2010 01-JAN-2010 01-JAN-2010 In an attempt to make the to_date/to_timestamp behavior documentable, I have modified the patch to have dates adjust toward the year 2020, and added code so if four digits are supplied, we don't do any adjustment. Here is the current odd behavior, which is fixed by the patch: test= select to_date('222', 'YYY'); to_date -01-01 (1 row) test= select to_date('0222', 'YYY'); to_date -01-01 (1 row) If they supply a full 4-digit year, it seems we should honor that, even for YYY. still does no adjustment, and I doubt we want to change that: test= select to_date('222', ''); to_date 0222-01-01 (1 row) test= select to_date('0222', ''); to_date 0222-01-01 (1 row) -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + [ text/x-diff is unsupported, treating like TEXT/PLAIN ] diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml new file mode 100644 index c03dd6c..282bb0d *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1 *** 5550, --- 5550,5564 listitem para +If the year format specification is less than four digits, e.g. +literalYYY/, and the supplied year is less than four digits, +the year will be adjusted to be nearest to year 2020, e.g. +literal95/ becomes 1995. + /para + /listitem + + listitem + para The literal/literal conversion from string to typetimestamp/type or typedate/type has a restriction when processing years with more than 4 digits. You must use some non-digit character or template after literal/literal, diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c new file mode 100644 index 726a1f4..1a3ec1c *** a/src/backend/utils/adt/formatting.c --- b/src/backend/utils/adt/formatting.c *** static void dump_node(FormatNode *node, *** 964,969 --- 964,970 static char *get_th(char *num, int type); static char *str_numth(char *dest,
Re: [HACKERS] [v9.2] Fix Leaky View Problem
On Wed, Sep 7, 2011 at 9:39 AM, Thom Brown t...@linux.com wrote: On 7 September 2011 14:34, Kohei KaiGai kai...@kaigai.gr.jp wrote: 2011/9/7 Thom Brown t...@linux.com: On 24 August 2011 13:38, Kohei Kaigai kohei.kai...@emea.nec.com wrote: The (2) is new stuff from the revision in commit-fest 1st. It enables to supply NOLEAKY option on CREATE FUNCTION statement, then the function is allowed to distribute across security barrier. Only superuser can set this option. NOLEAKY doesn't really sound appropriate as it sounds like pidgin English. Also, it could be read as Don't allow leaks in this function. Could we instead use something like TRUSTED or something akin to it being allowed to do more than safer functions? It then describes its level of behaviour rather than what it promises not to do. Thanks for your comment. I'm not a native English specker, so it is helpful. TRUSTED sounds meaningful for me, however, it is confusable with a concept of trusted procedure in label-based MAC. It is not only SELinux, Oracle's label based security also uses this term to mean a procedure that switches user's credential during its execution. http://download.oracle.com/docs/cd/B28359_01/network.111/b28529/storproc.htm So, how about CREDIBLE, instead of TRUSTED? I can't say I'm keen on that alternative, but I'm probably not the one to participate in bike-shedding here, so I'll leave comment to you hackers. :) I think TRUSTED actually does a reasonably good job capturing what we're after here, although I do share a bit of KaiGai's nervousness about terminological confusion. Still, I'd be inclined to go that way if we can't come up with anything better. CREDIBLE is definitely the wrong idea: that means believable, which sounds more like a statement about the function's results than about its side-effects. I thought about TACITURN, since we need the error messages to not be excessively informative, but that doesn't do a good job characterizing the hazard created by side-effects, or the potential for abuse due to - for example - deliberate division by zero. I also thought about PURE, which is a term that's sometimes used to describe code that throws no errors and has no side effects, and comes pretty close to our actual requirement here, but doesn't necessarily convey that a security concern is involved. Yet another idea would be to use a variant of TRUSTED, such as TRUSTWORTHY, just to avoid confusion with the idea of a trusted procedure, but I'm not that excited about that idea despite have no real specific gripe with it other than length. So at the moment I am leaning toward TRUSTED. Anyone else want to bikeshed? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REVIEW proposal: a validator for configuration files
Andy Colson a...@squeakycode.net writes: Where did the other warnings go? Its right though, line 570 is bad. It also seems to have killed the server. I have not gotten through the history of messages regarding this patch, but is it supposed to kill the server if there is a syntax error in the config file? The historical behavior is that a configuration file error detected during postmaster startup should prevent the server from starting, but an error detected during reload should only result in a LOG message and the reload not occurring. I don't believe anyone will accept a patch that causes the server to quit on a failed reload. There has however been some debate about the exact extent of ignoring bad values during reload --- currently the theory is ignore the whole file if anything is wrong, but there's some support for applying all non-bad values as long as the overall file syntax is okay. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] Fix Leaky View Problem
Robert Haas robertmh...@gmail.com wrote: Anyone else want to bikeshed? I'm not sure they beat TRUSTED, but: SECURE OPAQUE -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held
Robert Haas robertmh...@gmail.com writes: After spending some time staring at the code, I do have one idea as to what might be going on here. When a backend is terminated, ShutdownPostgres() calls AbortOutOfAnyTransaction() and then LockReleaseAll(USER_LOCKMETHOD, true). The second call releases all user locks, and the first one (or so we suppose) releases all normal locks as part of aborting the transaction. But what if there's no transaction in progress? In that case, AbortOutOfAnyTransaction() won't do anything - which is fine as far as transaction-level locks go, because we shouldn't be holding any of those anyway if there's no transaction in progress. However, if we hold a session-level lock at that point, then we'd orphan it. We don't make much use of session locks. As far as I can see, they are used by (1) VACUUM, (2) CREATE INDEX CONCURRENTLY, (3) ALTER DATABASE .. SET TABLESPACE, and (4) on standby servers, redo of DROP DATABASE actions. Any chance one of those died or was killed off around the time this happened? I don't believe this theory at all, because if that were the issue, we'd have heard about it long since. The correct theory has to involve a very-little-used triggering condition. At the moment I'm wondering about advisory (userspace) locks ... Dave, do your apps use any of those? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] Fix Leaky View Problem
2011/9/7 Robert Haas robertmh...@gmail.com: On Wed, Sep 7, 2011 at 9:39 AM, Thom Brown t...@linux.com wrote: On 7 September 2011 14:34, Kohei KaiGai kai...@kaigai.gr.jp wrote: 2011/9/7 Thom Brown t...@linux.com: On 24 August 2011 13:38, Kohei Kaigai kohei.kai...@emea.nec.com wrote: The (2) is new stuff from the revision in commit-fest 1st. It enables to supply NOLEAKY option on CREATE FUNCTION statement, then the function is allowed to distribute across security barrier. Only superuser can set this option. NOLEAKY doesn't really sound appropriate as it sounds like pidgin English. Also, it could be read as Don't allow leaks in this function. Could we instead use something like TRUSTED or something akin to it being allowed to do more than safer functions? It then describes its level of behaviour rather than what it promises not to do. Thanks for your comment. I'm not a native English specker, so it is helpful. TRUSTED sounds meaningful for me, however, it is confusable with a concept of trusted procedure in label-based MAC. It is not only SELinux, Oracle's label based security also uses this term to mean a procedure that switches user's credential during its execution. http://download.oracle.com/docs/cd/B28359_01/network.111/b28529/storproc.htm So, how about CREDIBLE, instead of TRUSTED? I can't say I'm keen on that alternative, but I'm probably not the one to participate in bike-shedding here, so I'll leave comment to you hackers. :) I think TRUSTED actually does a reasonably good job capturing what we're after here, although I do share a bit of KaiGai's nervousness about terminological confusion. Still, I'd be inclined to go that way if we can't come up with anything better. CREDIBLE is definitely the wrong idea: that means believable, which sounds more like a statement about the function's results than about its side-effects. I thought about TACITURN, since we need the error messages to not be excessively informative, but that doesn't do a good job characterizing the hazard created by side-effects, or the potential for abuse due to - for example - deliberate division by zero. I also thought about PURE, which is a term that's sometimes used to describe code that throws no errors and has no side effects, and comes pretty close to our actual requirement here, but doesn't necessarily convey that a security concern is involved. Yet another idea would be to use a variant of TRUSTED, such as TRUSTWORTHY, just to avoid confusion with the idea of a trusted procedure, but I'm not that excited about that idea despite have no real specific gripe with it other than length. So at the moment I am leaning toward TRUSTED. Anyone else want to bikeshed? I also become leaning toward TRUSTED, although we still have a bit risk of terminology confusion, because I assume it is quite rare case to set this option by DBA and we will able to expect DBAs who try to this option have correct knowledge about background of the leaky-view problem. I'll submit the patch again. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.1rc1 regression: EXPLAIN on information_schema.key_column_usage
Hi list, It seems I have found a regression in PostgreSQL 9.1rc1 (from 9.0). In many cases, running the following query fails: db=# EXPLAIN select * from information_schema.key_column_usage; ERROR: record type has not been registered However, this is not always reproducible. It seems to occur more likely on an empty database. At first I suspected uninitialized memory access somewhere, but valgrind does not highlight anything obvious. Trying to isolate the part of the view that causes the error also didn't yield any results. Similarly, information_schema.triggered_update_columns also occasionally returns this error, but less reliably. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] Fix Leaky View Problem
On 2011-09-07 16:02, Kevin Grittner wrote: Robert Haasrobertmh...@gmail.com wrote: Anyone else want to bikeshed? I'm not sure they beat TRUSTED, but: SECURE OPAQUE SAVE -- Yeb -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] problem of win32.mak
Hiroshi Saito wrote: Hi Magnus-san and Bruce-san. I am sorry in a very late reaction... Is it enough for a 9.1release? libpq of bcc32 and win32 has a problem. == error of nmake == ? .\Release\libpqdll.lib ??? .\Release\libpqdll.exp libpq.lib(fe-connect.obj) : error LNK2019: ?? _inet_net_ntop ??? _connectFailureMessage ? libpq.lib(getaddrinfo.obj) : error LNK2001: ?? _inet_net_ntop ??? libpq.lib(fe-connect.obj) : error LNK2019: ?? _pg_get_encoding_from_locale ??? _PQsetClientEncoding ? .\Release\libpq.dll : fatal error LNK1120: 2 ??? == end == Yes, this is a problem. I will apply your patch to 9.1 and head unless I hear otherwise in a few hours. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] problem of win32.mak
On Wed, Sep 7, 2011 at 16:43, Bruce Momjian br...@momjian.us wrote: Hiroshi Saito wrote: Hi Magnus-san and Bruce-san. I am sorry in a very late reaction... Is it enough for a 9.1release? libpq of bcc32 and win32 has a problem. == error of nmake == ? .\Release\libpqdll.lib ??? .\Release\libpqdll.exp libpq.lib(fe-connect.obj) : error LNK2019: ?? _inet_net_ntop ??? _connectFailureMessage ? libpq.lib(getaddrinfo.obj) : error LNK2001: ?? _inet_net_ntop ??? libpq.lib(fe-connect.obj) : error LNK2019: ?? _pg_get_encoding_from_locale ??? _PQsetClientEncoding ? .\Release\libpq.dll : fatal error LNK1120: 2 ??? == end == Yes, this is a problem. I will apply your patch to 9.1 and head unless I hear otherwise in a few hours. Looks fine to me, go ahead. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] error building head on OS X 10.7.1
Get the following error configure:3274: ccache gcc -V 5 llvm-gcc-4.2: argument to `-V' is missing should be ccache gcc -v 5 Dave Dave Cramer dave.cramer(at)credativ(dot)ca http://www.credativ.ca -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgresql.conf archive_command example
On Tue, Sep 6, 2011 at 10:11 PM, Fujii Masao masao.fu...@gmail.com wrote: On Sat, Sep 3, 2011 at 5:10 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: (2) It should copy, not move, with protection against overwriting an existing file. I agree that basically archive_command should not overwrite an existing file. But if the size of existing file is less than 16MB, it should do that. Otherwise, that WAL file would be lost forever. I think best practice in this case is that if you ever find an existing file with the same name already in place, you should error and investigate. We don't ship around partially completed WAL files, and finding an existing one probably means something went wrong. (Of course, we use rsync instead of copy/move, so we have some better guarantees about this). I have another feature request; (5) Maybe not in the initial version, but eventually it might be nice to support calling posix_fadvise(POSIX_FADV_DONTNEED) after copying a WAL file. Can you go into more details on how you envision this working. I'm mostly curious because I think rsync might already support this, which would make it easy to incorporate. On a side note, seeing this thread hasn't died, I'd encourage everyone to take another look at OmniPITR, https://github.com/omniti-labs/omnipitr. It's postgresql licensed, solves a lot of the problems listed here, and I think makes for a good example for people who want to accomplish more advanced awl management goals. So far the biggest criticism we've gotten is that it wasn't written in python, for some of you that might be a plus though ;-) Robert Treat play: xzilla.net work: omniti.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] problem of win32.mak
Hi. We are looking forward to the great release. thanks again!! Regards, Hiroshi Saito (2011/09/07 23:46), Magnus Hagander wrote: On Wed, Sep 7, 2011 at 16:43, Bruce Momjianbr...@momjian.us wrote: Hiroshi Saito wrote: Hi Magnus-san and Bruce-san. I am sorry in a very late reaction... Is it enough for a 9.1release? libpq of bcc32 and win32 has a problem. == error of nmake == ? .\Release\libpqdll.lib ??? .\Release\libpqdll.exp libpq.lib(fe-connect.obj) : error LNK2019: ?? _inet_net_ntop ??? _connectFailureMessage ? libpq.lib(getaddrinfo.obj) : error LNK2001: ?? _inet_net_ntop ??? libpq.lib(fe-connect.obj) : error LNK2019: ?? _pg_get_encoding_from_locale ??? _PQsetClientEncoding ? .\Release\libpq.dll : fatal error LNK1120: 2 ??? == end == Yes, this is a problem. I will apply your patch to 9.1 and head unless I hear otherwise in a few hours. Looks fine to me, go ahead. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.1rc1 regression: EXPLAIN on information_schema.key_column_usage
Marti Raudsepp ma...@juffo.org writes: It seems I have found a regression in PostgreSQL 9.1rc1 (from 9.0). In many cases, running the following query fails: db=# EXPLAIN select * from information_schema.key_column_usage; ERROR: record type has not been registered Looks like I overlooked a case in get_name_for_var_field. Thanks, will fix. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] OPERATOR FAMILY and pg_dump
Hi, If a basic operator family is created, e.g., create operator family of1 using btree; shouldn't pg_dump include this in its output? If not, why? Joe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] error building head on OS X 10.7.1
Dave Cramer p...@fastcrypt.com writes: Get the following error configure:3274: ccache gcc -V 5 llvm-gcc-4.2: argument to `-V' is missing should be ccache gcc -v 5 That's not an error, that's normal behavior. Mind you, I have no idea why autoconf chooses to do this when it's already tried -v, but this is not the source of whatever problem you're having. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] error building head on OS X 10.7.1
On Wed, Sep 7, 2011 at 11:16 AM, Tom Lane t...@sss.pgh.pa.us wrote: Dave Cramer p...@fastcrypt.com writes: Get the following error configure:3274: ccache gcc -V 5 llvm-gcc-4.2: argument to `-V' is missing should be ccache gcc -v 5 That's not an error, that's normal behavior. Mind you, I have no idea why autoconf chooses to do this when it's already tried -v, but this is not the source of whatever problem you're having. regards, tom lane Well the problem is that buildfarm can't build HEAD on OS X 10.7.1 Dave Cramer dave.cramer(at)credativ(dot)ca http://www.credativ.ca -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] error building head on OS X 10.7.1
Dave Cramer p...@fastcrypt.com writes: Well the problem is that buildfarm can't build HEAD on OS X 10.7.1 HEAD builds fine on my 10.7.1 laptop. If you're referring to orangutan, it's not failing on that, it's failing here: configure:3301: checking for C compiler default output file name configure:3323: ccache gcc /opt/local/include conftest.c 5 ld: in /opt/local/include, can't map file, errno=22 for architecture x86_64 collect2: ld returned 1 exit status which is probably because you have a malformed value of CFLAGS: 'config_env' = { 'CFLAGS' = '/opt/local/include', regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] Fix Leaky View Problem
On Wed, Sep 07, 2011 at 02:09:15PM +0100, Thom Brown wrote: On 24 August 2011 13:38, Kohei Kaigai kohei.kai...@emea.nec.com wrote: The (2) is new stuff from the revision in commit-fest 1st. It enables to supply NOLEAKY option on CREATE FUNCTION statement, then the function is allowed to distribute across security barrier. Only superuser can set this option. NOLEAKY doesn't really sound appropriate as it sounds like pidgin English. Also, it could be read as Don't allow leaks in this function. Could we instead use something like TRUSTED or something akin to it being allowed to do more than safer functions? It then describes its level of behaviour rather than what it promises not to do. I liked NOLEAKY for its semantics, though I probably would have spelled it LEAKPROOF. PostgreSQL will trust the function to implement a specific, relatively-unintuitive security policy. We want the function implementers to read that policy closely and not rely on any intuition they have about the trusted term of art. Our use of TRUSTED in CREATE LANGUAGE is more conventional, I think, as is the trusted nature of SECURITY DEFINER. In that vein, folks who actually need SECURITY DEFINER might first look at TRUSTED; NOLEAKY would not attract the same unwarranted attention. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] Fix Leaky View Problem
Noah Misch n...@leadboat.com writes: I liked NOLEAKY for its semantics, though I probably would have spelled it LEAKPROOF. PostgreSQL will trust the function to implement a specific, relatively-unintuitive security policy. We want the function implementers to read that policy closely and not rely on any intuition they have about the trusted term of art. Our use of TRUSTED in CREATE LANGUAGE is more conventional, I think, as is the trusted nature of SECURITY DEFINER. In that vein, folks who actually need SECURITY DEFINER might first look at TRUSTED; NOLEAKY would not attract the same unwarranted attention. I agree that TRUSTED is a pretty bad choice here because of the high probability that people will think it means something else than what it really means. LEAKPROOF isn't too bad. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] OPERATOR FAMILY and pg_dump
Joe Abbate j...@freedomcircle.com writes: If a basic operator family is created, e.g., create operator family of1 using btree; shouldn't pg_dump include this in its output? If not, why? Quoting from the pg_dump source code: * We want to dump the opfamily only if (1) it contains loose operators * or functions, or (2) it contains an opclass with a different name or * owner. Otherwise it's sufficient to let it be created during creation * of the contained opclass, and not dumping it improves portability of * the dump. I guess if it contains no opclasses and no operators either, this code won't dump it, but isn't it rather useless in such a case? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] error building head on OS X 10.7.1
On Wed, Sep 7, 2011 at 11:32 AM, Tom Lane t...@sss.pgh.pa.us wrote: Dave Cramer p...@fastcrypt.com writes: Well the problem is that buildfarm can't build HEAD on OS X 10.7.1 HEAD builds fine on my 10.7.1 laptop. If you're referring to orangutan, it's not failing on that, it's failing here: configure:3301: checking for C compiler default output file name configure:3323: ccache gcc /opt/local/include conftest.c 5 ld: in /opt/local/include, can't map file, errno=22 for architecture x86_64 collect2: ld returned 1 exit status which is probably because you have a malformed value of CFLAGS: 'config_env' = { 'CFLAGS' = '/opt/local/include', regards, tom lane Thanks Tom, that was it. Dave Cramer dave.cramer(at)credativ(dot)ca http://www.credativ.ca -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] Fix Leaky View Problem
2011/9/7 Tom Lane t...@sss.pgh.pa.us: Noah Misch n...@leadboat.com writes: I liked NOLEAKY for its semantics, though I probably would have spelled it LEAKPROOF. PostgreSQL will trust the function to implement a specific, relatively-unintuitive security policy. We want the function implementers to read that policy closely and not rely on any intuition they have about the trusted term of art. Our use of TRUSTED in CREATE LANGUAGE is more conventional, I think, as is the trusted nature of SECURITY DEFINER. In that vein, folks who actually need SECURITY DEFINER might first look at TRUSTED; NOLEAKY would not attract the same unwarranted attention. I agree that TRUSTED is a pretty bad choice here because of the high probability that people will think it means something else than what it really means. LEAKPROOF isn't too bad. It seems to me LEAKPROOF is never confusable for everyone, and no conflicts with other concept, although it was not in my vocaburary. If no better idea anymore, I'll submit the patch again; with LEAKPROOF keyword. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] Fix Leaky View Problem
On 09/07/2011 12:05 PM, Tom Lane wrote: I agree that TRUSTED is a pretty bad choice here because of the high probability that people will think it means something else than what it really means. Agreed. LEAKPROOF isn't too bad. It's fairly opaque unless you know the context, although that might be said of some of our other terms too. Someone coming across it is going to think What would it leak? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] OPERATOR FAMILY and pg_dump
On 09/07/2011 12:10 PM, Tom Lane wrote: I guess if it contains no opclasses and no operators either, this code won't dump it, but isn't it rather useless in such a case? Yes, I think it's useless, like a book cover without the contents, but ISTM it should still be dumped (perhaps someone started defining a family and forgot about it--oh, the puns that come to mind). Joe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] Fix Leaky View Problem
2011/9/7 Andrew Dunstan and...@dunslane.net: LEAKPROOF isn't too bad. It's fairly opaque unless you know the context, although that might be said of some of our other terms too. Someone coming across it is going to think What would it leak? It is introduced in the documentation. I'll add a point to this chapter in the introduction of this keyword. http://developer.postgresql.org/pgdocs/postgres/rules-privileges.html Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] typo
Hi, While updating the translation I noticed a typo in src/backend/commands/collationcmds.c circa line 126. parameter \lc_collate\ parameter must be specified -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] First bug introduced by pgrminclude
I have fixed a bug introduced by pgrminclude. It turns out that CppAsString2() will expand any symbol, even one that is undefined, so pgrminclude thought that catalog/catversion was not needed. This is illustrated in the attached C file that outputs 1 and y. I have bumped the catalog version to force users to reload their tablespaces (or use pg_upgrade) because the tablespace directory names will not be expanded to the catalog version. I have modified pgrminclude to skip files that use CppAsString2(). -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/src/include/catalog/catalog.h b/src/include/catalog/catalog.h new file mode 100644 index 1e1e12d..e472e05 *** a/src/include/catalog/catalog.h --- b/src/include/catalog/catalog.h *** *** 14,19 --- 14,24 #ifndef CATALOG_H #define CATALOG_H + /* + * 'pgrminclude ignore' needed here because CppAsString2() does not throw + * an error if the symbol is not defined. + */ + #include catalog/catversion.h /* pgrminclude ignore */ #include catalog/pg_class.h #include storage/relfilenode.h #include utils/relcache.h diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h new file mode 100644 index f5c9797..f3c8bb4 *** a/src/include/catalog/catversion.h --- b/src/include/catalog/catversion.h *** *** 53,58 */ /* mmddN */ ! #define CATALOG_VERSION_NO 201108051 #endif --- 53,58 */ /* mmddN */ ! #define CATALOG_VERSION_NO 201109071 #endif #include stdio.h #include stdlib.h #include /pg/include/c.h #define x 1 #define x1 CppAsString2(x) #define x2 CppAsString2(y) int main(int argc, char **argv) { puts(x1); puts(x2); return 0; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] Fix Leaky View Problem
Andrew Dunstan and...@dunslane.net writes: On 09/07/2011 12:05 PM, Tom Lane wrote: LEAKPROOF isn't too bad. It's fairly opaque unless you know the context, although that might be said of some of our other terms too. Someone coming across it is going to think What would it leak? Well, the whole point is that we want people to RTFM instead of assuming they know what it means ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] custom variables and PGC_SUSET issue
Hello Andy Colson found a bug in GUC implementation. When we have a custom SUSET variable, like plpgsql.variable_conflikt, then setting this variable before plpgsql initalisation raises a exception, but it raise a exception when some plpgsql function is created. Try to execute a attached script - a set statement is ok, but CREATE FUNCTION fails. repeated setting this GUC raise a strange message postgres=# \i script.sql SET before create function psql:script.sql:13: ERROR: 42501: permission denied to set parameter plpgsql.variable_conflict LOCATION: set_config_option, guc.c:5208 after function postgres=# \i script.sql SET before create function psql:script.sql:13: ERROR: XX000: attempt to redefine parameter plpgsql.variable_conflict LOCATION: define_custom_variable, guc.c:6333 after function Regards Pavel Stehule --load 'plpgsql'; \set VERBOSITY verbose set plpgsql.variable_conflict = use_variable; \echo before create function create or replace function test1(a integer) returns integer as $$ begin return a+1; end; $$ language plpgsql; \echo after function -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] typo
Euler Taveira de Oliveira eu...@timbira.com writes: While updating the translation I noticed a typo in src/backend/commands/collationcmds.c circa line 126. parameter \lc_collate\ parameter must be specified Will fix, thanks for spotting it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] custom variables and PGC_SUSET issue
Excerpts from Pavel Stehule's message of mié sep 07 14:23:43 -0300 2011: Hello Andy Colson found a bug in GUC implementation. When we have a custom SUSET variable, like plpgsql.variable_conflikt, then setting this variable before plpgsql initalisation raises a exception, but it raise a exception when some plpgsql function is created. Try to execute a attached script - a set statement is ok, but CREATE FUNCTION fails. Another thing we detected some days ago is that a custom variable in a module not loaded by postmaster, does not seem to get reported appropriately when an invalid value is set on postgresql.conf and reloaded: backends report problems with DEBUG3, only postmaster uses LOG, but since postmaster isn't loading the module, nothing happens. (Maybe this is our bug but it doesn't seem like it.) -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] timezone GUC
On Tue, Sep 6, 2011 at 23:52, Robert Haas robertmh...@gmail.com wrote: On Tue, Sep 6, 2011 at 5:16 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: Robert Haas robertmh...@gmail.com writes: I am (and, I think, Alvaro is also) of the opinion that the behavior here is still not really right. I don't see a practical way to do better unless we can find a less horridly inefficient way of implementing identify_system_timezone(). Although there's always more than one way to skin a cat. Consider this idea: 1. The hard-wired default for timezone is always UTC (or something else not dependent on environment). 2. We put the identify_system_timezone work into initdb, and have it inject a non-default entry into postgresql.conf in the usual way if it can identify what the system zone is. 3. Run-time dependency on TZ environment disappears altogether. This basically means that instead of incurring that search on every postmaster start, we do it once at initdb. If you change the postmaster's timezone environment, well, you gotta go change postgresql.conf. IMO this would be less DBA-friendly in practice, but only very marginally so; and getting rid of the special initialization behavior of the timezone GUC might well be considered sufficient recompense. Seems reasonable to me... +1. I'm not sure I agree it's less DBA-friendly, really - given that it makes it more consistent.. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Don't truncate integer part in to_char for 'FM99.'
Hi, This patch fixes an edge case bug in the numeric to_char() function. When the numeric to_char format used fillmode (FM), didn't contain 0s and had a trailing dot, the integer part of the number was truncated in error. to_char(10, 'FM99.') used to return '1', after this patch it will return '10' This is known to affect the format() function in the mysqlcompat pgFoundry project. Regards, Marti From 848076a119c39782ef854ac940f14f5975430b4e Mon Sep 17 00:00:00 2001 From: Marti Raudsepp ma...@juffo.org Date: Wed, 7 Sep 2011 20:12:58 +0300 Subject: [PATCH] Don't truncate integer part in to_char for 'FM99.' When the numeric to_char format used fillmode (FM), didn't contain 0s and had a trailing dot, the integer part of the number was truncated in error. to_char(10, 'FM99.') used to return '1', now it will return '10' --- src/backend/utils/adt/formatting.c|6 +- src/test/regress/expected/numeric.out |6 ++ src/test/regress/sql/numeric.sql |2 ++ 3 files changed, 13 insertions(+), 1 deletions(-) diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c index 7efd988..37a819e 100644 --- a/src/backend/utils/adt/formatting.c +++ b/src/backend/utils/adt/formatting.c @@ -4460,7 +4460,11 @@ NUM_processor(FormatNode *node, NUMDesc *Num, char *inout, char *number, if (IS_FILLMODE(Np-Num)) { - if (IS_DECIMAL(Np-Num)) + /* + * Only call get_last_relevant_decnum if the number has fractional + * digits, otherwise the result is bogus. + */ + if (IS_DECIMAL(Np-Num) Np-Num-post 0) Np-last_relevant = get_last_relevant_decnum( Np-number + ((Np-Num-zero_end - Np-num_pre 0) ? diff --git a/src/test/regress/expected/numeric.out b/src/test/regress/expected/numeric.out index d9927b7..e12ab5b 100644 --- a/src/test/regress/expected/numeric.out +++ b/src/test/regress/expected/numeric.out @@ -1154,6 +1154,12 @@ SELECT '' AS to_char_23, to_char(val, '9.999')FROM num_data; | -2.493e+07 (10 rows) +SELECT '' AS to_char_24, to_char('100'::numeric, 'FM999.'); + to_char_24 | to_char ++- +| 100 +(1 row) + -- TO_NUMBER() -- SELECT '' AS to_number_1, to_number('-34,338,492', '99G999G999'); diff --git a/src/test/regress/sql/numeric.sql b/src/test/regress/sql/numeric.sql index a1435ec..d552526 100644 --- a/src/test/regress/sql/numeric.sql +++ b/src/test/regress/sql/numeric.sql @@ -764,6 +764,8 @@ SELECT '' AS to_char_21, to_char(val, '99SG99') FROM num_data; SELECT '' AS to_char_22, to_char(val, 'FM.999') FROM num_data; SELECT '' AS to_char_23, to_char(val, '9.999')FROM num_data; +SELECT '' AS to_char_24, to_char('100'::numeric, 'FM999.'); + -- TO_NUMBER() -- SELECT '' AS to_number_1, to_number('-34,338,492', '99G999G999'); -- 1.7.6.1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Large C files
On Tue, Sep 6, 2011 at 9:14 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: On 7 September 2011 01:18, Bruce Momjian br...@momjian.us wrote: I am confused how moving a function from one C file to another could cause breakage? I'm really concerned about silent breakage, however unlikely - that is also Simon and Robert's concern, and rightly so. If it's possible in principle to guard against a certain type of problem, we should do so. While this is a mechanical process, it isn't quite that mechanical a process - I would not expect this work to be strictly limited to simply spreading some functions around different files. Certainly, if there are any other factors at all that could disrupt things, a problem that does not cause a compiler warning or error is vastly more troublesome than one that does, and the most plausible such error is if a symbol is somehow resolved differently when the function is moved. I suppose that the simplest way that this could happen is probably by somehow having a variable of the same name and type appear twice, once as a static, the other time as a global. IMHO, when manipulating code at this sort of macro scale, it's good to be paranoid and exhaustive, particularly when it doesn't cost you anything anyway. Unlike when writing or fixing a bit of code at a time, which we're all used to, if the compiler doesn't tell you about it, your chances of catching the problem before a bug manifests itself are close to zero. I was less concerned about the breakage that might be caused by variables acquiring unintended referents - which should be unlikely anyway given reasonable variable naming conventions - and more concerned that the associated refactoring would break recovery. We have no recovery regression tests; that's not a good thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Large C files
Robert Haas wrote: IMHO, when manipulating code at this sort of macro scale, it's good to be paranoid and exhaustive, particularly when it doesn't cost you anything anyway. Unlike when writing or fixing a bit of code at a time, which we're all used to, if the compiler doesn't tell you about it, your chances of catching the problem before a bug manifests itself are close to zero. I was less concerned about the breakage that might be caused by variables acquiring unintended referents - which should be unlikely anyway given reasonable variable naming conventions - and more concerned that the associated refactoring would break recovery. We have no recovery regression tests; that's not a good thing. So we are talking about more than moving files between functions? Yes, it would be risky to restruction functions, but for someone who understand that code, it might be safe. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] custom variables and PGC_SUSET issue
Alvaro Herrera alvhe...@commandprompt.com writes: Another thing we detected some days ago is that a custom variable in a module not loaded by postmaster, does not seem to get reported appropriately when an invalid value is set on postgresql.conf and reloaded: backends report problems with DEBUG3, only postmaster uses LOG, but since postmaster isn't loading the module, nothing happens. This is just an instance of the general problem that the current design assumes all processes will have the same opinion about the validity of settings read from postgresql.conf. We discussed that back in July: http://archives.postgresql.org/pgsql-hackers/2011-07/msg00850.php but it wasn't clear to me that any consensus had been reached about how to change the behavior. I proposed that we should let processes adopt individual settings even if they thought other ones were invalid; that gets rid of some of the issues, but it doesn't really address how we should report problems when only some of the processes think there's a problem. I don't think we can just s/DEBUG3/LOG/, because of the log clutter that will result when they all think there's a problem. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] custom variables and PGC_SUSET issue
Pavel Stehule pavel.steh...@gmail.com writes: When we have a custom SUSET variable, like plpgsql.variable_conflikt, then setting this variable before plpgsql initalisation raises a exception, but it raise a exception when some plpgsql function is created. Try to execute a attached script - a set statement is ok, but CREATE FUNCTION fails. You can't set a custom SUSET variable in advance of loading the module, because the system has no way to know it should enforce superuser restrictions on setting it. For the most part, this is a good reason to avoid custom SUSET variables. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] custom variables and PGC_SUSET issue
On Wed, Sep 7, 2011 at 1:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Pavel Stehule pavel.steh...@gmail.com writes: When we have a custom SUSET variable, like plpgsql.variable_conflikt, then setting this variable before plpgsql initalisation raises a exception, but it raise a exception when some plpgsql function is created. Try to execute a attached script - a set statement is ok, but CREATE FUNCTION fails. You can't set a custom SUSET variable in advance of loading the module, because the system has no way to know it should enforce superuser restrictions on setting it. For the most part, this is a good reason to avoid custom SUSET variables. Apparently we haven't taken that advice ourselves? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] custom variables and PGC_SUSET issue
Robert Haas robertmh...@gmail.com writes: On Wed, Sep 7, 2011 at 1:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: You can't set a custom SUSET variable in advance of loading the module, because the system has no way to know it should enforce superuser restrictions on setting it. For the most part, this is a good reason to avoid custom SUSET variables. Apparently we haven't taken that advice ourselves? The ones in auto_explain and pg_stat_statements aren't too problematic because those modules are designed to be preloaded by the postmaster. We've avoided adding such variables in modules that aren't intended to be used that way. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] custom variables and PGC_SUSET issue
On Wed, Sep 7, 2011 at 2:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Sep 7, 2011 at 1:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: You can't set a custom SUSET variable in advance of loading the module, because the system has no way to know it should enforce superuser restrictions on setting it. For the most part, this is a good reason to avoid custom SUSET variables. Apparently we haven't taken that advice ourselves? The ones in auto_explain and pg_stat_statements aren't too problematic because those modules are designed to be preloaded by the postmaster. We've avoided adding such variables in modules that aren't intended to be used that way. What about plpgsql.variable_conflict? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Large C files
Bruce Momjian br...@momjian.us writes: Robert Haas wrote: I was less concerned about the breakage that might be caused by variables acquiring unintended referents - which should be unlikely anyway given reasonable variable naming conventions - and more concerned that the associated refactoring would break recovery. We have no recovery regression tests; that's not a good thing. So we are talking about more than moving files between functions? Yes, it would be risky to restruction functions, but for someone who understand that code, it might be safe. The pgrminclude-induced bug you just fixed shows a concrete way in which moving code from one file to another might silently break it, ie, it still compiles despite lack of definition of some symbol it's intended to see. Having said that, I tend to agree that xlog.c is getting so large and messy that it needs to be broken up. But I'm not in favor of breaking up files just because they're large, eg, ruleutils.c is not in need of such treatment. The problem with xlog.c is that it seems to be dealing with many more considerations than it originally did. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] custom variables and PGC_SUSET issue
Robert Haas robertmh...@gmail.com writes: On Wed, Sep 7, 2011 at 2:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: The ones in auto_explain and pg_stat_statements aren't too problematic because those modules are designed to be preloaded by the postmaster. We've avoided adding such variables in modules that aren't intended to be used that way. What about plpgsql.variable_conflict? That one's not really meant to be changed on the fly either; we have #variable_conflict instead. The reason why this is hard, and not just a bug to be fixed, is that it's not clear what to do. Part of the issue is that we don't remember whether the current setting was applied by a superuser or not, but even if we did remember that, what happens at LOAD time if it wasn't? Silently replacing the value isn't appealing, and neither are the other options. It's better to not have such a variable in the first place. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Don't truncate integer part in to_char for 'FM99.'
Marti Raudsepp ma...@juffo.org writes: This patch fixes an edge case bug in the numeric to_char() function. When the numeric to_char format used fillmode (FM), didn't contain 0s and had a trailing dot, the integer part of the number was truncated in error. to_char(10, 'FM99.') used to return '1', after this patch it will return '10' Hmm. I agree that this is a bug, but the proposed fix seems like a bit of a kluge. Wouldn't it be better to make get_last_relevant_decnum honor its contract, that is not delete any relevant digits? I'm thinking instead of this if (!p) p = num; when there is no decimal point it should do something like if (!p) return num + strlen(num) - 1; regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] pg_upgrade problem
Bruce Momjian wrote: Tom Lane wrote: hubert depesz lubaczewski dep...@depesz.com writes: Worked a bit to get the ltree problem down to smallest possible, repeatable, situation. I looked at this again and verified that indeed, commit 8eee65c996048848c20f6637c1d12b319a4ce244 introduced an incompatible change into the on-disk format of ltree columns: it widened ltree_level.len, which is one component of an ltree on disk. So the crash is hardly surprising. I think that the only thing pg_upgrade could do about it is refuse to upgrade when ltree columns are present in an 8.3 database. I'm not sure though how you'd identify contrib/ltree versus some random user-defined type named ltree. It is actually easy to do using the attached patch. I check for the functions that support the data type and check of they are from an 'ltree' shared object. I don't check actual user table type names in this case. Attached patch applied to 9.0, 9.1, and HEAD. Doc changes included. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c new file mode 100644 index 37c38c1..720f130 *** a/contrib/pg_upgrade/check.c --- b/contrib/pg_upgrade/check.c *** check_old_cluster(migratorContext *ctx, *** 72,77 --- 72,78 { old_8_3_check_for_name_data_type_usage(ctx, CLUSTER_OLD); old_8_3_check_for_tsquery_usage(ctx, CLUSTER_OLD); + old_8_3_check_ltree_usage(ctx, CLUSTER_OLD); if (ctx-check) { old_8_3_rebuild_tsvector_tables(ctx, true, CLUSTER_OLD); diff --git a/contrib/pg_upgrade/pg_upgrade.h b/contrib/pg_upgrade/pg_upgrade.h new file mode 100644 index 41c4b11..7a02fa1 *** a/contrib/pg_upgrade/pg_upgrade.h --- b/contrib/pg_upgrade/pg_upgrade.h *** void old_8_3_check_for_name_data_type_us *** 394,399 --- 394,401 Cluster whichCluster); void old_8_3_check_for_tsquery_usage(migratorContext *ctx, Cluster whichCluster); + void old_8_3_check_ltree_usage(migratorContext *ctx, + Cluster whichCluster); void old_8_3_rebuild_tsvector_tables(migratorContext *ctx, bool check_mode, Cluster whichCluster); void old_8_3_invalidate_hash_gin_indexes(migratorContext *ctx, diff --git a/contrib/pg_upgrade/version_old_8_3.c b/contrib/pg_upgrade/version_old_8_3.c new file mode 100644 index 6fcd61b..7e3a7aa *** a/contrib/pg_upgrade/version_old_8_3.c --- b/contrib/pg_upgrade/version_old_8_3.c *** old_8_3_check_for_tsquery_usage(migrator *** 204,209 --- 204,289 /* + * old_8_3_check_ltree_usage() + * 8.3 - 8.4 + * The internal ltree structure was changed in 8.4 so upgrading is impossible. + */ + void + old_8_3_check_ltree_usage(migratorContext *ctx, Cluster whichCluster) + { + ClusterInfo *active_cluster = (whichCluster == CLUSTER_OLD) ? + ctx-old : ctx-new; + int dbnum; + FILE *script = NULL; + bool found = false; + char output_path[MAXPGPATH]; + + prep_status(ctx, Checking for /contrib/ltree); + + snprintf(output_path, sizeof(output_path), %s/contrib_ltree.txt, + ctx-cwd); + + for (dbnum = 0; dbnum active_cluster-dbarr.ndbs; dbnum++) + { + PGresult *res; + bool db_used = false; + int ntups; + int rowno; + int i_nspname, + i_proname; + DbInfo *active_db = active_cluster-dbarr.dbs[dbnum]; + PGconn *conn = connectToServer(ctx, active_db-db_name, whichCluster); + + /* Find any functions coming from contrib/ltree */ + res = executeQueryOrDie(ctx, conn, + SELECT n.nspname, p.proname + FROM pg_catalog.pg_proc p, + pg_catalog.pg_namespace n + WHERE p.pronamespace = n.oid AND + p.probin = '$libdir/ltree'); + + ntups = PQntuples(res); + i_nspname = PQfnumber(res, nspname); + i_proname = PQfnumber(res, proname); + for (rowno = 0; rowno ntups; rowno++) + { + found = true; + if (script == NULL (script = fopen(output_path, w)) == NULL) + pg_log(ctx, PG_FATAL, Could not create necessary file: %s\n, output_path); + if (!db_used) + { + fprintf(script, Database: %s\n, active_db-db_name); + db_used = true; + } + fprintf(script, %s.%s\n, + PQgetvalue(res, rowno, i_nspname), + PQgetvalue(res, rowno, i_proname)); + } + + PQclear(res); + + PQfinish(conn); + } + + if (found) + { + fclose(script); + pg_log(ctx, PG_REPORT, fatal\n); + pg_log(ctx, PG_FATAL, + | Your installation contains the \ltree\ data type. This data type\n + | changed its internal storage format between your old and new clusters so this\n + | cluster cannot currently be upgraded. You can manually upgrade databases\n + | that use \contrib/ltree\ facilities and remove \contrib/ltree\ from the old\n + | cluster and restart the
Re: [HACKERS] custom variables and PGC_SUSET issue
Excerpts from Tom Lane's message of mié sep 07 14:49:45 -0300 2011: Alvaro Herrera alvhe...@commandprompt.com writes: Another thing we detected some days ago is that a custom variable in a module not loaded by postmaster, does not seem to get reported appropriately when an invalid value is set on postgresql.conf and reloaded: backends report problems with DEBUG3, only postmaster uses LOG, but since postmaster isn't loading the module, nothing happens. This is just an instance of the general problem that the current design assumes all processes will have the same opinion about the validity of settings read from postgresql.conf. We discussed that back in July: http://archives.postgresql.org/pgsql-hackers/2011-07/msg00850.php but it wasn't clear to me that any consensus had been reached about how to change the behavior. I proposed that we should let processes adopt individual settings even if they thought other ones were invalid; that gets rid of some of the issues, but it doesn't really address how we should report problems when only some of the processes think there's a problem. Yeah; in this particular case, the value is just plain invalid for everybody. I think it just happens that postmaster didn't see it because it was valid when it was started (i.e. the file got edited and a SIGHUP sent, introducing the invalid value but not adopted anywhere). I don't think we can just s/DEBUG3/LOG/, because of the log clutter that will result when they all think there's a problem. I was thinking that the solution would be to promote DEBUG3 to LOG in the case of a custom variable. This would be noisy as you say, but I don't see any other option; at least it would just be the custom variables. This didn't work for reasons that we haven't been investigated yet (we discovered this late last week and I've been busy with 9.1 translations). -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] custom variables and PGC_SUSET issue
On Wed, Sep 7, 2011 at 2:21 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Sep 7, 2011 at 2:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: The ones in auto_explain and pg_stat_statements aren't too problematic because those modules are designed to be preloaded by the postmaster. We've avoided adding such variables in modules that aren't intended to be used that way. What about plpgsql.variable_conflict? That one's not really meant to be changed on the fly either; we have #variable_conflict instead. The reason why this is hard, and not just a bug to be fixed, is that it's not clear what to do. Part of the issue is that we don't remember whether the current setting was applied by a superuser or not, but even if we did remember that, what happens at LOAD time if it wasn't? Silently replacing the value isn't appealing, and neither are the other options. It's better to not have such a variable in the first place. Yeah, I guess we don't have many good short-term options. I continue to think that this whole facility is mis-designed, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About the performance of startup after dropping many tables
Excerpts from Gan Jiadong's message of vie feb 18 03:42:02 -0300 2011: Hi, Thanks for your reply. Of course, we will think about whether 4 relations dropping is reasonable. In fact, this happens in a very special scenario . But when we analyzed this issue, we found the PG code can be rewritten to achieve better performance. Or we can say the arithmetic of this part is not good enough. For example, by doing the refactoring as we done, the startup time can be reduced from 3 minutes to 8 seconds, It is quite a great improvement, especially for the systems with low TTR (time to recovery) requirement. There is any problem or risk to change this part of code as we suggested? The only way to know would be to show the changes. If you were to submit the patch, and assuming we agree on the design and implementation, we could even consider including it (or, more likely, some derivate of it). -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] custom variables and PGC_SUSET issue
On 09/07/2011 03:18 PM, Robert Haas wrote: Yeah, I guess we don't have many good short-term options. I continue to think that this whole facility is mis-designed, though. I agree. I have tripped over it several times. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] custom variables and PGC_SUSET issue
Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Tom Lane's message of mié sep 07 14:49:45 -0300 2011: I don't think we can just s/DEBUG3/LOG/, because of the log clutter that will result when they all think there's a problem. I was thinking that the solution would be to promote DEBUG3 to LOG in the case of a custom variable. This would be noisy as you say, but I don't see any other option; at least it would just be the custom variables. That's not much of a fix because (a) the behavior is still pretty undesirable, and (b) it supposes that this sort of failure can only occur for custom variables. The previous discussion arose from a different case entirely --- IIRC, it was from trying to specify client_encoding in postgresql.conf, which the postmaster was happy with but some backends were not because they had a database_encoding for which there was no conversion. There are probably a bunch of other possibilities out there that we haven't hit yet, and if not today, there certainly will be more in the future. So I'm not very excited by a proposed fix that makes assumptions about the exact reason why a process rejects a particular GUC value. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Log crashed backend's query (activity string)
Excerpts from Robert Haas's message of mar sep 06 19:57:07 -0300 2011: On Tue, Sep 6, 2011 at 6:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Sep 6, 2011 at 5:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: And I doubt that the goal is worth taking risks for. I am unable to count the number of times that I have had a customer come to me and say well, the backend crashed. And I go look at their logs and I have no idea what happened. gdb and print debug_query_string? Surely you're kidding. These are customer systems which I frequently don't even have access to. They don't always have gdb installed (sometimes they are Windows systems) and if they do the customer isn't likely to know how to use it, and even if they do they don't think the better of us for needing such a tool to troubleshoot a crash. I'm with Robert on this ... been there, got that look. TBH, I'm very unclear what could cause the postmaster to go belly-up copying a bounded amount of data out of shared memory for logging purposes only. It's surely possible to make the code safe against any sequence of bytes that might be found there. A mishandled encoding conversion could be problematic, so that needs to be carefully considered (perhaps just shut off unconditionally). -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Large C files
On Wed, Sep 7, 2011 at 7:12 PM, Tom Lane t...@sss.pgh.pa.us wrote: Bruce Momjian br...@momjian.us writes: Robert Haas wrote: I was less concerned about the breakage that might be caused by variables acquiring unintended referents - which should be unlikely anyway given reasonable variable naming conventions - and more concerned that the associated refactoring would break recovery. We have no recovery regression tests; that's not a good thing. So we are talking about more than moving files between functions? Yes, it would be risky to restruction functions, but for someone who understand that code, it might be safe. The pgrminclude-induced bug you just fixed shows a concrete way in which moving code from one file to another might silently break it, ie, it still compiles despite lack of definition of some symbol it's intended to see. Having said that, I tend to agree that xlog.c is getting so large and messy that it needs to be broken up. But I'm not in favor of breaking up files just because they're large, eg, ruleutils.c is not in need of such treatment. The problem with xlog.c is that it seems to be dealing with many more considerations than it originally did. I agree as well, though we've spawned many new files and directories in the last 7 years. Almost nothing has gone in there that didn't need to. As long as we accept its not a priority, I'll do some work to slide things away and we can do it over time. Please lets not waste effort on refactoring efforts in mid dev cycle. Having this done by someone without good experience is just going to waste all of our time and sneak bugs into something that does actually work rather well. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] problem of win32.mak
Hiroshi Saito wrote: Hi Magnus-san and Bruce-san. I am sorry in a very late reaction... Is it enough for a 9.1release? libpq of bcc32 and win32 has a problem. == error of nmake == ? .\Release\libpqdll.lib ??? .\Release\libpqdll.exp libpq.lib(fe-connect.obj) : error LNK2019: ?? _inet_net_ntop ??? _connectFailureMessage ? libpq.lib(getaddrinfo.obj) : error LNK2001: ?? _inet_net_ntop ??? libpq.lib(fe-connect.obj) : error LNK2019: ?? _pg_get_encoding_from_locale ??? _PQsetClientEncoding ? .\Release\libpq.dll : fatal error LNK1120: 2 ??? == end == Please take this into consideration. Applied and backpatched to 9.1. Thanks. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Log crashed backend's query (activity string)
Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Robert Haas's message of mar sep 06 19:57:07 -0300 2011: TBH, I'm very unclear what could cause the postmaster to go belly-up copying a bounded amount of data out of shared memory for logging purposes only. It's surely possible to make the code safe against any sequence of bytes that might be found there. A mishandled encoding conversion could be problematic, so that needs to be carefully considered (perhaps just shut off unconditionally). That, and the question of exactly what makes the amount bounded, and probably six other things that could go wrong. But I'm sure Andrew won't be pleased with a proposal to inject unknown-encoding data into the logs. I remain of the opinion that this needs to be kept out of the postmaster. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Log crashed backend's query (activity string)
On Wed, Sep 7, 2011 at 3:42 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: TBH, I'm very unclear what could cause the postmaster to go belly-up copying a bounded amount of data out of shared memory for logging purposes only. It's surely possible to make the code safe against any sequence of bytes that might be found there. A mishandled encoding conversion could be problematic, so that needs to be carefully considered (perhaps just shut off unconditionally). It's not really a problem for the postmaster if something just plain old fails. Where we get into trouble is if it manages to (a) crash, (b) take an excessive amount of time to complete, or (c) screw up the postmaster state in some way we can't recover from. But if any of those are an issue then, yeah, just shut it off. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Large C files
Simon Riggs si...@2ndquadrant.com writes: Please lets not waste effort on refactoring efforts in mid dev cycle. Say what? When else would you have us do it? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Log crashed backend's query (activity string)
On Wed, Sep 7, 2011 at 4:00 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Robert Haas's message of mar sep 06 19:57:07 -0300 2011: TBH, I'm very unclear what could cause the postmaster to go belly-up copying a bounded amount of data out of shared memory for logging purposes only. It's surely possible to make the code safe against any sequence of bytes that might be found there. A mishandled encoding conversion could be problematic, so that needs to be carefully considered (perhaps just shut off unconditionally). That, and the question of exactly what makes the amount bounded, and The fact that it's a fixed-size chunk of shmem. We won't copy more bytes than the size of the buffer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Log crashed backend's query (activity string)
Robert Haas robertmh...@gmail.com writes: On Wed, Sep 7, 2011 at 3:42 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: A mishandled encoding conversion could be problematic, so that needs to be carefully considered (perhaps just shut off unconditionally). It's not really a problem for the postmaster if something just plain old fails. Where we get into trouble is if it manages to (a) crash, (b) take an excessive amount of time to complete, or (c) screw up the postmaster state in some way we can't recover from. But if any of those are an issue then, yeah, just shut it off. Keep in mind that in the postmaster, elog(ERROR) *is* a crash. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held
On Wed, Sep 07, 2011 at 10:20:24AM -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: After spending some time staring at the code, I do have one idea as to what might be going on here. When a backend is terminated, ShutdownPostgres() calls AbortOutOfAnyTransaction() and then LockReleaseAll(USER_LOCKMETHOD, true). The second call releases all user locks, and the first one (or so we suppose) releases all normal locks as part of aborting the transaction. But what if there's no transaction in progress? In that case, AbortOutOfAnyTransaction() won't do anything - which is fine as far as transaction-level locks go, because we shouldn't be holding any of those anyway if there's no transaction in progress. However, if we hold a session-level lock at that point, then we'd orphan it. We don't make much use of session locks. As far as I can see, they are used by (1) VACUUM, (2) CREATE INDEX CONCURRENTLY, (3) ALTER DATABASE .. SET TABLESPACE, and (4) on standby servers, redo of DROP DATABASE actions. Any chance one of those died or was killed off around the time this happened? I don't believe this theory at all, because if that were the issue, we'd have heard about it long since. The correct theory has to involve a very-little-used triggering condition. At the moment I'm wondering about advisory (userspace) locks ... Dave, do your apps use any of those? Yes, we make extensive use of advisory locks. That was my thought too when Robert mentioned session level locks. I'm happy to add any additional instrumentation, but my client would be happier to actually run it if there was a way to recover from this without an unplanned outage. Is there something I can do when the patch detects the problem to be able to continue without a restart? Is is save to just reset the proclock queue? I don't think they would mind leaking locks, for instance, and a later planned restart to clear it up as much as they mind unscheduled downtime. Thank -dg -- David Gould da...@sonic.net 510 536 1443510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held
On Wed, Sep 7, 2011 at 4:22 PM, daveg da...@sonic.net wrote: Yes, we make extensive use of advisory locks. That was my thought too when Robert mentioned session level locks. I'm happy to add any additional instrumentation, but my client would be happier to actually run it if there was a way to recover from this without an unplanned outage. Is there something I can do when the patch detects the problem to be able to continue without a restart? Well, basically, you want to do the same thing that LockRelease() would do - remove the PROCLOCK from the SHM_QUEUE and delete it from the hash table, adjusting the counts in the LOCK object as appropriate. If you just ignore the failure, you'll get the blah blah blah is already held messages you were having before. Tom's right to be skeptical of my theory, because it would require a CHECK_FOR_INTERRUPTS() outside of a transaction block in one of the pathways that use session-level locks, and I can't find one. OTOH, I'm skeptical of the theory that this involves userlocks, because this whole thread started because of a complaint about lock 0/1260/0 already being held. That ain't no userlock. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Log crashed backend's query (activity string)
On Wed, Sep 7, 2011 at 4:15 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Sep 7, 2011 at 3:42 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: A mishandled encoding conversion could be problematic, so that needs to be carefully considered (perhaps just shut off unconditionally). It's not really a problem for the postmaster if something just plain old fails. Where we get into trouble is if it manages to (a) crash, (b) take an excessive amount of time to complete, or (c) screw up the postmaster state in some way we can't recover from. But if any of those are an issue then, yeah, just shut it off. Keep in mind that in the postmaster, elog(ERROR) *is* a crash. Right... but a function that returns -1 to indicate that something didn't work should be OK, as long as whatever it does is otherwise extremely boring. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Don't truncate integer part in to_char for 'FM99.'
On Wed, Sep 7, 2011 at 21:37, Tom Lane t...@sss.pgh.pa.us wrote: Hmm. I agree that this is a bug, but the proposed fix seems like a bit of a kluge. Wouldn't it be better to make get_last_relevant_decnum honor its contract, that is not delete any relevant digits? You're right, it was a kludge. Here's an improved version. I need to take a NUMProc* argument to do that right, because that's how it knows how many '0's to keep from the format. What do you think? Regards, Marti From d0264d8fe8179716bfd58e12201e456387ca5469 Mon Sep 17 00:00:00 2001 From: Marti Raudsepp ma...@juffo.org Date: Wed, 7 Sep 2011 20:12:58 +0300 Subject: [PATCH] Don't truncate integer part in to_char for 'FM99.' When the numeric to_char format used fillmode (FM), didn't contain 0s and had a trailing dot, the integer part of the number was truncated in error. to_char(10, 'FM99.') used to return '1', now it will return '10' --- src/backend/utils/adt/formatting.c| 29 - src/test/regress/expected/numeric.out |6 ++ src/test/regress/sql/numeric.sql |2 ++ 3 files changed, 24 insertions(+), 13 deletions(-) diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c index 7efd988..eada4a8 100644 --- a/src/backend/utils/adt/formatting.c +++ b/src/backend/utils/adt/formatting.c @@ -979,7 +979,7 @@ static char *fill_str(char *str, int c, int max); static FormatNode *NUM_cache(int len, NUMDesc *Num, text *pars_str, bool *shouldFree); static char *int_to_roman(int number); static void NUM_prepare_locale(NUMProc *Np); -static char *get_last_relevant_decnum(char *num); +static char *get_last_relevant_decnum(NUMProc *Np); static void NUM_numpart_from_char(NUMProc *Np, int id, int plen); static void NUM_numpart_to_char(NUMProc *Np, int id); static char *NUM_processor(FormatNode *node, NUMDesc *Num, char *inout, char *number, @@ -3911,17 +3911,26 @@ NUM_prepare_locale(NUMProc *Np) * -- */ static char * -get_last_relevant_decnum(char *num) +get_last_relevant_decnum(NUMProc *Np) { char *result, - *p = strchr(num, '.'); + *p; #ifdef DEBUG_TO_FROM_CHAR elog(DEBUG_elog_output, get_last_relevant_decnum()); #endif - if (!p) - p = num; + if (Np-Num-zero_end Np-num_pre) + /* Keep digits according to '0's in the format */ + p = Np-number + Np-Num-zero_end - Np-num_pre; + else + { + p = strchr(Np-number, '.'); + + if (!p) + return NULL; + } + result = p; while (*(++p)) @@ -4458,14 +4467,8 @@ NUM_processor(FormatNode *node, NUMDesc *Num, char *inout, char *number, { Np-num_pre = plen; - if (IS_FILLMODE(Np-Num)) - { - if (IS_DECIMAL(Np-Num)) -Np-last_relevant = get_last_relevant_decnum( - Np-number + - ((Np-Num-zero_end - Np-num_pre 0) ? - Np-Num-zero_end - Np-num_pre : 0)); - } + if (IS_FILLMODE(Np-Num) IS_DECIMAL(Np-Num)) + Np-last_relevant = get_last_relevant_decnum(Np); if (Np-sign_wrote == FALSE Np-num_pre == 0) ++Np-num_count; diff --git a/src/test/regress/expected/numeric.out b/src/test/regress/expected/numeric.out index d9927b7..e12ab5b 100644 --- a/src/test/regress/expected/numeric.out +++ b/src/test/regress/expected/numeric.out @@ -1154,6 +1154,12 @@ SELECT '' AS to_char_23, to_char(val, '9.999')FROM num_data; | -2.493e+07 (10 rows) +SELECT '' AS to_char_24, to_char('100'::numeric, 'FM999.'); + to_char_24 | to_char ++- +| 100 +(1 row) + -- TO_NUMBER() -- SELECT '' AS to_number_1, to_number('-34,338,492', '99G999G999'); diff --git a/src/test/regress/sql/numeric.sql b/src/test/regress/sql/numeric.sql index a1435ec..d552526 100644 --- a/src/test/regress/sql/numeric.sql +++ b/src/test/regress/sql/numeric.sql @@ -764,6 +764,8 @@ SELECT '' AS to_char_21, to_char(val, '99SG99') FROM num_data; SELECT '' AS to_char_22, to_char(val, 'FM.999') FROM num_data; SELECT '' AS to_char_23, to_char(val, '9.999')FROM num_data; +SELECT '' AS to_char_24, to_char('100'::numeric, 'FM999.'); + -- TO_NUMBER() -- SELECT '' AS to_number_1, to_number('-34,338,492', '99G999G999'); -- 1.7.6.1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Don't truncate integer part in to_char for 'FM99.'
Marti Raudsepp ma...@juffo.org writes: On Wed, Sep 7, 2011 at 21:37, Tom Lane t...@sss.pgh.pa.us wrote: Hmm. I agree that this is a bug, but the proposed fix seems like a bit of a kluge. Wouldn't it be better to make get_last_relevant_decnum honor its contract, that is not delete any relevant digits? You're right, it was a kludge. Here's an improved version. I need to take a NUMProc* argument to do that right, because that's how it knows how many '0's to keep from the format. Yeah, after fooling with it myself I saw that it was the interconnection of the don't-suppress-'0'-format-positions logic with the find-the-last- nonzero-digit logic that was confusing matters. (formatting.c may not be the most spaghetti-ish code I've ever worked with, but it's up there.) However, I don't think that inserting knowledge of the other consideration into get_last_relevant_decnum is really the way to make things cleaner. Also, the way yours is set up, I'm dubious that it does the right thing when the last '0' specifier is to the left of the decimal point. I'm currently testing this patch: diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c index 7efd988362889346af86c642f6ee660a4ae1b3d2..a7000250b0363165bee5697ad72036aad28b830e 100644 *** a/src/backend/utils/adt/formatting.c --- b/src/backend/utils/adt/formatting.c *** NUM_prepare_locale(NUMProc *Np) *** 3908,3913 --- 3908,3916 /* -- * Return pointer of last relevant number after decimal point *12.0500 -- last relevant is '5' + *12. -- last relevant is '.' + * If there is no decimal point, return NULL (which will result in same + * behavior as if FM hadn't been specified). * -- */ static char * *** get_last_relevant_decnum(char *num) *** 3921,3927 #endif if (!p) ! p = num; result = p; while (*(++p)) --- 3924,3931 #endif if (!p) ! return NULL; ! result = p; while (*(++p)) *** NUM_processor(FormatNode *node, NUMDesc *** 4458,4470 { Np-num_pre = plen; ! if (IS_FILLMODE(Np-Num)) { ! if (IS_DECIMAL(Np-Num)) ! Np-last_relevant = get_last_relevant_decnum( ! Np-number + ! ((Np-Num-zero_end - Np-num_pre 0) ? ! Np-Num-zero_end - Np-num_pre : 0)); } if (Np-sign_wrote == FALSE Np-num_pre == 0) --- 4462,4483 { Np-num_pre = plen; ! if (IS_FILLMODE(Np-Num) IS_DECIMAL(Np-Num)) { ! Np-last_relevant = get_last_relevant_decnum(Np-number); ! ! /* !* If any '0' specifiers are present, make sure we don't strip !* those digits. !*/ ! if (Np-last_relevant Np-Num-zero_end Np-num_pre) ! { ! char *last_zero; ! ! last_zero = Np-number + (Np-Num-zero_end - Np-num_pre); ! if (Np-last_relevant last_zero) ! Np-last_relevant = last_zero; ! } } if (Np-sign_wrote == FALSE Np-num_pre == 0) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held
Robert Haas robertmh...@gmail.com writes: Tom's right to be skeptical of my theory, because it would require a CHECK_FOR_INTERRUPTS() outside of a transaction block in one of the pathways that use session-level locks, and I can't find one. More to the point: session-level locks are released on error. The only way to get out of a transaction while still holding one is for the VACUUM-or-whichever-command code to deliberately commit and exit while still holding it. An error exit path would release the lock. OTOH, I'm skeptical of the theory that this involves userlocks, because this whole thread started because of a complaint about lock 0/1260/0 already being held. That ain't no userlock. Yeah, and for that matter it seems to let VACUUM off the hook too. If we assume that the reported object ID is non-corrupt (and if it's always the same, that seems like the way to bet) then this is a lock on pg_authid. Hmmm ... could the pathway involve an error exit from client authentication? We're still finding bugs in the 9.0 rewrite of auth-time database access. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Log crashed backend's query (activity string)
On Wed, Sep 7, 2011 at 22:42, Alvaro Herrera alvhe...@commandprompt.com wrote: A mishandled encoding conversion could be problematic, so that needs to be carefully considered (perhaps just shut off unconditionally). Are you referring to log file encoding or something else? The log file is already potentially mixed-encoding, as different databases may have different encodings and backends just dump bytes to it in their current encoding. pg_verify_mbstr() could potentially be used with noError=true if we can figure out the backend's current encoding, but that indeed sounds like something to avoid. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] timezone GUC
Magnus Hagander mag...@hagander.net writes: On Tue, Sep 6, 2011 at 23:52, Robert Haas robertmh...@gmail.com wrote: On Tue, Sep 6, 2011 at 5:16 PM, Tom Lane t...@sss.pgh.pa.us wrote: Although there's always more than one way to skin a cat. Consider this idea: 1. The hard-wired default for timezone is always UTC (or something else not dependent on environment). 2. We put the identify_system_timezone work into initdb, and have it inject a non-default entry into postgresql.conf in the usual way if it can identify what the system zone is. 3. Run-time dependency on TZ environment disappears altogether. This basically means that instead of incurring that search on every postmaster start, we do it once at initdb. If you change the postmaster's timezone environment, well, you gotta go change postgresql.conf. Seems reasonable to me... +1. I spent a bit of time on this idea last night. The most painful part actually seems to be translating identify_system_timezone to run in a non-backend environment (no elog, etc). The one thing I've run into that doesn't seem straightforward is to decide where to look for the timezone files. If we have --with-system-tzdata then of course it's a constant, but should we honor initdb's -L switch otherwise? And if so, how should we pass that into the pg_TZDIR code? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Log crashed backend's query (activity string)
Robert Haas robertmh...@gmail.com writes: On Wed, Sep 7, 2011 at 4:15 PM, Tom Lane t...@sss.pgh.pa.us wrote: Keep in mind that in the postmaster, elog(ERROR) *is* a crash. Right... but a function that returns -1 to indicate that something didn't work should be OK, as long as whatever it does is otherwise extremely boring. The more functionality you put in the postmaster, the more likely it is to trip over an elog(ERROR) somewhere. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Log crashed backend's query (activity string)
Excerpts from Marti Raudsepp's message of mié sep 07 18:09:32 -0300 2011: On Wed, Sep 7, 2011 at 22:42, Alvaro Herrera alvhe...@commandprompt.com wrote: A mishandled encoding conversion could be problematic, so that needs to be carefully considered (perhaps just shut off unconditionally). Are you referring to log file encoding or something else? The log file is already potentially mixed-encoding, as different databases may have different encodings and backends just dump bytes to it in their current encoding. I am referring to the fact that whatever the backend puts in shared memory is going to be in its encoding setting, which may not necessarily match the postmaster's. And if it doesn't, the postmaster might try to convert it using settings not valid for the string, possibly leading to crashes. I remember we had bugs whereby an encoding conversion would fail, leading to elog trying to report this problem, but this attempt would also incur a conversion step, failing recursively until elog's stack got full. I'm not saying this is impossible to solve, just something to keep in mind. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held
On Wed, Sep 7, 2011 at 4:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Yeah, and for that matter it seems to let VACUUM off the hook too. If we assume that the reported object ID is non-corrupt (and if it's always the same, that seems like the way to bet) then this is a lock on pg_authid. Hmmm ... could the pathway involve an error exit from client authentication? We're still finding bugs in the 9.0 rewrite of auth-time database access. Well, according to Dave's report upthread, it's not only this one relation: DG The recent errors are: DG lock AccessShareLock on object 16403/2615/0 is already held DG which is for pg_namespace in database c23. I thought about an error exit from client authentication, and that's a somewhat appealing explanation, but I can't quite see why we wouldn't clean up there the same as anywhere else. The whole mechanism feels a bit rickety to me - we don't actually release locks; we just abort the transaction and *assume* that will cause locks to get released. And it should. But there's a bit more action-at-a-distance there than I'd ideally like. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held
On Wed, Sep 07, 2011 at 04:55:24PM -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: Tom's right to be skeptical of my theory, because it would require a CHECK_FOR_INTERRUPTS() outside of a transaction block in one of the pathways that use session-level locks, and I can't find one. More to the point: session-level locks are released on error. The only way to get out of a transaction while still holding one is for the VACUUM-or-whichever-command code to deliberately commit and exit while still holding it. An error exit path would release the lock. OTOH, I'm skeptical of the theory that this involves userlocks, because this whole thread started because of a complaint about lock 0/1260/0 already being held. That ain't no userlock. Yeah, and for that matter it seems to let VACUUM off the hook too. If we assume that the reported object ID is non-corrupt (and if it's always the same, that seems like the way to bet) then this is a lock on pg_authid. Hmmm ... could the pathway involve an error exit from client authentication? We're still finding bugs in the 9.0 rewrite of auth-time database access. It does not seem restricted to pg_authid: 2011-08-24 18:35:57.445 24987 c23 apps ERROR: lock AccessShareLock on object 16403/2615/0 And I think I've seen it on other tables too. -dg -- David Gould da...@sonic.net 510 536 1443510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held
Robert Haas robertmh...@gmail.com writes: I thought about an error exit from client authentication, and that's a somewhat appealing explanation, but I can't quite see why we wouldn't clean up there the same as anywhere else. The whole mechanism feels a bit rickety to me - we don't actually release locks; we just abort the transaction and *assume* that will cause locks to get released. Well, transaction abort will call LockReleaseAll, which is carefully coded to clean up the proclock lists regardless of what is in the locallocks table, so I'm not sure why you find that any more rickety than anything else. But maybe it'd be interesting for Dave to stick a LockReleaseAll call into ProcKill() and see if that makes things better. (Dave: test that before you put it in production, I'm not totally sure it's safe.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held
daveg da...@sonic.net writes: It does not seem restricted to pg_authid: 2011-08-24 18:35:57.445 24987 c23 apps ERROR: lock AccessShareLock on object 16403/2615/0 And I think I've seen it on other tables too. Hmm. 2615 = pg_namespace, which most likely is the first catalog accessed by just about any SQL command that's going to access tables at all, so I suspect that this is mostly just a the first access failed thing and not something peculiar to pg_namespace. But we still don't have a clue why the locks are not getting released by the previous owner of the PGPROC slot. Have you trawled your logs to see if there's any sign of any distress at all, shortly before the problem starts to happen? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held
On Wed, Sep 07, 2011 at 06:35:08PM -0400, Tom Lane wrote: daveg da...@sonic.net writes: It does not seem restricted to pg_authid: 2011-08-24 18:35:57.445 24987 c23 apps ERROR: lock AccessShareLock on object 16403/2615/0 And I think I've seen it on other tables too. Hmm. 2615 = pg_namespace, which most likely is the first catalog accessed by just about any SQL command that's going to access tables at all, so I suspect that this is mostly just a the first access failed thing and not something peculiar to pg_namespace. But we still don't have a clue why the locks are not getting released by the previous owner of the PGPROC slot. Have you trawled your logs to see if there's any sign of any distress at all, shortly before the problem starts to happen? Will do, but its a pretty big haystack. Sure wish I knew what the needle looked like. ;-) -dg -- David Gould da...@sonic.net 510 536 1443510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held
On Wed, Sep 07, 2011 at 06:25:23PM -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: I thought about an error exit from client authentication, and that's a somewhat appealing explanation, but I can't quite see why we wouldn't clean up there the same as anywhere else. The whole mechanism feels a bit rickety to me - we don't actually release locks; we just abort the transaction and *assume* that will cause locks to get released. Well, transaction abort will call LockReleaseAll, which is carefully coded to clean up the proclock lists regardless of what is in the locallocks table, so I'm not sure why you find that any more rickety than anything else. But maybe it'd be interesting for Dave to stick a LockReleaseAll call into ProcKill() and see if that makes things better. (Dave: test that before you put it in production, I'm not totally sure it's safe.) Re safety, what is the worst case here? Also, this is very intermittant, we have seen it only in recent months on both 8.4.7 and 9.0.4 after years of no problems. Lately we see it what feels like a few times a month. Possibly some new application behaviour is provoking it, but I have no guesses as to what. -dg -- David Gould da...@sonic.net 510 536 1443510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held
daveg da...@sonic.net writes: On Wed, Sep 07, 2011 at 06:25:23PM -0400, Tom Lane wrote: ... But maybe it'd be interesting for Dave to stick a LockReleaseAll call into ProcKill() and see if that makes things better. (Dave: test that before you put it in production, I'm not totally sure it's safe.) Re safety, what is the worst case here? I think a failure would be pretty obvious --- if it gets through regression tests it's probably fine. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held
daveg da...@sonic.net writes: Also, this is very intermittant, we have seen it only in recent months on both 8.4.7 and 9.0.4 after years of no problems. Lately we see it what feels like a few times a month. Possibly some new application behaviour is provoking it, but I have no guesses as to what. BTW ... what were the last versions you were running on which you had *not* seen the problem? (Just wondering about the possibility that we back-patched some fix that broke things. It would be good to have a version range before trawling the commit logs.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Log crashed backend's query (activity string)
On Wed, Sep 7, 2011 at 5:24 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Marti Raudsepp's message of mié sep 07 18:09:32 -0300 2011: On Wed, Sep 7, 2011 at 22:42, Alvaro Herrera alvhe...@commandprompt.com wrote: A mishandled encoding conversion could be problematic, so that needs to be carefully considered (perhaps just shut off unconditionally). Are you referring to log file encoding or something else? The log file is already potentially mixed-encoding, as different databases may have different encodings and backends just dump bytes to it in their current encoding. I am referring to the fact that whatever the backend puts in shared memory is going to be in its encoding setting, which may not necessarily match the postmaster's. And if it doesn't, the postmaster might try to convert it using settings not valid for the string, possibly leading to crashes. I remember we had bugs whereby an encoding conversion would fail, leading to elog trying to report this problem, but this attempt would also incur a conversion step, failing recursively until elog's stack got full. I'm not saying this is impossible to solve, just something to keep in mind. Can we do something like: pass through ASCII characters unchanged, and output anything with the high-bit set as \xhexdigithexdigit? That might be garbled in some cases, but the goal here is not perfection. We're just trying to give the admin (or PostgreSQL-guru-for-hire) a clue where to start looking for the problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Moving core timestamp typedefs/macros somewhere else
In connection with doing this: http://archives.postgresql.org/message-id/22214.1315343...@sss.pgh.pa.us I've run into the problem that tz_acceptable(), which needs to be available to frontend-ish code if initdb is to use it, depends on these symbols: #define UNIX_EPOCH_JDATE2440588 /* == date2j(1970, 1, 1) */ #define POSTGRES_EPOCH_JDATE2451545 /* == date2j(2000, 1, 1) */ #define SECS_PER_DAY 86400 which are defined in backend-only include files such as utils/timestamp.h. This immediately brought to mind the pgrminclude fiasco of a couple days ago, which was at least in part due to the fact that utils/timestamp.h got included into some very low-level header files so that they could use typedef TimestampTz. So I think it's time to do something about that. I propose moving the Timestamp/Interval typedefs, as well as some basic associated macros such as the ones mentioned above (basically, lines 25-100 of utils/timestamp.h, plus the DT_NOBEGIN/DT_NOEND stuff, plus the Julian-date macros in datetime.h), into a separate header file that contains no backend-only declarations (eg, not fmgr.h stuff; right offhand I don't think it would depend on anything except c.h). If you believe the idea I suggested a few days ago that we ought to try to push basic typedefs into a separate set of headers, then this could be the first instance of that, which would lead to naming it something like datatype/timestamp.h. If that seems premature, then I guess it ought to go into utils/, but then we need some other name because utils/timestamp.h is taken. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving core timestamp typedefs/macros somewhere else
On 8 September 2011 10:22, Tom Lane t...@sss.pgh.pa.us wrote: If you believe the idea I suggested a few days ago that we ought to try to push basic typedefs into a separate set of headers, then this could be the first instance of that, which would lead to naming it something like datatype/timestamp.h. If that seems premature, then I guess it ought to go into utils/, but then we need some other name because utils/timestamp.h is taken. The separate headers for basic typedefs makes perfect sense to me. Cheers, BJ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held
On Wed, Sep 07, 2011 at 07:39:15PM -0400, Tom Lane wrote: daveg da...@sonic.net writes: Also, this is very intermittant, we have seen it only in recent months on both 8.4.7 and 9.0.4 after years of no problems. Lately we see it what feels like a few times a month. Possibly some new application behaviour is provoking it, but I have no guesses as to what. BTW ... what were the last versions you were running on which you had *not* seen the problem? (Just wondering about the possibility that we back-patched some fix that broke things. It would be good to have a version range before trawling the commit logs.) The first version we saw it on was 8.4.7. -dg -- David Gould da...@sonic.net 510 536 1443510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held
daveg da...@sonic.net writes: On Wed, Sep 07, 2011 at 07:39:15PM -0400, Tom Lane wrote: BTW ... what were the last versions you were running on which you had *not* seen the problem? (Just wondering about the possibility that we back-patched some fix that broke things. It would be good to have a version range before trawling the commit logs.) The first version we saw it on was 8.4.7. Yeah, you said that. I was wondering what you'd last run before 8.4.7. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: lock AccessShareLock on object 0/1260/0 is already held
On Wed, Sep 7, 2011 at 6:25 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I thought about an error exit from client authentication, and that's a somewhat appealing explanation, but I can't quite see why we wouldn't clean up there the same as anywhere else. The whole mechanism feels a bit rickety to me - we don't actually release locks; we just abort the transaction and *assume* that will cause locks to get released. Well, transaction abort will call LockReleaseAll, which is carefully coded to clean up the proclock lists regardless of what is in the locallocks table, so I'm not sure why you find that any more rickety than anything else. Well, it's very clear that you CAN orphan locks if a backend holding a session lock ever does CHECK_FOR_INTERRUPTS() outside of a transaction. Try the attached patch. rhaas=# vacuum full foo; FATAL: terminating connection due to administrator command FATAL: terminating connection due to administrator command The connection to the server was lost. Attempting reset: Succeeded. rhaas=# vacuum full foo; ERROR: lock AccessExclusiveLock on object 16384/1431013/0 is already held Now, I don't see any evidence of a live bug here (and on further thought it can't be Dave's bug because he is orphaning AccessShareLocks, not AccessExclusiveLocks), but I find this pretty convincing as a demonstration of ricketiness. It is certainly not obvious on its face that a misplaced CHECK_FOR_INTERRUPTS() can result in a backend exiting without cleaning up its locks, and I'd argue its a bad idea to leave it that way even if there's no user-visible problem there today. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company break-vacuum.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] force_not_null option support for file_fdw
(2011/09/05 22:05), Kohei Kaigai wrote: In my usual environment that test passed, but finally I've reproduced the failure with setting $LC_COLLATE to es_ES.UTF-8. Do you have set any $LC_COLLATE in your test environment? It is not set in my environment. I checked the behavior of ORDER BY when we set a collation on the regular relation, not a foreign table. Do we hit something other unexpected bug in collation here? postgres=# CREATE TABLE t1 (word1 text); CREATE TABLE postgres=# INSERT INTO t1 VALUES ('ABC'),('abc'),('123'),('NULL'); INSERT 0 4 postgres=# ALTER TABLE t1 ALTER word1 TYPE text COLLATE ja_JP.utf8; ALTER TABLE postgres=# SELECT * FROM t1 ORDER BY word1; word1 --- 123 ABC NULL abc (4 rows) postgres=# ALTER TABLE t1 ALTER word1 TYPE text COLLATE en_US.utf8; ALTER TABLE postgres=# SELECT * FROM t1 ORDER BY word1; word1 --- 123 abc ABC NULL (4 rows) Thanks for the checking. FYI, I mainly use Fedora 15 box with Japanese environment for my development. ISTM that your results are reasonable for each collation setting. Former ordering is same as C locale, and in latter case alphabetical order has priority over case distinctions. Do you mean that ordering used in file_fdw is affected from something unexpected, without collation or locale setting? BTW, I found a thread which is related to this issue. http://archives.postgresql.org/pgsql-hackers/2011-09/msg00130.php I changed the test data so that it uses only upper case alphabets, because case distinction is not important for that test. I also removed digits to avoid test failure in some locales which sort alphabets before digits. Regards, -- Shigeru Hanada diff --git a/contrib/file_fdw/data/text.csv b/contrib/file_fdw/data/text.csv index ...09827e7 . *** a/contrib/file_fdw/data/text.csv --- b/contrib/file_fdw/data/text.csv *** *** 0 --- 1,4 + AAA,AAA + XYZ,XYZ + NULL,NULL + ABC,ABC diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c index 224e74f..548dcd2 100644 *** a/contrib/file_fdw/file_fdw.c --- b/contrib/file_fdw/file_fdw.c *** *** 23,30 --- 23,32 #include foreign/fdwapi.h #include foreign/foreign.h #include miscadmin.h + #include nodes/makefuncs.h #include optimizer/cost.h #include utils/rel.h + #include utils/syscache.h PG_MODULE_MAGIC; *** static struct FileFdwOption valid_option *** 57,72 {escape, ForeignTableRelationId}, {null, ForeignTableRelationId}, {encoding, ForeignTableRelationId}, /* * force_quote is not supported by file_fdw because it's for COPY TO. */ - /* -* force_not_null is not supported by file_fdw. It would need a parser -* for list of columns, not to mention a way to check the column list -* against the table. -*/ /* Sentinel */ {NULL, InvalidOid} --- 59,70 {escape, ForeignTableRelationId}, {null, ForeignTableRelationId}, {encoding, ForeignTableRelationId}, + {force_not_null, AttributeRelationId},/* specified as boolean value */ /* * force_quote is not supported by file_fdw because it's for COPY TO. */ /* Sentinel */ {NULL, InvalidOid} *** static void fileGetOptions(Oid foreignta *** 112,117 --- 110,116 static void estimate_costs(PlannerInfo *root, RelOptInfo *baserel, const char *filename, Cost *startup_cost, Cost *total_cost); + static List * get_force_not_null(Oid relid); /* *** file_fdw_validator(PG_FUNCTION_ARGS) *** 145,150 --- 144,150 List *options_list = untransformRelOptions(PG_GETARG_DATUM(0)); Oid catalog = PG_GETARG_OID(1); char *filename = NULL; + char *force_not_null = NULL; List *other_options = NIL; ListCell *cell; *** file_fdw_validator(PG_FUNCTION_ARGS) *** 198,204 buf.data))); } ! /* Separate out filename, since ProcessCopyOptions won't allow it */ if (strcmp(def-defname, filename) == 0) { if (filename) --- 198,207 buf.data))); } ! /* !* Separate out filename and force_not_null, since ProcessCopyOptions !* won't allow them. !*/ if (strcmp(def-defname, filename) == 0) { if (filename) *** file_fdw_validator(PG_FUNCTION_ARGS) *** 207,212 --- 210,229 errmsg(conflicting or redundant options)));