Re: [HACKERS] experimental: replace s_lock spinlock code with pthread_mutex on linux
On Wed, Jun 27, 2012 at 12:58:47AM +0200, Nils Goroll wrote: So it looks like using pthread_mutexes could at least be an option on Linux. Using futexes directly could be even cheaper. Note that below this you only have the futex(2) system call. Futexes require all counter manipulation to happen in userspace, just like now, so all the per architecture stuff remains. On Linux pthread mutexes are really just a thin wrapper on top of this. The futex(2) system call merely provides an interface for handling the blocking and waking of other processes and releasing locks on process exit (so everything can still work after a kill -9). So it's more a replacement for the SysV semaphores than anything else. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] Schema version management
Robert, thank you for keeping this thread alive. Hopefully some more will join the discussion. I'm still hopeful the community can manage to agree upon acceptable tradeoffs and work-arounds to make this possible. I think the benefits clearly outweighs the minor issues of filenames, dumping order, etc. On Tue, Jun 26, 2012 at 6:04 PM, Robert Haas robertmh...@gmail.com wrote: I don't think either of these problems ought to be a complete show-stopper. It seems to me that the trade-off is that when object names are long, contain special characters, or are overloaded, we'll have to munge the names in some way to prevent collisions. That could mean that the names are not 100% stable, which would possibly produce some annoyance if you're using a VCS to track changes, but maybe that's an acceptable trade-off, because it shouldn't happen very often. If we could guararantee that identifiers less than 64 characters which are not overloaded and contain no special characters requiring quoting end up in an eponymous file, I think that would be good enough to make most of our users pretty happy. In other cases, I think the point would just be to make it work (with a funny name) rather than fail. I agree. It's not a problem if the filename is not identical to the name of the object, as long as the same name generates the same filename on all architectures. Url escaping would work, but converting all non-ascii characters to ascii would be nicer, and dropping any problematic characters, or replacing them with _ or any other suitable character. For the small fraction of users how have managed to find a good reason to name a function this/is\a/good.name/of/a\function.. the filename of such a function would be this_is_a_good_name_of_a_function. As long as the objects are dumped in the same order, there will be no merge problems when two developers commit changes of the same file. I think pg_dump does a reasonable job already making sure the order is always the same. How big is the problem, really? It would of course be a little easier to keep track of changes and do merging if all overloaded functions would be kept in separate files, but I see that as a minor feature request. As long as all objects with the same name are kept in separate files, that's good enough for my needs, and I have _a lot_ of functions, whereof quite a few are overloaded. \i /home/postgres/database/gluepay-split/aml/FUNCTION/get_linkid.sql \i /home/postgres/database/gluepay-split/aml/FUNCTION/set_address.sql It would be better to use \ir here rather than hard-code path names, I think. Then you'd only need to require that all the files be in the same directory, rather than requiring them to be at a certain hard-coded place in the filesystem. I fully agree! I didn't know about the \ir feature. Best regards, Joel Jacobson
Re: [HACKERS] new --maintenance-db options
Tom Lane t...@sss.pgh.pa.us writes: Amit Kapila amit.kap...@huawei.com writes: [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane The implementation I've wanted to see for some time is that you can start a standalone backend, but it speaks FE/BE protocol to its caller (preferably over pipes, so that there is no issue whatsoever of where you can securely put a socket or anything like that). Can't it be done like follow the FE/BE protocol, but call directly the server API's at required places. That wouldn't be easier, nor cleaner, and it would open us up to client-induced database corruption (from failure to follow APIs, crashes in the midst of an operation, memory stomps, etc). We decided long ago that we would never support truly embedded operation in the sense of PG executing in the client's process/address space. Okay. I like the design suggested above because it has many of the good properties of an embedded database (in particular, no need to manage or contact a server) but still keeps the client code at arm's length. In such a case will that standalone backend manage other processes like (wal writer, checkpoint, ...) or no background processes like in current --single mode. Can there be any performance advantage also in such a mode as compare to current when client and server on same m/c and uses Domain Socket? With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] plpython issue with Win64 (PG 9.2)
Hi, On Windows 7 64bit, plpython is causing server crash with the following test case i.e. CREATE PROCEDURAL LANGUAGE 'plpython3u'; CREATE OR REPLACE FUNCTION pymax (a integer, b integer) RETURNS integer AS $$ if a b: return a return b $$ LANGUAGE plpython3u; SELECT pymax(1, 2); Server exit with the following exception i.e. Unhandled exception at 0x777a3483 in postgres.exe: 0xC0FD: Stack overflow. plpython3.dll!PLy_traceback(char * * xmsg, char * * tbmsg, int * tb_depth) Line 174 C plpython3.dll!PLy_elog(int elevel, const char * fmt, ...) Line 67 C plpython3.dll!PLyUnicode_AsString(_object * unicode) Line 96 C plpython3.dll!PLy_traceback(char * * xmsg, char * * tbmsg, int * tb_depth) Line 176 + 0x8 bytes C plpython3.dll!PLy_elog(int elevel, const char * fmt, ...) Line 67 C plpython3.dll!PLyUnicode_AsString(_object * unicode) Line 96 C ... ... plpython3.dll!PLy_traceback(char * * xmsg, char * * tbmsg, int * tb_depth) Line 176 + 0x8 bytes C plpython3.dll!PLy_elog(int elevel, const char * fmt, ...) Line 67 C plpython3.dll!PLyUnicode_AsString(_object * unicode) Line 96 C plpython3.dll!PLy_traceback(char * * xmsg, char * * tbmsg, int * tb_depth) Line 176 + 0x8 bytes C Dbserver get stuck in the following call loop i.e. ... PLy_elog() - PLy_traceback() - PLyUnicode_AsString() - PLyUnicode_Bytes() - PLy_elog() ... I think primary reason that trigger this issue is when Function PLyUnicode_Bytes() calls PyUnicode_AsEncodedString( ,WIN1252 /*Server encoding*/, ) it fails with null. I built latest pg 9.2 source code with python 3.2.2.3 by using Visual Studio 2010. Thanks. Best Regards, Muhammad Asif Naeem
Re: [HACKERS] experimental: replace s_lock spinlock code with pthread_mutex on linux
Using futexes directly could be even cheaper. Note that below this you only have the futex(2) system call. I was only referring to the fact that we could save one function and one library call, which could make a difference for the uncontended case. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 01/16] Overhaul walsender wakeup handling
On Tuesday, June 26, 2012 04:06:08 PM Andres Freund wrote: On Tuesday, June 26, 2012 04:01:26 PM Robert Haas wrote: On Fri, Jun 22, 2012 at 12:35 PM, Andres Freund and...@2ndquadrant.com wrote: Can you elaborate on that a bit? What scenarios did you play around with, and what does win mean in this context? I had two machines connected locally and setup HS and my prototype between them (not at once obviously). The patch reduced all the average latency between both nodes (measured by 'ticker' rows arriving in a table on the standby), the jitter in latency and the amount of load I had to put on the master before the standby couldn't keep up anymore. I played with different loads: * multple concurrent ~50MB COPY's * multple concurrent ~50MB COPY's, pgbench * pgbench All three had a ticker running concurrently with synchronous_commit=off (because it shouldn't cause any difference in the replication pattern itself). The difference in averagelag and cutoff were smallest with just pgbench running alone and biggest with COPY running alone. Highjitter was most visible with just pgbench running alone but thats likely just because the average lag was smaller. OK, that sounds pretty promising. I'd like to run a few performance tests on this just to convince myself that it doesn't lead to a significant regression in other scenarios. Assuming that doesn't turn up anything major, I'll go ahead and commit this. Independent testing would be great, its definitely possible that I oversaw something although I obviously don't think so ;). Can you provide a rebased version? It seems that one of the hunks in xlog.c no longer applies. Will do so. Not sure if I can finish it today though, I am in the midst of redoing the ilist and xlogreader patches. I guess tomorrow will suffice otherwise... Ok, attached are two patches: The first is the rebased version of the original patch with WalSndWakeupProcess renamed to WalSndWakeupProcessRequests (seems clearer). The second changes WalSndWakeupRequest and WalSndWakeupProcessRequests into macros as you requested before. I am not sure if its a good idea or not. Anything else? Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 164276afc60fa2451f28de996fcc54567e6168e2 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Tue, 29 May 2012 20:00:16 +0200 Subject: [PATCH 1/2] Overhaul walsender wakeup handling The previous coding could miss xlog writeouts at several places. E.g. when wal was written out by the background writer or even after a commit if synchronous_commit=off. This could lead to delays in sending data to the standby of up to 7 seconds. To fix this move the responsibility of notification to the layer where the neccessary information is actually present. We take some care not to do the notification while we hold conteded locks like WALInsertLock or WalWriteLock locks. Document the preexisting fact that we rely on SetLatch to be safe from within signal handlers and critical sections. This removes the temporary bandaid from 2c8a4e9be2730342cbca85150a2a9d876aa77ff6 --- src/backend/access/transam/twophase.c | 21 - src/backend/access/transam/xact.c |7 -- src/backend/access/transam/xlog.c | 25 ++-- src/backend/port/unix_latch.c |3 +++ src/backend/port/win32_latch.c|4 src/backend/replication/walsender.c | 41 - src/include/replication/walsender.h |2 ++ 7 files changed, 67 insertions(+), 36 deletions(-) diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index e8fb78b..7f198c2 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -1042,13 +1042,6 @@ EndPrepare(GlobalTransaction gxact) /* If we crash now, we have prepared: WAL replay will fix things */ - /* - * Wake up all walsenders to send WAL up to the PREPARE record immediately - * if replication is enabled - */ - if (max_wal_senders 0) - WalSndWakeup(); - /* write correct CRC and close file */ if ((write(fd, statefile_crc, sizeof(pg_crc32))) != sizeof(pg_crc32)) { @@ -2045,13 +2038,6 @@ RecordTransactionCommitPrepared(TransactionId xid, /* Flush XLOG to disk */ XLogFlush(recptr); - /* - * Wake up all walsenders to send WAL up to the COMMIT PREPARED record - * immediately if replication is enabled - */ - if (max_wal_senders 0) - WalSndWakeup(); - /* Mark the transaction committed in pg_clog */ TransactionIdCommitTree(xid, nchildren, children); @@ -2133,13 +2119,6 @@ RecordTransactionAbortPrepared(TransactionId xid, XLogFlush(recptr); /* - * Wake up all walsenders to send WAL up to the ABORT PREPARED record - * immediately if replication is enabled - */ -
Re: [HACKERS] [v9.3] Row-Level Security
On Jun27, 2012, at 07:18 , Kohei KaiGai wrote: The problem is the way to implement it. If we would have permission checks on planner stage, it cannot handle a case when user-id would be switched prior to executor stage, thus it needs something remedy to handle the scenario correctly. Instead of a unique plan per query, it might be a solution to generate multiple plans depending on user-id, and choose a proper one in executor stage. Which type of implementation is what everybody is asking for? I think you need to a) Determine the user-id at planning time, and insert the matching RLS clause b1) Either re-plan the query if the user-id changes between planning and execution time, which means making the user-id a part of the plan-cache key. b2) Or decree that for RLS purposes, it's the user-id at planning time, not execution time, that counts. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
On Wed, Jun 27, 2012 at 12:00 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: So, here's a patch. Instead of using POSIX shmem, I just took the expedient of using mmap() to map a block of MAP_SHARED|MAP_ANONYMOUS memory. The sysv shm is still allocated, but it's just a copy of PGShmemHeader; the real shared memory is the anonymous block. This won't work if EXEC_BACKEND is defined so it just falls back on straight sysv shm in that case. Um. I hadn't thought about the EXEC_BACKEND interaction, but that seems like a bit of a showstopper. I would not like to give up the ability to debug EXEC_BACKEND mode on Unixen. Would Posix shmem help with that at all? Why did you choose not to use the Posix API, anyway? It seemed more complicated. If we use the POSIX API, we've got to have code to find a non-colliding name for the shm, and we've got to arrange to clean it up at process exit. Anonymous shm doesn't require a name and goes away automatically when it's no longer in use. With respect to EXEC_BACKEND, I wasn't proposing to kill it, just to make it continue to use a full-sized sysv shm. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proof concept - access to session variables on client side
2012/6/27 Tom Lane t...@sss.pgh.pa.us: Pavel Stehule pavel.steh...@gmail.com writes: Another thing I don't care for is the unannounced protocol extension. yes, it needs protocol extension and increasing version too. But I don't afraid about dissynchronisation - server doesn't send 'v' message when client doesn't support it. And you would know that how, exactly? minor version of protocol can be used http://archives.postgresql.org/pgsql-hackers/2011-12/msg00025.php I don't know if this topic is done, I only remember this thread Regards Pavel Stehule regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
On Wed, Jun 27, 2012 at 3:50 AM, Tom Lane t...@sss.pgh.pa.us wrote: A.M. age...@themactionfaction.com writes: On 06/26/2012 07:30 PM, Tom Lane wrote: I solved this via fcntl locking. No, you didn't, because fcntl locks aren't inherited by child processes. Too bad, because they'd be a great solution otherwise. You claimed this last time and I replied: http://archives.postgresql.org/pgsql-hackers/2011-04/msg00656.php I address this race condition by ensuring that a lock-holding violator is the postmaster or a postmaster child. If such as condition is detected, the child exits immediately without touching the shared memory. POSIX shmem is inherited via file descriptors. This is possible because the locking API allows one to request which PID violates the lock. The child expects the lock to be held and checks that the PID is the parent. If the lock is not held, that means that the postmaster is dead, so the child exits immediately. OK, I went back and re-read the original patch, and I now agree that something like this is possible --- but I don't like the way you did it. The dependence on particular PIDs seems both unnecessary and risky. The key concept here seems to be that the postmaster first stakes a claim on the data directory by exclusive-locking a lock file. If successful, it reduces that lock to shared mode (which can be done atomically, according to the SUS fcntl specification), and then holds the shared lock until it exits. Spawned children will not initially have a lock, but what they can do is attempt to acquire shared lock on the lock file. If fail, exit. If successful, *check to see that the parent postmaster is still alive* (ie, getppid() != 1). If so, the parent must have been continuously holding the lock, and the child has successfully joined the pool of shared lock holders. Otherwise bail out without having changed anything. It is the parent is still alive check, not any test on individual PIDs, that makes this work. There are two concrete reasons why I don't care for the GetPIDHoldingLock() way. Firstly, the fact that you can get a blocking PID from F_GETLK isn't an essential part of the concept of file locking IMO --- it's just an incidental part of this particular API. May I remind you that the reason we're stuck on SysV shmem in the first place is that we decided to depend on an incidental part of that API, namely nattch? I would like to not require file locking to have any semantics more specific than a process can hold an exclusive or a shared lock on a file, which is auto-released at process exit. Secondly, in an NFS world I don't believe that the returned l_pid value can be trusted for anything. If it's a PID from a different machine then it might accidentally conflict with one on our machine, or not. Reflecting on this further, it seems to me that the main remaining failure modes are (1) file locking doesn't work, or (2) idiot DBA manually removes the lock file. Both of these could be ameliorated with some refinements to the basic idea. For (1), I suggest that we tweak the startup process (only) to attempt to acquire exclusive lock on the lock file. If it succeeds, we know that file locking is broken, and we can complain. (This wouldn't help for cases where cross-machine locking is broken, but I see no practical way to detect that.) For (2), the problem really is that the proposed patch conflates the PID file with the lock file, but people are conditioned to think that PID files are removable. I suggest that we create a separate, permanently present file that serves only as the lock file and doesn't ever get modified (it need have no content other than the string Don't remove this!). It'd be created by initdb, not by individual postmaster runs; indeed the postmaster should fail if it doesn't find the lock file already present. The postmaster PID file should still exist with its current contents, but it would serve mostly as documentation and as server-contact information for pg_ctl; it would not be part of the data directory locking mechanism. I wonder whether this design can be adapted to Windows? IIRC we do not have a bulletproof data directory lock scheme for Windows. It seems like this makes few enough demands on the lock mechanism that there ought to be suitable primitives available there too. I assume you're saying we need to make changes in the internal API, right? Because we alreayd have a windows native implementation of shared memory that AFAIK works, so if the new Unix stuff can be done with the same internal APIs, it shouldn't nede to be changed. (Sorry, haven't followed the thread in detail) If so - can we define exactly what properties it is we *need*? (A native API worth looking at is e.g. http://msdn.microsoft.com/en-us/library/windows/desktop/aa365203(v=vs.85).aspx - but there are probably others as well if that one doesn't do) -- Magnus Hagander Me:
Re: [HACKERS] plpython issue with Win64 (PG 9.2)
On 27/06/12 11:51, Asif Naeem wrote: Hi, On Windows 7 64bit, plpython is causing server crash with the following test case i.e. CREATE PROCEDURAL LANGUAGE 'plpython3u'; CREATE OR REPLACE FUNCTION pymax (a integer, b integer) RETURNS integer AS $$ if a b: return a return b $$ LANGUAGE plpython3u; SELECT pymax(1, 2); I think primary reason that trigger this issue is when Function PLyUnicode_Bytes() calls PyUnicode_AsEncodedString( ,WIN1252 /*Server encoding*/, ) it fails with null. I built latest pg 9.2 source code with python 3.2.2.3 by using Visual Studio 2010. Thanks. I'll try to reproduce this on Linux, which should be possible given the results of your investigation. Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend for same-role
On Tue, Jun 26, 2012 at 10:58 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Mar 29, 2012 at 3:04 AM, Daniel Farina dan...@heroku.com wrote: On Mon, Mar 26, 2012 at 12:14 AM, Jeff Davis pg...@j-davis.com wrote: On Tue, 2012-03-20 at 01:38 -0700, Daniel Farina wrote: On Thu, Mar 15, 2012 at 9:39 PM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Mar 16, 2012 at 8:14 AM, Daniel Farina dan...@heroku.com wrote: Parallel to pg_cancel_backend, it'd be nice to allow the user to just outright kill a backend that they own (politely, with a SIGTERM), aborting any transactions in progress, including the idle transaction, and closing the socket. +1 Here's a patch implementing the simple version, with no more guards against signal racing than have been seen previously. The more elaborate variants to close those races is being discussed in a parallel thread, but I thought I'd get this simple version out there. Review: After reading through the threads, it looks like there was no real objection to this approach -- pid recycling is not something we're concerned about. I think you're missing a doc update though, in func.sgml: For the less restrictive functionpg_cancel_backend/, the role of an active backend can be found from the structfieldusename/structfield column of the structnamepg_stat_activity/structname view. Also, high-availability.sgml makes reference to pg_cancel_backend(), and it might be worthwhile to add an ...and pg_terminate_backend() there. Other than that, it looks good to me. Good comments. Patch attached to address the doc issues. The only iffy thing is that the paragraph For the less restrictive... I have opted to remove in its entirely. I think the documents are already pretty clear about the same-user rule, and I'm not sure if this is the right place for a crash-course on attributes in pg_stat_activity (but maybe it is). ...and pg_terminate_backend seems exactly right. And I think now that the system post-patch doesn't have such a strange contrast between the ability to send SIGINT vs. SIGTERM, such a contrast may not be necessary. I'm also not sure what the policy is about filling paragraphs in the manual. I filled one, which increases the sgml churn a bit. git (show|diff) --word-diff helps clean it up. I went ahead and committed this. I kinda think we should back-patch this into 9.2. It doesn't involve a catalog change, and would make the behavior consistent between the two releases, instead of changing in 9.1 and then changing again in 9.2. Thoughts? +1. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.3] Row-Level Security
2012/6/27 Florian Pflug f...@phlo.org: On Jun27, 2012, at 07:18 , Kohei KaiGai wrote: The problem is the way to implement it. If we would have permission checks on planner stage, it cannot handle a case when user-id would be switched prior to executor stage, thus it needs something remedy to handle the scenario correctly. Instead of a unique plan per query, it might be a solution to generate multiple plans depending on user-id, and choose a proper one in executor stage. Which type of implementation is what everybody is asking for? I think you need to a) Determine the user-id at planning time, and insert the matching RLS clause b1) Either re-plan the query if the user-id changes between planning and execution time, which means making the user-id a part of the plan-cache key. b2) Or decree that for RLS purposes, it's the user-id at planning time, not execution time, that counts. My preference is b1, because b2 approach takes user visible changes in concepts of permission checks. Probably, plan-cache should be also invalidated when user's property was modified or grant/revoke is issued, in addition to the table itself. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed
Sorry about the delay in answering. I have been swamped with non-PG related things lately. On Tue, Jun 26, 2012 at 11:08 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Jun 22, 2012 at 10:12 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, Jun 19, 2012 at 5:41 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: I'm confused by this remark, because surely the query planner does it this way only if there's no LIMIT. When there is a LIMIT, we choose based on the startup cost plus the estimated fraction of the total cost we expect to pay based on dividing the LIMIT by the overall row count estimate. Or is this not what you're talking about? My reasoning was that query_planner returns the cheapest-total path and cheapest fractional presorted (by the aggregation pathkeys). When evaluating hash-aggregates with this patch these two are indeed compared considering the esimated fraction of the total cost, but this might miss cheapest fractional unordered path for lazy hashaggregates. Reviewing the code now I discovered this path could be picked out from the pathlist, just like it is done by get_cheapest_fractional_path_for_pathkeys when pathkeys is nil. This would need to be returned in addition to the other two paths. To minimize overhead, this should only be done when we possibly want to consider lazy hash-aggregation (there is a group clause with no aggregates and grouping is hashable) But this is starting to get pretty crufty considering that there doesn't seem to be any really compelling usecases for this. Ants, do you intend to update this patch for this CommitFest? Or at all? It seems nobody's too excited about this, so I'm not sure whether it makes sense for you to put more work on it. But please advise as to your plans. If anyone thinks that this patch might be worth considering, then I'm prepared to do minor cleanup this CF (I saw some possibly unnecessary cruft in agg_fill_hash_and_retrieve). On the other hand, if you think the use case is too marginal to consider for inclusion then I won't shed a tear if this gets rejected. For me this was mostly a learning experience for poking around in the planner. Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.3] Row-Level Security
On Wed, Jun 27, 2012 at 7:21 AM, Florian Pflug f...@phlo.org wrote: On Jun27, 2012, at 07:18 , Kohei KaiGai wrote: The problem is the way to implement it. If we would have permission checks on planner stage, it cannot handle a case when user-id would be switched prior to executor stage, thus it needs something remedy to handle the scenario correctly. Instead of a unique plan per query, it might be a solution to generate multiple plans depending on user-id, and choose a proper one in executor stage. Which type of implementation is what everybody is asking for? I think you need to a) Determine the user-id at planning time, and insert the matching RLS clause b1) Either re-plan the query if the user-id changes between planning and execution time, which means making the user-id a part of the plan-cache key. b2) Or decree that for RLS purposes, it's the user-id at planning time, not execution time, that counts. Or b3, flag plans that depend on the user ID inside the plan-cache, and just flush all of those (but only those) when the user ID changes. In the common case where RLS is not used, that might ease the sting. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_terminate_backend for same-role
On Wed, Jun 27, 2012 at 8:13 AM, Magnus Hagander mag...@hagander.net wrote: I went ahead and committed this. I kinda think we should back-patch this into 9.2. It doesn't involve a catalog change, and would make the behavior consistent between the two releases, instead of changing in 9.1 and then changing again in 9.2. Thoughts? +1. Three votes with no objections is good enough for me, so, done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Reporting hba lines
When debugging strange and complex pg_hba lines, it can often be quite useful to know which line is matching a particular connection that failed for some reason. Because more often than not, it's actually not using the line in pg_hba.conf that's expected. The easiest way to do this is to emit an errdetail for the login failure, per this patch. Question is - is that leaking information to the client that we shouldn't be leaking? And if it is, what would be the preferred way to deal with it? We could put that as a detail to basically every single error message coming out of the auth system, but that seems like a bad idea. Or we could make a separate ereport(LOG) before send it to the client, perhaps? Thoughts? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ hba_line.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.3] Row-Level Security
2012/6/27 Robert Haas robertmh...@gmail.com: On Wed, Jun 27, 2012 at 7:21 AM, Florian Pflug f...@phlo.org wrote: On Jun27, 2012, at 07:18 , Kohei KaiGai wrote: The problem is the way to implement it. If we would have permission checks on planner stage, it cannot handle a case when user-id would be switched prior to executor stage, thus it needs something remedy to handle the scenario correctly. Instead of a unique plan per query, it might be a solution to generate multiple plans depending on user-id, and choose a proper one in executor stage. Which type of implementation is what everybody is asking for? I think you need to a) Determine the user-id at planning time, and insert the matching RLS clause b1) Either re-plan the query if the user-id changes between planning and execution time, which means making the user-id a part of the plan-cache key. b2) Or decree that for RLS purposes, it's the user-id at planning time, not execution time, that counts. Or b3, flag plans that depend on the user ID inside the plan-cache, and just flush all of those (but only those) when the user ID changes. In the common case where RLS is not used, that might ease the sting. Probably, PlannedStmt-invalItems allows to handle invalidation of plan-cache without big code changes. I'll try to put a flag of user-id to track the query plan with RLS assumed, or InvalidOid if no RLS was applied in this plan. I'll investigate the implementation for more details. Do we have any other scenario that run a query plan under different user privilege rather than planner stage? Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reporting hba lines
Le mercredi 27 juin 2012 14:54:15, Magnus Hagander a écrit : When debugging strange and complex pg_hba lines, it can often be quite useful to know which line is matching a particular connection that failed for some reason. Because more often than not, it's actually not using the line in pg_hba.conf that's expected. The easiest way to do this is to emit an errdetail for the login failure, per this patch. Question is - is that leaking information to the client that we shouldn't be leaking? I fear it is exactly the problem. Too bad because the feature is interesting. And if it is, what would be the preferred way to deal with it? We could put that as a detail to basically every single error message coming out of the auth system, but that seems like a bad idea. Or we could make a separate ereport(LOG) before send it to the client, perhaps? The problem is also that 1/ you're not sure how you did connect exactly 2/ you're not connecting like you wanted to do. (my case with ipv4 vs ipv6 just below). Providing more information on what we receive from client and tell it to the client sounds good, no ? Thoughts? I just consume some moment before fixing an ipv6 vs ipv4 login failure. The matched line from pg_hba .conf would have been very useful. Maybe it is enough to report what happens on the connection: from which host, TCP/socket (and which one, given that we should have multiple option here soon), dbname, user, inspecting pg_hba.conf will remain difficult if map is used or some other +group option. -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Posix Shared Mem patch
Magnus Hagander mag...@hagander.net writes: On Wed, Jun 27, 2012 at 3:50 AM, Tom Lane t...@sss.pgh.pa.us wrote: I wonder whether this design can be adapted to Windows? IIRC we do not have a bulletproof data directory lock scheme for Windows. It seems like this makes few enough demands on the lock mechanism that there ought to be suitable primitives available there too. I assume you're saying we need to make changes in the internal API, right? Because we alreayd have a windows native implementation of shared memory that AFAIK works, Right, but does it provide honest protection against starting two postmasters in the same data directory? Or more to the point, does it prevent starting a new postmaster when the old postmaster crashed but there are still orphaned backends making changes? AFAIR we basically punted on those problems for the Windows port, for lack of an equivalent to nattch. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Visual Studio 2012 RC
On 10.06.2012 11:41, Magnus Hagander wrote: On Sun, Jun 10, 2012 at 3:16 AM, Brar Pieningb...@gmx.de wrote: Magnus Hagander wrote: I don't have too much hope for them actually changing it - they seem hell-bent on forcing everybody into metro, and this seems to be yet another way to do that. But we can always hope... Looks like they've learnt their lesson... http://blogs.msdn.com/b/visualstudio/archive/2012/06/08/visual-studio-express-2012-for-windows-desktop.aspx Yeah. Didn't expect that to happen, but happy to see that it did! :-) I don't think we can realistically support VS2012 until Microsoft releases the gratis Visual Studio Express 2012 for Windows Desktop. We'll hardly find anyone willing to run a buildfarm machine with VS2012 until that, and I don't think any of the committers have access to VS2012 to test with. So, I'm bumping this to the next commitfest. By the time that begins, let's see if Microsoft has released it yet. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
Robert Haas robertmh...@gmail.com writes: On Wed, Jun 27, 2012 at 12:00 AM, Tom Lane t...@sss.pgh.pa.us wrote: Would Posix shmem help with that at all? Why did you choose not to use the Posix API, anyway? It seemed more complicated. If we use the POSIX API, we've got to have code to find a non-colliding name for the shm, and we've got to arrange to clean it up at process exit. Anonymous shm doesn't require a name and goes away automatically when it's no longer in use. I see. Those are pretty good reasons ... With respect to EXEC_BACKEND, I wasn't proposing to kill it, just to make it continue to use a full-sized sysv shm. Well, if the ultimate objective is to get out from under the SysV APIs entirely, we're not going to get there if we still have to have all that code for the EXEC_BACKEND case. Maybe it's time to decide that we don't need to support EXEC_BACKEND on Unix. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
All, * Tom Lane (t...@sss.pgh.pa.us) wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Jun 27, 2012 at 12:00 AM, Tom Lane t...@sss.pgh.pa.us wrote: Would Posix shmem help with that at all? Why did you choose not to use the Posix API, anyway? It seemed more complicated. If we use the POSIX API, we've got to have code to find a non-colliding name for the shm, and we've got to arrange to clean it up at process exit. Anonymous shm doesn't require a name and goes away automatically when it's no longer in use. I see. Those are pretty good reasons ... After talking to Magnus a bit this morning regarding this, it sounds like what we're doing on Windows is closer to Anonymous shm, except that they use an intentionally specific name, which also allows them to detect if any children are still alive by using a create-if-not-exists approach on the shm segment and failing if it still exists. There were some corner cases around restarts due to it taking a few seconds for the Windows kernel to pick up on the fact that all the children are dead and that the shm segment should go away, but they were able to work around that, and failure to start is surely much better than possible corruption. What this all boils down to is- can you have a shm segment that goes away when no one is still attached to it, but actually give it a name and then detect if it already exists atomically on startup on Linux/Unixes? If so, perhaps we could use the same mechanism on both.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Posix Shared Mem patch
* Tom Lane (t...@sss.pgh.pa.us) wrote: Right, but does it provide honest protection against starting two postmasters in the same data directory? Or more to the point, does it prevent starting a new postmaster when the old postmaster crashed but there are still orphaned backends making changes? AFAIR we basically punted on those problems for the Windows port, for lack of an equivalent to nattch. See my other mail, but, after talking to Magnus, it's my understanding that we had that problem initially, but it was later solved by using a named shared memory segment which the kernel will clean up when all children are gone. That, combined with a 'create-if-exists' call, allows detection of lost children to be done. Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] Re: [PATCH 07/16] Log enough data into the wal to reconstruct logical changes from it if wal_level=logical
On 13.06.2012 14:28, Andres Freund wrote: @@ -2584,6 +2610,73 @@ l1: rdata[1].buffer_std = true; rdata[1].next = NULL; + /* +* XXX: We could decide not to log changes when the origin is not the +* local node, that should reduce redundant logging. +*/ + if(need_tuple){ + xl_heap_header xlhdr; ... + relationFindPrimaryKey(relation,indexoid,pknratts, pkattnum, pktypoid, pkopclass); + + if(!indexoid){ + elog(WARNING, Could not find primary key for table with oid %u, +relation-rd_id); + goto no_index_found; + } + + index_rel = index_open(indexoid, AccessShareLock); + + indexdesc = RelationGetDescr(index_rel); + + for(natt = 0; natt indexdesc-natts; natt++){ + idxvals[natt] = + fastgetattr(tp, pkattnum[natt], desc,idxisnull[natt]); + Assert(!idxisnull[natt]); + } + + idxtuple = heap_form_tuple(indexdesc, idxvals, idxisnull); + + xlhdr.t_infomask2 = idxtuple-t_data-t_infomask2; + xlhdr.t_infomask = idxtuple-t_data-t_infomask; + xlhdr.t_hoff = idxtuple-t_data-t_hoff; + + rdata[1].next =(rdata[2]); + rdata[2].data = (char*)xlhdr; + rdata[2].len = SizeOfHeapHeader; + rdata[2].buffer = InvalidBuffer; + rdata[2].next = NULL; + + rdata[2].next =(rdata[3]); + rdata[3].data = (char *) idxtuple-t_data + offsetof(HeapTupleHeaderData, t_bits); + rdata[3].len = idxtuple-t_len - offsetof(HeapTupleHeaderData, t_bits); + rdata[3].buffer = InvalidBuffer; + rdata[3].next = NULL; + + heap_close(index_rel, NoLock); + no_index_found: + ; + } + recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_DELETE, rdata); PageSetLSN(page, recptr); It's not cool to do all that primary key lookup stuff within the critical section, while holding a lock on the page. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reporting hba lines
Magnus Hagander mag...@hagander.net writes: When debugging strange and complex pg_hba lines, it can often be quite useful to know which line is matching a particular connection that failed for some reason. Because more often than not, it's actually not using the line in pg_hba.conf that's expected. The easiest way to do this is to emit an errdetail for the login failure, per this patch. Question is - is that leaking information to the client that we shouldn't be leaking? Yes. And if it is, what would be the preferred way to deal with it? Report to the postmaster log only. errdetail_log should do. BTW, are you sure that auth_failed is only called in cases where an hba line has already been identified? Even if true today, it seems fairly risky to assume that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reporting hba lines
On Wed, Jun 27, 2012 at 3:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: When debugging strange and complex pg_hba lines, it can often be quite useful to know which line is matching a particular connection that failed for some reason. Because more often than not, it's actually not using the line in pg_hba.conf that's expected. The easiest way to do this is to emit an errdetail for the login failure, per this patch. Question is - is that leaking information to the client that we shouldn't be leaking? Yes. And if it is, what would be the preferred way to deal with it? Report to the postmaster log only. errdetail_log should do. Oh, I wasn't aware we had that :) You learn something new every day. BTW, are you sure that auth_failed is only called in cases where an hba line has already been identified? Even if true today, it seems fairly risky to assume that. It is true today, but yes, it might be safe to guard against it with something like this? I also fixed the error message to follow the guidelines better - I think :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ hba_line.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
On Wed, Jun 27, 2012 at 3:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: On Wed, Jun 27, 2012 at 3:50 AM, Tom Lane t...@sss.pgh.pa.us wrote: I wonder whether this design can be adapted to Windows? IIRC we do not have a bulletproof data directory lock scheme for Windows. It seems like this makes few enough demands on the lock mechanism that there ought to be suitable primitives available there too. I assume you're saying we need to make changes in the internal API, right? Because we alreayd have a windows native implementation of shared memory that AFAIK works, Right, but does it provide honest protection against starting two postmasters in the same data directory? Or more to the point, does it prevent starting a new postmaster when the old postmaster crashed but there are still orphaned backends making changes? AFAIR we basically punted on those problems for the Windows port, for lack of an equivalent to nattch. No, we spent a lot of time trying to *fix* it, and IIRC we did. We create a shared memory segment with a fixed name based on the data directory. This shared memory segment is inherited by all children. It will automatically go away only when all processes that have an open handle to it go away (in fact, it can even take a second or two more, if they go away by crash and not by cleanup - we have a workaround in the code for that). But as long as there is an orphaned backend around, the shared memory segment stays around. We don't have nattch. But we do have nattch0. Or something like that. You can work around it if you find two different paths to the same data directory (e.g .using junctions), but you are really actively trying to break the system if you do that... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] warning handling in Perl scripts
On 06/24/2012 04:05 PM, Robert Haas wrote: On Sun, Jun 24, 2012 at 2:40 PM, Peter Eisentrautpete...@gmx.net wrote: Every time I make a change to the structure of the catalog files, genbki.pl produces a bunch of warnings (like Use of uninitialized value in string eq at genbki.pl line ...), and produces corrupted output files, that are then (possibly) detected later by the compiler. Also, getting out of that is difficult because due to the complicated dependency relationship between the involved files, you need to remove a bunch of files manually, or clean everything. So error handling could be better. It seems that adding diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl index ebc4825..7d66da9 100644 --- a/src/backend/catalog/genbki.pl +++ b/src/backend/catalog/genbki.pl @@ -19,6 +19,8 @@ use strict; use warnings; +local $SIG{__WARN__} = sub { die $_[0] }; + my @input_files; our @include_path; my $output_path = ''; would address that. Could that cause any other problems? Should it be added to all Perl scripts? This seems like a band-aid. How about if we instead add whatever error-handling the script is missing, so that it produces an appropriate, human-readable error message? I realise I'm late to this party, but I'm with Robert. The root cause of the errors should be fixed. That's not to say that making warnings fatal might not also be a good idea as a general defense mechanism. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reporting hba lines
Magnus Hagander mag...@hagander.net writes: On Wed, Jun 27, 2012 at 3:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: BTW, are you sure that auth_failed is only called in cases where an hba line has already been identified? Even if true today, it seems fairly risky to assume that. It is true today, but yes, it might be safe to guard against it with something like this? FWIW, the usual approach for conditionally emitting bits of an ereport is more like ereport(FATAL, (errcode(errcode_return), errmsg(errstr, port-user_name), port-hba ? errdetail_log(Connection matched pg_hba.conf line %d, port-hba-linenumber) : 0)); but that's just a nitpick. A bigger issue is that I'm not convinced that a line number will be tremendously helpful: it's easy to miscount lines, and a line number will certainly not be helpful in the frequent cases where people are modifying the wrong hba file. Can we show the source text of the hba line? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Regarding WAL Format Changes
While reading patch-1 ( http://archives.postgresql.org/pgsql-hackers/2012-06/txtFskHiYakjO.txt 1-use-uint64-got-segno.patch) of WAL Format Changes(http://archives.postgresql.org/message-id/4FDA5136.6080206@enterpris edb.com), I had few observations which are summarized below: 1. Function header for following functions still contains referece to log, seg a. InstallXLogFileSegment() b. RemoveOldXlogFiles() c. XLogFileCopy() d. XLogGetLastRemoved() e. UpdateLastRemovedPtr() f. RemoveOldXlogFiles() 2. @@ -2680,8 +2645,8 @@ InstallXLogFileSegment(uint32 *log, uint32 *seg, char *tmppath, LWLockRelease(ControlFileLock); ereport(LOG, (errcode_for_file_access(), - errmsg(could not link file \%s\ to \%s\ (initialization of log file %u, segment %u): %m, -tmppath, path, *log, *seg))); + errmsg(could not link file \%s\ to \%s\ (initialization of log file): %m, +tmppath, path))); If Changed error message can contain log file and segment number, it would be more clear. That should be easily deducible from segment number. 3. -RemoveOldXlogFiles(uint32 log, uint32 seg, XLogRecPtr endptr) +RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr) . . . @@ -4016,8 +3953,9 @@ retry: if (!(((XLogPageHeader) readBuf)-xlp_info XLP_FIRST_IS_CONTRECORD)) { ereport(emode_for_corrupt_record(emode, *RecPtr), -(errmsg(there is no contrecord flag in log file %u, segment %u, offset %u, -readId, readSeg, readOff))); +(errmsg(there is no contrecord flag in log segment %s, offset %u, + XLogFileNameP(curFileTLI, readSegNo), +readOff))); goto next_record_is_invalid; } pageHeaderSize = XLogPageHeaderSize((XLogPageHeader) readBuf); @@ -4025,10 +3963,13 @@ retry: if (contrecord-xl_rem_len == 0 || total_len != (contrecord-xl_rem_len + gotlen)) { +char fname[MAXFNAMELEN]; +XLogFileName(fname, curFileTLI, readSegNo); ereport(emode_for_corrupt_record(emode, *RecPtr), -(errmsg(invalid contrecord length %u in log file %u, segment %u, offset %u, +(errmsg(invalid contrecord length %u in log segment %s, offset %u, contrecord-xl_rem_len, -readId, readSeg, readOff))); + XLogFileNameP(curFileTLI, readSegNo), +readOff))); goto next_record_is_invalid; } For the above 2 changed error messages, 'log segment' is used for filename. However all similar changes has 'log file' for filename. There are some places where 'log segment' is used and other places it is 'log file'. So is there any particular reason for it? 4. @@ -533,33 +533,17 @@ pg_xlog_location_diff(PG_FUNCTION_ARGS) -/* - * Sanity check - */ -if (loc1.xrecoff XLogFileSize) -ereport(ERROR, -(errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg(xrecoff \%X\ is out of valid range, 0..%X, loc1.xrecoff, XLogFileSize))); -if (loc2.xrecoff XLogFileSize) -ereport(ERROR, -(errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg(xrecoff \%X\ is out of valid range, 0..%X, loc2.xrecoff, XLogFileSize))); +bytes1 = (((uint64)loc1.xlogid) 32L) + loc1.xrecoff; +bytes2 = (((uint64)loc2.xlogid) 32L) + loc2.xrecoff; Is there no chance that it can be out of valid range after new changes, just a doubt? 5. --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -69,11 +69,12 @@ walrcv_disconnect_type walrcv_disconnect = NULL; /* * These variables are used similarly to openLogFile/Id/Seg/Off, - * but for walreceiver to write the XLOG. + * but for walreceiver to write the XLOG. recvFileTLI is the TimeLineID In the above comments, there is still reference to Id/Seg/Off. With Regards, Amit Kapila.
Re: [HACKERS] Posix Shared Mem patch
Magnus Hagander mag...@hagander.net writes: On Wed, Jun 27, 2012 at 3:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: AFAIR we basically punted on those problems for the Windows port, for lack of an equivalent to nattch. No, we spent a lot of time trying to *fix* it, and IIRC we did. OK, in that case this isn't as interesting as I thought. If we do go over to a file-locking-based solution on Unix, it might be worthwhile changing to something similar on Windows. But it would be more about reducing coding differences between the platforms than plugging any real holes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reporting hba lines
On Wed, Jun 27, 2012 at 4:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: On Wed, Jun 27, 2012 at 3:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: BTW, are you sure that auth_failed is only called in cases where an hba line has already been identified? Even if true today, it seems fairly risky to assume that. It is true today, but yes, it might be safe to guard against it with something like this? FWIW, the usual approach for conditionally emitting bits of an ereport is more like ereport(FATAL, (errcode(errcode_return), errmsg(errstr, port-user_name), port-hba ? errdetail_log(Connection matched pg_hba.conf line %d, port-hba-linenumber) : 0)); Hmm. Ok. So it treats a 0/NULL there as a way to ignore it. I tried something with the NULL inside the errdetail, which obviously failed. but that's just a nitpick. A bigger issue is that I'm not convinced that a line number will be tremendously helpful: it's easy to miscount lines, and a line number will certainly not be helpful in the frequent Editors will help you count the lines, no? :-) cases where people are modifying the wrong hba file. Can we show the source text of the hba line? We don't currently keep the full source text around - but we certainly could do that if we wanted to. I'm not sure how much it helps - usually, you're going to end up on a line that's completely irrelevant if you get the wrong hba file (e.g. a comment or a line that's not even in the file at all due to size). Maybe we should just include the *name* of the HBA file in the error message? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
On Wed, Jun 27, 2012 at 9:44 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Jun 27, 2012 at 12:00 AM, Tom Lane t...@sss.pgh.pa.us wrote: Would Posix shmem help with that at all? Why did you choose not to use the Posix API, anyway? It seemed more complicated. If we use the POSIX API, we've got to have code to find a non-colliding name for the shm, and we've got to arrange to clean it up at process exit. Anonymous shm doesn't require a name and goes away automatically when it's no longer in use. I see. Those are pretty good reasons ... With respect to EXEC_BACKEND, I wasn't proposing to kill it, just to make it continue to use a full-sized sysv shm. Well, if the ultimate objective is to get out from under the SysV APIs entirely, we're not going to get there if we still have to have all that code for the EXEC_BACKEND case. Maybe it's time to decide that we don't need to support EXEC_BACKEND on Unix. I don't personally see a need to do anything that drastic at this point. Admittedly, I rarely compile with EXEC_BACKEND, but I don't think it's bad to have the option available. Adjusting shared memory limits isn't really a big problem for PostgreSQL developers; what we're trying to avoid is the need for PostgreSQL *users* to concern themselves with it. And surely anyone who is using EXEC_BACKEND on Unix is a developer, not a user. If and when we come up with a substitute for the nattch interlock, then this might be worth thinking a bit harder about. At that point, if we still want to support EXEC_BACKEND on Unix, then we'd need the EXEC_BACKEND case at least to use POSIX shm rather than anonymous shared mmap. Personally I think that would be not that hard and probably worth doing, but there doesn't seem to be any point in writing that code now, because for the simple case of just reducing the amount of shm that we allocate, an anonymous mapping seems better all around. We shouldn't overthink this. Our shared memory code has allocated a bunch of crufty hacks over the years to work around various platform-specific issues, but it's still not a lot of code, so I don't see any reason to worry unduly about making a surgical fix without having a master plan. Nothing we want to do down the road will require moving the earth. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reporting hba lines
Magnus Hagander mag...@hagander.net writes: On Wed, Jun 27, 2012 at 4:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: cases where people are modifying the wrong hba file. Can we show the source text of the hba line? We don't currently keep the full source text around - but we certainly could do that if we wanted to. If we're concerned about providing a message like this, I think it'd be well worthwhile. We presumably would only store the active lines not comment lines, so the extra memory space would be negligible in just about any real use-case. I'm not sure how much it helps - usually, you're going to end up on a line that's completely irrelevant if you get the wrong hba file (e.g. a comment or a line that's not even in the file at all due to size). Only if you count accurately, and aren't fooled into miscounting by the expectation that you must arrive on a non-comment line. I don't buy the the editor can do it for you argument either, at least not since noticing that recent versions of emacs don't count lines the way I do. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
On Wed, Jun 27, 2012 at 9:52 AM, Stephen Frost sfr...@snowman.net wrote: What this all boils down to is- can you have a shm segment that goes away when no one is still attached to it, but actually give it a name and then detect if it already exists atomically on startup on Linux/Unixes? If so, perhaps we could use the same mechanism on both.. As I understand it, no. You can either have anonymous shared mappings, which go away when no longer in use but do not have a name. Or you can have POSIX or sysv shm, which have a name but do not automatically go away when no longer in use. There seems to be no method for setting up a segment that both has a name and goes away automatically. POSIX shm in particular tries to look like a file, whereas anonymous memory tries to look more like malloc (except that you can share the mapping with child processes). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regarding WAL Format Changes
On 27.06.2012 17:14, Amit Kapila wrote: 1. Function header for following functions still contains referece to log, seg a. InstallXLogFileSegment() b. RemoveOldXlogFiles() c. XLogFileCopy() d. XLogGetLastRemoved() e. UpdateLastRemovedPtr() f. RemoveOldXlogFiles() Thanks, fixed. 2. @@ -2680,8 +2645,8 @@ InstallXLogFileSegment(uint32 *log, uint32 *seg, char *tmppath, LWLockRelease(ControlFileLock); ereport(LOG, (errcode_for_file_access(), - errmsg(could not link file \%s\ to \%s\ (initialization of log file %u, segment %u): %m, -tmppath, path, *log, *seg))); + errmsg(could not link file \%s\ to \%s\ (initialization of log file): %m, +tmppath, path))); If Changed error message can contain log file and segment number, it would be more clear. That should be easily deducible from segment number. That seems redundant. The target file name is calculated from the segment number, and we're now using the file name instead of log+seg in other messages too. 3. -RemoveOldXlogFiles(uint32 log, uint32 seg, XLogRecPtr endptr) +RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr) . . . @@ -4016,8 +3953,9 @@ retry: if (!(((XLogPageHeader) readBuf)-xlp_info XLP_FIRST_IS_CONTRECORD)) { ereport(emode_for_corrupt_record(emode, *RecPtr), -(errmsg(there is no contrecord flag in log file %u, segment %u, offset %u, -readId, readSeg, readOff))); +(errmsg(there is no contrecord flag in log segment %s, offset %u, + XLogFileNameP(curFileTLI, readSegNo), +readOff))); goto next_record_is_invalid; } pageHeaderSize = XLogPageHeaderSize((XLogPageHeader) readBuf); @@ -4025,10 +3963,13 @@ retry: if (contrecord-xl_rem_len == 0 || total_len != (contrecord-xl_rem_len + gotlen)) { +char fname[MAXFNAMELEN]; +XLogFileName(fname, curFileTLI, readSegNo); ereport(emode_for_corrupt_record(emode, *RecPtr), -(errmsg(invalid contrecord length %u in log file %u, segment %u, offset %u, +(errmsg(invalid contrecord length %u in log segment %s, offset %u, contrecord-xl_rem_len, -readId, readSeg, readOff))); + XLogFileNameP(curFileTLI, readSegNo), +readOff))); goto next_record_is_invalid; } For the above 2 changed error messages, 'log segment' is used for filename. However all similar changes has 'log file' for filename. There are some places where 'log segment' is used and other places it is 'log file'. So is there any particular reason for it? Not really. There are several messages that use log file %s, and also several places that use log segment %s Should we make it consistent and use either log segment or log file everywhere? 4. @@ -533,33 +533,17 @@ pg_xlog_location_diff(PG_FUNCTION_ARGS) -/* - * Sanity check - */ -if (loc1.xrecoff XLogFileSize) -ereport(ERROR, -(errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg(xrecoff \%X\ is out of valid range, 0..%X, loc1.xrecoff, XLogFileSize))); -if (loc2.xrecoff XLogFileSize) -ereport(ERROR, -(errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg(xrecoff \%X\ is out of valid range, 0..%X, loc2.xrecoff, XLogFileSize))); +bytes1 = (((uint64)loc1.xlogid) 32L) + loc1.xrecoff; +bytes2 = (((uint64)loc2.xlogid) 32L) + loc2.xrecoff; Is there no chance that it can be out of valid range after new changes, just a doubt? No. Not in the sense it used to be, anyway, the XLogFileSize check is no longer relevant. Perhaps we should check for InvalidXLogRecPtr or that the pointer doesn't point e.g in the middle of a page header. But then again, this calculation works fine with both of those cases, so I see no reason to make it stricter. 5. --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -69,11 +69,12 @@
[HACKERS] Re: [COMMITTERS] pgsql: Move WAL continuation record information to WAL page header.
On 27.06.2012 17:53, Andres Freund wrote: I had noticed one thing when reviewing the patches before: @@ -717,6 +719,15 @@ XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata) booldoPageWrites; boolisLogSwitch = (rmid == RM_XLOG_ID info == XLOG_SWITCH); uint8 info_orig = info; + static XLogRecord *rechdr; + + if (rechdr == NULL) + { + rechdr = malloc(SizeOfXLogRecord); + if (rechdr == NULL) + elog(ERROR, out of memory); + MemSet(rechdr, 0, SizeOfXLogRecord); + } /* cross-check on whether we should be here or not */ if (!XLogInsertAllowed()) Why do you allocate this dynamically? XLogRecord is 32bytes, there doesn't seem to be much point in this? On 64-bit architectures, the struct needs padding at the end to make the size MAXALIGNed to 32 bytes; a simple XLogRecord rechdr local variable would not include that. You could do something like: union { XLogRecord rechdr; char bytes[SizeOfXLogRecord]; } but that's quite ugly too. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] warning handling in Perl scripts
On Jun 27, 2012, at 4:07 PM, Andrew Dunstan wrote: I realise I'm late to this party, but I'm with Robert. The root cause of the errors should be fixed. That's not to say that making warnings fatal might not also be a good idea as a general defense mechanism. ISTM that if they are fatal, one will be more motivated to fix them. I think that was the point. David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks
Excerpts from Kevin Grittner's message of mié jun 27 08:40:58 -0400 2012: Alvaro Herrera wrote: here's fklocks v14, which also merges to new master as there were several other conflicts. It passes make installcheck-world now. Recent commits broke it again, so here's a rebased v15. (No changes other than attempting to merge your work with the pg_controldata and pg_resetxlog changes and wrapping that FIXME in a comment.) Oooh, so sorry -- I was supposed to have git-stashed that before producing the patch. This change cannot have caused the pg_dumpall problem below though, because it only touched the multixact cache code. Using that patch, pg_upgrade completes, but pg_dumpall fails: Hmm, something changed enough that requires more than just a code merge. I'll look into it. Thanks for the merge. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regarding WAL Format Changes
Excerpts from Heikki Linnakangas's message of mié jun 27 10:56:13 -0400 2012: On 27.06.2012 17:14, Amit Kapila wrote: For the above 2 changed error messages, 'log segment' is used for filename. However all similar changes has 'log file' for filename. There are some places where 'log segment' is used and other places it is 'log file'. So is there any particular reason for it? Not really. There are several messages that use log file %s, and also several places that use log segment %s Should we make it consistent and use either log segment or log file everywhere? I think it would be better to use log segment for WAL segments. That way we don't cause confusion with the regular text/csv log output files. Heck, maybe even WAL segment instead of log. As a translator, I can't have a single, clear explanation of what log file is because there are multiple meanings. It would be better not to depend on context. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regarding WAL Format Changes
On Wed, Jun 27, 2012 at 11:56 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 27.06.2012 17:14, Amit Kapila wrote: 1. Function header for following functions still contains referece to log, seg a. InstallXLogFileSegment() b. RemoveOldXlogFiles() c. XLogFileCopy() d. XLogGetLastRemoved() e. UpdateLastRemovedPtr() f. RemoveOldXlogFiles() Thanks, fixed. There is still reference to log/seg in the comment of InstallXLogFileSegment(). The attached patch should be applied? Regards, -- Fujii Masao logseg2segno_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.3] Row-Level Security
On Jun27, 2012, at 15:07 , Kohei KaiGai wrote: Probably, PlannedStmt-invalItems allows to handle invalidation of plan-cache without big code changes. I'll try to put a flag of user-id to track the query plan with RLS assumed, or InvalidOid if no RLS was applied in this plan. I'll investigate the implementation for more details. Do we have any other scenario that run a query plan under different user privilege rather than planner stage? Hm, what happens if a SECURITY DEFINER functions returns a refcursor? Actually, I wonder how we handle that today. If the executor is responsible for permission checks, that wouldn't we apply the calling function's privilege level in that case, at least of the cursor isn't fetched from in the SECURITY DEFINER function? If I find some time, I'll check... best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Move WAL continuation record information to WAL page header.
On Wednesday, June 27, 2012 05:09:46 PM Heikki Linnakangas wrote: On 27.06.2012 17:53, Andres Freund wrote: I had noticed one thing when reviewing the patches before: @@ -717,6 +719,15 @@ XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata) booldoPageWrites; boolisLogSwitch = (rmid == RM_XLOG_ID info == XLOG_SWITCH); uint8 info_orig = info; + static XLogRecord *rechdr; + + if (rechdr == NULL) + { + rechdr = malloc(SizeOfXLogRecord); + if (rechdr == NULL) + elog(ERROR, out of memory); + MemSet(rechdr, 0, SizeOfXLogRecord); + } /* cross-check on whether we should be here or not */ if (!XLogInsertAllowed()) Why do you allocate this dynamically? XLogRecord is 32bytes, there doesn't seem to be much point in this? On 64-bit architectures, the struct needs padding at the end to make the size MAXALIGNed to 32 bytes; a simple XLogRecord rechdr local variable would not include that. You could do something like: union { XLogRecord rechdr; char bytes[SizeOfXLogRecord]; } but that's quite ugly too. Ah, yes. Makes sense. Btw, the XLogRecord struct is directly layed out with 32bytes here (64bit, linux) because xl_prev needs to be 8byte aligned... Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regarding WAL Format Changes
On 27.06.2012 18:55, Fujii Masao wrote: On Wed, Jun 27, 2012 at 11:56 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 27.06.2012 17:14, Amit Kapila wrote: 1. Function header for following functions still contains referece to log, seg a. InstallXLogFileSegment() b. RemoveOldXlogFiles() c. XLogFileCopy() d. XLogGetLastRemoved() e. UpdateLastRemovedPtr() f. RemoveOldXlogFiles() Thanks, fixed. There is still reference to log/seg in the comment of InstallXLogFileSegment(). The attached patch should be applied? Thanks, applied. Sorry for being so sloppy with this.. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regarding WAL Format Changes
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: So is there any particular reason for it? Not really. There are several messages that use log file %s, and also several places that use log segment %s Should we make it consistent and use either log segment or log file everywhere? +1 for uniformity. I think I'd vote for using file and eliminating the segment terminology altogether, but the other direction would be okay too, and might require fewer changes. IIRC, in the original coding segment meant 16MB worth of WAL while file was sometimes used to denote 4GB worth (ie, the boundaries where we had to increment the high half of the LSN struct). Now that 4GB boundaries are not special, there's no reason to retain the file concept or terminology. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] pg_basebackup blocking all queries with horrible performance
On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas robertmh...@gmail.com wrote: On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander mag...@hagander.net wrote: You agreed to add something like NOSYNC option into START_REPLICATION command? I'm on the fence. I was hoping somebody else would chime in with an opinion as well. +1 Nobody else with any opinion on this? :( I don't think we really need a NOSYNC flag at this point. Just not setting the flush location in clients that make a point of flushing in a timely fashion seems fine. Okay, I'm in the minority, so I'm writing the patch that way. WIP patch attached. In the patch, pg_basebackup background process and pg_receivexlog always return invalid location as flush one, and will never become sync standby even if their name is in synchronous_standby_names. The timing of their sending the reply depends on the standby_message_timeout specified in -s option. So the write position may lag behind the true position. pg_receivexlog accepts new option -S (better option character?). If this option is specified, pg_receivexlog returns true flush position, and can become sync standby. It sends back the reply to the master each time the write position changes or the timeout passes. If synchronous_commit is set to remote_write, synchronous replication to pg_receivexlog would work well. The patch needs more documentation. But I think that it's worth reviewing the code in advance, so I attached the WIP patch. Comments? Objections? The patch is based on current HEAD, i.e., 9.3dev. If the patch is applied, we need to write the backport version of the patch for 9.2. Regards, -- Fujii Masao pg_receivexlog_syncstandby_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regarding WAL Format Changes
On Thu, Jun 28, 2012 at 1:13 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 27.06.2012 18:55, Fujii Masao wrote: On Wed, Jun 27, 2012 at 11:56 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 27.06.2012 17:14, Amit Kapila wrote: 1. Function header for following functions still contains referece to log, seg a. InstallXLogFileSegment() b. RemoveOldXlogFiles() c. XLogFileCopy() d. XLogGetLastRemoved() e. UpdateLastRemovedPtr() f. RemoveOldXlogFiles() Thanks, fixed. There is still reference to log/seg in the comment of InstallXLogFileSegment(). The attached patch should be applied? Thanks, applied. Sorry for being so sloppy with this.. No problem. WAL format change you did is really nice! Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] pg_basebackup blocking all queries with horrible performance
On 27 June 2012 18:24, Fujii Masao masao.fu...@gmail.com wrote: will never become sync standby even if their name is in synchronous_standby_names. I don't understand why you'd want that. What is wrong with removing the name from synchronous_standby_names if you don't like it? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
On Jun 27, 2012, at 7:34 AM, Robert Haas wrote: On Wed, Jun 27, 2012 at 12:00 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: So, here's a patch. Instead of using POSIX shmem, I just took the expedient of using mmap() to map a block of MAP_SHARED|MAP_ANONYMOUS memory. The sysv shm is still allocated, but it's just a copy of PGShmemHeader; the real shared memory is the anonymous block. This won't work if EXEC_BACKEND is defined so it just falls back on straight sysv shm in that case. Um. I hadn't thought about the EXEC_BACKEND interaction, but that seems like a bit of a showstopper. I would not like to give up the ability to debug EXEC_BACKEND mode on Unixen. Would Posix shmem help with that at all? Why did you choose not to use the Posix API, anyway? It seemed more complicated. If we use the POSIX API, we've got to have code to find a non-colliding name for the shm, and we've got to arrange to clean it up at process exit. Anonymous shm doesn't require a name and goes away automatically when it's no longer in use. With respect to EXEC_BACKEND, I wasn't proposing to kill it, just to make it continue to use a full-sized sysv shm. I solved this by unlinking the posix shared memory segment immediately after creation. The file descriptor to the shared memory is inherited, so, by definition, only the postmaster children can access the memory. This ensures that shared memory cleanup is immediate after the postmaster and all children close, as well. The fcntl locking is not required to protect the posix shared memory- it can protect itself. Cheers, M -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte
On Thu, May 24, 2012 at 12:04 AM, Alexander Korotkov aekorot...@gmail.com wrote: Thanks. I rewrote inverse conversion from pg_wchar to mule. New version of patch is attached. Review: It looks to me like pg_wchar2utf_with_len will not work, because unicode_to_utf8 returns its second argument unmodified - not, as your code seems to assume, the byte following what was already written. MULE also looks problematic. The code that you've written isn't symmetric with the opposite conversion, unlike what you did in all other cases, and I don't understand why. I'm also somewhat baffled by the reverse conversion: it treats a multi-byte sequence beginning with a byte for which IS_LCPRV1(x) returns true as invalid if there are less than 3 bytes available, but it only reads two; similarly, for IS_LCPRV2(x), it demands 4 bytes but converts only 3. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
On Sat, Jun 2, 2012 at 10:44 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Tom Lane wrote: Simon Riggs writes: On 31 May 2012 15:00, Tom Lane wrote: If we want to finish the beta cycle in a reasonable time period and get back to actual development, we have to refrain from adding more possibly-destabilizing development work to 9.2. And that is what this is. In what way is it possibly destabilising? I'm prepared to believe that it only affects performance, but it could be destabilizing to that. It needs proper review and testing, and the next CF is the right environment for that to happen. +1 This is not a bug fix or even a fix for a performance regression. The train has left the station; the next one will be along shortly. And here it is. There are a couple of outstanding issues here upon which it would be helpful to get input from a few more people: 1. Are there any call sites from which this should be disabled, either because the optimization won't help, or because the caller is holding a lock that we don't want them sitting on for a moment longer than necessary? I think the worst case is that we're doing something like splitting the root page of a btree, and now nobody can walk the btree until the flush finishes, and here we are doing an unnecessary sleep. I am not sure where I can construct a workload where this is a real as opposed to a theoretical problem, though. So maybe we should just not worry about it. It suffices for this to be an improvement over the status quo; it doesn't have to be perfect. Thoughts? 2. Should we rename the GUCs, since this patch will cause them to control WAL flush in general, as opposed to commit specifically? Peter Geoghegan and Simon were arguing that we should retitle it to group_commit_delay rather than just commit_delay, but that doesn't seem to be much of an improvement in describing what the new behavior will actually be, and I am doubtful that it is worth creating a naming incompatibility with previous releases for a cosmetic change. I suggested wal_flush_delay, but there's no consensus on that. Opinions? Also, I think I see a straightforward, if minor, improvement. Right now, the patch has this: * Sleep before flush! By adding a delay here, we may give further * backends the opportunity to join the backlog of group commit * followers; this can significantly improve transaction throughput, at * the risk of increasing transaction latency. * * We do not sleep if enableFsync is not turned on, nor if there are * fewer than CommitSiblings other backends with active transactions. */ if (CommitDelay 0 enableFsync MinimumActiveBackends(CommitSiblings)) pg_usleep(CommitDelay); /* Got the lock */ LogwrtResult = XLogCtl-LogwrtResult; if (!XLByteLE(record, LogwrtResult.Flush)) { /* try to write/flush later additions to XLOG as well */ if (LWLockConditionalAcquire(WALInsertLock, LW_EXCLUSIVE)) The XLByteLE test four lines from the bottom should happen before we consider whether to sleep, because it's possible that we'll discover that our flush request has already been satisfied and exit without doing anything, in which case the sleep is a waste. The attached version adjusts the logic accordingly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company move_delay_2012_06_27.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)
On 27 June 2012 23:01, Robert Haas robertmh...@gmail.com wrote: 1. Are there any call sites from which this should be disabled, either because the optimization won't help, or because the caller is holding a lock that we don't want them sitting on for a moment longer than necessary? I think the worst case is that we're doing something like splitting the root page of a btree, and now nobody can walk the btree until the flush finishes, and here we are doing an unnecessary sleep. I am not sure where I can construct a workload where this is a real as opposed to a theoretical problem, though. So maybe we should just not worry about it. It suffices for this to be an improvement over the status quo; it doesn't have to be perfect. Thoughts? I also find it hard to believe this is a concern. As I've said before though, it hasn't been adequately explained just how you're supposed to skip the delay if you're in this situation. It's highly unlikely that you'll be the one doing the delaying anyway. Skipping the delay iff you're the leader when this might possibly be a problem will only *conceivably* be effective in a tiny minority of cases, so this seems to make little sense to me. Look at the original benchmark - there are latency numbers there too. The patch actually performs slightly better than baseline in terms of latency (worst and average cases) as well as throughput. To get back to bus analogies again, if people start treating buses like Taxis, you get slightly better minimum latency figures (not included in any benchmark performed), because one or two people immediately high-jacked a bus and left on an hour long journey rather than waiting another 15 minutes for more passengers to come along. That doesn't seem like it should be something to concern ourselves with. More to the point, I think that there are laws of physics reasons why these backends that might be particularly adversely affected by a delay (due to some eventuality like the one you described) cannot *really* avoid a delay. Of course this isn't an absolute, and certainly won't apply when there are relatively few clients, when the delay really is useless, but then we trust the dba to set commit_siblings (and indeed commit_delay) correctly, and it's hardly that big of a deal, since it's only typically a fraction of a single fsync() longer at worst - certainly not a total wait that is longer than the total wait the backend could reasonably expect to have. Incidentally, I'm guessing someone is going to come up with a rule-of-thumb for setting commit_delay that sounds something like half the latency of your wal_sync_method as reported by pg_test_fsync(). 2. Should we rename the GUCs, since this patch will cause them to control WAL flush in general, as opposed to commit specifically? Peter Geoghegan and Simon were arguing that we should retitle it to group_commit_delay rather than just commit_delay, but that doesn't seem to be much of an improvement in describing what the new behavior will actually be, and I am doubtful that it is worth creating a naming incompatibility with previous releases for a cosmetic change. I suggested wal_flush_delay, but there's no consensus on that. Opinions? My main problem with the existing name is that every single article on commit_delay since it was introduce over ten years ago has been on balance very negative. Greg Smith's book, the docs, my talk at PgCon, and any other material in existence about commit_delay that I'm aware of says the same thing. commit_delay in master is essentially the same now as it was in 2000; it's just that commit_siblings is a bit smarter, in that it is now based on number of active transactions specifically. Now, based on the fact that the doc changes concerning the practical details of setting commit_delay survived your editorialisations, I take it that you accept my view that this is going to fundamentally alter the advice that surrounds commit_delay, from just don't touch it to something like this is something that you should probably consider, that may be very compelling in certain situations. Let's not shoot ourselves in the foot by keeping the old, disreputable name. You felt that wal_flush_delay more accurately reflected what the new setting actually does. You were right, and I'd be perfectly happy to go along with that. The XLByteLE test four lines from the bottom should happen before we consider whether to sleep, because it's possible that we'll discover that our flush request has already been satisfied and exit without doing anything, in which case the sleep is a waste. The attached version adjusts the logic accordingly. Seems like a minor point, unlikely to make any appreciable difference, but there's no reason to think that it will have any negative effect compared to what I've proposed either, so that's fine with me. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent
[HACKERS] pg_signal_backend() asymmetry
Hi all, I have one nitpick related to the recent changes for pg_cancel_backend() and pg_terminate_backend(). If you use these functions as an unprivileged user, and try to signal a nonexistent PID, you get: ERROR: must be superuser or have the same role to cancel queries running in other server processes Whereas if you do the same thing as a superuser, you get: WARNING: PID 123 is not a PostgreSQL server process pg_cancel_backend --- f (1 row) The comment above the WARNING generated for the latter case says: /* * This is just a warning so a loop-through-resultset will not abort * if one backend terminated on its own during the run */ which is nice, but wouldn't unprivileged users want the same behavior? Not to mention, the ERROR above is misleading, as it claims the nonexistent PID really belongs to another user. A simple fix is attached. The existing code called both BackendPidGetProc(pid) and IsBackendPid(pid) while checking a non-superuser's permissions, which really means two separate calls to BackendPidGetProc(pid). I simplified that to down to a single BackendPidGetProc(pid) call. Josh pg_signal_backend_asymmetry.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for Todo Item : Provide fallback_application_name in contrib/pgbench, oid2name, and dblink
On Thu, Jun 14, 2012 at 10:55 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Jun 13, 2012 at 12:07 AM, Amit Kapila amit.kap...@huawei.com wrote: To achieve the same in dblink, we need to parse the passed connection string and check if it contains fallback_application_name, if yes then its okay, otherwise we need to append fallback_application_name in connection string. That seems undesirable. I don't think this is important enough to be worth reparsing the connection string for. I'd just forget about doing it for dblink if there's no cheaper way. Indeed reparsing connection string is not cheap, but dblink does it for checking password requirement for non-in dblink_connstr_check when the local user was not a superuser. So Amit's idea doesn't seem unreasonable to me, if we can avoid extra PQconninfoParse call. Just an idea, but how about pushing fallback_application_name handling into dblink_connstr_check? We reparse connection string unless local user was a superuser, so it would not be serious overhead in most cases. Although it might require changes in DBLINK_GET_CONN macro... Regards, -- Shigeru Hanada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] We probably need autovacuum_max_wraparound_workers
Folks, Yeah, I can't believe I'm calling for *yet another* configuration variable either. Suggested workaround fixes very welcome. The basic issue is that autovacuum_max_workers is set by most users based on autovac's fairly lightweight action most of the time: analyze, vacuuming pages not on the visibility list, etc. However, when XID wraparound kicks in, then autovac starts reading entire tables from disk ... and those tables may be very large. This becomes a downtime issue if you've set autovacuum_max_workers to, say, 5 and several large tables hit the wraparound threshold at the same time (as they tend to do if you're using the default settings). Then you have 5 autovacuum processes concurrently doing heavy IO and getting in each others' way. I've seen this at two sites now, and my conclusion is that a single autovacuum_max_workers isn't sufficient if to cover the case of wraparound vacuum. Nor can we just single-thread the wraparound vacuum (i.e. just one worker) since that would hurt users who have thousands of small tables. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for Todo Item : Provide fallback_application_name in contrib/pgbench, oid2name, and dblink
On Wed, Jun 27, 2012 at 9:57 PM, Shigeru HANADA shigeru.han...@gmail.com wrote: On Thu, Jun 14, 2012 at 10:55 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Jun 13, 2012 at 12:07 AM, Amit Kapila amit.kap...@huawei.com wrote: To achieve the same in dblink, we need to parse the passed connection string and check if it contains fallback_application_name, if yes then its okay, otherwise we need to append fallback_application_name in connection string. That seems undesirable. I don't think this is important enough to be worth reparsing the connection string for. I'd just forget about doing it for dblink if there's no cheaper way. Indeed reparsing connection string is not cheap, but dblink does it for checking password requirement for non-in dblink_connstr_check when the local user was not a superuser. So Amit's idea doesn't seem unreasonable to me, if we can avoid extra PQconninfoParse call. Just an idea, but how about pushing fallback_application_name handling into dblink_connstr_check? We reparse connection string unless local user was a superuser, so it would not be serious overhead in most cases. Although it might require changes in DBLINK_GET_CONN macro... *shrug* If it can be done without costing anything meaningful, I don't object, but I would humbly suggest that this is not hugely important one way or the other. application_name is primarily a monitoring convenience, so it's not hugely important to have it set anyway, and fallback_application_name is only going to apply in cases where the user doesn't care enough to bother setting application_name. Let's not knock ourselves out to solve a problem that may not be that important to begin with. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We probably need autovacuum_max_wraparound_workers
Josh Berkus j...@agliodbs.com writes: Yeah, I can't believe I'm calling for *yet another* configuration variable either. Suggested workaround fixes very welcome. The basic issue is that autovacuum_max_workers is set by most users based on autovac's fairly lightweight action most of the time: analyze, vacuuming pages not on the visibility list, etc. However, when XID wraparound kicks in, then autovac starts reading entire tables from disk ... and those tables may be very large. It doesn't seem to me that this has much of anything to do with wraparound; that just happens to be one possible trigger condition for a lot of vacuuming activity to be happening. (Others are bulk data loads or bulk updates, for instance.) Nor am I convinced that changing the max_workers setting is an appropriate fix anyway. I think what you've really got here is inappropriate autovacuum cost delay settings, and/or the logic in autovacuum.c to try to divvy up the available I/O capacity by tweaking workers' delay settings isn't working very well. It's hard to propose improvements without a lot more detail than you've provided, though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We probably need autovacuum_max_wraparound_workers
On Jun 27, 2012, at 22:00, Josh Berkus j...@agliodbs.com wrote: Folks, Yeah, I can't believe I'm calling for *yet another* configuration variable either. Suggested workaround fixes very welcome. The basic issue is that autovacuum_max_workers is set by most users based on autovac's fairly lightweight action most of the time: analyze, vacuuming pages not on the visibility list, etc. However, when XID wraparound kicks in, then autovac starts reading entire tables from disk ... and those tables may be very large. This becomes a downtime issue if you've set autovacuum_max_workers to, say, 5 and several large tables hit the wraparound threshold at the same time (as they tend to do if you're using the default settings). Then you have 5 autovacuum processes concurrently doing heavy IO and getting in each others' way. I've seen this at two sites now, and my conclusion is that a single autovacuum_max_workers isn't sufficient if to cover the case of wraparound vacuum. Nor can we just single-thread the wraparound vacuum (i.e. just one worker) since that would hurt users who have thousands of small tables. Would there be enough benefit to setting up separate small/medium?/large thresholds with user-changeable default table size boundaries so that you can configure 6 workers where 3 handle the small tables, 2 handle the medium tables, and 1 handles the large tables. Or alternatively a small worker consumes 1, medium 2, and large 3 'units' from whatever size pool has been defined. So you could have 6 small tables or two large tables in-progress simultaneously. David J. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We probably need autovacuum_max_wraparound_workers
I think what you've really got here is inappropriate autovacuum cost delay settings, and/or the logic in autovacuum.c to try to divvy up the available I/O capacity by tweaking workers' delay settings isn't working very well. It's hard to propose improvements without a lot more detail than you've provided, though. Wait, we *have* that logic? If so, that's the problem ... it's not working very well. What detail do you want? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We probably need autovacuum_max_wraparound_workers
Josh, all, * Josh Berkus (j...@agliodbs.com) wrote: Yeah, I can't believe I'm calling for *yet another* configuration variable either. Suggested workaround fixes very welcome. As I suggested on IRC, my thought would be to have a goal-based system for autovacuum which is similar to our goal-based commit system. We don't need autovacuum sucking up all the I/O in the box, nor should we ask the users to manage that. Instead, let's decide when the autovacuum on a given table needs to finish and then plan to keep on working at a rate that'll allow us to get done well in advance of that deadline. Just my 2c. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Server crash while trying to fetch EXPLAIN query results with a cursor
Rushabh Lathia rushabh.lat...@gmail.com writes: Above testcase endup with server crash. Crash is due to following assert into ScanQueryForLocks() Assert(parsetree-commandType != CMD_UTILITY); Meh, yeah, more fallout from the CREATE TABLE AS representation change. PFA patch for the same. Applied with some editorialization. Thanks! regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We probably need autovacuum_max_wraparound_workers
Josh Berkus j...@agliodbs.com writes: I think what you've really got here is inappropriate autovacuum cost delay settings, and/or the logic in autovacuum.c to try to divvy up the available I/O capacity by tweaking workers' delay settings isn't working very well. It's hard to propose improvements without a lot more detail than you've provided, though. Wait, we *have* that logic? If so, that's the problem ... it's not working very well. What detail do you want? What's it doing? What do you think it should do instead? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We probably need autovacuum_max_wraparound_workers
Stephen Frost sfr...@snowman.net writes: * Josh Berkus (j...@agliodbs.com) wrote: Yeah, I can't believe I'm calling for *yet another* configuration variable either. Suggested workaround fixes very welcome. As I suggested on IRC, my thought would be to have a goal-based system for autovacuum which is similar to our goal-based commit system. We don't need autovacuum sucking up all the I/O in the box, nor should we ask the users to manage that. Instead, let's decide when the autovacuum on a given table needs to finish and then plan to keep on working at a rate that'll allow us to get done well in advance of that deadline. If we allow individual vacuum operations to stretch out just because they don't need to be completed right away, we will need more concurrent vacuum workers (so that we can respond to vacuum requirements for other tables). So I submit that this would only move the problem around: the number of active workers would increase to the point where things are just as painful, plus or minus a bit. The intent of the autovacuum cost delay features is to ensure that autovacuum doesn't suck an untenable fraction of the machine's I/O capacity, even when it's running flat out. So I think Josh's complaint indicates that we have a problem with cost-delay tuning; hard to tell what exactly without more info. It might only be that the defaults are bad for these particular users, or it could be more involved. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regarding WAL Format Changes
From: Heikki Linnakangas [mailto:heikki.linnakan...@enterprisedb.com] Sent: Wednesday, June 27, 2012 8:26 PM On 27.06.2012 17:14, Amit Kapila wrote: 2. @@ -2680,8 +2645,8 @@ InstallXLogFileSegment(uint32 *log, uint32 *seg, char *tmppath, LWLockRelease(ControlFileLock); ereport(LOG, (errcode_for_file_access(), - errmsg(could not link file \%s\ to \%s\ (initialization of log file %u, segment %u): %m, -tmppath, path, *log, *seg))); + errmsg(could not link file \%s\ to \%s\ (initialization of log file): %m, +tmppath, path))); If Changed error message can contain log file and segment number, it would be more clear. That should be easily deducible from segment number. That seems redundant. The target file name is calculated from the segment number, and we're now using the file name instead of log+seg in other messages too. errmsg(could not link file \%s\ to \%s\ (initialization of log file): %m, +tmppath, path))); In this if we try to get the meaning of second part of message (initialization of log file), it was much better previously as in this message it refers 2 files and previously it was clear initialization of which log file failed. So we can mention file name in second part of message (initialization of log file) as well. 3. -RemoveOldXlogFiles(uint32 log, uint32 seg, XLogRecPtr endptr) For the above 2 changed error messages, 'log segment' is used for filename. However all similar changes has 'log file' for filename. There are some places where 'log segment' is used and other places it is 'log file'. So is there any particular reason for it? Not really. There are several messages that use log file %s, and also several places that use log segment %s Should we make it consistent and use either log segment or log file everywhere? 'file' seems to be better option as some users may not be even aware of segments, they would be using default values of segments and they can relate to 'file' easily. Also using 'WAL' instead of 'log' as suggested by Alvaro is good if others also thinks same. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We probably need autovacuum_max_wraparound_workers
On Wed, Jun 27, 2012 at 11:38 PM, Tom Lane t...@sss.pgh.pa.us wrote: Stephen Frost sfr...@snowman.net writes: * Josh Berkus (j...@agliodbs.com) wrote: Yeah, I can't believe I'm calling for *yet another* configuration variable either. Suggested workaround fixes very welcome. As I suggested on IRC, my thought would be to have a goal-based system for autovacuum which is similar to our goal-based commit system. We don't need autovacuum sucking up all the I/O in the box, nor should we ask the users to manage that. Instead, let's decide when the autovacuum on a given table needs to finish and then plan to keep on working at a rate that'll allow us to get done well in advance of that deadline. If we allow individual vacuum operations to stretch out just because they don't need to be completed right away, we will need more concurrent vacuum workers (so that we can respond to vacuum requirements for other tables). So I submit that this would only move the problem around: the number of active workers would increase to the point where things are just as painful, plus or minus a bit. The intent of the autovacuum cost delay features is to ensure that autovacuum doesn't suck an untenable fraction of the machine's I/O capacity, even when it's running flat out. So I think Josh's complaint indicates that we have a problem with cost-delay tuning; hard to tell what exactly without more info. It might only be that the defaults are bad for these particular users, or it could be more involved. I've certainly come across many reports of the cost delay settings being difficult to tune, both on pgsql-hackers/performance and in various private EnterpriseDB correspondence. I think Stephen's got it exactly right: the system needs to figure out the rate at which vacuum needs to happen, not rely on the user to provide that information. For checkpoints, we estimated the percentage of the checkpoint that ought to be completed and the percentage that actually is completed; if the latter is less than the former, we speed things up until we're back on track. For autovacuum, the trick is to speed things up when the rate at which tables are coming due for autovacuum exceeds the rate at which we are vacuuming them; or, when we anticipate that a whole bunch of wraparound vacuums are going to come due simultaneously, to start doing them sooner so that they are more spread out. For example, suppose that 26 tables each of which is 4GB in size are going to simultaneously come due for an anti-wraparound vacuum in 26 hours. For the sake of simplicity suppose that each will take 1 hour to vacuum. What we currently do is wait for 26 hours and then start vacuuming them all at top speed, thrashing the I/O system. What we ought to do is start vacuuming them much sooner and do them consecutively. Of course, the trick is to design a mechanism that does something intelligent if we think we're on track and then all of a sudden the rate of XID consumption changes dramatically, and now we've got vacuum faster or with more workers. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We probably need autovacuum_max_wraparound_workers
Robert Haas robertmh...@gmail.com writes: For example, suppose that 26 tables each of which is 4GB in size are going to simultaneously come due for an anti-wraparound vacuum in 26 hours. For the sake of simplicity suppose that each will take 1 hour to vacuum. What we currently do is wait for 26 hours and then start vacuuming them all at top speed, thrashing the I/O system. This is a nice description of a problem that has nothing to do with reality. In the first place, we don't vacuum them all at once; we can only vacuum max_workers of them at a time. In the second place, the cost-delay features ought to be keeping autovacuum from thrashing the I/O, entirely independently of what the reason was for starting the vacuums. Clearly, since people are complaining, there's something that needs work there. But not the work you're proposing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Posix Shared Mem patch
On Wed, Jun 27, 2012 at 9:44 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Jun 27, 2012 at 12:00 AM, Tom Lane t...@sss.pgh.pa.us wrote: Would Posix shmem help with that at all? Why did you choose not to use the Posix API, anyway? It seemed more complicated. If we use the POSIX API, we've got to have code to find a non-colliding name for the shm, and we've got to arrange to clean it up at process exit. Anonymous shm doesn't require a name and goes away automatically when it's no longer in use. I see. Those are pretty good reasons ... So, should we do it this way? I did a little research and discovered that Linux 2.3.51 (released 3/11/2000) apparently returns EINVAL for MAP_SHARED|MAP_ANONYMOUS. That combination is documented to work beginning in Linux 2.4.0. How worried should we be about people trying to run PostgreSQL 9.3 on pre-2.4 kernels? If we want to worry about it, we could try mapping a one-page shared MAP_SHARED|MAP_ANONYMOUS segment first. If that works, we could assume that we have a working MAP_SHARED|MAP_ANONYMOUS facility and try to allocate the whole segment plus a minimal sysv shm. If the single page allocation fails with EINVAL, we could fall back to allocating the entire segment as sysv shm. A related question is - if we do this - should we enable it only on ports where we've verified that it works, or should we just turn it on everywhere and fix breakage if/when it's reported? I lean toward the latter. If we find that there are platforms where (a) mmap is not supported or (b) MAP_SHARED|MAP_ANON works but has the wrong semantics, we could either shut off this optimization on those platforms by fiat, or we could test not only that the call succeeds, but that it works properly: create a one-page mapping and fork a child process; in the child, write to the mapping and exit; in the parent, wait for the child to exit and then test that we can read back the correct contents. This would protect against a hypothetical system where the flags are accepted but fail to produce the correct behavior. I'm inclined to think this is over-engineering in the absence of evidence that there are platforms that work this way. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases
Hello, thank you for your sugestion. I agree. That is the fundamental question. I've coded just for my fun but I don't see not so much signicance to do that. We might omit the test for this which is non-ciritical and corner cases. We need these tests to work on Windows too, so fancy gmake tricks are probably not the way to deal with varying results. Hmm. I understand that you suggested that we should do this in normal regression test. Ok. Since there found to be only two patterns in the regression test. The fancy thing is no more needed. I will unfold them and make sure to work on mingw build environment. And for one more environment, on the one with VC++.. I'll need a bit longer time to make out what vcregress.pl does. On the other things, I will decide as following and sent to committer as soon as the above is finished. - The main patch fixes the sql-ascii handling itself shoud ported into 9.2 and 9.1. Someone shoud work for this. (me?) - The remainder of the patch whic fixes the easy fixable leakes of palloc'ed memory won't be ported into 9.1. This is only for 9.3dev. - The patch for 9.3dev will be provided with the new regression test. It will be easily ported into 9.1 and 9.2 and there seems to be no problem technically, but a bit unsure from the other points of view... regards, -- Kyotaro Horiguchi NTT Open Source Software Center == My e-mail address has been changed since Apr. 1, 2012. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We probably need autovacuum_max_wraparound_workers
On Thu, Jun 28, 2012 at 12:51 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: For example, suppose that 26 tables each of which is 4GB in size are going to simultaneously come due for an anti-wraparound vacuum in 26 hours. For the sake of simplicity suppose that each will take 1 hour to vacuum. What we currently do is wait for 26 hours and then start vacuuming them all at top speed, thrashing the I/O system. This is a nice description of a problem that has nothing to do with reality. In the first place, we don't vacuum them all at once; we can only vacuum max_workers of them at a time. In the second place, the cost-delay features ought to be keeping autovacuum from thrashing the I/O, entirely independently of what the reason was for starting the vacuums. I don't think it works that way. The point is that the workload imposed by autovac is intermittent and spikey. If you configure the cost limit too low, or the delay too high, or the number of autovac workers is too low, then autovac can't keep up, which causes all of your tables to bloat and is a total disaster. You have to make sure that isn't going to happen, so you naturally configure the settings aggressively enough that you're sure autovac will be able to stay ahead of your bloat problem. But then autovac is more resource-intensive ALL the time, not just when there's a real need for it. This is like giving a kid a $20 bill to buy lunch and having them walk around until they find a restaurant sufficiently expensive that lunch there costs $20. The point of handing over $20 was that you were willing to spend that much *if needed*, not that the money was burning a hole in your pocket. To make that more concrete, suppose that a table has an update rate such that it hits the autovac threshold every 10 minutes. If you set the autovac settings such that an autovacuum of that table takes 9 minutes to complete, you are hosed: there will eventually be some 10-minute period where the update rate is ten times the typical amount, and the table will gradually become horribly bloated. But if you set the autovac settings such that an autovacuum of the table can finish in 1 minute, so that you can cope with a spike, then whenever there isn't a spike you are processing the table ten times faster than necessary, and now one minute out of every ten carries a heavier I/O load than the other 9, leading to uneven response times. It's just ridiculous to assert that it doesn't matter if all the anti-wraparound vacuums start simultaneously. It does matter. For one thing, once every single autovacuum worker is pinned down doing an anti-wraparound vacuum of some table, then a table that needs an ordinary vacuum may have to wait quite some time before a worker is available. Depending on the order in which workers iterate through the tables, you could end up finishing all of the anti-wraparound vacuums before doing any of the regular vacuums. If the wraparound vacuums had been properly spread out, then there would at all times have been workers available for regular vacuums as needed. For another thing, you can't possibly think that three or five workers running simultaneously, each reading a different table, is just as efficient as having one worker grind through them consecutively. Parallelism is not free, ever, and particularly not here, where it has the potential to yank the disk head around between five different files, seeking like crazy, instead of a nice sequential I/O pattern on each file in turn. Josh wouldn't keep complaining about this if it didn't suck. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers