Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol
On 2012-11-12 19:21:28 +, Simon Riggs wrote: On 10 September 2012 17:50, Tom Lane t...@sss.pgh.pa.us wrote: The point of the proposal that I am making is to have a simple, low-maintenance solution for people who need a single-application database. A compromise somewhere in the middle isn't likely to be an improvement for anybody. For instance, if you want to have additional connections, you open up a whole collection of communication and authentication issues, which potential users of a single-application database don't want to cope with. So the proposal is to implement a database that can't ever have 2 or more connections. ... It's almost impossible to purchase a CPU these days that doesn't have multiple cores, so the whole single-process architecture is just dead. Yes, we want Postgres installed everywhere, but this isn't the way to achieve that. I agree we should allow a PostgreSQL installation to work for a single user, but I don't see that requires other changes. This idea will cause endless bugs, thinkos and severely waste our time. So without a much better justification, I don't think we should do this. I personally think that a usable scriptable --single mode is justification enough, even if you don't aggree with the other goals. Having to wait for hours just enter one more command because --single doesn't support any scripts sucks. Especially in recovery situations. I also don't think a single-backend without further child processes is all that helpful - but I think this might be a very useful stepping stone. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inadequate thought about buffer locking during hot standby replay
On 12.11.2012 22:53, Tom Lane wrote: Here's an updated patch that fixes the GIST replay functions as well as the other minor issues that were mentioned. Barring objections, I'll set about back-patching this as far as 9.0. Ok. It won't help all that much on 9.0, though. One thing that could use verification is my fix for gistRedoPageSplitRecord. AFAICS, the first page listed in the WAL record is always the original page, and the ones following it are pages that were split off from it, and can (as yet) only be reached by following right-links from the original page. As such, it should be okay to release locks on the non-first pages as soon as we've written them. We have to hold lock on the original page though to avoid letting readers follow dangling right-links. Also, the update of NSN/FOLLOW_RIGHT on the child page (if any) has to be done atomically with all this, so that has to be done before releasing the original-page lock as well. Does that sound right? Yep. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory
Hi CK, On 2012-10-30 21:16:07 +0100, Christian Kruse wrote: index b4fcbaf..66ed10f 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml I think a short introduction or at least a reference on how to configure hugepages would be a good thing. varlistentry id=guc-temp-buffers xreflabel=temp_buffers termvarnametemp_buffers/varname (typeinteger/type)/term indexterm diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c index df06312..f9de239 100644 --- a/src/backend/port/sysv_shmem.c +++ b/src/backend/port/sysv_shmem.c @@ -27,10 +27,14 @@ #ifdef HAVE_SYS_SHM_H #include sys/shm.h #endif +#ifdef MAP_HUGETLB +#include dirent.h +#endif I think a central #define for the MAP_HUGETLB capability would be a good idea, akin to HAVE_SYS_SHM_H. E.g. this: --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -22,6 +22,7 @@ #include limits.h #include unistd.h #include sys/stat.h +#include sys/mman.h #ifdef HAVE_SYSLOG #include syslog.h #endif is unlikely to fly on windows. +/* + * static long InternalGetHugepageSize() + * + * Attempt to get a valid hugepage size from /sys/kernel/mm/hugepages/ by + * reading directory contents + * Will fail (return -1) if the directory could not be opened or no valid + * page sizes are available. Will return the biggest hugepage size on + * success. + * + */ The biggest remark is out of date. +static long +InternalGetHugepageSize() +{ ... + if ((smallest_size == -1 || size smallest_size) + InternalGetFreeHugepagesCount(ent-d_name) 0) + { + smallest_size = size; + } ... + + if (smallest_size == -1) + { + ereport(huge_tlb_pages == HUGE_TLB_TRY ? DEBUG1 : WARNING, + (errmsg(Could not find a valid hugepage size), + errhint(This error usually means that either CONFIG_HUGETLB_PAGE + is not in kernel or that your architecture does not + support hugepages or you did not configure hugepages))); + } I think differentiating the error message between no hugepages found and InternalGetFreeHugepagesCount(ent-d_name) always beeing zero would be a good idea. Failing this way if InternalGetFreeHugepagesCount(ent-d_name) 0 seems fine. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory
Oh, one more thing... On 2012-10-30 21:16:07 +0100, Christian Kruse wrote: ok, I think I implemented all of the changes you requested. All but the ia64 dependent, I have to do more research for this one. I vote for simply not caring about ia64. This is: +#ifdef MAP_HUGETLB +# ifdef __ia64__ +#define PG_HUGETLB_BASE_ADDR (void *)(0x8000UL) +#define PG_MAP_HUGETLB (MAP_HUGETLB|MAP_FIXED) +# else +#define PG_HUGETLB_BASE_ADDR (void *)(0x0UL) +#define PG_MAP_HUGETLB MAP_HUGETLB +# endif +#else +# define PG_MAP_HUGETLB 0 +#endif too much underdocumented crazyness for a very minor platform. Should somebody with the approprate harware want to submit an additional patch, fine Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 11/13/2012 01:22 AM, Greg Smith wrote: Once you accept that eventually there need to be online conversion tools, there needs to be some easy way to distinguish which pages have been processed for several potential implementations. Agreed. What I'm saying is that this identification doesn't need to be as fine grained as a per-page bit. A single horizon or border is enough, given an ordering of relations (for example by OID) and an ordering of pages in the relations (obvious). All of the table-based checksum enabling ideas ... This is not really one - it doesn't allow per-table switching. It's just meant to be a more compact way of representing which pages have been checksummed and which not. I'm thinking of this in some ways like the way creation of a new (but not yet valid) foreign key works. Once that's active, new activity is immediately protected moving forward. And eventually there's this cleanup step needed, one that you can inch forward over a few days. I understand that. However, I question if users really care. If a corruption is detected, the clever DBA tells his trainee immediately check the file- and disk subsystem - no matter whether the corruption was on old or new data. You have a point in that pages with newer data are often more likely to be re-read and thus getting checked. Where as the checksums written to pages with old data might not be re-read any time soon. Starting to write checksums from the end of the relation could mitigate this to some extent, though. Also keep in mind the quietly corrupted after checked once, but still in the middle of checking a relation case. Thus a single bit doesn't really give us the guarantee you ask for. Sure, we can add more than one bit. And yeah, if done properly, adding more bits exponentially reduces the likeliness of a corruption inadvertently turning off checksumming for a page. All that said, I'm not opposed to using a few bits of the page header. I wanted to outline an alternative that I think is viable and less intrusive. This is why I think any good solution to this problem needs to incorporate restartable conversion. I fully agree to that. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol
On 13 November 2012 06:14, Amit kapila amit.kap...@huawei.com wrote: I get the installability thang, very very much, I just don't see the single process thing as the only solution. At very least an open minded analysis of the actual problem and ways of solving it is called for, not just reach for a close to hand solution. Some other usecase where I have seen it required is in telecom billing apps. In telecom application where this solution works, needs other maintainence connections as well. Some of the reasons for its use are performance and less maintainence overhead and also their data requirements are also not so high. So even if this solution doesn't meet all requirements of single process solution (and neither I think it is written to address all) but can't we think of it as first version and then based on requirements extend it to have other capabilities: a. to have a mechnism for other background processes (autovacuum, checkpoint, ..). b. more needs to be thought of.. Why would we spend time trying to put back something that is already there? Why not simply avoid removing it in the first place? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol
Simon Riggs escribió: So even if this solution doesn't meet all requirements of single process solution (and neither I think it is written to address all) but can't we think of it as first version and then based on requirements extend it to have other capabilities: a. to have a mechnism for other background processes (autovacuum, checkpoint, ..). b. more needs to be thought of.. Why would we spend time trying to put back something that is already there? Why not simply avoid removing it in the first place? Actually, the whole point of this solution originally was just to serve pg_upgrade needs, so that it doesn't have to start a complete postmaster environment just to have to turn off most of what postmaster does, and with enough protections to disallow everyone else from connecting. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol
On 13 November 2012 13:05, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Simon Riggs escribió: So even if this solution doesn't meet all requirements of single process solution (and neither I think it is written to address all) but can't we think of it as first version and then based on requirements extend it to have other capabilities: a. to have a mechnism for other background processes (autovacuum, checkpoint, ..). b. more needs to be thought of.. Why would we spend time trying to put back something that is already there? Why not simply avoid removing it in the first place? Actually, the whole point of this solution originally was just to serve pg_upgrade needs, so that it doesn't have to start a complete postmaster environment just to have to turn off most of what postmaster does, and with enough protections to disallow everyone else from connecting. I don't see anything that pg_upgrade is doing that causes the need to support a special mode. From other people's comments it's clear that single user mode is desirable to many and *will* be widely deployed if we allow it. I support the wish to allow a database server to be limited by configuration to a single user. However, supporting a specifically targeted mode that presents single user as an architectural design/limitation is a regressive step that I am strongly opposed to. The most popular relational database in the world is Microsoft Access, not MySQL. Access appears desirable because it allows a single user to create and use a database (which is very good). But all business databases have a requirement for at least one of: high availability, multi-user access or downstream processing in other parts of the business. Businesses worldwide curse the difficulties caused by having critical business data in desktop databases. And worldwide, there are also many that don't understand the problems that disconnected data causes because they can't see past the initial benefit. The lessons from that are that its OK to start with a database used by a single person, but that database soon needs to allow access from multiple users or automated agents. Many database systems support embedded or single user mode as an architectural option. All of those systems cause headaches in all of the businesses where they are used. They also cause problems on small detached devices such as phones, because even on very small systems there is a requirement for multiple concurrently active processes each of which may need database access. PostgreSQL was designed from the ground up as a multi-user database. This is the very fact that puts us in a good position to become pervasive. A single database system that works the same on all devices, with useful replication to connect data together. The embedded or single mode concept has long been on the do not want list. I believe that is a completely rational and strongly desirable thing. Supporting multiple architectures is extra work, and the restrictive architecture bites people in the long term. The fact that its an easy patch is not a great argument for changing that position, and in fact, its not easy, since it comes with a request to make it work on Windows (= extra work). The easy bit is not proven since people are already starting to ask about bgwriter and autovacuum. In this release there is much work happening around providing additional autonomous agents (bgworker) and other work around flexible replication (BDR), all of which would be nullified by the introduction and eventual wide usage of a restrictive new architecture. Single user configuration option, yes. Architecturally limited special version of PostgreSQL, no. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inadequate thought about buffer locking during hot standby replay
Heikki Linnakangas hlinnakan...@vmware.com writes: On 12.11.2012 22:53, Tom Lane wrote: Here's an updated patch that fixes the GIST replay functions as well as the other minor issues that were mentioned. Barring objections, I'll set about back-patching this as far as 9.0. Ok. It won't help all that much on 9.0, though. Well, it won't help GIST much, but the actually-reported-from-the-field case is in btree, and it does fix that. It occurs to me that if we're sufficiently scared of this case, we could probably hack the planner (in 9.0 only) to refuse to use GIST indexes in hot-standby queries. That cure might be worse than the disease though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inadequate thought about buffer locking during hot standby replay
On Tue, Nov 13, 2012 at 9:03 AM, Tom Lane t...@sss.pgh.pa.us wrote: Ok. It won't help all that much on 9.0, though. Well, it won't help GIST much, but the actually-reported-from-the-field case is in btree, and it does fix that. It occurs to me that if we're sufficiently scared of this case, we could probably hack the planner (in 9.0 only) to refuse to use GIST indexes in hot-standby queries. That cure might be worse than the disease though. if anything, it should be documented. if you do this kind of thing people will stop installing bugfix releases. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On Tue, Nov 13, 2012 at 1:06 PM, Amit kapila amit.kap...@huawei.com wrote: On Monday, November 12, 2012 8:23 PM Fujii Masao wrote: On Fri, Nov 9, 2012 at 3:03 PM, Amit Kapila amit.kap...@huawei.com wrote: On Thursday, November 08, 2012 10:42 PM Fujii Masao wrote: On Thu, Nov 8, 2012 at 5:53 PM, Amit Kapila amit.kap...@huawei.com wrote: On Thursday, November 08, 2012 2:04 PM Heikki Linnakangas wrote: On 19.10.2012 14:42, Amit kapila wrote: On Thursday, October 18, 2012 8:49 PM Fujii Masao wrote: Before implementing the timeout parameter, I think that it's better to change both pg_basebackup background process and pg_receivexlog so that BTW, IIRC the walsender has no timeout mechanism during sending backup data to pg_basebackup. So it's also useful to implement the timeout mechanism for the walsender during backup. Yes, its useful, but for walsender the main problem is that it uses blocking send call to send the data. I have tried using tcp_keepalive settings, but the send call doesn't comeout incase of network break. The only way I could get it out is: change in the corresponding file /proc/sys/net/ipv4/tcp_retries2 by using the command echo 8 /proc/sys/net/ipv4/tcp_retries2 As per recommendation, its value should be at-least 8 (equivalent to 100 sec) Do you have any idea, how it can be achieved? What about using pq_putmessage_noblock()? I will try this, but do you know why at first place in code the blocking mode is used to send files? I am asking as I am little scared that it should not break any design which was initially thought of while making send of files as blocking. I'm afraid I don't know why. I guess that using non-blocking mode complicates the code, so in the first version of pg_basebackup the blocking mode was adopted. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory leaks in record_out and record_send
Tom Lane t...@sss.pgh.pa.us writes: OTOH I can't see trying to back-patch a solution like that. If we want to fix this in the back branches (and note the complaint linked above is against 8.3), I think we have to do it as attached. Thoughts? I've been using textin(record_out(NEW)) in generic partitioning triggers, and you can find examples of that trick in the wiki, so I think we have users of that in the field. Please indeed do consider backpatching! I don't have an opinion on the opportunity to use a shorter memory context, I feel that would need some more involved analytics than my brainpower of the moment allows me to consider. Thanks, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory leaks in record_out and record_send
Dimitri Fontaine dimi...@2ndquadrant.fr writes: Tom Lane t...@sss.pgh.pa.us writes: Thoughts? I've been using textin(record_out(NEW)) in generic partitioning triggers, and you can find examples of that trick in the wiki, so I think we have users of that in the field. I think explicit calls like that actually wouldn't be a problem, since they'd be run in a per-tuple context anyway. The cases that are problematic are hard-coded I/O function calls. I'm worried about the ones like, say, plpgsql's built-in conversion operations. We could probably fix printtup's usage with some confidence, but there are a lot of other ones. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index only scans wiki page
On Mon, Nov 12, 2012 at 8:25 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: On 13 November 2012 01:03, Jeff Janes jeff.ja...@gmail.com wrote: https://wiki.postgresql.org/wiki/Index-only_scans This page is seriously out of date. It suggests they are not yet implemented, but only being talked about. Attached is a piece I wrote on the feature. That might form the basis of a new wiki page. Feel free to incorporate this material as you see fit. I found this an interesting read. As one of the people who worked on the feature, I'm sort of curious whether people have any experience yet with how this actually shakes out in the field. Are you (or is anyone) aware of positive/negative field experiences with this feature? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory leaks in record_out and record_send
Tom Lane t...@sss.pgh.pa.us writes: I think explicit calls like that actually wouldn't be a problem, since they'd be run in a per-tuple context anyway. The cases that are problematic are hard-coded I/O function calls. I'm worried about the ones like, say, plpgsql's built-in conversion operations. We could probably fix printtup's usage with some confidence, but there are a lot of other ones. That's a good reason to get them into a shorter memory context, but which? per transaction maybe? shorter? -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Patch to compute Max LSN of Data Pages
On Tue, Nov 13, 2012 at 1:23 PM, Amit kapila amit.kap...@huawei.com wrote: On Monday, November 12, 2012 9:56 PM Alvaro Herrera wrote: Robert Haas escribió: On Tue, Jul 31, 2012 at 8:09 AM, Amit kapila amit.kap...@huawei.com wrote: I think I can see all of those things being potentially useful. There are a couple of pending patches that will revise the WAL format slightly; not sure how much those are likely to interfere with any development you might do on (2) in the meantime. Based on above conclusion, I have prepared a patch which implements Option-1 I wonder if we shouldn't make this a separate utility, rather than something that is part of pg_resetxlog. Anyone have a thought on that topic? That thought did cross my mind too. One of the reasons for keeping it with pg_resetxlog, is that this was proposed as a solution for scenario's where user's db has become corrupt and now he want to start it. So to do it he can find the max LSN and set the same using pg_resetxlog, it will avoid the further corruption of database after it got started. If we keep it a separate utility then user needs to first run this utility to find max LSN and then use pg_resetxlog to achieve the same. I don't see a big problem in that but may be it would have been better if there are other usecases for it. We might be able to use this utility to decide whether we need to take a fresh backup from the master onto the standby, to start old master as new standby after failover. When starting new standby after failover, any data page in the standby must not precede the master. Otherwise, the standby cannot catch up with the master consistently. But, the master might write the data page corresponding to the WAL which has not been replicated to the standby yet. So, if failover happens before that WAL has been replicated, the data page in old master would precede new master (i.e., old standby), and in this case the backup is required. OTOH, if maximum LSN in data page in the standby is less than the master, the backup is not required. Without this utility, it's difficult to calculate the maximum LSN of data page, so basically we needed to take a backup when starting the standby. In the future, thanks to this utility, we can calculate the maximum LSN, and can skip a backup if that LSN is less than the master (i.e., last applied LSN, IOW, timeline switch LSN). Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory leaks in record_out and record_send
Dimitri Fontaine dimi...@2ndquadrant.fr writes: Tom Lane t...@sss.pgh.pa.us writes: I think explicit calls like that actually wouldn't be a problem, since they'd be run in a per-tuple context anyway. The cases that are problematic are hard-coded I/O function calls. I'm worried about the ones like, say, plpgsql's built-in conversion operations. We could probably fix printtup's usage with some confidence, but there are a lot of other ones. That's a good reason to get them into a shorter memory context, but which? per transaction maybe? shorter? It would have to be per-tuple to do any good. The existing behavior is per-query and causes problems if lots of rows are output. In plpgsql it would be a function-call-lifespan leak. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #7656: PL/Perl SPI_freetuptable() segfault
pgm...@joh.to writes: I have a reproducible segmentation fault in PL/Perl. I have yet to narrow down the test case to something sensible, but I do have a backtrace: 219 while (context-firstchild != NULL) (gdb) bt #0 0x000104e90782 in MemoryContextDeleteChildren (context=0x102bd) at mcxt.c:219 #1 0x000104e906a8 in MemoryContextDelete (context=0x102bd) at mcxt.c:174 #2 0x000104bbefb5 in SPI_freetuptable (tuptable=0x7f9ae4289230) at spi.c:1003 #3 0x00011ec9928b in plperl_spi_execute_fetch_result (tuptable=0x7f9ae4289230, processed=1, status=-6) at plperl.c:2900 #4 0x00011ec98f27 in plperl_spi_exec (query=0x7f9ae4155f80 0x7f9ae3e3fe50, limit=-439796840) at plperl.c:2821 #5 0x00011ec9b5f7 in XS__spi_exec_query (my_perl=0x7f9ae40cce00, cv=0x7f9ae4148e90) at SPI.c:69 While trying to narrow down the test case I noticed what the problem was: I was calling spi_execute_query() instead of spi_execute_prepared(). Hm. It looks like SPI_execute failed as expected (note the status passed to plperl_spi_execute_fetch_result is -6 which is SPI_ERROR_ARGUMENT), but it did not reset SPI_tuptable, which led to plperl_spi_execute_fetch_result trying to call SPI_freetuptable on what was probably an already-deleted tuple table. One theory we could adopt on this is that this is plperl_spi_execute_fetch_result's fault and it shouldn't be trying to free a tuple table unless status 0. Another theory we could adopt is that SPI functions that are capable of setting SPI_tuptable ought to clear it at start, to ensure that they return it as null on failure. The latter seems like a nicer fix but I'm afraid it might have unexpected side-effects. It would certainly be a lot more invasive. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #7656: PL/Perl SPI_freetuptable() segfault
On 11/13/2012 12:17 PM, Tom Lane wrote: pgm...@joh.to writes: I have a reproducible segmentation fault in PL/Perl. I have yet to narrow down the test case to something sensible, but I do have a backtrace: 219 while (context-firstchild != NULL) (gdb) bt #0 0x000104e90782 in MemoryContextDeleteChildren (context=0x102bd) at mcxt.c:219 #1 0x000104e906a8 in MemoryContextDelete (context=0x102bd) at mcxt.c:174 #2 0x000104bbefb5 in SPI_freetuptable (tuptable=0x7f9ae4289230) at spi.c:1003 #3 0x00011ec9928b in plperl_spi_execute_fetch_result (tuptable=0x7f9ae4289230, processed=1, status=-6) at plperl.c:2900 #4 0x00011ec98f27 in plperl_spi_exec (query=0x7f9ae4155f80 0x7f9ae3e3fe50, limit=-439796840) at plperl.c:2821 #5 0x00011ec9b5f7 in XS__spi_exec_query (my_perl=0x7f9ae40cce00, cv=0x7f9ae4148e90) at SPI.c:69 While trying to narrow down the test case I noticed what the problem was: I was calling spi_execute_query() instead of spi_execute_prepared(). Hm. It looks like SPI_execute failed as expected (note the status passed to plperl_spi_execute_fetch_result is -6 which is SPI_ERROR_ARGUMENT), but it did not reset SPI_tuptable, which led to plperl_spi_execute_fetch_result trying to call SPI_freetuptable on what was probably an already-deleted tuple table. One theory we could adopt on this is that this is plperl_spi_execute_fetch_result's fault and it shouldn't be trying to free a tuple table unless status 0. Another theory we could adopt is that SPI functions that are capable of setting SPI_tuptable ought to clear it at start, to ensure that they return it as null on failure. The latter seems like a nicer fix but I'm afraid it might have unexpected side-effects. It would certainly be a lot more invasive. These aren't mutually exclusive, though, are they? It seems reasonable to do the minimal fix for the stable branches (looks like it's just a matter of moving the call up a couple of lines in plperl.c) and make the nicer fix just for the development branch. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol
Simon Riggs si...@2ndquadrant.com writes: The most popular relational database in the world is Microsoft Access, not MySQL. Access appears desirable because it allows a single user to create and use a database (which is very good). But all business databases have a requirement for at least one of: high availability, multi-user access or downstream processing in other parts of the business. That's a mighty sweeping claim, which you haven't offered adequate evidence for. The fact of the matter is that there is *lots* of demand for simple single-user databases, and what I'm proposing is at least a first step towards getting there. The main disadvantage of approaching this via the existing single-user mode is that you won't have any autovacuum, bgwriter, etc, support. But the flip side is that that lack of infrastructure is a positive advantage for certain admittedly narrow use-cases, such as disaster recovery and pg_upgrade. So while I agree that this isn't the only form of single-user mode that we'd like to support, I think it is *a* form we'd like to support, and I don't see why you appear to be against having it at all. A more reasonable objection would be that we need to make sure that this isn't foreclosing the option of having a multi-process environment with a single user connection. I don't see that it is, but it might be wise to sketch exactly how that case would work before accepting this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL
On 11/12/12 7:59 PM, Amit kapila wrote: On Monday, November 12, 2012 12:07 PM Greg Smith wrote: On 11/9/12 11:59 PM, Amit kapila wrote: Please let me know if there are any objections or problems in above method of implementation, else I can go ahead to prepare the patch for the coming CF. It may be the case that the locking scheme Robert described is the best approach here. It seems kind of heavy to me though. I suspect that some more thinking about it might come up with something better. So, here's the problem I'm seeing with having a single .auto file: when we write settings to a file, are we writing a *single* setting or *all of a user's current settings*? I was imagining writing single, specific settings, which inevitably leads to one-setting-per-file, e.g.: SET PERSISTENT work_mem = 256MB; What Amit seems to be talking about is more EXPORT SETTINGS, where you dump all current settings in the session to a file. This seems likely to produce accidental changes when the user writes out settings they've forgotten they changed. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL
Josh Berkus j...@agliodbs.com writes: I was imagining writing single, specific settings, which inevitably leads to one-setting-per-file, e.g.: SET PERSISTENT work_mem = 256MB; What Amit seems to be talking about is more EXPORT SETTINGS, where you dump all current settings in the session to a file. This seems likely to produce accidental changes when the user writes out settings they've forgotten they changed. Yeah. It also seems to be unnecessarily different from the existing model of SET. I'd go with one-setting-per-command. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inadequate thought about buffer locking during hot standby replay
On 14/11/12 04:32, Merlin Moncure wrote: On Tue, Nov 13, 2012 at 9:03 AM, Tom Lane t...@sss.pgh.pa.us wrote: Ok. It won't help all that much on 9.0, though. Well, it won't help GIST much, but the actually-reported-from-the-field case is in btree, and it does fix that. It occurs to me that if we're sufficiently scared of this case, we could probably hack the planner (in 9.0 only) to refuse to use GIST indexes in hot-standby queries. That cure might be worse than the disease though. if anything, it should be documented. if you do this kind of thing people will stop installing bugfix releases. merlin How about displaying a warning, when people try to use the 'feature', as well as document it? Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inadequate thought about buffer locking during hot standby replay
On Tue, Nov 13, 2012 at 10:32 AM, Merlin Moncure mmonc...@gmail.com wrote: On Tue, Nov 13, 2012 at 9:03 AM, Tom Lane t...@sss.pgh.pa.us wrote: Ok. It won't help all that much on 9.0, though. Well, it won't help GIST much, but the actually-reported-from-the-field case is in btree, and it does fix that. It occurs to me that if we're sufficiently scared of this case, we could probably hack the planner (in 9.0 only) to refuse to use GIST indexes in hot-standby queries. That cure might be worse than the disease though. if anything, it should be documented. if you do this kind of thing people will stop installing bugfix releases. Agreed. I think doing that in a back-branch release would be extremely user-hostile. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL
On Tue, Nov 13, 2012 at 1:16 PM, Tom Lane t...@sss.pgh.pa.us wrote: Josh Berkus j...@agliodbs.com writes: I was imagining writing single, specific settings, which inevitably leads to one-setting-per-file, e.g.: SET PERSISTENT work_mem = 256MB; What Amit seems to be talking about is more EXPORT SETTINGS, where you dump all current settings in the session to a file. This seems likely to produce accidental changes when the user writes out settings they've forgotten they changed. Yeah. It also seems to be unnecessarily different from the existing model of SET. I'd go with one-setting-per-command. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL
On Mon, Nov 12, 2012 at 10:59 PM, Amit kapila amit.kap...@huawei.com wrote: Is the above opinion about only locking or even on a way to write the changed things in a file as mentioned in point-1 in mail chain upthread. (Point-1: 1. While writing .auto file, it will always assume that .auto file contain all config parameters. Now as this .auto file is of fixed format and fixed record size, it can directly write a given record to its particular position.) What my thinking was that if we can decide that the format and size of each configuration is fixed, it can be directly written without doing anything for it in memory. Uh, no, I don't think that's a good idea. IMHO, what we should do is: 1. Read postgresql.conf.auto and remember all the settings we saw. If we see something funky like an include directive, barf. 2. Forget the value we remembered for the particular setting being changed. Instead, remember the user-supplied new value for that parameter. 3. Write a new postgresql.conf.auto based on the information remembered in steps 1 and 2. Of course, if we go with one-file-per-setting, then this becomes even simpler: just clobber the file for the single setting being updated - creating it if it exists - and ignore all the rest. I don't personally favor that approach because I think I think it's clunky to manage, but YMMV. With either approach, it's worth noting that a RESET variant of this could be useful - which would either remove the chosen setting from postgresql.conf.auto, or remove the file containing the automatically-set value for that setting. I think my personal favorite syntax is: ALTER SYSTEM .. SET wunk = 'thunk'; ALTER SYSTEM .. RESET wunk; But I'm OK with something else if there's consensus. I don't particularly like SET PERSISTENT because I think this is more like ALTER DATABASE .. SET than it is like SET LOCAL, but IJWH. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol
On Tue, Nov 13, 2012 at 12:38 PM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: The most popular relational database in the world is Microsoft Access, not MySQL. Access appears desirable because it allows a single user to create and use a database (which is very good). But all business databases have a requirement for at least one of: high availability, multi-user access or downstream processing in other parts of the business. That's a mighty sweeping claim, which you haven't offered adequate evidence for. The fact of the matter is that there is *lots* of demand for simple single-user databases, and what I'm proposing is at least a first step towards getting there. The main disadvantage of approaching this via the existing single-user mode is that you won't have any autovacuum, bgwriter, etc, support. But the flip side is that that lack of infrastructure is a positive advantage for certain admittedly narrow use-cases, such as disaster recovery and pg_upgrade. So while I agree that this isn't the only form of single-user mode that we'd like to support, I think it is *a* form we'd like to support, and I don't see why you appear to be against having it at all. A more reasonable objection would be that we need to make sure that this isn't foreclosing the option of having a multi-process environment with a single user connection. I don't see that it is, but it might be wise to sketch exactly how that case would work before accepting this. I'm not particularly excited about providing more single-user mode options, but I think it's worth having this particular thing because it makes pg_upgrade more robust. Whether we do anything else is something we can litigate when the time comes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol
On 13 November 2012 17:38, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: The most popular relational database in the world is Microsoft Access, not MySQL. Access appears desirable because it allows a single user to create and use a database (which is very good). But all business databases have a requirement for at least one of: high availability, multi-user access or downstream processing in other parts of the business. That's a mighty sweeping claim, which you haven't offered adequate evidence for. The fact of the matter is that there is *lots* of demand for simple single-user databases, and what I'm proposing is at least a first step towards getting there. I agree there is lots of demand for simple single-user databases and I wish that too. What I don't agree with is something that casts that requirement in stone by architecturally/permanently disallowing secondary connections. Evidence for claims: * The whole Business Intelligence industry relies on being able to re-purpose existing data, forming integrated webs of interconnecting databases. All of that happens after the initial developers write the first version of the database application. * Everybody wants a remote backup, whether its for your mobile phone contact list or your enterprise datastore. People are migrating away from embedded databases in droves for these very reasons. The main disadvantage of approaching this via the existing single-user mode is that you won't have any autovacuum, bgwriter, etc, support. But the flip side is that that lack of infrastructure is a positive advantage for certain admittedly narrow use-cases, such as disaster recovery and pg_upgrade. So while I agree that this isn't the only form of single-user mode that we'd like to support, I think it is *a* form we'd like to support, and I don't see why you appear to be against having it at all. I have no problem with people turning things off, I reject the idea that we should encourage people to never be able to turn them back on. A more reasonable objection would be that we need to make sure that this isn't foreclosing the option of having a multi-process environment with a single user connection. I don't see that it is, but it might be wise to sketch exactly how that case would work before accepting this. Whatever we provide will become the norm. I don't have a problem with you providing BOTH the proposed single user mode AND the multi-process single user connection mode in this release. But if you provide just one of them and its the wrong one, we will be severely hampered in the future. Yes, I am very much against this project producing a new DBMS architecture that works on top of PostgreSQL data files, yet prevents maintenance, backup, replication and multi-user modes. I see this decision as a critical point for this project, so please consider this objection and where it comes from. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Variable length array element encoding…
[ Not subscribed, please keep me in the CC list ] Is there a standard idiom for encoding small variable length data in an array? I wrote the varint extension[1] that encodes data using a variable width encoding scheme[2] for signed and unsigned integers[3]. Right now the extension is mostly of use in skinny tables that have at least 4-5 columns, all of which are of INT or INT8. If you have only 5 columns of INT8, you can save ~50% of your table space. But, to get larger savings, it's required to bypass the tuple overhead and aggregating data in to an array (i.e. aggregate all time series data for a 5min window of time in to a single varuint[]). The problem with that being, each varint takes 8 bytes in an array because of padding and alignment. Is there a way to prevent that, or, more realistically, are there standard ways of encoding this data in to a BYTEA and then manually scanning and unpacking the data? Random access in to the array isn't a concern. I was thinking about adding a BYTEA to varint[] cast, but am fishing for a better idea. Any hints or thoughts? Thanks in advance. -sc [1] https://github.com/sean-/postgresql-varint [2] SELECT varint64, pg_column_size(varint64) FROM varint64_table ORDER BY varint64 ASC; varint64 | pg_column_size --+- -4611686018427387905 | 11 -4611686018427387904 | 10 -36028797018963969 | 10 -36028797018963968 | 9 -281474976710657 | 9 -281474976710656 | 8 -219902323 | 8 -219902322 | 7 -17179869185 | 7 -17179869184 | 6 -134217729 | 6 -134217728 | 5 -1048577 | 5 -1048576 | 4 -8193| 4 -8192| 3 -65 | 3 -64 | 2 -1 | 2 0| 2 1| 2 63 | 2 64 | 3 8191 | 3 8192 | 4 1048575 | 4 1048576 | 5 134217727| 5 134217728| 6 17179869183 | 6 17179869184 | 7 219902321| 7 219902322| 8 281474976710655 | 8 281474976710656 | 9 36028797018963967| 9 36028797018963968| 10 4611686018427387903 | 10 4611686018427387904 | 11 (39 rows) SELECT varuint64, pg_column_size(varint64) FROM varuint64_table ORDER BY varint64 ASC; varuint64 | pg_column_size -+- 0 | 2 127 | 2 128 | 3 16383 | 3 16384 | 4 2097151 | 4 2097152 | 5 268435455 | 5 268435456 | 6 34359738367 | 6 34359738368 | 7 4398046511103 | 7 4398046511104 | 8 562949953421311 | 8 562949953421312 | 9 72057594037927935 | 9 72057594037927936 | 10 9223372036854775807 | 10 [3] I know the unsigned int only goes up to 2^^63 atm, it will go to 2^^64 once I get around to setting up a test methodology. Using INT8 internally was too convenient at the time. -- Sean Chittenden s...@chittenden.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory leaks in record_out and record_send
On Tue, Nov 13, 2012 at 12:18 AM, Tom Lane t...@sss.pgh.pa.us wrote: I wonder though if we ought to think about running output functions in a short-lived memory context instead of the executor's main context. We've considered that before, I think, and it's always been the path of least resistance to fix the output functions instead --- but there will always be another leak I'm afraid. Such is the lot of people who code in C. I worry that the number of memory contexts we're kicking around already is imposing a significant distributed overhead on the system that is hard to measure but nevertheless real, and that this will add to it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol
Simon Riggs si...@2ndquadrant.com writes: On 13 November 2012 17:38, Tom Lane t...@sss.pgh.pa.us wrote: ... The fact of the matter is that there is *lots* of demand for simple single-user databases, and what I'm proposing is at least a first step towards getting there. I agree there is lots of demand for simple single-user databases and I wish that too. What I don't agree with is something that casts that requirement in stone by architecturally/permanently disallowing secondary connections. If you want secondary connections, then I think you want a postmaster. We already have umpteen ways to limit who can connect (for example, putting the socket in a directory with limited access rights), and in that sort of situation I don't see why you'd really want a database that is only accessible when the main client is running. The case that this patch is meant to address is one where there is only one client application, period, and you'd rather that the database starts and stops automatically with that application instead of needing any management complexity. Now we can debate whether we want only one process or multiple processes underneath the client application, but I think the restriction to one client connection is a key *feature* not a bug, precisely because it removes a whole bunch of user-visible complexity that we cannot escape otherwise. People are migrating away from embedded databases in droves for these very reasons. [ shrug... ] If they don't want an embedded database, they won't want this either, but there are still plenty of people left who do want an embedded database. We've never had an adequate offering for those people before. If we ratchet up the management complexity of single user mode then it still won't be an adequate offering for them. I see this decision as a critical point for this project, so please consider this objection and where it comes from. I think this is nonsense. It's not critical; it's a very small patch that provides a feature of interest to a limited audience. And I don't believe it's foreclosing providing other operating modes later, unless maybe people feel this is almost good enough and lose motivation to work on those other operating modes. But if that happens, then I'd say the demand for the other modes isn't as high as you think. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory leaks in record_out and record_send
Robert Haas robertmh...@gmail.com writes: On Tue, Nov 13, 2012 at 12:18 AM, Tom Lane t...@sss.pgh.pa.us wrote: I wonder though if we ought to think about running output functions in a short-lived memory context instead of the executor's main context. We've considered that before, I think, and it's always been the path of least resistance to fix the output functions instead --- but there will always be another leak I'm afraid. Such is the lot of people who code in C. I worry that the number of memory contexts we're kicking around already is imposing a significant distributed overhead on the system that is hard to measure but nevertheless real, and that this will add to it. Yeah, perhaps. I'd like to think that a MemoryContextReset is cheaper than a bunch of retail pfree's, but it's hard to prove anything without actually coding and testing it --- and on modern machines, effects like cache locality could swamp pure instruction-count gains anyway. Anyway, I committed the narrow fix for the moment. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Sun, Nov 11, 2012 at 5:52 PM, Jeff Davis pg...@j-davis.com wrote: Per-database does sound easier than per-table. I'd have to think about how that would affect shared catalogs though. For now, I'm leaning toward an offline utility to turn checksums on or off, called pg_checksums. It could do so lazily (just flip a switch to enabling in pg_control), or it could do so eagerly and turn it into a fully-protected instance. For the first patch, it might just be an initdb-time option for simplicity. It'd be pretty easy to write a pg_checksums utilitys to turn checksums on/off on a database that is shut down, since the hard part of all of this is to change the state while the database is running. But I think even that doesn't need to be part of the first patch. A small patch that gets committed is better than a big one that doesn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol
Preface: I think there's some great commentary here, and find myself agreeing pretty whole-heartedly. On Tue, Nov 13, 2012 at 2:45 PM, Simon Riggs si...@2ndquadrant.com wrote: On 13 November 2012 17:38, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: The most popular relational database in the world is Microsoft Access, not MySQL. Access appears desirable because it allows a single user to create and use a database (which is very good). But all business databases have a requirement for at least one of: high availability, multi-user access or downstream processing in other parts of the business. That's a mighty sweeping claim, which you haven't offered adequate evidence for. The fact of the matter is that there is *lots* of demand for simple single-user databases, and what I'm proposing is at least a first step towards getting there. I agree there is lots of demand for simple single-user databases and I wish that too. What I don't agree with is something that casts that requirement in stone by architecturally/permanently disallowing secondary connections. Evidence for claims: * The whole Business Intelligence industry relies on being able to re-purpose existing data, forming integrated webs of interconnecting databases. All of that happens after the initial developers write the first version of the database application. * Everybody wants a remote backup, whether its for your mobile phone contact list or your enterprise datastore. People are migrating away from embedded databases in droves for these very reasons. There seems to be a continuum of different sorts of scenarios of more-to-less concurrency that are desirable for some different reasons. From most-to-least, I can see: 1 - Obviously, there's the case that Postgres is eminently good at, of supporting many users concurrently using a database. We love that, let's not break it :-). 2 - We have found it useful to have some extra work processes that do some useful internal things, such as vacuuming, forcing background writes, collecting statistics. And an online backup requires having a second process. 3 - People doing embedded systems find it attractive to attach all the data to the singular user running the system. Witness the *heavy* deployment of SQLite on Android and iOS. People make an assumption that this is a single-process thing, but I am inclined to be a bit skeptical. What they *do* know is that it's convenient to not spawn extra processes and do IPC. That's not quite the same thing as it being a certainty that they definitely want not to have more than one process. 4 - There are times when there *is* certainty about not wanting there to be more than one process. When running pg_upgrade, or, at certain times, when doing streaming replication node status switches, one might have that certainty. Or when reindexing system tables, which needs single user mode. For us to conflate the 3rd and 4th items seems like a mistake to me. The main disadvantage of approaching this via the existing single-user mode is that you won't have any autovacuum, bgwriter, etc, support. But the flip side is that that lack of infrastructure is a positive advantage for certain admittedly narrow use-cases, such as disaster recovery and pg_upgrade. So while I agree that this isn't the only form of single-user mode that we'd like to support, I think it is *a* form we'd like to support, and I don't see why you appear to be against having it at all. I have no problem with people turning things off, I reject the idea that we should encourage people to never be able to turn them back on. Yep. That seems like conflating #2 with #4. It's mighty attractive to have a forcible single process mode to add safety to certain activities. I think we need a sharper knife, though, so we don't ablate off stuff like #2, just because someone imagined that Must Have Single Process!!! was the right doctrine. A more reasonable objection would be that we need to make sure that this isn't foreclosing the option of having a multi-process environment with a single user connection. I don't see that it is, but it might be wise to sketch exactly how that case would work before accepting this. Whatever we provide will become the norm. I don't have a problem with you providing BOTH the proposed single user mode AND the multi-process single user connection mode in this release. But if you provide just one of them and its the wrong one, we will be severely hampered in the future. Yes, I am very much against this project producing a new DBMS architecture that works on top of PostgreSQL data files, yet prevents maintenance, backup, replication and multi-user modes. I see this decision as a critical point for this project, so please consider this objection and where it comes from. I don't think we're necessarily *hugely* hampered by doing one of
Re: [HACKERS] Enabling Checksums
On Mon, Nov 12, 2012 at 4:44 AM, Craig Ringer cr...@2ndquadrant.com wrote: That'll make it hard for VACUUM, hint-bit setting, etc to opportunistically checksum pages whenever they're doing a page write anyway. Is it absurd to suggest using another bitmap, like the FSM or visibility map, to store information on page checksumming while checksumming is enabled but incomplete? As a much smaller file the bitmap could its self be very quickly generated in one pass when checksumming is enabled, with its starting state showing no pages having checksums. Hmm... what if we took this a step further and actually stored the checksums in a separate relation fork? That would make it pretty simple to support enabling/disabling checksums for particular relations. It would also allow us to have a wider checksum, like 32 or 64 bits rather than 16. I'm not scoffing at a 16-bit checksum, because even that's enough to catch a very high percentage of errors, but it wouldn't be terrible to be able to support a wider one, either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory leaks in record_out and record_send
On Tue, Nov 13, 2012 at 3:21 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Nov 13, 2012 at 12:18 AM, Tom Lane t...@sss.pgh.pa.us wrote: I wonder though if we ought to think about running output functions in a short-lived memory context instead of the executor's main context. We've considered that before, I think, and it's always been the path of least resistance to fix the output functions instead --- but there will always be another leak I'm afraid. Such is the lot of people who code in C. I worry that the number of memory contexts we're kicking around already is imposing a significant distributed overhead on the system that is hard to measure but nevertheless real, and that this will add to it. Yeah, perhaps. I'd like to think that a MemoryContextReset is cheaper than a bunch of retail pfree's, but it's hard to prove anything without actually coding and testing it --- and on modern machines, effects like cache locality could swamp pure instruction-count gains anyway. Yeah. The thing that concerns me is that I think we have a pretty decent number of memory contexts where the expected number of allocations is very small ... and we have the context *just in case* we do more than that in certain instances. I've seen profiles where the setup/teardown costs of memory contexts are significant ... which doesn't mean that those examples would perform better with fewer memory contexts, but it's enough to make me pause for thought. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Patch to compute Max LSN of Data Pages
On Tue, Nov 13, 2012 at 11:46 AM, Fujii Masao masao.fu...@gmail.com wrote: Without this utility, it's difficult to calculate the maximum LSN of data page, so basically we needed to take a backup when starting the standby. In the future, thanks to this utility, we can calculate the maximum LSN, and can skip a backup if that LSN is less than the master (i.e., last applied LSN, IOW, timeline switch LSN). Doesn't the minimum recovery point give us that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
Robert Haas robertmh...@gmail.com writes: Hmm... what if we took this a step further and actually stored the checksums in a separate relation fork? That would make it pretty simple to support enabling/disabling checksums for particular relations. It would also allow us to have a wider checksum, like 32 or 64 bits rather than 16. I'm not scoffing at a 16-bit checksum, because even that's enough to catch a very high percentage of errors, but it wouldn't be terrible to be able to support a wider one, either. What happens when you get an I/O failure on the checksum fork? Assuming you're using 8K pages there, that would mean you can no longer verify the integrity of between one and four thousand pages of data. Not to mention the race condition problems associated with trying to be sure the checksum updates hit the disk at the same time as the data-page updates. I think you really have to store the checksums *with* the data they're supposedly protecting. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks
Noah Misch wrote: On Wed, Oct 31, 2012 at 05:22:10PM -0300, Alvaro Herrera wrote: Not really sure about the proposed syntax, but yes clearly we need some other syntax to mean FOR NON KEY UPDATE. I would rather keep FOR UPDATE to mean what I currently call FOR KEY UPDATE. More proposals for the other (weaker) lock level welcome (but if you love FOR NON KEY UPDATE, please chime in too) Agree on having FOR UPDATE without any FOR KEY UPDATE synonym. For the weaker lock, I mildly preferred the proposal of FOR NO KEY UPDATE. NON KEY captures the idea better in English, but NO is close enough and already part of the SQL lexicon. This is the proposal I like best; however there is an asymmetry, because the locking options now are FOR KEY SHARE FOR SHARE FOR NO KEY UPDATE FOR UPDATE I used to have comments such as /* Currently, SELECT ... FOR [KEY] UPDATE/SHARE requires UPDATE privileges */ #define ACL_SELECT_FOR_UPDATE ACL_UPDATE but now they are slightly incorrect because the NO is not illustrated. I guess I could use SELECT ... FOR [NO KEY] UPDATE/SHARE but this leaves out the FOR KEY SHARE case (and can be thought to introduce a FOR NO KEY SHARE case). And getting much more verbose than that is probably not warranted. In some places I would like the use a phrase like the locking clause, but I'm not sure that it's clear enough. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Process waiting for ExclusiveLock on INSERT
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, I had a surprising question today from a customer about exclusive locks on INSERT. His log file shows something like: == LOG: process 1881 still waiting for ExclusiveLock on extension of relation 168318652 of database 58429185 after 1000.065 ms STATEMENT: INSERT INTO (...cols...) VALUES (...values...) == While asking on #postgresql and investigating in the code, I think I hit a theory. When the system is I/O bound and backends are doing shared buffer cleanups, at some point they have to extend a relation with new pages, requiring for an exclusive lock on the relation to forbid anyone else to extend it in the same time. At this time, if multiple backends try to extend the relation, one win, all other wait for the lock, leading to messages in the log file when log_lock_waits is enabled. This lock would comes from src/backend/access/heap/hio.c:432: == /* * Have to extend the relation. * * We have to use a lock to ensure no one else is extending the rel at the * same time, else we will both try to initialize the same new page. We * can skip locking for new or temp relations, however, since no one else * could be accessing them. */ needLock = !RELATION_IS_LOCAL(relation); if (needLock) LockRelationForExtension(relation, ExclusiveLock); == Is this theory correct or this issue should be discussed a bit more on this list ? PFA a small bash script with a small PostgreSQL configuration in comments that reproduce this behavior very often on my laptop, eg: == 2012-11-13 23:15:51 CET [23137]: [1-1] user=postgres,db=test LOG: process 23137 still waiting for ExclusiveLock on extension of relation 28118 of database 28115 after 100.086 ms 2012-11-13 23:15:51 CET [23137]: [2-1] user=postgres,db=test STATEMENT: insert into test select i, md5(i::text) from generate_series(1,1000) AS i == Regards, - -- Jehan-Guillaume de Rorthais http://www.dalibo.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAlCiyT4ACgkQXu9L1HbaT6KbdgCgslQiKjP5bovr/eN5gi1TJB6i 9pcAoI9BpfD/4306xSUZTPUcQTLYHJS3 =HgzB -END PGP SIGNATURE- test_exclusivelock_on_insert.sh Description: application/shellscript -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol
Tom Lane t...@sss.pgh.pa.us writes: I agree there is lots of demand for simple single-user databases and I wish that too. What I don't agree with is something that casts that requirement in stone by architecturally/permanently disallowing secondary connections. If you want secondary connections, then I think you want a postmaster. I would agree. I think you're both talking above each other, and that what Simon is worried about (but I haven't asked him about that before sending that email) is how to change the application setup to switch from single user mode to multi user mode. IIRC the way to implement single user mode in your application is quite low-level with this patch, so switching to multi-user mode is not about just changing the connection string, or is it? The case that this patch is meant to address is one where there is only one client application, period, and you'd rather that the database starts and stops automatically with that application instead of needing any management complexity. Now we can debate whether we want only one process or multiple processes underneath the client application, but I think the restriction to one client connection is a key *feature* not a bug, precisely because it removes a whole bunch of user-visible complexity that we cannot escape otherwise. Well I think your patch would be easier to accept as is if it was documented only as a psql friendly single-user mode. I would really welcome that. embedded database. We've never had an adequate offering for those people before. If we ratchet up the management complexity of single user mode then it still won't be an adequate offering for them. Now, if we're talking about single user mode as in embedded database, I really do think this patch should include a solution to run online maintainance, logical and physical backups, replication, archiving and all the production grade features you expect from PostgreSQL. And then I understand Simon's POV about code complexity and bgworkers for examples, which will *need* to be taken care of in that solution. I think this is nonsense. It's not critical; it's a very small patch that provides a feature of interest to a limited audience. And I don't Yes, it's providing full psql capabilities where we only had that bizare postgres --single interface. Maybe it will make initdb and debugging it easier too. believe it's foreclosing providing other operating modes later, unless maybe people feel this is almost good enough and lose motivation to work on those other operating modes. But if that happens, then I'd say the demand for the other modes isn't as high as you think. Again, my concern on that point after reading Simon's comments is only about the production procedure you have to follow to switch your application from single user mode to multi user mode. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index only scans wiki page
On 13 November 2012 16:37, Robert Haas robertmh...@gmail.com wrote: I found this an interesting read. As one of the people who worked on the feature, I'm sort of curious whether people have any experience yet with how this actually shakes out in the field. Are you (or is anyone) aware of positive/negative field experiences with this feature? Unfortunately, I don't think that I have any original insight about the problems with index-only scans in the field right now. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory leaks in record_out and record_send
* Robert Haas (robertmh...@gmail.com) wrote: Yeah. The thing that concerns me is that I think we have a pretty decent number of memory contexts where the expected number of allocations is very small ... and we have the context *just in case* we do more than that in certain instances. I've seen profiles where the setup/teardown costs of memory contexts are significant ... which doesn't mean that those examples would perform better with fewer memory contexts, but it's enough to make me pause for thought. So, for my 2c, I'm on the other side of this, personally. We have memory contexts for more-or-less exactly this issue. It's one of the great things about PG- it's resiliant and very unlikely to have large or bad memory leaks in general, much of which can, imv, be attributed to our use of memory contexts. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Index only scans wiki page
On 13 November 2012 16:37, Robert Haas robertmh...@gmail.com wrote: I found this an interesting read. As one of the people who worked on the feature, I'm sort of curious whether people have any experience yet with how this actually shakes out in the field. Are you (or is anyone) aware of positive/negative field experiences with this feature? Unfortunately, I don't think that I have any original insight about the problems with index-only scans in the field right now. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Add contrib module functions to docs' function index
Hi all Andreas Heiduk on -bugs suggested that we add the functions provided by contrib modules to the function index in the docs, so it's easier to go from, say, what the heck is idx(...) to finding it in the intarray contrib module. This seems like a good idea and I'd like to pop it in the TODO until I get time to check it out, flagged as a minor/newbie-friendly problem. Any objections? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add contrib module functions to docs' function index
-Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers- ow...@postgresql.org] On Behalf Of Craig Ringer Sent: Tuesday, November 13, 2012 6:15 PM To: PostgreSQL Hackers Subject: [HACKERS] Add contrib module functions to docs' function index Hi all Andreas Heiduk on -bugs suggested that we add the functions provided by contrib modules to the function index in the docs, so it's easier to go from, say, what the heck is idx(...) to finding it in the intarray contrib module. This seems like a good idea and I'd like to pop it in the TODO until I get time to check it out, flagged as a minor/newbie-friendly problem. Any objections? For clarity does this proposal refer to Chapter 9 of the documentation, the Index, or both. If modifying Chapter 9 the function and operator tables should be extended to include a source column with values of base or contrib: module name or something similar. As to the desirability of such a change I concur that it would be a nice usability enhancement to consider beyond just updating the actual Index. David J. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add contrib module functions to docs' function index
On 11/14/2012 07:56 AM, David Johnston wrote: For clarity does this proposal refer to Chapter 9 of the documentation, the Index, or both. If modifying Chapter 9 the function and operator tables should be extended to include a source column with values of base or contrib: module name or something similar. As to the desirability of such a change I concur that it would be a nice usability enhancement to consider beyond just updating the actual Index. Sorry I was unclear. I'm talking about making sure that contrib module functions (and settings) appear in the documentation index ( http://www.postgresql.org/docs/current/static/bookindex.html http://www.postgresql.org/docs/9.2/static/bookindex.html) so it's easy to find a function by name whether it's in core or contrib. This is what I want to add to TODO. Separately, it might also be nice to add the contrib functions to the section 9 tables with an extra column showing their origin, but that's less clearly a good thing. Even if there's a column saying intarray for intarray functions in the array functions list, people will still try to use them without loading the extension and get confused when they're not found. It'll also bloat the listings of core functions. Rather than do that, I'd probably prefer to add a note to relevant sections. For example, in array functions I'd want to add Additional functions that operate only on arrays of integers are available in the a href=...intarray extension/a. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [HACKERS] Further pg_upgrade analysis for many tables
On Tue, Nov 13, 2012 at 07:03:51PM -0500, Bruce Momjian wrote: I am attaching an updated pg_upgrade patch, which I believe is ready for application for 9.3. Correction, here is the proper patch. The previous posted version was had pending merges from the master branch. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_upgrade/file.c b/contrib/pg_upgrade/file.c new file mode 100644 index a5d92c6..d8cd8f5 *** a/contrib/pg_upgrade/file.c --- b/contrib/pg_upgrade/file.c *** copy_file(const char *srcfile, const cha *** 221,281 #endif - /* - * load_directory() - * - * Read all the file names in the specified directory, and return them as - * an array of char * pointers. The array address is returned in - * *namelist, and the function result is the count of file names. - * - * To free the result data, free each (char *) array member, then free the - * namelist array itself. - */ - int - load_directory(const char *dirname, char ***namelist) - { - DIR *dirdesc; - struct dirent *direntry; - int count = 0; - int allocsize = 64; /* initial array size */ - - *namelist = (char **) pg_malloc(allocsize * sizeof(char *)); - - if ((dirdesc = opendir(dirname)) == NULL) - pg_log(PG_FATAL, could not open directory \%s\: %s\n, - dirname, getErrorText(errno)); - - while (errno = 0, (direntry = readdir(dirdesc)) != NULL) - { - if (count = allocsize) - { - allocsize *= 2; - *namelist = (char **) - pg_realloc(*namelist, allocsize * sizeof(char *)); - } - - (*namelist)[count++] = pg_strdup(direntry-d_name); - } - - #ifdef WIN32 - /* - * This fix is in mingw cvs (runtime/mingwex/dirent.c rev 1.4), but not in - * released version - */ - if (GetLastError() == ERROR_NO_MORE_FILES) - errno = 0; - #endif - - if (errno) - pg_log(PG_FATAL, could not read directory \%s\: %s\n, - dirname, getErrorText(errno)); - - closedir(dirdesc); - - return count; - } - - void check_hard_link(void) { --- 221,226 diff --git a/contrib/pg_upgrade/pg_upgrade.h b/contrib/pg_upgrade/pg_upgrade.h new file mode 100644 index 3058343..f35ce75 *** a/contrib/pg_upgrade/pg_upgrade.h --- b/contrib/pg_upgrade/pg_upgrade.h *** *** 7,13 #include unistd.h #include assert.h - #include dirent.h #include sys/stat.h #include sys/time.h --- 7,12 *** const char *setupPageConverter(pageCnvCt *** 366,372 typedef void *pageCnvCtx; #endif - int load_directory(const char *dirname, char ***namelist); const char *copyAndUpdateFile(pageCnvCtx *pageConverter, const char *src, const char *dst, bool force); const char *linkAndUpdateFile(pageCnvCtx *pageConverter, const char *src, --- 365,370 diff --git a/contrib/pg_upgrade/relfilenode.c b/contrib/pg_upgrade/relfilenode.c new file mode 100644 index 33a867f..d763ba7 *** a/contrib/pg_upgrade/relfilenode.c --- b/contrib/pg_upgrade/relfilenode.c *** *** 17,25 static void transfer_single_new_db(pageCnvCtx *pageConverter, FileNameMap *maps, int size); ! static void transfer_relfile(pageCnvCtx *pageConverter, ! const char *fromfile, const char *tofile, ! const char *nspname, const char *relname); /* --- 17,24 static void transfer_single_new_db(pageCnvCtx *pageConverter, FileNameMap *maps, int size); ! static void transfer_relfile(pageCnvCtx *pageConverter, FileNameMap *map, ! const char *suffix); /* *** static void *** 131,185 transfer_single_new_db(pageCnvCtx *pageConverter, FileNameMap *maps, int size) { - char old_dir[MAXPGPATH]; - char file_pattern[MAXPGPATH]; - char **namelist = NULL; - int numFiles = 0; int mapnum; ! int fileno; ! bool vm_crashsafe_change = false; ! ! old_dir[0] = '\0'; ! ! /* Do not copy non-crashsafe vm files for binaries that assume crashsafety */ if (old_cluster.controldata.cat_ver VISIBILITY_MAP_CRASHSAFE_CAT_VER new_cluster.controldata.cat_ver = VISIBILITY_MAP_CRASHSAFE_CAT_VER) ! vm_crashsafe_change = true; for (mapnum = 0; mapnum size; mapnum++) { ! char old_file[MAXPGPATH]; ! char new_file[MAXPGPATH]; ! ! /* Changed tablespaces? Need a new directory scan? */ ! if (strcmp(maps[mapnum].old_dir, old_dir) != 0) ! { ! if (numFiles 0) ! { ! for (fileno = 0; fileno numFiles; fileno++) ! pg_free(namelist[fileno]); ! pg_free(namelist); ! } ! ! snprintf(old_dir, sizeof(old_dir), %s, maps[mapnum].old_dir); ! numFiles = load_directory(old_dir, namelist); ! } ! ! /* Copying files might take some time, so give feedback. */ ! ! snprintf(old_file, sizeof(old_file), %s/%u, maps[mapnum].old_dir, ! maps[mapnum].old_relfilenode); ! snprintf(new_file,
Re: [HACKERS] Doc patch, index search_path where it's used to secure functions
On Fri, 2012-09-28 at 12:17 -0500, Karl O. Pinc wrote: On 09/28/2012 11:28:39 AM, Karl O. Pinc wrote: Doc patch, index search_path where it's used to secure functions. search_path-securing.patch Second version. Should be indexing the concept, not the run-time setting. Well, I'm not sure. We currently have three index entries on the topic: search path search_path search_path configuration parameter I think I'd put them all under search_path. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doc patch, further describe and-mask nature of the permission system
On Sat, 2012-09-29 at 01:16 -0500, Karl O. Pinc wrote: This patch makes some sweeping statements. Unfortunately, they are wrong. What you term the additive nature is really only a special case in the relationship between table and column privileges. Schema and database privileges are completely separate things. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Further pg_upgrade analysis for many tables
On Wed, Nov 14, 2012 at 2:03 AM, Bruce Momjian br...@momjian.us wrote: At 64k I see pg_upgrade taking 12% of the duration time, if I subtract out the dump/restore times. My percentage numbers only included CPU time and I used SSD storage. For the most part there was no IO wait to speak of, but it's completely expected that thousands of link calls are not free. Postgres time itself breaks down with 10% for shutdown checkpoint and 90% for regular running, consisting of 16% parsing, 13% analyze, 20% plan, 30% execute, 11% commit (AtEOXact_RelationCache) and 6% network. That SVG graph was quite impressive. I used perf and Gprof2Dot for this. I will probably do a blog post on how to generate these graphs. It's much more useful for me than a plain flat profile as I don't know by heart which functions are called by which. It looks to me that most benefit could be had from introducing more parallelism. Are there any large roadblocks to pipelining the dump and restore to have them happen in parallel? I talked to Andrew Dustan about parallelization in pg_restore. First, we currently use pg_dumpall, which isn't in the custom format required for parallel restore, but if we changed to custom format, create table isn't done in parallel, only create index/check constraints, and trigger creation, etc. Not sure if it worth perusing this just for pg_upgrade. I agree that parallel restore for schemas is a hard problem. But I didn't mean parallelism within the restore, I meant that we could start both postmasters and pipe the output from dump directly to restore. This way the times for dumping and restoring can overlap. Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] recursive view syntax
I noticed we don't implement the recursive view syntax, even though it's part of the standard SQL feature set for recursive queries. Here is a patch to add that. It basically converts CREATE RECURSIVE VIEW name (columns) AS SELECT ...; to CREATE VIEW name AS WITH RECURSIVE name (columns) AS (SELECT ...) SELECT columns FROM name; diff --git a/doc/src/sgml/ref/create_view.sgml b/doc/src/sgml/ref/create_view.sgml index 838bf48..c13f3ec 100644 --- a/doc/src/sgml/ref/create_view.sgml +++ b/doc/src/sgml/ref/create_view.sgml @@ -21,7 +21,7 @@ refsynopsisdiv synopsis -CREATE [ OR REPLACE ] [ TEMP | TEMPORARY ] VIEW replaceable class=PARAMETERname/replaceable [ ( replaceable class=PARAMETERcolumn_name/replaceable [, ...] ) ] +CREATE [ OR REPLACE ] [ TEMP | TEMPORARY ] [ RECURSIVE ] VIEW replaceable class=PARAMETERname/replaceable [ ( replaceable class=PARAMETERcolumn_name/replaceable [, ...] ) ] [ WITH ( replaceable class=PARAMETERview_option_name/replaceable [= replaceable class=PARAMETERview_option_value/replaceable] [, ... ] ) ] AS replaceable class=PARAMETERquery/replaceable /synopsis @@ -81,6 +81,23 @@ titleParameters/title /varlistentry varlistentry +termliteralRECURSIVE//term +listitem + para + Creates a recursive view. The syntax +synopsis +CREATE RECURSIVE VIEW replaceablename/ (replaceablecolumns/) AS SELECT replaceable.../; +/synopsis + is equivalent to +synopsis +CREATE VIEW replaceablename/ AS WITH RECURSIVE replaceablename/ (replaceablecolumns/) AS (SELECT replaceable.../) SELECT replaceablecolumns/ FROM replaceablename/; +/synopsis + A view column list must be specified for a recursive view. + /para +/listitem + /varlistentry + + varlistentry termreplaceable class=parametername/replaceable/term listitem para @@ -191,6 +208,16 @@ titleExamples/title literal*/ was used to create the view, columns added later to the table will not be part of the view. /para + + para + Create a recursive view consisting of the numbers from 1 to 100: +programlisting +CREATE RECURSIVE VIEW nums_1_100 (n) AS +VALUES (1) +UNION ALL +SELECT n+1 FROM nums_1_100 WHERE n 100; +/programlisting + /para /refsect1 refsect1 diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index e4ff76e..159096a 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -164,6 +164,7 @@ static void SplitColQualList(List *qualList, static void processCASbits(int cas_bits, int location, const char *constrType, bool *deferrable, bool *initdeferred, bool *not_valid, bool *no_inherit, core_yyscan_t yyscanner); +static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %} @@ -7834,6 +7835,30 @@ ViewStmt: CREATE OptTemp VIEW qualified_name opt_column_list opt_reloptions n-options = $8; $$ = (Node *) n; } + | CREATE OptTemp RECURSIVE VIEW qualified_name '(' columnList ')' opt_reloptions +AS SelectStmt +{ + ViewStmt *n = makeNode(ViewStmt); + n-view = $5; + n-view-relpersistence = $2; + n-aliases = $7; + n-query = makeRecursiveViewSelect(n-view-relname, n-aliases, $11); + n-replace = false; + n-options = $9; + $$ = (Node *) n; +} + | CREATE OR REPLACE OptTemp RECURSIVE VIEW qualified_name '(' columnList ')' opt_reloptions +AS SelectStmt +{ + ViewStmt *n = makeNode(ViewStmt); + n-view = $7; + n-view-relpersistence = $4; + n-aliases = $9; + n-query = makeRecursiveViewSelect(n-view-relname, n-aliases, $13); + n-replace = true; + n-options = $11; + $$ = (Node *) n; +} ; opt_check_option: @@ -13541,6 +13566,60 @@ processCASbits(int cas_bits, int location, const char *constrType, } } +/*-- + * Recursive view transformation + * + * Convert + * + * CREATE RECURSIVE VIEW relname (aliases) AS query + * + * to + * + * CREATE VIEW relname (aliases) AS + * WITH RECURSIVE relname (aliases) AS (query) + * SELECT aliases FROM relname + * + * Actually, just the WITH ... part, which is then inserted into the original + * view definition as the query. + * -- + */ +static Node * +makeRecursiveViewSelect(char *relname, List *aliases, Node *query) +{ + SelectStmt *s = makeNode(SelectStmt); + WithClause *w = makeNode(WithClause); + CommonTableExpr *cte = makeNode(CommonTableExpr); + List *tl = NIL; + ListCell *lc; + + cte-ctename = relname; + cte-aliascolnames = aliases; + cte-ctequery = query; + cte-location = -1; + + w-recursive = true; + w-ctes = list_make1(cte); + w-location = -1; + + foreach (lc, aliases) + { + ResTarget *rt = makeNode(ResTarget); + + rt-name = NULL; + rt-indirection = NIL; + rt-val = makeColumnRef(strVal(lfirst(lc)), NIL, -1, 0); + rt-location = -1; + + tl = lappend(tl, rt); + } + + s-targetList = tl; + s-fromClause = list_make1(makeRangeVar(NULL, relname, -1)); + s-withClause = w; + + return (Node *)
Re: [HACKERS] recursive view syntax
Peter Eisentraut pete...@gmx.net writes: I noticed we don't implement the recursive view syntax, even though it's part of the standard SQL feature set for recursive queries. Here is a patch to add that. Can't you simplify that by using SELECT * FROM name? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL
On Wednesday, November 14, 2012 12:25 AM Robert Haas wrote: On Mon, Nov 12, 2012 at 10:59 PM, Amit kapila amit.kap...@huawei.com wrote: Is the above opinion about only locking or even on a way to write the changed things in a file as mentioned in point-1 in mail chain upthread. (Point-1: 1. While writing .auto file, it will always assume that .auto file contain all config parameters. Now as this .auto file is of fixed format and fixed record size, it can directly write a given record to its particular position.) What my thinking was that if we can decide that the format and size of each configuration is fixed, it can be directly written without doing anything for it in memory. Uh, no, I don't think that's a good idea. IMHO, what we should do is: 1. Read postgresql.conf.auto and remember all the settings we saw. If we see something funky like an include directive, barf. 2. Forget the value we remembered for the particular setting being changed. Instead, remember the user-supplied new value for that parameter. 3. Write a new postgresql.conf.auto based on the information remembered in steps 1 and 2. I am okay with implementing the above way because as per my understanding this is almost very similar to what I have mentioned in my initial proposal (Point-5 in Algorithm of Alter System Set ...). http://archives.postgresql.org/pgsql-hackers/2012-10/msg01509.php However as now Greg suggested to explore GUC concept as well, so I would like to check and see the feasibility by that method. The only reason I have mentioned about fixed format and fixed record size concept is that during previous discussions for writing the file with GUC, it came up that is it possible to write file without reading it in current session. (-- It seems to me that we ought to be able to rewrite a machine-generated configuration file without loading those values into the current session.) Now on second thought it seems to me may be you want to say by above comment was without loading into session specific GUC. Of course, if we go with one-file-per-setting, then this becomes even simpler: just clobber the file for the single setting being updated - creating it if it exists - and ignore all the rest. I don't personally favor that approach because I think I think it's clunky to manage, but YMMV. With either approach, it's worth noting that a RESET variant of this could be useful - which would either remove the chosen setting from postgresql.conf.auto, or remove the file containing the automatically-set value for that setting. I think my personal favorite syntax is: ALTER SYSTEM .. SET wunk = 'thunk'; ALTER SYSTEM .. RESET wunk; But I'm OK with something else if there's consensus. I don't particularly like SET PERSISTENT because I think this is more like ALTER DATABASE .. SET than it is like SET LOCAL, but IJWH. I think for this there are multiple ways, one is Alter System .., other is provide this through built-in function. For first version may be I will go with built-in function Approach, then if there is consensus to give it through Alter System, we can change it. One advantage, I am seeing in your above suggestion is that a method to provide RESET will be better with ALTER SYSTEM rather than built-in function. For the same to achieve through built-in, I think one way to provide is to give a separate function. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doc patch, index search_path where it's used to secure functions
On 11/13/2012 08:46:19 PM, Peter Eisentraut wrote: Well, I'm not sure. We currently have three index entries on the topic: search path search_path search_path configuration parameter I think I'd put them all under search_path. Ok. I think you are right that they need to all be under one index entry. It might make sense to have a search term entry that says see search_term, but since the two entries would be right next to each other in the index this seems overkill. I'm going to send this in as a single patch that fixes all the search path related index entries: search_path-index.patch (replaces search_path-normalize.patch and search_path-securing_v2.patch) This new patch also changes the search path index in doc/src/sgml/func.sgml. Perhaps I just don't understand the logic in the indexing in that file but the search path index there seems to point to the entire file/nowhere useful. Since there are 2 places in the file that are concerned with search path I've removed the global index into the file and added new index entries. This also moved the schema, current index target. (Since that too involves search path.) I've also added a seealso entry to schema, current to the top-level search_path index entry. If you want this broken up into smaller patches let me know. Regards, Karl k...@meme.com Free Software: You don't pay back, you pay forward. -- Robert A. Heinlein diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 88cea3d..3bad24c 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4769,7 +4769,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; varlistentry id=guc-search-path xreflabel=search_path termvarnamesearch_path/varname (typestring/type)/term indexterm - primaryvarnamesearch_path/ configuration parameter/primary + primaryvarnamesearch_path/varname/primary + secondaryconfiguration parameter/secondary /indexterm indextermprimarypath/secondaryfor schemas// listitem diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index d6e5d64..99ebd10 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -1698,7 +1698,8 @@ CREATE TABLE public.products ( ... ); titleThe Schema Search Path/title indexterm -primarysearch path/primary +primaryvarnamesearch_path/varname/primary +seealsoschema, current/seealso /indexterm indexterm @@ -1735,7 +1736,8 @@ CREATE TABLE public.products ( ... ); /para indexterm -primarysearch_path/primary +primaryvarnamesearch_path/varname/primary +secondaryruntime variable/secondary /indexterm para diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index f8f63d8..dc6ca1e 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -12944,16 +12944,6 @@ postgres=# SELECT * FROM unnest2(ARRAY[[1,2],[3,4]]); /indexterm indexterm -primaryschema/primary -secondarycurrent/secondary - /indexterm - - indexterm -primarysearch path/primary -secondarycurrent/secondary - /indexterm - - indexterm primarysession_user/primary /indexterm @@ -12980,6 +12970,11 @@ postgres=# SELECT * FROM unnest2(ARRAY[[1,2],[3,4]]); the current user is the quoteeffective user/quote. /para + indexterm +primaryschema/primary +secondarycurrent/secondary + /indexterm + para functioncurrent_schema/function returns the name of the schema that is first in the search path (or a null value if the search path is @@ -13493,6 +13488,13 @@ SELECT has_function_privilege('joeuser', 'myfunc(int, text)', 'execute'); are immediately available without doing commandSET ROLE/. /para + indexterm +primaryvarnamesearch_path/varname/primary +secondary + object visibility given an unqualified name +/secondary + /indexterm + para xref linkend=functions-info-schema-table shows functions that determine whether a certain object is firsttermvisible/ in the diff --git a/doc/src/sgml/ref/create_function.sgml b/doc/src/sgml/ref/create_function.sgml index 4336e4b..cb5ee91 100644 --- a/doc/src/sgml/ref/create_function.sgml +++ b/doc/src/sgml/ref/create_function.sgml @@ -673,6 +673,11 @@ SELECT * FROM dup(42); refsect1 id=sql-createfunction-security titleWriting literalSECURITY DEFINER/literal Functions Safely/title + indexterm + primaryvarnamesearch_path/varname/ + secondaryuse in securing functions/ + /indexterm + para Because a literalSECURITY DEFINER/literal function is executed with the privileges of the user that created it, care is needed to -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL
On Tuesday, November 13, 2012 11:43 PM Josh Berkus wrote: On 11/12/12 7:59 PM, Amit kapila wrote: On Monday, November 12, 2012 12:07 PM Greg Smith wrote: On 11/9/12 11:59 PM, Amit kapila wrote: Please let me know if there are any objections or problems in above method of implementation, else I can go ahead to prepare the patch for the coming CF. It may be the case that the locking scheme Robert described is the best approach here. It seems kind of heavy to me though. I suspect that some more thinking about it might come up with something better. So, here's the problem I'm seeing with having a single .auto file: when we write settings to a file, are we writing a *single* setting or *all of a user's current settings*? Single setting. I was imagining writing single, specific settings, which inevitably leads to one-setting-per-file, e.g.: SET PERSISTENT work_mem = 256MB; Yes, from beginning what I was discussing was setting of single config parameter as in your example. However, it can be done with one-file for all variables as well. I have already mentioned 2 ways of doing it, one is fixed format and fixed size file, other is similar to what Robert has detailed in his mail (http://archives.postgresql.org/pgsql-hackers/2012-11/msg00572.php). What Amit seems to be talking about is more EXPORT SETTINGS, where you dump all current settings in the session to a file. This seems likely to produce accidental changes when the user writes out settings they've forgotten they changed. I think may be I was not clear enough in my previous mails, but for sure whatever I am talking is never related to dump all current settings in the session to a file. In fact both my ideas (fixed format file, initial proposal) was not to touch or check the current session parameters. There is only one Approach which is to see if from GUC, we can write the file that talks about writing multiple parameters. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory leaks in record_out and record_send
On Tue, Nov 13, 2012 at 05:50:08PM -0500, Stephen Frost wrote: * Robert Haas (robertmh...@gmail.com) wrote: Yeah. The thing that concerns me is that I think we have a pretty decent number of memory contexts where the expected number of allocations is very small ... and we have the context *just in case* we do more than that in certain instances. I've seen profiles where the setup/teardown costs of memory contexts are significant ... which doesn't mean that those examples would perform better with fewer memory contexts, but it's enough to make me pause for thought. So, for my 2c, I'm on the other side of this, personally. We have memory contexts for more-or-less exactly this issue. It's one of the great things about PG- it's resiliant and very unlikely to have large or bad memory leaks in general, much of which can, imv, be attributed to our use of memory contexts. If the problem is that we create memory context overhead which is not necessary in many cases, perhaps we can reduce the overhead somehow. IIRC we have a seperate function for resetting a context and freeing it entirely. If there was a quick test we could do such that resetting a context did nothing unless at least (say) 16k had been allocated, that might reduce the cost for many very small allocations. Ofcourse, unless someone comes up with a way to measure the cost this is all handwaving, but it might a nice project for someone interested in learning to hack postgres. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature