Re: [HACKERS] Avoiding timeline generation
On 25.03.2011 03:00, Daniel Farina wrote: Here is the mechanism: I want to author a recovery.conf to perform some amount of restore_command or streaming replication based recovery, but I do *not* want to generate a new timeline. Rather, I want to stay in hot standby mode to allow read-only connections. That's exactly what the standby mode is for. Add standby_mode=on to recovery.conf, and the server will do exactly that. Perhaps the documentation is not clear on this. Any suggestions on how to improve that? InArchiveRecovery gets set to true as soon as readRecoveryCommandFile completes basically normally, and it looks as though that will ensure we will get a new timeline. If one tries a bizarre hack, like ensuring the restore_command does not terminate, one never finishes recovery That's what pg_standby does. That was the only option before standby_mode was introduced, in version 9.0, although we didn't have hot standby until 9.0 either. -- as one may expect -- and one cannot connect to the server -- which one may not expect is necessarily the case presuming hot standby, if the server was terminated cleanly. That's not true. As long as you enable hot standby, the server will accept connections while restore command is running. The things I want to do with the ability to suppress a new timeline: * Offline WAL application -- I want to be able to bring up a second server, perform some amount of point in time recovery, and then stop and archive. It would be nice to support read-only queries in this case to test the recovered database. The goal of this is to reduce recovery time in a disaster scenario without tying up resources on a live server. Yep, that can be done with standby_mode=on. * The ability to quiesce a system by bringing it into read-only state that generates no new WAL while still being able to ship old WAL. In theory it should be possible to stop a server, put it into hot standby mode by creating a recovery.conf file, and restart, but it won't try ship the old WAL after that. When you stop a server it will try to archive all existing WAL before exiting, though. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pre-set Hint bits/VACUUM FREEZE on data load..?
On 24.03.2011 23:54, Stephen Frost wrote: * Heikki Linnakangas (heikki.linnakan...@enterprisedb.com) wrote: The problem is that you still need to track which queries within the transaction can see the tuples. For example: Wow, that's a good point wrt cursors. Sounds more and more like we'd need a special data-loading mode for this where we'd have to disallow those options. I've been thinking that's a frowned-upon approach in general, but let me ask specifically- are we uninterested in such a special 'data-load' mode? Or do we expect that the limitations would be too great to make it useful enough for users? That last I don't think I agree with.. I don't think we should put the onus on the user to choose the right data loading mode. If we can reliably detect the cases where it's safe do these tricks, we can transparently apply them when possible. I would be cool with tricks that apply only in narrow cases, as long as we don't require the user to do anything. That said, it can be surprising if some trivial change to the schema happens to disable the optimization, and your query performance after data load suddenly plummets. That can be a pain for the DBA to debug. I'm skeptical of changes that need any extra checks in the fast-path of HeapTupleSatisfiesMVCC. That is a hot spot already, any extra cycles there would add up to hurt performance. Careful performance testing is required. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding timeline generation
On Fri, Mar 25, 2011 at 12:38 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 25.03.2011 03:00, Daniel Farina wrote: Here is the mechanism: I want to author a recovery.conf to perform some amount of restore_command or streaming replication based recovery, but I do *not* want to generate a new timeline. Rather, I want to stay in hot standby mode to allow read-only connections. That's exactly what the standby mode is for. Add standby_mode=on to recovery.conf, and the server will do exactly that. Perhaps the documentation is not clear on this. Any suggestions on how to improve that? I was actually pretty well aware of this option, if that is the case, I fat-fingered something or had a thinko (mental bit flip?) and then what I thought I knew about standby_mode=on was invalidated (perhaps incorrectly). I will confirm tomorrow. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pre-set Hint bits/VACUUM FREEZE on data load..?
On 25.03.2011 09:51, Heikki Linnakangas wrote: I don't think we should put the onus on the user to choose the right data loading mode. If we can reliably detect the cases where it's safe do these tricks, we can transparently apply them when possible. I would be cool with tricks that apply only in narrow cases, as long as we don't require the user to do anything. I believe the conditions for being able to set hint bits immediately at COPY are: 1. The table has been created or truncated in the same transaction 2. We are not in a subtransaction (or the table was created and truncated in the same subtransaction) 3. There are no open portals 4. Executing the COPY doesn't need to run any unsafe code that might access the same table. This includes triggers, check constraints and input functions. An expression is safe if it is immutable. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pre-set Hint bits/VACUUM FREEZE on data load..?
On Fri, Mar 25, 2011 at 1:00 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: 1. The table has been created or truncated in the same transaction 2. We are not in a subtransaction (or the table was created and truncated in the same subtransaction) 3. There are no open portals 4. Executing the COPY doesn't need to run any unsafe code that might access the same table. This includes triggers, check constraints and input functions. An expression is safe if it is immutable. It sounds like these cases would, at the very least, be met by some forms of pg_restore...that's a highly desirable property, I feel. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pre-set Hint bits/VACUUM FREEZE on data load..?
On 25.03.2011 00:15, Stephen Frost wrote: At the start of a load, we check if the table was created in the current transaction. If so, we check if we've already done a load which used the frozen XID. If we have, then we use the normal mechanics. If we havn't, then we stuff what the XID would have been in memory somewhere, mark that we've used the frozen XID, and load the data using the frozen XID. On subsequent queries, if we come across a frozen XID in a table created in this transaction, we use the XID we've stored instead of the frozen XID. The tricky part here is how to check if the table was created in the same transaction, within HeapTupleSatisfiesMVCC, with minimal overhead. If you do it naively, the check will be executed at every single tuple read in the system. It has to be really really fast. I don't want to discourage, maybe it's possible with some clever code arrangement. However, it needs a lot of performance testing. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] maximum digits for NUMERIC
On Sun, Mar 20, 2011 at 08:14:21PM -0400, Noah Misch wrote: On Fri, Mar 11, 2011 at 11:36:14AM +, Gianni Ciolli wrote: maybe we should change the 1000 digits here: http://developer.postgresql.org/pgdocs/postgres/datatype-numeric.html#DATATYPE-NUMERIC-DECIMAL because ISTM that up to 2^17 digits are supported (which makes more sense than 1000). Agreed. The documentation is suggestive of this limit: # CREATE TABLE n (c numeric(1001,0)); ERROR: NUMERIC precision 1001 must be between 1 and 1000 LINE 1: CREATE TABLE n (c numeric(1001,0)); However, that's indeed just a limit of the numeric typmod representation, not the data type itself. An unqualified numeric column hits no such limit. For the record, the limits I found from my tests are: * 2^17 - 1 maximum total digits * 2^14 - 1 maximum fractional digits (I did tests as I couldn't extract any obvious limit from the source code of numeric.c) Best regards, Dr. Gianni Ciolli - 2ndQuadrant Italia PostgreSQL Training, Services and Support gianni.cio...@2ndquadrant.it | www.2ndquadrant.it -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: making write location work (was: Efficient transaction-controlled synchronous replication)
On Fri, Mar 25, 2011 at 6:11 AM, Robert Haas robertmh...@gmail.com wrote: On Thu, Mar 24, 2011 at 2:28 PM, Simon Riggs si...@2ndquadrant.com wrote: The protocol supports sending two values, so two are displayed. If you wish to remove one from the display then that only makes sense if you also prevent the protocol from supporting two values. There is no benefit in doing that, so why do it? We are going to put that back in 9.2 if you remove it now. Why not leave it, so we don't need to rewrite all the monitoring tools that will use this view? What are you planning to use write_location for? BTW, I'm thinking to add recv_location (not write_location) in 9.2 to support another sync rep mode which makes transactions wait until the standby has received (not fsync'd) the WAL. You're planning the same? If we're going to put it back in 9.2, then we shouldn't remove it now. We should just make it work. It's a three line patch. If 9.2 is going to meaningfully distinguish between write location and flush location, then we may as well do the same thing in 9.1. Then we'll be ahead of the game: not only will the view have the same columns in both releases, but they'll actually have the same semantics in both releases. +1 Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] really lazy vacuums?
2011/3/24 Jim Nasby j...@nasby.net: On Mar 22, 2011, at 11:46 AM, Cédric Villemain wrote: 2011/3/22 Greg Stark gsst...@mit.edu: On Mon, Mar 21, 2011 at 6:08 PM, Jim Nasby j...@nasby.net wrote: Has anyone looked at the overhead of measuring how long IO requests to the kernel take? If we did that not only could we get an idea of what our IO workload looked like, we could also figure out whether a block came out of cache or not. That information could potentially be useful to the planner, but even if the database couldn't use that knowledge itself it would be a damn useful statistic to have... IMHO, far more useful than our current hit rate statistics. I've done this -- actually better, I used mincore to actually check whether the block was in cache before issuing the read -- but it turns out you can't get what you're looking for this way. The linux fincore() syscall never get in the kernel, maybe something to revive... Is there an equivalent in other OSes? Could we use time measurement as an alternative if not? fincore() syscall is a shortcut for mmap+mincore calls, suggested by people working on libprefetch. see http://lwn.net/Articles/371538/ The alternative via time measurement is interesting, should be easy to ouput both measures in pg_statio_* and see what happens... It turns out when you do this you see one block being read from disk followed by n blocks that all appear to be cache hits. Because they've been prefetched by the kernel. I did the same, I now believe that it is not very important to have the very exact numbers. Prefetech blocks *are* in memory when we request them, the first read access read more than one block because the cost is the same. Yeah... there's places in the planner where we make guesses as to the likelyhood of something being in-cache. If we could actually track complete hit rate over time (PG buffers + FS cache), then we wouldn't have to guess at things anymore. And having this info in pg_stats would be extremely valuable. yes, also Robert wrote some interesting items to keep in mind when thinking about that, in another thread, recently. A fs-cache snapshot or just a 'percent_in_cache' per relation/file (?) is easy to do/add to some auto-analyze daemon. *but* making a good use of it in the planner is not as trivial as it looks. (i.e. without breaking what is working well) Once I get time to add hooks in costsize.c, a simple extension can do the trick. (just need some shared_buffers to keep FS-pg_stats and hooks to use it in some places). What you end up with is actually something like the number of iops which is also an interesting measure but not really what you were looking for. My getrusage patch, which I should still dig out though it's rather too late to be committing now unless someone tells me otherwise, would tell you how much i/o a plan node actually did. But you won't know which blocks did the i/o since I was only tracking totals for the plan node. That's probably what you're looking for here. Please show us the patch :) -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pre-set Hint bits/VACUUM FREEZE on data load..?
On Fri, Mar 25, 2011 at 8:09 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: The tricky part here is how to check if the table was created in the same transaction, within HeapTupleSatisfiesMVCC, with minimal overhead. If you do it naively, the check will be executed at every single tuple read in the system. It has to be really really fast. You could have a single global boolean variable to indicate whether any tables have been created in this transaction and inserted into using this frozenxid hack in this transaction yet. That would be branch predicted as false in any decent processor which shouldn't cost anything. If it's true then you go off and check a small hash table. The question is how often would the insert trigger with false positives. That is, how often do people create tables, insert a small number of records, then go off and do complex performance-sensitive things in the same transaction. I suppose this does happen due to the fairly common pattern of I want temporary tables but they need to be visible to this other transaction. For example for batch jobs where you want to be able to display some sort of progress indicator. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pre-set Hint bits/VACUUM FREEZE on data load..?
* Greg Stark (gsst...@mit.edu) wrote: You could have a single global boolean variable to indicate whether any tables have been created in this transaction and inserted into using this frozenxid hack in this transaction yet. This was exactly where I was going, and, honestly, I was wondering if there'd be a way to possibly even avoid running through HeapTupleIsVisible for those relations entirely.. Up to the point where some further action modifies the table again and forces us to have to go through that path. I'm not sure I feel it's a real 'bad' hack. :) The question is how often would the insert trigger with false positives. That is, how often do people create tables, insert a small number of records, then go off and do complex performance-sensitive things in the same transaction. I suppose this does happen due to the fairly common pattern of I want temporary tables but they need to be visible to this other transaction. For example for batch jobs where you want to be able to display some sort of progress indicator. Right, that kind of issue is why I had been thinking it might be a different mode, but it also may not be all that bad to use an approach like what you described above. We could descide to not do it for temporary tables, perhaps, or, if we could possibly avoid having to check for visibility at the tuple level for these relations, perhaps it'd be faster on those use-cases..? We would need to figure out, also, if people are doing batch operations.. eg: they load 1 records at a time into a table. We wouldn't want to lose these optimizations in those cases, provied they're done one after another. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
On Sat, Mar 19, 2011 at 12:07 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Mar 18, 2011 at 10:55 AM, Greg Stark gsst...@mit.edu wrote: On Thu, Mar 17, 2011 at 5:46 PM, Robert Haas robertmh...@gmail.com wrote: What makes more sense to me after having thought about this more carefully is to simply make a blanket rule that when synchronous_replication=on, synchronous_commit has no effect. That is easy to understand and document. For what it's worth has no effect doesn't make much sense to me. It's a boolean, either commits are going to block or they're not. What happened to the idea of a three-way switch? synchronous_commit = off synchronous_commit = disk synchronous_commit = replica With on being a synonym for disk for backwards compatibility. Then we could add more options later for more complex conditions like waiting for one server in each data centre or waiting for one of a certain set of servers ignoring the less reliable mirrors, etc. This is similar to what I suggested upthread, except that I suggested on/local/off, with the default being on. That way if you set synchronous_standby_names, you get synchronous replication without changing another setting, but you can say local instead if for some reason you want the middle behavior. If we're going to do it all with one GUC, I think that way makes more sense. If you're running sync rep, you might still have some transactions that you don't care about, but that's what async commit is for. It's a funny kind of transaction that we're OK with losing if we have a failover but we're not OK with losing if we have a local crash from which we recover without failing over. I'm OK with this. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding timeline generation
On Mar 24, 2011, at 9:00 PM, Daniel Farina dan...@heroku.com wrote: * Offline WAL application -- I want to be able to bring up a second server, perform some amount of point in time recovery, and then stop and archive. It would be nice to support read-only queries in this case to test the recovered database. The goal of this is to reduce recovery time in a disaster scenario without tying up resources on a live server. The new 9.1 feature paise_at_recovery_target seems like it might be what you need here. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
On Sat, Mar 19, 2011 at 4:29 AM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, 2011-03-18 at 20:19 +0100, Markus Wanner wrote: Simon, On 03/18/2011 05:19 PM, Simon Riggs wrote: Simon Riggssi...@2ndquadrant.com wrote: In PostgreSQL other users cannot observe the commit until an acknowledgement has been received. On other nodes as well? To me that means the standby needs to hold back COMMIT of an ACKed transaction, until receives a re-ACK from the master, that it committed the transaction there. How else could the slave know when to commit its ACKed transactions? We could do that easily enough, actually, if we wished. Do we wish? No. I'm not sure what's the problem with seeing from the standby the data which is not visible on the master yet? And, I'm really not sure whether that problem can be solved by making the data visible on the master before the standby. If we really want to see the consistent data from each node, we should implement and use a cluster-wide snapshot as well as Postgres-XC does. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] How to Make a pg_filedump
Hi All I am trying to write a pg_filedump to read dead rows in pgsql, I am relatively new to postgresql and have been trying to find out how to write and compile a pg_filedump to help display dead rows. Please can anyone help me i) Explain how i can write a pg_filedump to display dead rows or how to write pg_filedump ii) Explain how to compile a pg_filedump Thanks -- View this message in context: http://postgresql.1045698.n5.nabble.com/How-to-Make-a-pg-filedump-tp4264126p4264126.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] How to Make a pg_filedump
Hi All I am trying to write a pg_filedump to read dead rows in pgsql, I am relatively new to postgresql and have been trying to find out how to write and compile a pg_filedump to help display dead rows. Please can anyone help me i) Explain how i can write a pg_filedump to display dead rows or how to write pg_filedump ii) Explain how to compile a pg_filedump Thanks -- View this message in context: http://postgresql.1045698.n5.nabble.com/How-to-Make-a-pg-filedump-tp4264125p4264125.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 2nd Level Buffer Cache
On Tue, Mar 22, 2011 at 3:53 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Mar 22, 2011 at 11:24 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Fri, Mar 18, 2011 at 9:19 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Mar 18, 2011 at 11:14 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Maybe the thing to focus on first is the oft-discussed benchmark farm (similar to the build farm), with a good mix of loads, so that the impact of changes can be better tracked for multiple workloads on a variety of platforms and configurations. Without something like that it is very hard to justify the added complexity of an idea like this in terms of the performance benefit gained. A related area that could use some looking at is why performance tops out at shared_buffers ~8GB and starts to fall thereafter. Under what circumstances does this happen? Can a simple pgbench -S with a large scaling factor elicit this behavior? To be honest, I'm mostly just reporting what I've heard Greg Smith say on this topic. I don't have any machine with that kind of RAM. I can sponsor a few hours (say 10) of one High-memory on-demand Quadruple Extra Large instance (26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each), 1690 GB of local instance storage, 64-bit platform). That's the largest memory AWS has. Let me know if I can help. Regards, -- Gurjeet Singh EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: [HACKERS] Pre-set Hint bits/VACUUM FREEZE on data load..?
On Thu, Mar 24, 2011 at 23:47, Stephen Frost sfr...@snowman.net wrote: I'd be happy with a data loading mode that even disallowed subtransactions if necessary to achieve the write-once (well, plus WAL if you're archiving) operation... Note that there's already an extension on pgFoundry for a data loading mode, have you looked at pg_bulkload? http://pgbulkload.projects.postgresql.org/ Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI bug?
hi, YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote: thanks for quickly fixing problems. Thanks for the rigorous testing. :-) i tested the later version (a2eb9e0c08ee73208b5419f5a53a6eba55809b92) and only errors i got was out of shared memory. i'm not sure if it was caused by SSI activities or not. PG_DIAG_SEVERITY: WARNING PG_DIAG_SQLSTATE: 53200 PG_DIAG_MESSAGE_PRIMARY: out of shared memory PG_DIAG_SOURCE_FILE: shmem.c PG_DIAG_SOURCE_LINE: 190 PG_DIAG_SOURCE_FUNCTION: ShmemAlloc PG_DIAG_SEVERITY: ERROR PG_DIAG_SQLSTATE: 53200 PG_DIAG_MESSAGE_PRIMARY: out of shared memory PG_DIAG_SOURCE_FILE: dynahash.c PG_DIAG_SOURCE_LINE: 925 PG_DIAG_SOURCE_FUNCTION: hash_search_with_hash_value Nor am I. Some additional information would help. (1) Could you post the non-default configuration settings? none. it can happen with just initdb+createdb'ed database. (2) How many connections are in use in your testing? 4. (3) Can you give a rough categorization of how many of what types of transactions are in the mix? all transactions are SERIALIZABLE. no transactions are with READ ONLY. (but some of them are actually select-only.) (4) Are there any long-running transactions? no. (5) How many of these errors do you get in what amount of time? once it start happening, i see them somehow frequently. (6) Does the application continue to run relatively sanely, or does it fall over at this point? my application just exits on the error. if i re-run the application without rebooting postgres, it seems that i will get the error sooner than the first run. (but it might be just a matter of luck) (7) The message hint would help pin it down, or a stack trace at the point of the error would help more. Is it possible to get either? Looking over the code, it appears that the only places that SSI could generate that error, it would cancel that transaction with the hint You might need to increase max_pred_locks_per_transaction. and otherwise allow normal processing. no message hints. these errors are not generated by SSI code, at least directly. (please look at PG_DIAG_SOURCE_xxx in the above log.) YAMAMOTO Takashi Even with the above information it may be far from clear where allocations are going past their maximum, since one HTAB could grab more than its share and starve another which is staying below its maximum. I'll take a look at the possibility of adding a warning or some such when an HTAB expands past its maximum size. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Set hint bits upon eviction from BufMgr
Maybe I'm being overly simplistic or incorrect here, but I was thinking that there might be a route to reducing hint bit impact to the main sufferers of the feature without adding too much pain in the general case. I'm unfortunately convinced there is no getting rid of them -- in fact their utility will become even more apparent with faster storage and the pendulum of optimization swings back to the cpu side. My idea is to reserve a bit in the page header, say PD_ALL_SAME_XMIN that indicates all the tuples are from the same transaction and set it when the first insertion tuple hits the page and unset it when any tuple is added from another xmin/touched/deleted. The point here is to set up a cheap check at the page level that we can make when a page is getting evicted from the bufmgr. If the bit is set, we grab off the xmin of the first tuple on the page and test it for visibility (assuming the hint bit is not already set). If we get a thumbs up on the transaction, we can look the page and set all tuple hints as during the page evict/sync process. We don't worry about logging/crash safety on the 'all same' hint because it's only interesting to this bufmgr check (it can even be cleared when page is loaded). Without this bit, the only way to set hint bits going during bufmgr eviction is to do a visibility check on every tuple, which would probably be prohibitively expensive. Since OLTP environments would rarely see this bit, they would not have to pay for the check. Also, we can maybe tweak the bufmgr to prefer not to evict pages with this bit set if it's known they are not yet written out to primary storage. Maybe this impossible or not logical...just thinking out loud. Anyways, if this actually works, shared buffers can start to play a role of mitigating hint bit i/o as long as the transaction resolves before pages start jumping out into storage. If you couple this with a facility to do bulk loads that break up transactions on regular intervals, you have a good shot at getting all your hint bits written out properly in large load situation. You might be able to do similar tricks with deletes -- I haven't thought about that. Also there might be some interplay with vacuum or some other deal breaker -- curious to see if I have something worth further thought here. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: making write location work (was: Efficient transaction-controlled synchronous replication)
On Fri, Mar 25, 2011 at 5:03 AM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Mar 25, 2011 at 6:11 AM, Robert Haas robertmh...@gmail.com wrote: On Thu, Mar 24, 2011 at 2:28 PM, Simon Riggs si...@2ndquadrant.com wrote: The protocol supports sending two values, so two are displayed. If you wish to remove one from the display then that only makes sense if you also prevent the protocol from supporting two values. There is no benefit in doing that, so why do it? We are going to put that back in 9.2 if you remove it now. Why not leave it, so we don't need to rewrite all the monitoring tools that will use this view? What are you planning to use write_location for? BTW, I'm thinking to add recv_location (not write_location) in 9.2 to support another sync rep mode which makes transactions wait until the standby has received (not fsync'd) the WAL. You're planning the same? If we're going to put it back in 9.2, then we shouldn't remove it now. We should just make it work. It's a three line patch. If 9.2 is going to meaningfully distinguish between write location and flush location, then we may as well do the same thing in 9.1. Then we'll be ahead of the game: not only will the view have the same columns in both releases, but they'll actually have the same semantics in both releases. +1 I think we have adequate consensus on this topic, so committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set hint bits upon eviction from BufMgr
On Mar 25, 2011, at 9:52 AM, Merlin Moncure wrote: Without this bit, the only way to set hint bits going during bufmgr eviction is to do a visibility check on every tuple, which would probably be prohibitively expensive. Since OLTP environments would rarely see this bit, they would not have to pay for the check. IIRC one of the biggest costs is accessing the CLOG, but what if the bufmgr.c/bgwriter didn't use the same CLOG lookup mechanism as backends did? Unlike when a backend is inspecting visibility, it's not necessary for something like bgwriter to know exact visibility as long as it doesn't mark something as visible when it shouldn't. If it uses a different CLOG caching/accessing method that lags behind the real CLOG then the worst-case scenario is that there's a delay on setting hint bits. But getting grwiter to dothis would likely still be a huge win over forcing backends to worry about it. It's also possible that the visibility check itself could be simplified. BTW, I don't think you want to play these games when a backend is evicting a page because you'll be slowing a real backend down. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Transactional DDL, but not Serializable
Greetings, We have a curious situation, consider this: Process 1: BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; CRETE TABLE table1 (i integer); INSERT INTO table1 VALUES (13); Process 2: BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; CREATE TABLE table2 (i integer); INSERT INTO table2 VALUES (123); COMMIT; Process 1: SELECT * FROM pg_class WHERE relname = 'table2'; -- no rows returned SELECT * FROM table2; -- works?! but at least no records returned INSERT INTO table2 VALUES (456); -- also works.. now we have a tuple in the table which appears to -- have been added before the table existed.. COMMIT; This happens, of course, because we use SysCache to look up table names to Oids and that uses SnapshotNow. In my view, this violates the basic principle of least suprise and means that while we have transaction DDL, but it's not really serializable (no, I don't particularly care about that). What I do worry about is that the bulk load discussion going on could be shot down because of this. We won't let the earlier transaction see any records in the table today because those tuples have an xmin later, but if we were to insert those tuples with the frozen XID (as I proposed in the other thread) then they'd be visible. I don't believe fixing this would be terribly difficult and, I believe, would be similar to what we've done else where (eg: with indexes)- basically, add a column to pg_class with the 'createdxmin' and then compare that against our transaction whenever we're doing table lookups. Thoughts? Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Pre-set Hint bits/VACUUM FREEZE on data load..?
On Mar 24, 2011, at 4:42 PM, Greg Stark wrote: On Thu, Mar 24, 2011 at 9:39 PM, Greg Stark gsst...@mit.edu wrote: We could conceivably deal with that by not setting the frozenxid but setting the hint bit for those tuples and having a documented special case that if the hint bit is set but it's the same xid as your own you have to treat it as not-committed. Or I suppose we could set the frozenxid but maintain a hash table of relations that we use to remember which relations we created in this transaction and are treating this way. For any table in that hash table we ignore the xmin and just look at cmin/cmax. I'm not sure this solves the cases of subtransactions -- but perhaps we just wouldn't store frozenxid if we're in a subtransaction. And I'm not sure we have access to the relation id when we're doing visibility checks. I think we do. This would involve no serious ugliness in the actual on-disk storage and it would make data loads a write-once operation which is the holy grail :) I have an even broader idea... If we had the ability to specify in a transaction exactly what tables it was going to read from, enforce that it only read from those tables, and published that information to other backends, vacuum could then ignore long running transactions that are guaranteed not to hit the table they're vacuuming. This would eliminate a large pain-point for anyone that has a high-velocity table in an environment that also has to support longer running transactions. If we had that mechanism you could also make use of it for this idea; you would specify that your transaction will never read from the new table, which means that any data inserted into that table can be pre-frozen (assuming we make sure that no one else can see that table exists yet). -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactional DDL, but not Serializable
Stephen Frost sfr...@snowman.net writes: I don't believe fixing this would be terribly difficult and, I believe, would be similar to what we've done else where (eg: with indexes)- basically, add a column to pg_class with the 'createdxmin' and then compare that against our transaction whenever we're doing table lookups. Making DDL serializable is *not* simple, and half-baked hacks won't make that situation better ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \dt and table size
Excerpts from Pavel Stehule's message of vie mar 25 02:48:49 -0300 2011: 2011/3/24 Robert Haas robertmh...@gmail.com: On Wed, Mar 23, 2011 at 4:50 PM, Pavel Stehule pavel.steh...@gmail.com wrote: can we enhance a detail for table and show more accurate numbers? table size: xxx toast size: xxx indexes size: xxx Only if we don't mind going beyond 80 columns. sure, it is on new lines That could get very long ... are you proposing something like \d++ ? -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 2nd Level Buffer Cache
On Mar 25, 2011, at 10:07 AM, Gurjeet Singh wrote: On Tue, Mar 22, 2011 at 3:53 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Mar 22, 2011 at 11:24 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Fri, Mar 18, 2011 at 9:19 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Mar 18, 2011 at 11:14 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Maybe the thing to focus on first is the oft-discussed benchmark farm (similar to the build farm), with a good mix of loads, so that the impact of changes can be better tracked for multiple workloads on a variety of platforms and configurations. Without something like that it is very hard to justify the added complexity of an idea like this in terms of the performance benefit gained. A related area that could use some looking at is why performance tops out at shared_buffers ~8GB and starts to fall thereafter. Under what circumstances does this happen? Can a simple pgbench -S with a large scaling factor elicit this behavior? To be honest, I'm mostly just reporting what I've heard Greg Smith say on this topic. I don't have any machine with that kind of RAM. I can sponsor a few hours (say 10) of one High-memory on-demand Quadruple Extra Large instance (26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each), 1690 GB of local instance storage, 64-bit platform). That's the largest memory AWS has. Related to that... after talking to Greg Smith at PGEast last night, he felt it would be very valuable just to profile how much time is being spent waiting/holding the freelist lock in a real environment. I'm going to see if we can do that on one of our slave databases. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \dt and table size
2011/3/25 Alvaro Herrera alvhe...@commandprompt.com: Excerpts from Pavel Stehule's message of vie mar 25 02:48:49 -0300 2011: 2011/3/24 Robert Haas robertmh...@gmail.com: On Wed, Mar 23, 2011 at 4:50 PM, Pavel Stehule pavel.steh...@gmail.com wrote: can we enhance a detail for table and show more accurate numbers? table size: xxx toast size: xxx indexes size: xxx Only if we don't mind going beyond 80 columns. sure, it is on new lines That could get very long ... are you proposing something like \d++ ? \d++ is good idea. I don't thing so it's necessary for detail about sizes. But it can be used for super detail: * sizes of all indexes * statistics of usage * statistics of indexes maybe - it is just idea. Pavel -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to Make a pg_filedump
Excerpts from aaronenabs's message of vie mar 25 10:43:49 -0300 2011: Hi All I am trying to write a pg_filedump to read dead rows in pgsql, I am relatively new to postgresql and have been trying to find out how to write and compile a pg_filedump to help display dead rows. Please can anyone help me i) Explain how i can write a pg_filedump to display dead rows or how to write pg_filedump ii) Explain how to compile a pg_filedump Are you aware that pg_filedump already exists? The source code lives at http://pgfoundry.org/projects/pgfiledump (for 9.0) and http://sources.redhat.com/rhdb (for older releases) Moreover, pg_filedump will display all rows, dead or otherwise. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 2nd Level Buffer Cache
On Thu, Mar 24, 2011 at 7:51 PM, Greg Stark gsst...@mit.edu wrote: On Thu, Mar 24, 2011 at 11:33 PM, Jeff Janes jeff.ja...@gmail.com wrote: I tried under the circumstances I thought were mostly likely to show a time difference, and I was unable to detect a reliable difference in timing between free list and clock sweep. It strikes me that it shouldn't be terribly hard to add a profiling option to Postgres to dump out a list of precisely which blocks of data were accessed in which order. Then it's fairly straightforward to process that list using different algorithms to measure which generates the fewest cache misses. It is pretty easy to get the list by adding a couple elog. To be safe you probably also need to record pins and unpins, as you can't evict a pinned buffer no matter how other-wise eligible it might be. For most workloads you might be able to get away with just assuming that if it is eligible for replacement under any reasonable strategy, than it is very unlikely to still be pinned. Also, if the list is derived from a concurrent environment, then the order of access you see under a particular policy might no longer be the same if a different policy were adopted. But whose work-load would you use to do the testing? The ones I was testing were simple enough that I just know what the access pattern is, the root and 1st level branch blocks are almost always in shared buffer, the leaf and table blocks almost never are. Here my concern was not how to choose which block to replace in a conceptual way, but rather how to code that selection in way that is fast and concurrent and low latency for the latency-sensitive processes. Either method will evict the same blocks, with the exception of differences introduced by race conditions that get resolved differently. A benefit of focusing on the implementation rather than the high level selection strategy is that improvements in implementation are more likely to better carry over to other workloads. My high level conclusions were that the running of the selection is generally not a bottleneck, and in the cases where it was, the bottleneck was due to contention on the LWLock, regardless of what was done under that lock. Changing who does the clock-sweep is probably not meaningful unless it facilitates a lock-strength reduction or other contention reduction. I have also played with simulations of different algorithms for managing the usage_count, and I could get improvements but they weren't big enough or general enough to be very exciting. It was generally the case were if the data size was X, the improvement was maybe 30% over the current, but if the data size was 0.8X or 1.2X, there was no difference. So not very general. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] When and how many times does ExecSetParamPlan executes?
Hello all, I was going through the Expression Evaluator and was trying to understand how the expressions are formed and evaluated. I was informed on the IRC channel that the PARAM nodes are quite important and many well written client applications use PARAMs for sending query to the backend. I found while looking at the ExecEvalParam function that a call is made to the ExecSetParamPlan for the first time when the PARAM is caught in the ExprState tree. I am confused about how many times the call is made to the ExecSetParamPlan funtion. Since the executor runs the expression evaluator for each tuple (trying to find qual satisfaction) while the expression tree is created only once, the possibility (from my viewpoint) is that the ExecSetParamPlan might be called once only; which would be when the first PARAM is encountered during the query execution. OR, it might get called individually for each tuple (when the PARAM node is caught in the tree). So I am confused about which case is it? Does it (ExecSetParamPlan) get called on per-tuple basis or first encounter in the qual/expression evaluation basis? Kindly help. Regards, Vaibhav -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set hint bits upon eviction from BufMgr
On Fri, Mar 25, 2011 at 10:34 AM, Jim Nasby j...@nasby.net wrote: On Mar 25, 2011, at 9:52 AM, Merlin Moncure wrote: Without this bit, the only way to set hint bits going during bufmgr eviction is to do a visibility check on every tuple, which would probably be prohibitively expensive. Since OLTP environments would rarely see this bit, they would not have to pay for the check. IIRC one of the biggest costs is accessing the CLOG, but what if the bufmgr.c/bgwriter didn't use the same CLOG lookup mechanism as backends did? Unlike when a backend is inspecting visibility, it's not necessary for something like bgwriter to know exact visibility as long as it doesn't mark something as visible when it shouldn't. If it uses a different CLOG caching/accessing method that lags behind the real CLOG then the worst-case scenario is that there's a delay on setting hint bits. But getting grwiter to dothis would likely still be a huge win over forcing backends to worry about it. It's also possible that the visibility check itself could be simplified. BTW, I don't think you want to play these games when a backend is evicting a page because you'll be slowing a real backend down. Well, I'm not so sure -- as noted above, you only pay for the check above when all the records in a page are new, and only once per page, not once per tuple. Basically, only when you are bulk jamming records through the buffers. The amoritized cost of the clog lookup is going to be near zero (maybe you could put a fuse in that would get tripped if there weren't enough tuples in the page to justify the check). If you are bulk loading more data than you have shared buffers, then you get zero benefit. However, you might having the makings of a strategy of dealing with hint bit i/o in user land. (by breaking up transactions, tweaking shared buffers, etc). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Proposal: q-gram GIN and GiST indexes
Hackers, I would like to ask you about currency of the work above. I propose to develop functionality of GIN and GiST q-gram indexes with following features: 1) Handle edit distance (e.g. levenshtein distance) and LIKE/ILIKE queries(using GIN partial match if no full q-grams can be extracted from wildcard) 2) Support of various q 3) Support of positional q-grams in GIN (for more effective edit distance filtering) 4) Various signature size in GiST As you can see, there are some significant differences from pg_trgm. Do you see this functionality useful? If you think this functionality useful, where do you like to see it: separate project, contrib module, core (of course, in the case when code have sufficient quality)? I have stong confidence level about implementability of this project in few month. That's why I could propose this as an GSoC project. With best regards, Alexander Korotkov. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactional DDL, but not Serializable
* Tom Lane (t...@sss.pgh.pa.us) wrote: Making DDL serializable is *not* simple, and half-baked hacks won't make that situation better ... Sorry, that obviously didn't come across clearly (I think I've just been talking to Kevin far too much). I'm not interested in making them serializable. I'd like to not have tables randomly appear during a serializable transaction. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Transactional DDL, but not Serializable
Making DDL serializable is *not* simple, and half-baked hacks won't make that situation better ... That seemed unnecessary. Whether or not you approve of Stephen's solution, he is dealing with a real issue. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactional DDL, but not Serializable
* Joshua Berkus (j...@agliodbs.com) wrote: That seemed unnecessary. Whether or not you approve of Stephen's solution, he is dealing with a real issue. The solution felt, to me at least, to have a lot of parallel to an index's indcheckxmin. We've dealt with this issue there and have a precedent for how to deal with it. Based on discussions with other folks it sounds like we may be forced to do it for constraints also, and I think we'd want to try to deal with all of them in a similar way. Perhaps the current solution for indexes is a hack and should be tossed out with a wholesale replacment which solves all these problems, which would certainly be quite a bit of work, but if that's necessary then let's discuss it and get an idea down on a wiki somewhere about what that should look like. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] 2nd Level Buffer Cache
On Mar 25, 2011, at 11:58 AM, Jim Nasby j...@nasby.net wrote: Related to that... after talking to Greg Smith at PGEast last night, he felt it would be very valuable just to profile how much time is being spent waiting/holding the freelist lock in a real environment. I'm going to see if we can do that on one of our slave databases. Yeah, that would be great. Also, some LWLOCK_STATS output or oprofile output would be definitely be useful. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] When and how many times does ExecSetParamPlan executes?
Vaibhav Kaushal vaibhavkaushal...@gmail.com writes: Hello all, I was going through the Expression Evaluator and was trying to understand how the expressions are formed and evaluated. I was informed on the IRC channel that the PARAM nodes are quite important and many well written client applications use PARAMs for sending query to the backend. I found while looking at the ExecEvalParam function that a call is made to the ExecSetParamPlan for the first time when the PARAM is caught in the ExprState tree. I am confused about how many times the call is made to the ExecSetParamPlan funtion. Indeed ... ExecSetParamPlan has nothing at all to do with Params representing values coming from a client (PARAM_EXTERN parameters). It's used in connection with PARAM_EXEC parameters that represent the outputs of InitPlan subplans (that is, execute-only-once subplans). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactional DDL, but not Serializable
Stephen Frost sfr...@snowman.net writes: Sorry, that obviously didn't come across clearly (I think I've just been talking to Kevin far too much). I'm not interested in making them serializable. I'd like to not have tables randomly appear during a serializable transaction. Well, basically, you can't have that. Example: you have an existing table with primary key, and while you're in the middle of doing some long transaction, somebody else creates a table with a foreign-key reference to the one you're about to do a delete from. Being serializable does not excuse you from the obligation to check for FK violations in that invisible table. It might be acceptable to fail entirely, but not to act as though the table isn't there. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pre-set Hint bits/VACUUM FREEZE on data load..?
On Fri, Mar 25, 2011 at 4:00 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 25.03.2011 09:51, Heikki Linnakangas wrote: I don't think we should put the onus on the user to choose the right data loading mode. If we can reliably detect the cases where it's safe do these tricks, we can transparently apply them when possible. I would be cool with tricks that apply only in narrow cases, as long as we don't require the user to do anything. I believe the conditions for being able to set hint bits immediately at COPY are: 1. The table has been created or truncated in the same transaction 2. We are not in a subtransaction (or the table was created and truncated in the same subtransaction) 3. There are no open portals 4. Executing the COPY doesn't need to run any unsafe code that might access the same table. This includes triggers, check constraints and input functions. An expression is safe if it is immutable. That's not enough... some other transaction could see the data before the transaction commits. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] When and how many times does ExecSetParamPlan executes?
Thanks for the reply Mr. Tom. So, I think that the function ExecSetParamPlan (as the code suggests too) is called _once_ in any plan/expression and that should be mostly for a sub-select query. Kindly correct me if I am wrong. Since I am not able to understand this usecase completely, a sample query which is capable of calling this function (ExecSetParamPlan) could show some light. It would be really kind of you / anyone to show me a query executable through psql which can actually call ExecSetParamPlan and involves the use of a on-disk relation. The reason I am asking for a query which I could run on psql is that I am using gdb to debug (and notice the steps taken by) PG. Regards, Vaibhav On Fri, 2011-03-25 at 14:37 -0400, Tom Lane wrote: Vaibhav Kaushal vaibhavkaushal...@gmail.com writes: Hello all, I was going through the Expression Evaluator and was trying to understand how the expressions are formed and evaluated. I was informed on the IRC channel that the PARAM nodes are quite important and many well written client applications use PARAMs for sending query to the backend. I found while looking at the ExecEvalParam function that a call is made to the ExecSetParamPlan for the first time when the PARAM is caught in the ExprState tree. I am confused about how many times the call is made to the ExecSetParamPlan funtion. Indeed ... ExecSetParamPlan has nothing at all to do with Params representing values coming from a client (PARAM_EXTERN parameters). It's used in connection with PARAM_EXEC parameters that represent the outputs of InitPlan subplans (that is, execute-only-once subplans). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI bug?
On Fri, Mar 18, 2011 at 5:57 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Kevin Grittner kevin.gritt...@wicourts.gov wrote: I'm still looking at whether it's sane to try to issue a warning when an HTAB exceeds the number of entries declared as its max_size when it was created. I think this does it. If nothing else, it might be instructive to use it while testing the SSI patch. Would it make any sense to slip this into 9.1, or should I add it to the first 9.2 CF? I don't think it's too late to commit something like this, but I'm not clear on whether we want it. Is this checking for what should be a can't-happen case, or are these soft limits that we expect to be exceeded from time to time? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI bug?
On Fri, Mar 18, 2011 at 4:51 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Dan Ports d...@csail.mit.edu wrote: I am surprised to see that error message without SSI's hint about increasing max_predicate_locks_per_xact. After reviewing this, I think something along the following lines might be needed, for a start. I'm not sure the Asserts are actually needed; they basically are checking that the current behavior of hash_search doesn't change. I'm still looking at whether it's sane to try to issue a warning when an HTAB exceeds the number of entries declared as its max_size when it was created. I don't see much advantage in changing these to asserts - in a debug build, that will promote ERROR to PANIC; whereas in a production build, they'll cause a random failure somewhere downstream. The HASH_ENTER to HASH_ENTER_NULL changes look like they might be needed, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pre-set Hint bits/VACUUM FREEZE on data load..?
On Fri, Mar 25, 2011 at 7:06 PM, Robert Haas robertmh...@gmail.com wrote: 1. The table has been created or truncated in the same transaction ,,, That's not enough... some other transaction could see the data before the transaction commits. How? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set hint bits upon eviction from BufMgr
On 25.03.2011 16:52, Merlin Moncure wrote: Without this bit, the only way to set hint bits going during bufmgr eviction is to do a visibility check on every tuple, which would probably be prohibitively expensive. I don't think the naive approach of scanning all tuples would be too bad, actually. The hint bits only need to be set once, and it'd be bgwriter shouldering the overhead. The problem with setting hing bits when a buffer is evicted is that it doesn't help with the bulk load case. The hint bits can't be set for a bulk load until the load is finished and the transaction commits. Maybe it would still be worthwhile to have bgwriter set hint bits, to reduce I/O caused by hint bit updates in an OLTP workload, but that's not what people usually complain about. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] WIP: Allow SQL-language functions to reference parameters by parameter name
Attached is a WIP patch that allows SQL-language functions to reference their parameters by name. It uses p_post_columnref_hook, so potentially ambiguous references prefer the column... that seems to make the most sense, both because it avoids a backwards incompatibility, and it conforms with SQL's usual notion of assuming you mean the nearest name. It allows the parameter name to be qualified with the function name, for when you really mean you want the parameter. This being my first foray into the PostgreSQL source, I expect the code is horribly wrong in a variety of ways. Other than that, the regression tests I've been using are a slight modification of existing queries; I imagine they should look measurably different. And finally, conspicuously absent are the documentation changes that will obviously need to accompany a real patch. (This builds passes `make check` on current HEAD, a4425e3) Thanks! Matthew -- matt...@trebex.net diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c new file mode 100644 index ce3b77b..be71fbb *** a/src/backend/executor/functions.c --- b/src/backend/executor/functions.c *** typedef SQLFunctionCache *SQLFunctionCac *** 116,122 --- 116,124 */ typedef struct SQLFunctionParseInfo { + char *name; /* function's name */ Oid *argtypes; /* resolved types of input arguments */ + char **argnames; /* names of input arguments */ int nargs; /* number of input arguments */ Oid collation; /* function's input collation, if known */ } SQLFunctionParseInfo; *** typedef struct SQLFunctionParseInfo *** 124,129 --- 126,133 /* non-export function prototypes */ static Node *sql_fn_param_ref(ParseState *pstate, ParamRef *pref); + static Node *sql_fn_post_column_ref(ParseState *pstate, ColumnRef *cref, Node *var); + static Node *sql_fn_param_ref_num(ParseState *pstate, SQLFunctionParseInfoPtr pinfo, int paramno, int location); static List *init_execution_state(List *queryTree_list, SQLFunctionCachePtr fcache, bool lazyEvalOK); *** prepare_sql_fn_parse_info(HeapTuple proc *** 163,168 --- 167,173 int nargs; pinfo = (SQLFunctionParseInfoPtr) palloc0(sizeof(SQLFunctionParseInfo)); + pinfo-name = NameStr(procedureStruct-proname); /* Save the function's input collation */ pinfo-collation = inputCollation; *** prepare_sql_fn_parse_info(HeapTuple proc *** 201,206 --- 206,241 pinfo-argtypes = argOidVect; } + if (nargs 0) + { + Datum proargnames; + Datum proargmodes; + int argnum; + int n_arg_names; + bool isNull; + + proargnames = SysCacheGetAttr(PROCNAMEARGSNSP, procedureTuple, + Anum_pg_proc_proargnames, + isNull); + if (isNull) + proargmodes = PointerGetDatum(NULL); /* just to be sure */ + + proargmodes = SysCacheGetAttr(PROCNAMEARGSNSP, procedureTuple, + Anum_pg_proc_proargmodes, + isNull); + if (isNull) + proargmodes = PointerGetDatum(NULL); /* just to be sure */ + + n_arg_names = get_func_input_arg_names(proargnames, proargmodes, pinfo-argnames); + + if (n_arg_names nargs) + pinfo-argnames = NULL; + } + else + { + pinfo-argnames = NULL; + } + return pinfo; } *** prepare_sql_fn_parse_info(HeapTuple proc *** 210,223 void sql_fn_parser_setup(struct ParseState *pstate, SQLFunctionParseInfoPtr pinfo) { - /* Later we might use these hooks to support parameter names */ pstate-p_pre_columnref_hook = NULL; ! pstate-p_post_columnref_hook = NULL; pstate-p_paramref_hook = sql_fn_param_ref; /* no need to use p_coerce_param_hook */ pstate-p_ref_hook_state = (void *) pinfo; } /* * sql_fn_param_ref parser callback for ParamRefs ($n symbols) */ --- 245,354 void sql_fn_parser_setup(struct ParseState *pstate, SQLFunctionParseInfoPtr pinfo) { pstate-p_pre_columnref_hook = NULL; ! pstate-p_post_columnref_hook = sql_fn_post_column_ref; pstate-p_paramref_hook = sql_fn_param_ref; /* no need to use p_coerce_param_hook */ pstate-p_ref_hook_state = (void *) pinfo; } + static Node * + sql_fn_resolve_name(ParseState *pstate, SQLFunctionParseInfoPtr pinfo, const char *paramname, int location) + { + int i; + for (i = 0; i pinfo-nargs; i++) + if (pinfo-argnames[i] strcmp(pinfo-argnames[i], paramname) == 0) + return sql_fn_param_ref_num(pstate, pinfo, i + 1, location); + + return NULL; + } + + /* + * sql_fn_post_column_ref parser callback for ColumnRefs + */ + static Node * + sql_fn_post_column_ref(ParseState *pstate, ColumnRef *cref, Node *var) + { + SQLFunctionParseInfoPtr pinfo = (SQLFunctionParseInfoPtr) pstate-p_ref_hook_state; + int names; + Node *field1; + Node *subfield = NULL; + const char *pname; + Node *param; + + if (var != NULL) + return NULL; /* there's a table column, prefer
Re: [HACKERS] Set hint bits upon eviction from BufMgr
On Fri, Mar 25, 2011 at 2:32 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 25.03.2011 16:52, Merlin Moncure wrote: Without this bit, the only way to set hint bits going during bufmgr eviction is to do a visibility check on every tuple, which would probably be prohibitively expensive. I don't think the naive approach of scanning all tuples would be too bad, actually. The hint bits only need to be set once, and it'd be bgwriter shouldering the overhead. The problem with setting hing bits when a buffer is evicted is that it doesn't help with the bulk load case. The hint bits can't be set for a bulk load until the load is finished and the transaction commits. Not the true bulk load case. However, if you can break up a load into multiple transactions and sneak out 10-100mb of pages into the buffer per transaction, you have a good chance of getting most/all the bits out correct before bgwriter eats them up. I was thinking to also teach bgwriter to keep xmin flagged pages in a separate lower priority pool so that it didn't race to them before the transaction had a chance to go in. Long term, I'm imagining more direct transaction control in the backend, either via autonomous transactions, or stored procedures with explicit transaction control, so we don't have to load N gigabytes in a single transaction. Maybe it would still be worthwhile to have bgwriter set hint bits, to reduce I/O caused by hint bit updates in an OLTP workload, but that's not what people usually complain about. well, if bgwriter does it, you lose the ability to bail the clog check via TransactionIdIsCurrentTransactionId, right? If it's done in the bufmgr you at least have a chance to not have to go all the way out. Either way though, you at least have to teach bgwriter to be more cooperative. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Allow SQL-language functions to reference parameters by parameter name
Matthew Draper matt...@trebex.net writes: Attached is a WIP patch that allows SQL-language functions to reference their parameters by name. It uses p_post_columnref_hook, so potentially ambiguous references prefer the column... that seems to make the most sense, both because it avoids a backwards incompatibility, and it conforms with SQL's usual notion of assuming you mean the nearest name. Personally I'd vote for *not* having any such dangerous semantics as that. We should have learned better by now from plpgsql experience. I think the best idea is to throw error for ambiguous references, period. That means you do need ways to disambiguate in both directions. For column references you can just qualify with the table name/alias. If the parameter reference is intended, allow qualification with the function name. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] When and how many times does ExecSetParamPlan executes?
Vaibhav Kaushal vaibhavkaushal...@gmail.com writes: So, I think that the function ExecSetParamPlan (as the code suggests too) is called _once_ in any plan/expression and that should be mostly for a sub-select query. Kindly correct me if I am wrong. Since I am not able to understand this usecase completely, a sample query which is capable of calling this function (ExecSetParamPlan) could show some light. It would be really kind of you / anyone to show me a query executable through psql which can actually call ExecSetParamPlan and involves the use of a on-disk relation. regression=# explain verbose select *, (select sum(f1) from int4_tbl) ss from int8_tbl; QUERY PLAN - Seq Scan on public.int8_tbl (cost=1.07..2.12 rows=5 width=16) Output: int8_tbl.q1, int8_tbl.q2, $0 InitPlan 1 (returns $0) - Aggregate (cost=1.06..1.07 rows=1 width=4) Output: sum(int4_tbl.f1) - Seq Scan on public.int4_tbl (cost=0.00..1.05 rows=5 width=4) Output: int4_tbl.f1 (7 rows) $0 here represents the PARAM_EXEC Param. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI bug?
Robert Haas robertmh...@gmail.com writes: On Fri, Mar 18, 2011 at 5:57 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: I'm still looking at whether it's sane to try to issue a warning when an HTAB exceeds the number of entries declared as its max_size when it was created. I don't think it's too late to commit something like this, but I'm not clear on whether we want it. We do *not* want that. Up to now, I believe the lockmgr's lock table is the only shared hash table that is expected to grow past the declared size; that can happen anytime a session exceeds max_locks_per_transaction, which we consider to be only a soft limit. I don't know whether SSI has introduced any other hash tables that behave similarly. (If it has, we might want to rethink the amount of slop space we leave in shmem...) There might perhaps be some value in adding a warning like this if it were enabled per-table (and not enabled by default). But I'd want to see a specific reason for it, not just let's see if we can scare users with a WARNING appearing out of nowhere. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Allow SQL-language functions to reference parameters by parameter name
2011/3/25 Tom Lane t...@sss.pgh.pa.us: Matthew Draper matt...@trebex.net writes: Attached is a WIP patch that allows SQL-language functions to reference their parameters by name. It uses p_post_columnref_hook, so potentially ambiguous references prefer the column... that seems to make the most sense, both because it avoids a backwards incompatibility, and it conforms with SQL's usual notion of assuming you mean the nearest name. Personally I'd vote for *not* having any such dangerous semantics as that. We should have learned better by now from plpgsql experience. I think the best idea is to throw error for ambiguous references, period. That means you do need ways to disambiguate in both directions. For column references you can just qualify with the table name/alias. If the parameter reference is intended, allow qualification with the function name. I agree with Tom. There can be GUC for controlling use or don't use a parameter names. I am for GUC, because there will be a bilion conflicts. But a talk about priorities - sql identifier or parameter is useless. Regards Pavel Stehule regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set hint bits upon eviction from BufMgr
On Fri, Mar 25, 2011 at 3:32 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 25.03.2011 16:52, Merlin Moncure wrote: Without this bit, the only way to set hint bits going during bufmgr eviction is to do a visibility check on every tuple, which would probably be prohibitively expensive. I don't think the naive approach of scanning all tuples would be too bad, actually. The hint bits only need to be set once, and it'd be bgwriter shouldering the overhead. I was thinking the same thing. The only thing I'm worried about is whether it'd make the bgwriter less responsive; we already have some issues in that department. The problem with setting hing bits when a buffer is evicted is that it doesn't help with the bulk load case. The hint bits can't be set for a bulk load until the load is finished and the transaction commits. Maybe it would still be worthwhile to have bgwriter set hint bits, to reduce I/O caused by hint bit updates in an OLTP workload, but that's not what people usually complain about. Yeah. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Allow SQL-language functions to reference parameters by parameter name
Pavel Stehule pavel.steh...@gmail.com writes: 2011/3/25 Tom Lane t...@sss.pgh.pa.us: I think the best idea is to throw error for ambiguous references, period. There can be GUC for controlling use or don't use a parameter names. I am for GUC, because there will be a bilion conflicts. But a talk about priorities - sql identifier or parameter is useless. GUCs are not tremendously helpful for problems such as this. If we actually wanted to preserve full backwards compatibility, we'd need to think of a way to mark SQL functions per-function as to what to do. But I don't think that's necessary. Up to now there's been relatively little use for naming the parameters of SQL functions, so I think there will be few conflicts in the field if we just change the behavior. The mess and complication we have for the comparable behavior in plpgsql seemed necessary because of the number of existing usages that would certainly break --- but I doubt that SQL-language functions will have anywhere near as big a problem. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pre-set Hint bits/VACUUM FREEZE on data load..?
On Fri, Mar 25, 2011 at 3:29 PM, Greg Stark gsst...@mit.edu wrote: On Fri, Mar 25, 2011 at 7:06 PM, Robert Haas robertmh...@gmail.com wrote: 1. The table has been created or truncated in the same transaction ,,, That's not enough... some other transaction could see the data before the transaction commits. How? Hmm. Maybe I'm wrong. I guess the XID would still be in the MVCC snapshot of every other running transaction, so maybe it would be OK. *scratches head* -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI bug?
On Fri, Mar 25, 2011 at 4:06 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Mar 18, 2011 at 5:57 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: I'm still looking at whether it's sane to try to issue a warning when an HTAB exceeds the number of entries declared as its max_size when it was created. I don't think it's too late to commit something like this, but I'm not clear on whether we want it. We do *not* want that. Up to now, I believe the lockmgr's lock table is the only shared hash table that is expected to grow past the declared size; that can happen anytime a session exceeds max_locks_per_transaction, which we consider to be only a soft limit. I don't know whether SSI has introduced any other hash tables that behave similarly. (If it has, we might want to rethink the amount of slop space we leave in shmem...) There might perhaps be some value in adding a warning like this if it were enabled per-table (and not enabled by default). But I'd want to see a specific reason for it, not just let's see if we can scare users with a WARNING appearing out of nowhere. What about a system view that shows declared actual sizes of all these hash tables? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pre-set Hint bits/VACUUM FREEZE on data load..?
On 25.03.2011 22:21, Robert Haas wrote: On Fri, Mar 25, 2011 at 3:29 PM, Greg Starkgsst...@mit.edu wrote: On Fri, Mar 25, 2011 at 7:06 PM, Robert Haasrobertmh...@gmail.com wrote: 1. The table has been created or truncated in the same transaction ,,, That's not enough... some other transaction could see the data before the transaction commits. How? Hmm. Maybe I'm wrong. I guess the XID would still be in the MVCC snapshot of every other running transaction, so maybe it would be OK. *scratches head* Right. You can't mark the tuples as frozen, but you can set the xmin-committed hint bits. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Problem with streaming replication, backups, and recovery (9.0.x)
hi, So, I hit a strange problem with Streaming Replication, that I cannot explain. Executive summary: when using hot backup made on straming replication slave, sometimes (depending on load) generated backup is created in such a way, that while it can be brough back as standalone Pg, and it can be brough back as streaming slave, such slave (created off the backup) cannot be promoted to standalone. Disclaimer: I know that making hot backups on slave is not the suggested way, yet I was doing it without any problem on earlier Postgres versions (8.2,8.3,8.4), and do not have this problem with backups generated from the masters, so the problem I hit now is so peculiar, that I thought that it might be just an effect of some underlying, more serious, condition. Longer explanation: First, let me explain how omnipitr-backup-slave works, because it's the tool that I use to make backups on slave. Steps that it does: 1. gets pg_controldata for $PGDATADIR 2. compresses $PGDATA to data tar.gz, putting inside backup_label file, which contains: START WAL LOCATION: %s (file %s) CHECKPOINT LOCATION: %s START TIME: %s LABEL: OmniPITR_Slave_Hot_Backup where START WAL LOCATION uses value from Latest checkpoint's REDO location from pg_controldata from step #1, CHECKPOINT LOCATION is taken from Latest checkpoint location from pg_controldata taken in step #1, and START TIME is based on current (before starting compression of $PGDATA) timestamp. 3. gets another copy of pg_controldata for $PGDATA 4. repeats step #3 until value in Latest checkpoint location will change 5. wait until file that contains WAL location, from Minimum recovery ending location from pg_controldata from step #4, will be available 6. creates .backup file which is named based on START WAL LOCATION (from step #2), and contains the same lines as backup_label file from step #2, plus two more lines: STOP WAL LOCATION: %s (file %s) STOP TIME: %s where STOP WAL LOCATION is taken from Minimum recovery ending location from pg_controldata from step #4, and STOP time is current timestamp as of before starting compression of wal segments. 7. compresses xlogs plus the .backup file generated in step #6. This approach worked for a long time on various hosts, systems, versions, etc. But now, it fails. I'm using for tests PostgreSQL 9.0.2 and 9.0.3 (mostly 9.0.2 as this is the most critical for me, but I tested on 9.0.3 too, and the problem is the same), on linux (ubuntu), 64bit. I do the procedure as always, and it produces backup. With this backup I can setup new standalone server, and it works. I can also setup streaming slave, and it also works, but when I create trigger file to promote this slave to master it fails with error: 2011-03-24 21:01:58.051 CET @ 9680 LOG: trigger file found: /home/depesz/slave2/finish.recovery 2011-03-24 21:01:58.051 CET @ 9930 FATAL: terminating walreceiver process due to administrator command 2011-03-24 21:01:58.151 CET @ 9680 LOG: redo done at 0/1F58 2011-03-24 21:01:58.151 CET @ 9680 LOG: last completed transaction was at log time 2011-03-24 20:58:25.836333+01 2011-03-24 21:01:58.238 CET @ 9680 FATAL: WAL ends before consistent recovery point Which is interesting, because this particular backup was done using .backup file containing: START WAL LOCATION: 0/A20 (file 0001000A) STOP WAL LOCATION: 0/12C9D7E8 (file 00010012) CHECKPOINT LOCATION: 0/B803050 START TIME: 2011-03-24 20:52:46 CET STOP TIME: 2011-03-24 20:53:41 CET LABEL: OmniPITR_Slave_Hot_Backup Which means that minimum recovery ending location was in fact reached (it was on 0/12C9D7E8, and recovery continued till 0/1F58). I have set of script that can be used to replicate the problem, but the test takes some time (~ 30 minutes). What's most interesting is that this problem does not happen always. It happens only when there was non-trivial load on db server - this is in my tests where both master and slave are the same machine. I think that in normal cases load on slave is more important. If anyone would be able to help, I can give you access to test machine and/or provide set of script which replicate (usually) the problem. Alternatively - if there is anything I can do to help you solve the mystery - I'd be very willing to. Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactional DDL, but not Serializable
* Tom Lane (t...@sss.pgh.pa.us) wrote: Well, basically, you can't have that. Example: you have an existing table with primary key, and while you're in the middle of doing some long transaction, somebody else creates a table with a foreign-key reference to the one you're about to do a delete from. Being serializable does not excuse you from the obligation to check for FK violations in that invisible table. It might be acceptable to fail entirely, but not to act as though the table isn't there. That's an excellent example and point. Is there a 'right' answer (with regard to the SQL spec, what other databases do, etc)? When you go to delete a record from the existing table you could get a FK violation due to the invisible table, which could end up being rolled back and removed. It seems like the semantics around this would call for the adding-FK transaction to be treated as if the table did already exist and then handle this case as we would if there wasn't any DDL involved. Another approach might be to wait till commit to check the FK, but that'd probably be considered unkind. If the spec doesn't dictate anything and/or we can't find anyone else's semantics that make sense, I suppose we'll need to define our own. To that end, perhaps we should put up something on a wiki or similar to start capturing these and considering what the 'right' answer would be. Apologies for my ignorance on this. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] WIP: Allow SQL-language functions to reference parameters by parameter name
2011/3/25 Tom Lane t...@sss.pgh.pa.us: Pavel Stehule pavel.steh...@gmail.com writes: 2011/3/25 Tom Lane t...@sss.pgh.pa.us: I think the best idea is to throw error for ambiguous references, period. There can be GUC for controlling use or don't use a parameter names. I am for GUC, because there will be a bilion conflicts. But a talk about priorities - sql identifier or parameter is useless. GUCs are not tremendously helpful for problems such as this. If we actually wanted to preserve full backwards compatibility, we'd need to think of a way to mark SQL functions per-function as to what to do. But I don't think that's necessary. Up to now there's been relatively little use for naming the parameters of SQL functions, so I think there will be few conflicts in the field if we just change the behavior. The mess and complication we have for the comparable behavior in plpgsql seemed necessary because of the number of existing usages that would certainly break --- but I doubt that SQL-language functions will have anywhere near as big a problem. should be nice some converting tool for pg_dump or pg_upgrade. It can dump SQL functions with only qualified identifiers. Pavel regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] maximum digits for NUMERIC
On Fri, Mar 25, 2011 at 08:46:17AM +, Gianni Ciolli wrote: On Sun, Mar 20, 2011 at 08:14:21PM -0400, Noah Misch wrote: Agreed. The documentation is suggestive of this limit: # CREATE TABLE n (c numeric(1001,0)); ERROR: NUMERIC precision 1001 must be between 1 and 1000 LINE 1: CREATE TABLE n (c numeric(1001,0)); However, that's indeed just a limit of the numeric typmod representation, not the data type itself. An unqualified numeric column hits no such limit. For the record, the limits I found from my tests are: * 2^17 - 1 maximum total digits * 2^14 - 1 maximum fractional digits (I did tests as I couldn't extract any obvious limit from the source code of numeric.c) The attached patch resumes this short discussion. Best regards, Dr. Gianni Ciolli - 2ndQuadrant Italia PostgreSQL Training, Services and Support gianni.cio...@2ndquadrant.it | www.2ndquadrant.it diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index 0bb6594..259523d 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -476,7 +476,7 @@ /indexterm para - The type typenumeric/type can store numbers with up to 1000 + The type typenumeric/type can store numbers with up to 131071 digits of precision and perform calculations exactly. It is especially recommended for storing monetary amounts and other quantities where exactness is required. However, arithmetic on @@ -493,7 +493,7 @@ the whole number, that is, the number of digits to both sides of the decimal point. So the number 23.5141 has a precision of 6 and a scale of 4. Integers can be considered to have a scale of - zero. + zero. The maximum allowed scale is 16383. /para para @@ -525,6 +525,15 @@ NUMERIC explicitly.) /para +note + para + The maximum allowed precision when explicitely specified in the + type declaration is 1000; otherwise the typeNUMERIC/type + data type supports a maximum precision of 131071 and a maximum + scale of 16383. + /para +/note + para If the scale of a value to be stored is greater than the declared scale of the column, the system will round the value to the specified -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactional DDL, but not Serializable
Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: Well, basically, you can't have that. Example: you have an existing table with primary key, and while you're in the middle of doing some long transaction, somebody else creates a table with a foreign-key reference to the one you're about to do a delete from. Being serializable does not excuse you from the obligation to check for FK violations in that invisible table. It might be acceptable to fail entirely, but not to act as though the table isn't there. That's an excellent example and point. Is there a 'right' answer (with regard to the SQL spec, what other databases do, etc)? I'm not aware that anybody's got an amazingly satisfactory solution. PG's answer is of course to use up-to-the-minute DDL regardless of what the transaction might see for other purposes, which certainly has got disadvantages if you're hoping for truly serializable behavior. But I'm not sure there's a better answer. You could make an argument for failing any serializable transaction that's affected by DDL changes that happen after it started. I don't know whether that cure would be worse than the disease. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactional DDL, but not Serializable
Tom Lane wrote: Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: Well, basically, you can't have that. Example: you have an existing table with primary key, and while you're in the middle of doing some long transaction, somebody else creates a table with a foreign-key reference to the one you're about to do a delete from. Being serializable does not excuse you from the obligation to check for FK violations in that invisible table. It might be acceptable to fail entirely, but not to act as though the table isn't there. That's an excellent example and point. Is there a 'right' answer (with regard to the SQL spec, what other databases do, etc)? I'm not aware that anybody's got an amazingly satisfactory solution. PG's answer is of course to use up-to-the-minute DDL regardless of what the transaction might see for other purposes, which certainly has got disadvantages if you're hoping for truly serializable behavior. But I'm not sure there's a better answer. You could make an argument for failing any serializable transaction that's affected by DDL changes that happen after it started. I don't know whether that cure would be worse than the disease. If transaction A commits successfully before transaction B commits, regardless of when transaction B started, and transaction A changes/adds/etc any constraints on the database, then I would expect transaction B to only commit successfully if all of its data changes pass those new/changed constraints. If B were allowed to commit without that being the case, then it would leave the database in an inconsistent state, that is a state where its data doesn't conform to its constraints. A database should always be consistent on transaction boundaries, at the very least, if not on statement boundaries. As to whether B's failure happens when it tries to commit or happens earlier, based on visibility issues with A's changes, doesn't matter to me so much (do what works best for you/others), but it should fail at some point if it would otherwise cause inconsistencies. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.1 Beta
Judging by the number of new threads about development for 9.2, I think its time we declared 9.1 Beta. I just had a conversation with some Debian developers about how PostgreSQL 9.0 got pulled out of their release because we delayed by 3 weeks. So we missed our slot to deliver useful new features to our very best supporters by 2 years. I really hope that was deliberate. I've never understood why we timebox useful development, yet tweaking is allowed to go on without limit. Personally, I don't see the rationale to allow developers some kind of priority over their input. This tweaking period is essentially a time when insiders can put their votes in, but nobody else can. Beta is where we get feedback from a wider audience. The sooner we declare Beta, the sooner people will test. Then we will have user feedback, bugs to fix etc.. Everybody is very clearly sitting idle. With a longer bug list we will make faster progress to release. We're just wasting time. If we had a hard date for feature freeze, lets have a hard date for Beta of +2 months (next time), and +2.5 months now. (I know +1 month was suggested, well that's just unrealistic). Beta is a great time to resolve difficult decisions, by opening the floor to wider debate and feedback. Delaying beta because we still have unresolved issues is exactly backwards of what we should be doing. Let's hear from a wider audience. Vox populi, vox dei -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] locale operation to be invoked, but no collation was derived (in trigger)
Stefan Huehner ste...@huehner.org writes: first i am not sure how the state of the collation work in current git is supposed to be with all the discussion going on here... but wanted to get out that bug report: create table ad_tab (ad_tab_id varchar(32), name varchar(32)); create function test_trg() RETURNS TRIGGER LANGUAGE plpgsql AS $function$ DECLARE BEGIN --Check tab name starts with a upper case letter IF (not (substr(new.Name,1,1) between 'A' and 'Z')) THEN RAISE EXCEPTION '%', '@TabName1stCharUpper@' ; --OBTG:-2-- END IF; END ; $function$ ; create trigger test_trg after update on test for each row execute procedure test_trg(); insert into test values ('1', 'test'); update test set name = 'test2' where test_id = '1'; ERROR: locale operation to be invoked, but no collation was derived CONTEXT: PL/pgSQL function test_trg line 4 at IF Fixed, thanks for the test case. (I have a feeling I'd better grep the whole backend for other places that are likewise neglecting to set the collation fields in manually-created expression nodes ...) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Allow SQL-language functions to reference parameters by parameter name
On Mar 25, 2011, at 4:20 PM, Tom Lane t...@sss.pgh.pa.us wrote: GUCs are not tremendously helpful for problems such as this. If we actually wanted to preserve full backwards compatibility, we'd need to think of a way to mark SQL functions per-function as to what to do. But I don't think that's necessary. Up to now there's been relatively little use for naming the parameters of SQL functions, so I think there will be few conflicts in the field if we just change the behavior. Oh wow, I don't agree with that at all. People may name the parameters for documentation purposes, and then have things like WHERE foo = $1, foo happening also to be the name associated with $1. Boom! In any case, I think this is 9.2 material. We need to get a beta out the door, and I emphatically think we should be focusing on resolving the issues with features already committed, not committing new ones. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Allow SQL-language functions to reference parameters by parameter name
Robert Haas robertmh...@gmail.com writes: On Mar 25, 2011, at 4:20 PM, Tom Lane t...@sss.pgh.pa.us wrote: But I don't think that's necessary. Up to now there's been relatively little use for naming the parameters of SQL functions, so I think there will be few conflicts in the field if we just change the behavior. Oh wow, I don't agree with that at all. People may name the parameters for documentation purposes, and then have things like WHERE foo = $1, foo happening also to be the name associated with $1. Boom! Well, maybe, but it's not like it's subtle or hard to fix. In any case, I think this is 9.2 material. Oh, of course. It *is* just a WIP patch, anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: making write location work (was: Efficient transaction-controlled synchronous replication)
On Wed, Mar 23, 2011 at 6:22 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Mar 23, 2011 at 12:10 PM, Simon Riggs si...@2ndquadrant.com wrote: Specifically, if we're not going to remove write location, then I think we need to apply something like the attached. The protocol supports different write/fsync values, so the view should display them. That's exactly the point. No its not. Currently, we have a protocol that supports different write and fsync values, but the code as written does not actually ever send a reply at any time when the two values can ever be different. So there is no point in sending both of them. The write location is completely redundant with the fsync location and therefore completely useless. We shouldn't bother sending the value twice, or displaying it twice, if it's absolutely 100% guaranteed to be identical in every case. As of 9.1, we now support other tools that use the protocol, so you cannot assume you know what is being sent, just because one sender has certain characteristics. The point of the patch that I posted is that it restores the previous behavior, where we send an update before flushing WAL and again after flushing WAL. If we do that, then the write location can be ahead of the flush location when we've written but not flushed. If we don't do that, and only send replies after flushing everything, then the two fields are perforce always the same on the master. I don't see that as being a useful behavior, and in fact I think it could be quite confusing. Someone might assume that if we bother to expose both a write_location and a flush_location, they are somehow different. They can be in 9.1 and almost certainly will be in 9.2 -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: making write location work (was: Efficient transaction-controlled synchronous replication)
On Thu, Mar 24, 2011 at 3:22 AM, Robert Haas robertmh...@gmail.com wrote: On Wed, Mar 23, 2011 at 12:10 PM, Simon Riggs si...@2ndquadrant.com wrote: Specifically, if we're not going to remove write location, then I think we need to apply something like the attached. The protocol supports different write/fsync values, so the view should display them. That's exactly the point. Currently, we have a protocol that supports different write and fsync values, but the code as written does not actually ever send a reply at any time when the two values can ever be different. So there is no point in sending both of them. The write location is completely redundant with the fsync location and therefore completely useless. We shouldn't bother sending the value twice, or displaying it twice, if it's absolutely 100% guaranteed to be identical in every case. The point of the patch that I posted is that it restores the previous behavior, where we send an update before flushing WAL and again after flushing WAL. If we do that, then the write location can be ahead of the flush location when we've written but not flushed. If we don't do that, and only send replies after flushing everything, then the two fields are perforce always the same on the master. I don't see that as being a useful behavior, and in fact I think it could be quite confusing. Someone might assume that if we bother to expose both a write_location and a flush_location, they are somehow different. I agree with Robert. It's useless and confusing to send the same location as flush_location as write_location redundantly. We should either remove write_location from the pg_stat_replication view and the protocol at all, or apply the proposed patch. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: making write location work (was: Efficient transaction-controlled synchronous replication)
On Thu, Mar 24, 2011 at 11:45 AM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Mar 24, 2011 at 3:22 AM, Robert Haas robertmh...@gmail.com wrote: On Wed, Mar 23, 2011 at 12:10 PM, Simon Riggs si...@2ndquadrant.com wrote: Specifically, if we're not going to remove write location, then I think we need to apply something like the attached. The protocol supports different write/fsync values, so the view should display them. That's exactly the point. Currently, we have a protocol that supports different write and fsync values, but the code as written does not actually ever send a reply at any time when the two values can ever be different. So there is no point in sending both of them. The write location is completely redundant with the fsync location and therefore completely useless. We shouldn't bother sending the value twice, or displaying it twice, if it's absolutely 100% guaranteed to be identical in every case. The point of the patch that I posted is that it restores the previous behavior, where we send an update before flushing WAL and again after flushing WAL. If we do that, then the write location can be ahead of the flush location when we've written but not flushed. If we don't do that, and only send replies after flushing everything, then the two fields are perforce always the same on the master. I don't see that as being a useful behavior, and in fact I think it could be quite confusing. Someone might assume that if we bother to expose both a write_location and a flush_location, they are somehow different. I agree with Robert. It's useless and confusing to send the same location as flush_location as write_location redundantly. We should either remove write_location from the pg_stat_replication view and the protocol at all, or apply the proposed patch. You may be confused, but that doesn't mean its useless. The protocol supports sending two values, so two are displayed. If you wish to remove one from the display then that only makes sense if you also prevent the protocol from supporting two values. There is no benefit in doing that, so why do it? We are going to put that back in 9.2 if you remove it now. Why not leave it, so we don't need to rewrite all the monitoring tools that will use this view? Just say this in the docs. Currently, standbys return the same value for write and flush locations and so in most cases these values will be the same. It is possible to write a user program that sends replies at different times and in that case the values would differ. In later release these values will differ for standbys. when more options are available. It's almost certain that Magnus will fix the bug in pg_xlogstream (or whatever its called) and a tool will be available, so lets leave this. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.1 Beta
Simon Riggs si...@2ndquadrant.com writes: Judging by the number of new threads about development for 9.2, I think its time we declared 9.1 Beta. I just had a conversation with some Debian developers about how PostgreSQL 9.0 got pulled out of their release because we delayed by 3 weeks. So we missed our slot to deliver useful new features to our very best supporters by 2 years. I really hope that was deliberate. We do *not* make release decisions based on Debian's schedule. Even if we wanted to, going beta is hardly likely to affect their decisions. The correct question is whether we're ready for beta, and I would say the answer is clearly no, unless you have a pretty low standard for what ready for beta means. Perhaps it would be suitable to discuss what the standard for that really ought to be; but I don't agree in the slightest that we ought to decide based on predetermined calendar dates rather than the state of the code. If we had a hard date for feature freeze, lets have a hard date for Beta of +2 months (next time), and +2.5 months now. (I know +1 month was suggested, well that's just unrealistic). Beta is a great time to resolve difficult decisions, by opening the floor to wider debate and feedback. The reason we get wider testing during beta is that people have some confidence (perhaps misplaced) that the database won't eat their data. Releasing alpha-grade code and calling it beta isn't going to get us wider testing ... at least, not more than once. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Allow SQL-language functions to reference parameters by parameter name
On Mar 25, 2011, at 7:45 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, maybe, but it's not like it's subtle or hard to fix. Depends how much of it you have. I've become very skeptical of anything that breaks pg_dump-and-reload-ability. And doubly so now that such problems also mean breaking pg_upgrade after the old cluster has been shut down. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2011 - Mentors? Projects?
Dne 8.3.2011 07:44, Selena Deckelmann napsal(a): Hi! PostgreSQL is applying for GSoC again this year. We're looking for: * Mentors * Project ideas Would you like to mentor? Please let me know! Our application closes on Friday, so please contact me *before* Friday. I've started a wiki page: http://wiki.postgresql.org/wiki/GSoC_2011 Hi, I spoke to a teacher from a local university last week, mainly as we were looking for a place where a local PUG could meet regularly. I realized this could be a good opportunity to head-hunt some students to participate in this GSoC. Are we still interested in new students? I've never been involved in GSoC before, so I'm not quite sure how all this works. Should the student choose a project idea from the wiki, contact one of the mentors, write to pgsql-students or how is that supposed to work? I've noticed the student application deadline is April 8, so it would be a bit rush, I guess. Is that worth a try? regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Allow SQL-language functions to reference parameters by parameter name
Robert Haas robertmh...@gmail.com writes: On Mar 25, 2011, at 7:45 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, maybe, but it's not like it's subtle or hard to fix. Depends how much of it you have. I've become very skeptical of anything that breaks pg_dump-and-reload-ability. This wouldn't break pg_dump scripts, because they disable check_function_bodies. You would get a failure on first *use* of a function, which is something different. Basically my concern here is that in the name of easing a short-term conversion issue, we'll be condemning users to a future of subtle, hard-to-find bugs due to ambiguous names. How many hundreds of reports have we seen about the equivalent problem in plpgsql? You could argue that the frequency of plpgsql issues was at least partly due to having a poor choice of which way to resolve the ambiguity, but I don't think it can be entirely blamed on that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2011 - Mentors? Projects?
Tomas, I spoke to a teacher from a local university last week, mainly as we were looking for a place where a local PUG could meet regularly. I realized this could be a good opportunity to head-hunt some students to participate in this GSoC. Are we still interested in new students? Yes, please! We have had students from Charles University several times before, and would be glad to have more. The wiki page has links to the information about the program. Talk to Zdenek if you have more questions. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] race condition in sync rep
I believe I've figured out why synchronous replication has such terrible performance with fsync=off: it has a nasty race condition. It may happen - if the standby responds very quickly - that the standby acks the commit record and awakens waiters before the committing backend actually begins to wait. There's no cross-check for this: the committing backend waits unconditionally, with no regard to whether the necessary ACK has already arrived. At this point we may be in for a very long wait: another ACK will be required to release waiters, and that may not be immediately forthcoming. I had thought that the next ACK (after at most wal_receiver_status_interval) would do the trick, but it appears to be even worse than that: by making the standby win the race, I was easily able to get the master to hang for over a minute, and it only got released when I committed another transaction. Had I been sufficiently patient, the next checkpoint probably would have done the trick. Of course, with fsync=off on the standby, it's much easier for the standby to win the race. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Allow SQL-language functions to reference parameters by parameter name
On Fri, Mar 25, 2011 at 8:58 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mar 25, 2011, at 7:45 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, maybe, but it's not like it's subtle or hard to fix. Depends how much of it you have. I've become very skeptical of anything that breaks pg_dump-and-reload-ability. This wouldn't break pg_dump scripts, because they disable check_function_bodies. You would get a failure on first *use* of a function, which is something different. Basically my concern here is that in the name of easing a short-term conversion issue, we'll be condemning users to a future of subtle, hard-to-find bugs due to ambiguous names. How many hundreds of reports have we seen about the equivalent problem in plpgsql? You could argue that the frequency of plpgsql issues was at least partly due to having a poor choice of which way to resolve the ambiguity, but I don't think it can be entirely blamed on that. As I've said before, I believe that the root cause of this problem is that using the same syntax for variables and column names is a bad idea in the first place. If we used $foo or ?foo or ${foo} or $.foo or foo!!$#? to mean the parameter called foo, then this would all be a non-issue. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Allow SQL-language functions to reference parameters by parameter name
Tom, Personally I'd vote for *not* having any such dangerous semantics as that. We should have learned better by now from plpgsql experience. I think the best idea is to throw error for ambiguous references, period. As a likely heavy user of this feature, I agree with Tom here. I really don't want the column being silently preferred in SQL functions, when PL/pgSQL functions are throwing an error. I'd end up spending hours debugging this. Also, I don't understand why this would be a dump/reload issue if $1 and $2 continue to work. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication server timeout patch
On Wed, Mar 23, 2011 at 6:33 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 16.03.2011 11:11, Fujii Masao wrote: On Wed, Mar 16, 2011 at 4:49 PM, Fujii Masaomasao.fu...@gmail.com wrote: Agreed. I'll change the patch. Done. I attached the updated patch. I don't much like the API for this. Walsender shouldn't need to know about the details of the FE/BE protocol, pq_putbytes_if_available() seems too low level to be useful. I think a better API would be to have a non-blocking version of pq_putmessage(). We can make the output buffer in pqcomm.c resizeable, so that when the message doesn't fit in the output buffer in pq_putmessage(), the buffer is enlarged instead of trying to flush it. Attached is a patch using that approach. This is a much smaller patch, and easier to understand. I'm not totally happy with the walsender main loop, it seems to work as it is, but the logic has become quite complicated. Ideas welcome on how to simplify that. Heikki, are you planning to commit this, either with or without further revisions? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2011 - Mentors? Projects?
Dne 26.3.2011 02:05, Joshua Berkus napsal(a): Tomas, I spoke to a teacher from a local university last week, mainly as we were looking for a place where a local PUG could meet regularly. I realized this could be a good opportunity to head-hunt some students to participate in this GSoC. Are we still interested in new students? Yes, please! We have had students from Charles University several times before, and would be glad to have more. The wiki page has links to the information about the program. Talk to Zdenek if you have more questions. I know Zdenek was mentoring some students in the previous years, but he's been a bit hard to reach recently. And the deadline is near. I've read some info about the program on a wiki, but I'm not sure what should the students do. Let's say they will read the list of project ideas on the wiki, and they'll choose one or two of them. What should they do next? Should they write to the pgsql-students mailing list? I guess most of the students won't have much experience with PostgreSQL, and most of the ideas is described just very briefly, so they'll need help with the proposal. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.1 Beta
On Fri, Mar 25, 2011 at 6:18 PM, Simon Riggs si...@2ndquadrant.com wrote: I've never understood why we timebox useful development, yet tweaking is allowed to go on without limit. Personally, I don't see the rationale to allow developers some kind of priority over their input. This tweaking period is essentially a time when insiders can put their votes in, but nobody else can. Beta is where we get feedback from a wider audience. I think 9.0 got delayed quite a bit by the fact that we need approximately 347 people to wrap a release, and they all had vacations at different times over the summer. The code was pretty stable by July 1; I think we could easily have released in August if we had a slightly less awkward process for getting these things out the door. The sooner we declare Beta, the sooner people will test. Then we will have user feedback, bugs to fix etc.. Everybody is very clearly sitting idle. With a longer bug list we will make faster progress to release. We're just wasting time. I can't resist observing that if you want beta to happen sooner, it would be better not to commit major and largely unreviewed patches three weeks after the end of the last CommitFest. Before you insist that it was reviewed, the version that was actually committed bore so little resemblance to the versions that were posted earlier that any earlier review that was done was basically meaningless in terms of ensuring that the final product was bug free, and it wasn't and isn't. I complained *repeatedly* about the need to get both collation support and sync rep finished and committed sooner, for exactly this reason. We are now reaping the entirely predictable fruit of having failed to make that happen. But for those two patches, we would likely be in beta already, or darn close. http://archives.postgresql.org/pgsql-hackers/2010-12/msg01257.php http://archives.postgresql.org/pgsql-hackers/2011-01/msg01209.php http://archives.postgresql.org/pgsql-hackers/2011-01/msg02811.php http://archives.postgresql.org/pgsql-hackers/2011-02/msg00438.php -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.1 Beta
On Fri, Mar 25, 2011 at 8:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: The correct question is whether we're ready for beta, and I would say the answer is clearly no, unless you have a pretty low standard for what ready for beta means. Perhaps it would be suitable to discuss what the standard for that really ought to be; but I don't agree in the slightest that we ought to decide based on predetermined calendar dates rather than the state of the code. Agreed. I think some discussion of which of the things on the open item lists need to be done before beta might be helpful, and we ought to add any items that are not there but are blockers. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Allow SQL-language functions to reference parameters by parameter name
On Mar 25, 2011, at 9:22 PM, Joshua Berkus j...@agliodbs.com wrote: Tom, Personally I'd vote for *not* having any such dangerous semantics as that. We should have learned better by now from plpgsql experience. I think the best idea is to throw error for ambiguous references, period. As a likely heavy user of this feature, I agree with Tom here. I really don't want the column being silently preferred in SQL functions, when PL/pgSQL functions are throwing an error. I'd end up spending hours debugging this. Also, I don't understand why this would be a dump/reload issue if $1 and $2 continue to work. Because an identifier that previously referred unambiguously to a column might now be ambiguous, if there is a parameter with the same name. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Allow SQL-language functions to reference parameters by parameter name
Robert Haas robertmh...@gmail.com writes: On Mar 25, 2011, at 9:22 PM, Joshua Berkus j...@agliodbs.com wrote: Also, I don't understand why this would be a dump/reload issue if $1 and $2 continue to work. Because an identifier that previously referred unambiguously to a column might now be ambiguous, if there is a parameter with the same name. Yes, a function that previously worked might now throw error, if we make ambiguous names be errors. But this is *not* a failure that would occur during dump/reload. You'd have to actually run the function. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Allow SQL-language functions to reference parameters by parameter name
On Mar 25, 2011, at 9:12 PM, Robert Haas wrote: As I've said before, I believe that the root cause of this problem is that using the same syntax for variables and column names is a bad idea in the first place. If we used $foo or ?foo or ${foo} or $.foo or foo!!$#? to mean the parameter called foo, then this would all be a non-issue. Yes *please*. Man that would make maintenance of such functions easier. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Allow SQL-language functions to reference parameters by parameter name
Robert Haas robertmh...@gmail.com writes: As I've said before, I believe that the root cause of this problem is that using the same syntax for variables and column names is a bad idea in the first place. If we used $foo or ?foo or ${foo} or $.foo or foo!!$#? to mean the parameter called foo, then this would all be a non-issue. If this were PL/perl, or PL/almost-anything-except-SQL, I could get behind such a proposal. But it's not, it's SQL; and SQL doesn't do things that way. SQL's idea of disambiguation is qualified names. And even more to the point: to the extent you think that weird syntax might be a suitable solution, you have to keep in mind that the SQL committee could take over any such syntax at the drop of a hat. See the recent unpleasantness concerning = ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] resolving SQL ambiguity (was Re: WIP: Allow SQL-lang funcs to ref params by param name)
Robert Haas wrote: On Mar 25, 2011, at 9:22 PM, Joshua Berkus j...@agliodbs.com wrote: Tom, Personally I'd vote for *not* having any such dangerous semantics as that. We should have learned better by now from plpgsql experience. I think the best idea is to throw error for ambiguous references, period. As a likely heavy user of this feature, I agree with Tom here. I really don't want the column being silently preferred in SQL functions, when PL/pgSQL functions are throwing an error. I'd end up spending hours debugging this. Also, I don't understand why this would be a dump/reload issue if $1 and $2 continue to work. Because an identifier that previously referred unambiguously to a column might now be ambiguous, if there is a parameter with the same name. I mention 2 possible solutions here, both which involve syntax alterations, each between the -- lines. I personally like the second/lower option more. Might it be reasonable, perhaps as a 9.2 feature, to add top-level-namespaces so that one could always explicitly qualify what they are referring to? For example, you could have the 3 sch, lex, attr (I may have missed some useful ones). The sch TLN would unambiguously refer directly to a schema object, such as a database table. The lex TLN would unambiguously refer directly to a lexical, either a parameter of the current routine or to a lexical variable. The attr TLN would unambiguously refer to a table/etc column/attribute in the manner typical for SQL. Use them like: sch.foo - the table/etc foo lex.foo - the lexical variable foo attr.foo - the column foo Use of these TLN are optional where there is no ambiguity. The TLN are not reserved words, but if one has an entity named the same, then references to it must be TLN-qualified; eg: lex.sch lex.lex lex.attr Now these are just examples. You may find a different set works better. -- There are also alternate solutions. For example, it could be mandated that lexical-scope aliases for any data/var-like schema object are required in routines, where the aliases are distinct from all lexical vars/params/etc, and then all SQL/code in the routines may only refer to the schema objects by the aliases. Effectively this makes it so that routines can no longer see non-lexical vars but for those from parameters, and this aliasing is defining a parameter whose argument is supplied by the DBMS automatically rather than as an explicit routine caller argument. That way, inside a routine body there are only lexical names for things, and so no namespace-qualification is ever needed by the regular SQL. Similarly, if you always think of table column names as referring to an attribute or element of a table variable, then just reference the column qualified by the table name (or the lexical alias thereof). Same as you do in any other programming language. Of course, sometimes you don't have to qualify column name references as context could make it unambiguous. Or, a shorthand like a simple leading . could unambiguously say you're referring to a column of the particular table in context. With those in place, all unqualified references are straight to lexical variables or parameters. And so, this is also an effective way to resolve the ambiguity and I prefer the latter design personally. Here's an example in quasi-PL/PgSQL: create function myrtn (myparam integer, mytbl ::= mydb.myschema.mytbl) as declare myvar integer := 5; $body$ begin select (.mycol + myvar * myparam) as mynewcol from mytbl; end; $body$ Note that I've already thought through this last example as these methods of avoiding ambiguity are loosely-speaking how my language Muldis D avoids the problem faced by many SQL procedures. The .mycol syntax specifically was inspired originally for me by Perl 6 where the lack of something just before the . means that the implicit topic variable is referred to, like if you said $_.mycol. A Perl 6 analogy being something like: $mytbl.map:{ .mycol + $myvar * $myparam } aka: $mytbl.map:{ $_.mycol + $myvar * $myparam } -- -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Open issues for collations
Robert Haas robertmh...@gmail.com writes: I think some discussion of which of the things on the open item lists need to be done before beta might be helpful, and we ought to add any items that are not there but are blockers. Here's a quick enumeration of some things I think need discussion about the collations patch: * Are we happy yet with the collation assignment behavior (see parse_collate.c)? A couple of specific subtopics: ** Selecting a field from a record-returning function's output. Currently, we'll use the field's declared collation; except that if the field has default collation, we'll replace that with the common collation of the function's inputs, if any. Is either part of that sane? Do we need to make this work for functions invoked with other syntax than a plain function call, eg operator or cast syntax? ** What to do with domains whose declaration includes a COLLATE clause? Currently, we'll impute that collation to the result of a cast to the domain type --- even if the cast's input expression includes an explicit COLLATE clause. It's not clear that that's per spec. If it is correct, should we behave similarly for functions that are declared to return a domain type? Should it matter if the cast-to-domain is explicit or implicit? Perhaps it'd be best if domain collations only mattered for columns declared with that domain type. Then we'd have a general rule that collations only come into play in an expression as a result of (a) the declared type of a column reference or (b) an explicit COLLATE clause. * In plpgsql, is it OK for declared local variables to inherit the function's input collation? Should we provide a COLLATE option in variable declarations to let that be overridden? If Oracle understands COLLATE, probably we should look at what they do in PL/SQL. * RI triggers should insert COLLATE clauses in generated queries to satisfy SQL2008 9.13 SR 4a, which says that RI comparisons use the referenced column's collation. Right now you may get either table's collation depending on which query type is involved. I think an obvious failure may not be possible so long as equality means the same thing in all collations, but it's definitely possible that the planner might decide it can't use the referenced column's unique index, which would suck for performance. (Note: this rule seems to prove that the committee assumes equality can mean different things in different collations, else they'd not have felt the need to specify.) * It'd sure be nice if we had some nontrivial test cases that work in encodings besides UTF8. I'm still bothered that the committed patch failed to cover single-byte-encoding cases in upper/lower/initcap. * Remove initdb's warning about useless locales? Seems like pointless noise, or at least something that can be relegated to debug mode. * Is it worth adding a cares-about-collation flag to pg_proc? Probably too late to be worrying about such refinements for 9.1. There are a bunch of other minor issues that I'm still working through, but these are the ones that seem to merit discussion. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers