[HACKERS] Warm-Standby using WAL archiving / Seperate pg_restorelog application
Hi I've now setup a warm-standby machine by using wal archiving. The restore_command on the warm-standby machine loops until the wal requested by postgres appears, instead of returning 1. Additionally, restore_command check for two special flag-files abort and take_online. If take_online exists, then it exists with code 1 in case of a non-existant wal - this allows me to take the slave online if the master fails. This methods seems to work, but it is neither particularly fool-proof nor administrator friendly. It's not possible e.g. to reboot the slave without postgres abortint the recovery, and therefor processing all wals generated since the last backup all over again. Monitoring this system is hard too, since there is no easy way to detect errors while restoring a particular wal. I think that all those problems could be solved if postgres provided a standalone application that could restore one wal into a specified data-dir. It should be possible to call this application repeatedly to restore wals as they are received from the master. Since pg_restorelog would be call seperately for every wal, I'd be easy to detect errors recovering a specific wal. Do you think this idea is feaseable? How hard would it be to turn the current archived-wal-recovery-code into a standalone executable (That of course needs to be called when postgres is _not_ running.) greetings, Florian Pflug ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Warm-Standby using WAL archiving / Seperate pg_restorelog
Merlin Moncure wrote: On 7/10/06, Florian G. Pflug [EMAIL PROTECTED] wrote: This methods seems to work, but it is neither particularly fool-proof nor administrator friendly. It's not possible e.g. to reboot the slave without postgres abortint the recovery, and therefor processing all wals generated since the last backup all over again. Monitoring this system is hard too, since there is no easy way to detect errors while restoring a particular wal. what I would really like to see is to have the postmaster start up in a special read only mode where it could auto-restore wal files placed there by an external process but not generate any of its own. This would be a step towards a pitr based simple replication method. I didn't dare to ask for being able to actually _access_ a wal-shipping based slaved (in read only mode) - from how I interpret the code, it's a _long_ way to get that working. So I figured a stand-alone executable that just recovers _one_ archived wal would at least remove that administrative burden that my current solution brings. And it would be easy to monitor the slave - much easier than with any automatic pickup of wals. greetings, Florian Pflug ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Forcing wal rotation
Hi For my warm-standby-cluster I'm now saving the currently used wal using rsync, to avoid loosing data from a few hours (or days) ago, when there is little traffic, and thus the wal isn't rotated. For online backups, the problem is even worse, because a backup might me unuseable even hours after I called pg_stop_backup(), because the wal segment needed to bring the backup to a consistent state might not have been archived at that time. I've now thought about how to fix that without doing that rather crude rsync-pg_xlog-hack. I've read through the code, and learned that wal-segments are expected to have a specific size - thus rotating them early is not that easy. But now I figured that another short-term solution could probably be implemented easily. I'd like to write a function that just fills the wal with nop records, until the current used wal is full. Since I assume that there are already different kind of wal records, adding a NOP-record that just takes up space, and does nothing else, should be quite easy. And even if wals contain only one kind of record (Write this page to that datafile), I could just repeat the last records over and over again, until the wal is filled, couldn't I? Do you think that this is feasable? If so, I'd like to try it. greetings, Florian Pflug ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Forcing wal rotation
A.M. wrote: On Fri, July 14, 2006 11:20 am, Florian G. Pflug wrote: Hi For my warm-standby-cluster I'm now saving the currently used wal using rsync, to avoid loosing data from a few hours (or days) ago, when there is little traffic, and thus the wal isn't rotated. For online backups, the problem is even worse, because a backup might me unuseable even hours after I called pg_stop_backup(), because the wal segment needed to bring the backup to a consistent state might not have been archived at that time. How about an SQL-level function that calls the wal scripts? This would also allow important transactions to push data to the standy server regardless of the wal size. That was the idea - providing pg_rotate_wal(), which would guarantee that the wal is rotatted at least once if called. Thinking further about this, for a first prove of concept, I'd be enough to write a C function pg_current_walsegment(). pg_rotate_wal() could then be a plpgsql function, that e.g. creates a temporary table, and fills it with data, until the return value of pg_current_walsegment() changes. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Forcing wal rotation
Martijn van Oosterhout wrote: On Fri, Jul 14, 2006 at 05:36:58PM +0200, Florian G. Pflug wrote: That was the idea - providing pg_rotate_wal(), which would guarantee that the wal is rotatted at least once if called. Thinking further about this, for a first prove of concept, I'd be enough to write a C function pg_current_walsegment(). pg_rotate_wal() could then be a plpgsql function, that e.g. creates a temporary table, and fills it with data, until the return value of pg_current_walsegment() changes. Temporary tables don't get xlogged. It would probably be easier to hook into the xlog machinery and create NOP records, like you originally suggested. From further sourcecode reading, I got the following implementation plan .) Add new entry to RmgrTable (Should I add it at the end, or use one of the reserved entries?) .) Create nop_redo and nop_desc - for nop_redo an empty function should be sufficient .) Create pg_rotate_wal, which calls XLogInsert(RM_NOP_ID, XLOG_NO_TRAN, rdata) with rdata.data = pointer to WAL_SEGMENT_SIZE zero bytes rdata.len = WAL_SEGMENT_SIZE rdata.buffer = InvalidBuffer rdata.next = NULL Since I insert WAL_SIZE bytes, I shouldn't even have to loop, because that records has no chance to fit into the current wal segment, right? Am I overlooking something? greetings, Florian Pflug ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Forcing wal rotation
Simon Riggs wrote: On Fri, 2006-07-14 at 12:09 -0400, Tom Lane wrote: Florian G. Pflug [EMAIL PROTECTED] writes: I've now thought about how to fix that without doing that rather crude rsync-pg_xlog-hack. I've read through the code, and learned that wal-segments are expected to have a specific size - thus rotating them early is not that easy. Simon was working on a patch for this at the code sprint; I think it's submitted to -patches already. Slightly different patch. I'm working on this one still. Cool - what are the chances of this being included in 8.2? Explicitly filling the segment as you propose would be really bad for performance. Yes, current approach I am taking is better than that. Well, my proposal wasn't really a long-term solution - I was thinking about I quick fix that I could implement for 8.1, basically to let my warm-standby-setup feel less like as house of cards as someone put it ;-) I didn't care too much about the performance hit - I don't expect the database I indent to use it for to have much load, otherwise the wal segments are rotated quite often anyway. But I agree that for a general solution, my approach is not really ideal ;-) Since we just ported the application in question to 8.1, I'm not sure that we will switch to 8.2 when it is released - so I'm still interested in finding a solution for 8.1 Do you think I could backport your patch to 8.1 - or does it depend on some other new features of 8.2? greetings, Florian Pflug ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] UPDATE/DELETE XXX WHERE CURRENT OF cursor_name
Gavin Sherry wrote: On Mon, 24 Jul 2006, Golden Liu wrote: begin; declare foo cursor for select * from bar for update; fetch foo; update bar set abc='def' where current of foo; fetch foo; delete from bar where current of foo; commit; No one has stepped up to do this for 8.2 so unfortunately you will most likely not see this within the next year or so :-(. Couldn't this be emulated by doing begin; declare foo cursor for select * from bar for update; fetch foo into v_foo ; update bar set abc='def' where ctid = v_foo.ctid; fetch foo into v_foo ; delete from bar where ctid = v_foo.ctid; commit; Or could a concurrent vacuum run lead to the wrong rows being updated/deleted? greetings, Florian Pflug ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] UPDATE/DELETE XXX WHERE CURRENT OF cursor_name
Tom Lane wrote: Florian G. Pflug [EMAIL PROTECTED] writes: Couldn't this be emulated by doing begin; declare foo cursor for select * from bar for update; fetch foo into v_foo ; update bar set abc='def' where ctid = v_foo.ctid; That wouldn't follow the expected semantics if there's a concurrent update, because the updated row would always fail the WHERE clause, and thus the update would just silently not happen. (I'm thinking about READ COMMITTED mode of course --- in SERIALIZABLE you'd just get the expected error.) You'd have to find some way to pump the row's most up-to-date version through the cursor's query plan, a la EvalPlanQual, to see if it still met the cursor's WHERE condition. How could there be a concurrent update of the _same_ row, when I do select * from bar *for update*. Or are you talking about concurrent updates to the same page that could somehow alter the ctid of _another_ tuple? greetings, Florian Pflug ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] UPDATE/DELETE XXX WHERE CURRENT OF cursor_name
Tom Lane wrote: Florian G. Pflug [EMAIL PROTECTED] writes: How could there be a concurrent update of the _same_ row, when I do select * from bar *for update*. AFAICT the spec doesn't require one to have written FOR UPDATE in order to use WHERE CURRENT OF. (In effect, they expect FOR UPDATE to be the default, which is certainly not a change we're going to want to make to DECLARE CURSOR.) If we did make that restriction then we could probably skip the EvalPlanQual mess. But if the expect for update to be default, then essentially they do require that one to use a cursor with for update semantics when using where current of - or do they allow where current of even for not for update cursors? If one would restrict in implementation of where current of to for update, without hold cursors, the only non-trivial problem that I can see is how to support more than one update of the same row. Because as far as I can see, if you'd do begin; declare foo cursor select * from bar for update; fetch foo into v_foo ; update bar set ... where ctid = v_foo.ctid ; update bar set ... where ctid = v_foo.ctid ; commit; the second update would silently be ignored. But since only updates happing in the same transaction would somehow need to be tracked, this should be much easier to do than supporting the non-for-update case. greetings, Florian Pflug ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] Patch for VS.Net 2005's strxfrm() bug
Bruce Momjian wrote: Why is this better than: #if _MSC_VER == 1400 Surely this will not be true if _MSC_VER is undefined? I experienced injustice and the reason of in OSX for it. What was the problem with OSX? Did it throw a warning of you did an equality test on an undefined symbol? The following if evaluated to true on osx, although I'm pretty sure that _MSC_VER isn't defined on osx ;-) #if (_MSC_VER 1300) ... #endif replacing it with #ifdef WIN32 #if (_MSC_VER 1300) ... #endif #endif fixed the problem. greetings, Florian Pflug ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PATCHES] Patch for VS.Net 2005's strxfrm() bug
[EMAIL PROTECTED] wrote: Bruce Momjian wrote: Why is this better than: #if _MSC_VER == 1400 Surely this will not be true if _MSC_VER is undefined? I experienced injustice and the reason of in OSX for it. What was the problem with OSX? Did it throw a warning of you did an equality test on an undefined symbol? The following if evaluated to true on osx, although I'm pretty sure that _MSC_VER isn't defined on osx ;-) #if (_MSC_VER 1300) ... #endif replacing it with #ifdef WIN32 #if (_MSC_VER 1300) ... #endif #endif fixed the problem. No doubt, but that's quite a different test. I mainly posted this to show what the offending ifdef in pgadmin3 looked like, since someone referenced it, not as an argument against #if _MSC_VER = 1400. I guess _MSC_VER 1300 gets interpreted as 0 1300 if _MSC_VER is undefined, so _MSC_VER = 1400 would actually work. But it still suprised me a lot that _MSC_VER 1300 evaluated to true if _MSC_VER is undefined - maybe thats the _real_ reason why some people don't like the tri-state logic in sql - it's because they get confused when trying to use the c preprocessor ;-) greetings, Florian Pflug ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] GUC with units, details
Tom Lane wrote: Peter Eisentraut [EMAIL PROTECTED] writes: Tom Lane wrote: Peter's not said exactly how he plans to deal with this, but I suppose it'll round off one way or the other ... It'll get truncated by integer division. I wouldn't mind if someone proposed a patch to create a warning or error in this case, but I wanted to keep the initial version simple. I'd recommend against that. Apple recently changed OS X so that it rejects SHMMAX settings that aren't an exact multiple of something-or-other, and I've found that to be a *serious* PITA. Of course part of the problem is that there's no helpful message, but it's still a big loss from a usability standpoint, and quite unnecessary (every other Unix seems willing to round off...) One thought is that maybe we should round up not down? I'm having a hard time making a specific case either way, though. Rounding up would have the advantage that you could just specify 0 in the config file, and have postgres use the smallest value possible. greetings, Florian Pflug ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] GUC with units, details
Peter Eisentraut wrote: Florian G. Pflug wrote: Rounding up would have the advantage that you could just specify 0 in the config file, and have postgres use the smallest value possible. In most algebras, dividing zero by something is still zero, so there'd be no need to round anything. I guess a clicked the send button a little too fast. You're right, of course. greetings, Florian Pflug ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] extension for sql update
Tom Lane wrote: Susanne Ebrecht [EMAIL PROTECTED] writes: ... We could provide the mixed update syntax and leave the typed row value expression for the next release. Do you agree? I don't really see the point --- the patch won't provide any new functionality in anything like its current form, because you can always just write the separate expressions in the simple one to one way. If we do offer the row-on-the-left syntax then people will try to put sub-selects on the right, and won't get anything beyond an unhelpful syntax error message. So my vote would be to leave it alone until we have a more complete implementation. It has the advantage that inserts and updates look more alike. If your sql statements are generated by code, then that removes the need of a special case for updates. greetings, Florian Pflug ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PATCHES] Forcing current WAL file to be archived
Albe Laurenz wrote: Tim Allen wrote: Patch included to implement xlog switching, using an xlog record processing instruction and forcibly moving xlog pointers. 1. Happens automatically on pg_stop_backup() Oh - so it will not be possible to do an online backup _without_ forcing a WAL switch any more? Well, previously, you would have always had to simulate a wal switch, by working out which is the current wal file and copying that. Otherwise your online backup wouldn't be complete. What Simon is describing sounds like a big step forward from that situation. It should let me delete half the code in my pitr backup/failover scripts. Definitely a Good Thing. Certainly a Good Thing, and it should be on by default. But couldn't there be situations where you'd like to do an online backup without a WAL switch? To avoid generating an archive WAL every day on a database with few changes, e.g.? But the online backup would be impossible to restore, if you don't have enough wal archived to recover past the point where you called pg_stop_backup(). So, doing a wal switch when pg_stop_backup() is called greatly reduces the risk of a user error that leads to broken backups. greetings, Florian Pflug ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Constraint exclusion is not general enough
Tom Lane wrote: Rod Taylor [EMAIL PROTECTED] writes: A simple way of doing this might be to use a minimum cost number? But you don't have any cost numbers until after you've done the plan. Couldn't this work similar to geqo_effort? The planner could try planning the query using only cheap algorithmns, and if the cost exceeds a certain value, it'd restart, and use more sophisticated methods. greetings, Florian Pflug ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Constraint exclusion is not general enough
Tom Lane wrote: Florian G. Pflug [EMAIL PROTECTED] writes: Tom Lane wrote: But you don't have any cost numbers until after you've done the plan. Couldn't this work similar to geqo_effort? The planner could try planning the query using only cheap algorithmns, and if the cost exceeds a certain value, it'd restart, and use more sophisticated methods. AFAICS this would be a net loss on average. Most of the time, the constraint exclusion code doesn't win, and so throwing away all your planning work to try it is going to be a loser most of the time. On the other hand, if the consider-replanning threshold is high enough, than that additional time really doesn't matter - If a query runs for minutes, or even hours, a few wasted cycles during planning don't hurt. greetings, Florian Pflug ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] An Idea for planner hints
Hi Since the discussion about how to force a specific plan has come up, I though I'd post an idea I had for this a while ago. It's not reall well though out yet, but anyway. When the topic of optimizer hints comes up, people often suggest that there should be a way to force postgres to use a certain index, or do joins in a certain order. AFAIK, this mimics what oracle does - you can put comments into your query that specify what index to use. This approach has two major drawbacks .) Plans that seem good now might not seem that good a few months later - your data might have changed, and other execution plans might fit better now .) You have to change all your queries to make use of features in new postgres versions, like bitmap scans. My experience with the postgres optimizer is that it usually performs great - and if it doesn't, that always boiled down to two problems (at least for me) .) The query is autogenerated, and includes complex, and highly inter- dependent where (or join) conditions. This leads to wrong estimates of where selectivity, and thus to bad plans. .) There are correlations between columns and/or tables that postgres doesn't know about (and has no chance of knowing about). Again, this leads to vastly wrong estimates of row counts, and to bad plans. I think that those bad estimates of the selectivity of where-clauses (or on-clauses for joins) is where postgres could use hints. Image a query like select ... from t1 join t2 on t1.t2_id = t2.id and expr. Lets say that expr is true for only 1% of the rows in t2 - but those are exactly the rows that have matching rows in t1. Postgres would probably guess that this join will produce about 1/100 of the rows that t1 has - but I _know_ that it will produce 100 (!) times more rows. Now, I'd like to hand that information to postgres. I wouldn't want to force any particular access method or join order, but rather I'd just tell it hey, this expression has selectivity 1 in this context, not 0.01 as you might think. Could that work? greetings, Florian Pflug ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] An Idea for planner hints
Tom Lane wrote: Martijn van Oosterhout kleptog@svana.org writes: ISTM theat the easiest way would be to introduce a sort of predicate like so: SELECT * FROM foo, bar WHERE pg_selectivity(foo.a = bar.a, 0.1); The one saving grace of Florian's proposal was that you could go hack the statistics *without* changing your queries. This throws that away again. I think for this to be really effective, you'd actually need both - a query-independent way specifying selectivities, and a way to influence the estimates for a _single_ query. Image a complex, autogenerated query with looks something like this select from t1 join t2 on ... join t3 on ... join t4 on ... ... ... where big, complicated expression derived from some user input. This big, complicated expression looks different for every query - and currently, postgres often vastly overestimates the selectivity of this expression. This leads to weird join orders, and generally very bad performance. Of course, *I* don't know the selectivity of this expression myself - but experience tells me that on average it's something like 50%, and not 1% as postgres believes. So, in that case, being able to write select ... join where pg_selectivity(expression, 0.5) would be a big win. The thing I object to about the I want to decorate my queries with planner hints mindset is that it's coming at it from the wrong direction. You should never be thinking in terms of fix this one query, because that just leads back into the same dead end that your fix doesn't work tomorrow. What you *should* be thinking about is why did the planner get this wrong, and how do I fix the generic problem?. If you attack it that way then your fix is much more likely to work on the next slightly-different query. Fixing the generic problem is surely the best _if_ there is a fix for the generic problem at all. But if your where-conditions involves fields from 10 different tables, then IMHO there is no way to _ever_ guarantee that postgres will get correct selectivity estimates. But since (at least for me) overestimating selectivity hurts fare more than underestimating it, forcing postgres to just assume a certain selectivity could help. I'm not in any way saying that there should _only_ be selectivity annotations inside the query - a query-independent mechanism would be a very nice thing to have. But a query-independent mechanism wont be sufficient in all cases IMHO. greetings, Florian Pflug ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] An Idea for planner hints
Tom Lane wrote: Florian G. Pflug [EMAIL PROTECTED] writes: Image a complex, autogenerated query with looks something like this select from t1 join t2 on ... join t3 on ... join t4 on ... ... ... where big, complicated expression derived from some user input. This big, complicated expression looks different for every query - and currently, postgres often vastly overestimates the selectivity of this expression. This is a straw man. There is no way that your application can throw in a chosen-at-random selectivity value for a join condition that it doesn't understand and have that be more likely to be right than the planner's guess. No, my application probably won't get it right, _but_ .) I can at least _choose_ what selectivity to use. My experience is that a selectivity that is too small (meaning that postgres underestimates the number of records resulting for a join or where) is usually much worse than a overly large selectivity (meaning that postgres expects more records than it actually finds). Forcing a high selectivity (thus letting postgres expect a lot of records) therefore should lead to better plans then letting postgres underestimating the selectivity. .) Often, my application (or I) *can* guess betten then postgres. My application, for example, executes the same set of about 100 queries every day to build cache tables. Since I _know_ how many records the query returned yesterday, I can use that value to get a *very* good approximation of the selectivity. This is something my app can do easily, while postgres would have really a hard time to figure that out. greetings, Florian Pflug ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] An Idea for planner hints
Peter Eisentraut wrote: Arturo PĂ©rez wrote: The DBA therefore pokes the right information into the planner's statistical tables (or, perhaps, a more human- manageable one that gets compiled into the planner's stats). I think we're perfectly capable of producing a system that can collect the statistics. We just don't want to collect every possible statistic, but just those that someone declared to be interesting beforehand. There need not be any manual poking. Just manual declaring. But we need to work this from the other end anyway. We need to determine first, what sort of statistics the planner could make use of. Then we can figure out the difficulties in collecting them. I've been told that oracle has an interesting feature regarding materialized views that gave me an idea how to declare what statistics to gather. It seems as if oracle is able to figure out that it can use a certain materialized view to speed up execution of a certain query, even if the query doesn't use that view explicitly. So, e.g. if you do 1) create materialized view v as select * from t1 join t2 on t1.t2_id = t2.id. 2) select * from t1 join t2 on t1.t2_id = t2.id join t3 on t3.t2_id = t2.id then oracle seems to be able to use the already-joined tuples in v, and only needs to join t3 to those, instead of having to rejoin t1 and t2. That gave me the idea that something similar could be used to declare what statistics to gather, in a very general way. Imagine that I could do. 1) create statistics for select * from t1 join t2 on t1.t2_id and t1.flag = TRUE. 2) select * from t1 join t2 on t1.t2_id and t1.flag = TRUE join t3 on ... join t4 on ... The command 1) would basically gather the same statistics for the result of the query as it would gather for a normal table with the same signature. When planning 2), postgres would recognize that it can use those statistics (similar to how oracle recognizes that it can use a certain materialized view), and would thus. know the selectivity of that particular join very accurately. I think there might even be a way to do (1) without actually executing the (whole) query. If every access-method in the query plan could be told to deliver only say 10% of the rows it would deliver normally, but the rest of the plan was executed normally, then the result should have the same statistical properties as the complete result would have. greetings, Florian Pflug ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] WIP archive_timeout patch
Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: Revised patch enclosed, now believed to be production ready. This implements regular log switching using the archive_timeout GUC. Further patch enclosed implementing these changes plus the record type version of pg_xlogfile_name_offset() Applied with minor changes --- it seemed better to me to put tracking of the last xlog switch time directly into xlog.c, instead of having the bgwriter code try to determine whether a switch had happened recently. I noticed a minor annoyance while testing: when the system is completely idle, you get a forced segment switch every checkpoint_timeout seconds, even though there is nothing useful to log. The checkpoint code is smart enough not to do a checkpoint if nothing has happened since the last one, and the xlog switch code is smart enough not to do a switch if nothing has happened since the last one ... but they aren't talking to each other and so each one's change looks like something happened to the other one. I'm not sure how much trouble it's worth taking to prevent this scenario, though. If you can't afford a WAL file switch every five minutes, you probably shouldn't be using archive_timeout anyway ... Actually, this behaviour IMHO even has it's advantages - if you can be sure that at least one wal will be archived every 5 minutes, then it's easy to monitor the replication - you can just watch the logfile if the slave, and send a failure notice if no logfile is imported at least every 10 minutes or so. Of course, for this to be useful, the documentation would have to tell people about that behaviour, and it couldn't easily be changed in the next release... greetings, Florian Pflug ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PostgreSQL on 64 bit Linux
[EMAIL PROTECTED] wrote: This is what I mean by after thought. PostgreSQL is designed for 32-bit processors. Which is fine. I'm not complaining. The question was whether there is an interest in pursuing 64-bit specific optimizations. In the PostgreSQL code, a quick check points me only to has long int 64 as a 64-bit source code #ifdef. Of the six places that reference this, five of them actually slow down the code, as they check for overflow of the 'long int' result beyond 4 bytes of data. The sixth place is used to define the 64-bit type in use by PostgreSQL, which I suspect is infrequently used. I believe the answer is no. No or few 64-bit optimization possibilities have been chased down, probably because some or many of these would: 1) require significant re-architecture 2) reduce the performance in a 32-bit world Just out of intereset - what areas in postgres do you think could be improved (performance wise) on 64-bit machines? The only area that I can see is the int64 datatype - it's stored in palloc()'ed memory on 32-bit machines AFAIK - I'm not sure if it uses the long long datatype on 64-bit archs.. But I can't imagine any other area that could be tuned by making use of (native) 64-bit ints. greetings, Florian Pflug ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Patch: Tie stats options to autovacuum in postgresql.conf
Jim C. Nasby wrote: On Thu, Sep 28, 2006 at 03:07:39PM -0700, David Wheeler wrote: PostgreSQLers, I just ran into an issue where a client thought that autovacuum was running but it wasn't. This is because it's not fatal when autovacuum is on but stats_start_collector and/or stats_row_level is off. I suspect that there's a reason that it's not fatal, so I thought that it might be useful to give folks just a little bit of help by telling them in postgresql.conf that they need to enable them in order for autovacuum to work. +1. I was just at a client today that had run into this problem. Actually, I'm in favor of refusing to start if autovac is on but the proper stats settings aren't. I'd rather that then people ending up with bloated databases and crappy performance. If think that setting autovacuum to on should even force stats_collector and stats_row_level to on - together with a warning if they would otherwise be off. The risk of autovacuum being disabled by accident seems to risk a much worse performance penatly then having the statistics collector running by accident. Additionally, the statistics collector can easily be turned off within seconds even _if_ it was on accidentally, but if vacuuming was disabled by accident, the user might have to run vacuum full - with all the concurrency issues that this implies.. greetings, Florian flug ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] array_accum aggregate
Tom Lane wrote: Stephen Frost [EMAIL PROTECTED] writes: * Tom Lane ([EMAIL PROTECTED]) wrote: It looks like it should work to have just one polymorphic aggregate definition, eg, array_accum(anyelement) returns anyarray. I was hoping to do that, but since it's an aggregate the ffunc format is pre-defined to require accepting the 'internal state' and nothing else, and to return 'anyelement' or 'anyarray' one of the inputs must be an 'anyelement' or 'anyarray', aiui. Hmm ... I hadn't been thinking about what the state type would need to be, but certainly bytea is a lie given what you're really doing. We've run into this same problem in contrib/intagg: sometimes you'd like to use a state data structure that isn't any regular SQL datatype, and in particular isn't just a single blob of memory. That's a problem from nodeAgg's point of view because it expects to be responsible for copying the state value from one context to another. Don't have an immediate idea for a solution ... I used an int8 as state type for an implementation of a kind of array_accum_unique aggregate. I know that it's a ugly hack, but it has been running on a production machine for months now, without a problem... The memory is alloc'd from the aggcontext, btw. Note that I only use this aggregate in one particular query - so there might be problems with my approach that just don't manifest in my particular situation. For example, the aggregate is used only on a table that is never updated, and it is only used in select queries. So there might be problems if the executor decides that it has to restart a query... greetings, Florian Pflug ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Transaction Snapshots and Hot Standby
Heikki Linnakangas wrote: BTW, we haven't talked about how to acquire a snapshot in the slave. You'll somehow need to know which transactions have not yet committed, but will in the future. In the master, we keep track of in-progress transaction in the ProcArray, so I suppose we'll need to do the same in the slave. Very similar to prepared transactions, actually. I believe the Abort records, which are not actually needed for normal operation, become critical here. The slave will need to put an entry to ProcArray for any new XLogRecord.xl_xid it sees in the WAL, and remove the entry at a Commit and Abort record. And clear them all at a shutdown record. For reference, here is how I solved the snapshot problem in my Summer-of-Code project last year, which dealt exactly with executing read-only queries on PITR slaves (But sadly never came out of alpha stage due to both my and Simon's lack of time) The main idea was to invert the meaning of the xid array in the snapshot struct - instead of storing all the xid's between xmin and xmax that are to be considering in-progress, the array contained all the xid's xmin that are to be considered completed. The current read-only snapshot (which current meaning the corresponding state on the master at the time the last replayed wal record was generated) was maintained in shared memory. It' xmin field was continually updated with the (newly added) XLogRecord.xl_xmin field, which contained the xid of the oldest running query on the master, with a pruning step after each ReadOnlySnapshot.xmin update to remove all entries xmin from the xid array. If a commit was seen for an xid, that xid was added to the ReadOnlySnapshot.xid array. The advantage of this concept is that it handles snapshotting on the slave without too much additional work for the master (The only change is the addition of the xl_xmin field to XLogRecord). It especially removes that need to track ShmemVariableCache-nextXid. The downside is that the size of the read-only snapshot is theoretically unbounded, which poses a bit of a problem if it's supposed to live inside shared memory... regards, Florian Pflug smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] Transaction Snapshots and Hot Standby
Simon Riggs wrote: On Sat, 2008-09-13 at 10:48 +0100, Florian G. Pflug wrote: The main idea was to invert the meaning of the xid array in the snapshot struct - instead of storing all the xid's between xmin and xmax that are to be considering in-progress, the array contained all the xid's xmin that are to be considered completed. The downside is that the size of the read-only snapshot is theoretically unbounded, which poses a bit of a problem if it's supposed to live inside shared memory... Why do it inverted? That clearly has problems. Because it solves the problem of sponteaously apprearing XIDs in the WAL. At least prior to 8.3 with virtual xids, a transaction might have allocated it's xid long before actually writing anything to disk, and therefore long before this XID ever shows up in the WAL. And with a non-inverted snapshot such an XID would be considered to be completed by transactions on the slave... So, one either needs to periodically log a snapshot on the master or log XID allocations which both seem to cause considerable additional load on the master. With an inverted snapshot, it's sufficient to log the current RecentXmin - a values that is readily available on the master, and therefore the cost amounts to just one additional 4-byte field per xlog entry. regards, Florian Pflug smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] Transaction Snapshots and Hot Standby
Simon Riggs wrote: On Sat, 2008-09-13 at 10:48 +0100, Florian G. Pflug wrote: The current read-only snapshot (which current meaning the corresponding state on the master at the time the last replayed wal record was generated) was maintained in shared memory. It' xmin field was continually updated with the (newly added) XLogRecord.xl_xmin field, which contained the xid of the oldest running query on the master, with a pruning step after each ReadOnlySnapshot.xmin update to remove all entries xmin from the xid array. If a commit was seen for an xid, that xid was added to the ReadOnlySnapshot.xid array. The advantage of this concept is that it handles snapshotting on the slave without too much additional work for the master (The only change is the addition of the xl_xmin field to XLogRecord). It especially removes that need to track ShmemVariableCache-nextXid. Snapshots only need to know which transactions are currently running during WAL apply. The standby can't remove any tuples itself, so it doesn't need to know what the master's OldestXmin is. I used the logged xmin value to track the shared snapshot's xmin, which in turn allowed me to prune the xid array, eliminating all xids that xmin. regards, Florian Pflug smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] Listen / Notify - what to do when the queue is full
Heikki Linnakangas wrote: Joachim Wieland wrote: On Thu, Nov 19, 2009 at 4:12 AM, Tom Lane t...@sss.pgh.pa.us wrote: Yes, I have been thinking about that also. So what should happen when you prepare a transaction that has sent a NOTIFY before? From the user's point of view, nothing should happen at prepare. At a quick glance, it doesn't seem hard to support 2PC. Messages should be put to the queue at prepare, as just before normal commit, but the backends won't see them until they see that the XID has committed. Yeah, but if the server is restarted after the PREPARE but before the COMMIT, the notification will be lost, since all notification queue entries are lost upon restart with the slru design, no? best regards, Florian Pflug smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] Listen / Notify - what to do when the queue is full
Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: A better approach is to do something similar to what we do now: at prepare, just store the notifications in the state file like we do already. In notify_twophase_postcommit(), copy the messages to the shared queue. Although it's the same approach we have now, it becomes a lot cleaner with the patch, because we're not piggybacking the messages on the backend-private queue of the current transaction, but sending the messages directly on behalf of the prepared transaction being committed. This is still ignoring the complaint: you are creating a clear risk that COMMIT PREPARED will fail. I'm not sure that it's really worth it, but one way this could be made safe would be for PREPARE to reserve the required amount of queue space, such that nobody else could use it during the window from PREPARE to COMMIT PREPARED. I'd see no problem with COMMIT PREPARED failing, as long as it was possible to retry the COMMIT PREPARED at a later time. There surely are other failure cases for COMMIT PREPARED too, like an IO error that prevents the clog bit from being set, or a server crash half-way through COMMIT PREPARED. best regards, Florian Pflug smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] Listen / Notify - what to do when the queue is full
Tom Lane wrote: Florian G. Pflug f...@phlo.org writes: Tom Lane wrote: This is still ignoring the complaint: you are creating a clear risk that COMMIT PREPARED will fail. I'd see no problem with COMMIT PREPARED failing, as long as it was possible to retry the COMMIT PREPARED at a later time. There surely are other failure cases for COMMIT PREPARED too, like an IO error that prevents the clog bit from being set, or a server crash half-way through COMMIT PREPARED. Yes, there are failure cases that are outside our control. That's no excuse for creating one that's within our control. True. On the other hand, people might prefer having to deal with (very unlikely) COMMIT PREPARED *transient* failures over not being able to use NOTIFY together with 2PC at all. Especially since any credible distributed transaction manager has to deal with COMMIT PREPARED failures anyway. Just my $0.02, though. best regards, Florian Pflug smime.p7s Description: S/MIME Cryptographic Signature
[HACKERS] DEFAULT of domain ignored in plpgsql (8.4.1)
Hi It seems that pl/pgsql ignores the DEFAULT value of domains for local variables. With the following definitions in place create domain myint as int default 0; create or replace function myint() returns myint as $body$ declare v_result myint; begin return v_result; end; $body$ language plpgsql immutable; issuing select myint(); returns NULL, not 0 on postgres 8.4.1 If the line v_result myint; is changes to v_result myint default 0; than 0 is returned as expected. I've tried to create a patch, but didn't see how I'd convert the result from get_typedefault() (A Node*, presumeably the parsetree corresponding to the default expression?) into a plan that I could store in a PLpgSQL_expr. I guess I'd need something like SPI_prepare_plan that takes a parse tree instead of a query string. Or am I on a completely wrong track there? While trying to cook up a patch I've also stumbled over what I perceive as a bug relating to DOMAINS and column DEFAULTs. I'll write that up in a second E-Mail to avoid confusion. best regards, Florian Pflug smime.p7s Description: S/MIME Cryptographic Signature
[HACKERS] column DEFAULTs and prepared statements
Hi While trying to come up with a patch to handle domain DEFAULTs in plpgsql I've stumbled across the following behavior regarding domain DEFAULTs and prepared statements. session 1: create domain myint as int default 0 ; session 1: create table mytable (i myint) ; session 2: prepare ins as insert into mytable (i) values (default); session 2: execute ins; session 1: alter domain myint set default 1; session 2: execute ins; select * from mytable returns: i --- 0 0 while I'd have expected: i --- 0 1 After doing the same without using a domain session 1: create table mytable (i myint default 0) ; session 2: prepare ins as insert into mytable (i) values (default); session 2: execute ins; session 1: alter table mytable alter column i default 1; session 2: execute ins; select * from mytable returns: i --- 0 1 As far as I understand the code this happens because the dependency on the domain (for the default value) is not recorded in the plan cache entry. This would imply that the same error also occurs if the INSERT happens from a pl/pgsql function instead of a manually prepared statement, but I haven't tested that. If someone gives me a general idea where to start, I could try to come up with a patch best regards, Florian Pflug smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] DEFAULT of domain ignored in plpgsql (8.4.1)
Robert Haas wrote: On Thu, Nov 19, 2009 at 9:06 PM, Florian G. Pflug f...@phlo.org wrote: I've tried to create a patch, but didn't see how I'd convert the result from get_typedefault() (A Node*, presumeably the parsetree corresponding to the default expression?) into a plan that I could store in a PLpgSQL_expr. I guess I'd need something like SPI_prepare_plan that takes a parse tree instead of a query string. Or am I on a completely wrong track there? While trying to cook up a patch I've also stumbled over what I perceive as a bug relating to DOMAINS and column DEFAULTs. I'll write that up in a second E-Mail to avoid confusion. I suggest adding this to the open CommitFest (2010-01) at https://commitfest.postgresql.org/action/commitfest_view/open Hm, but I don't (yet) have a patch to add... best regards, Florian Pflug smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] DEFAULT of domain ignored in plpgsql (8.4.1)
Tom Lane wrote: Florian G. Pflug f...@phlo.org writes: It seems that pl/pgsql ignores the DEFAULT value of domains for local variables. The plpgsql documentation seems entirely clear on this: The DEFAULT clause, if given, specifies the initial value assigned to the variable when the block is entered. If the DEFAULT clause is not given then the variable is initialized to the SQL null value. Hm, must have missed that paragraph :-(. Sorry for that. Would a patch that changes that have any chance of being accepted? Or is the gain (not having to repeat the DEFAULT clause, and being able to maintain it at one place instead of many) considered too small compared to the risk of breaking existing code? best regards, Florian Pflug smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] DEFAULT of domain ignored in plpgsql (8.4.1)
Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: (2) this change, while very useful, does change what had been a simple rule (All variables are NULL unless specifically set otherwise) into a conditional one (All variables are NULL unless set otherwise OR unless they are declared as domain types with defaults). Do people feel that the new behavior would be sufficiently intuitive to avoid user confusion? I'm inclined to leave it alone. It complicates the mental model, and frankly attaching defaults to domains was not one of the SQL committee's better ideas anyway. It's *fundamentally* non-orthogonal. I've always though of domains as being a kind of subtype of it's base type. In this picture, DEFAULTs for domains correspond to overriding the default constructor of the type (thinking C++ now), and seem like a natural thing to have. But maybe that's more a C++ programmers than a database designers point of view... I've just checked how rowtypes behave, and while the set to null unless specifically set otherwise rule kind of holds for them, their NULL value seems to be special-cased enough to blur the line quite a bit create or replace function myt() returns t as $body$ declare r t; begin raise notice 'r: %, r is null: %', r, (r is null); return r; end; $body$ language plpgsql immutable; select myt(),myt() is null; gives: NOTICE: r: (,), r is null: t NOTICE: r: (,), r is null: t myt | ?column? -+-- (,) | f Strange I think... And at least half of an exception to the simple always null unless specifically set otherwise rule It also seems that while domain DEFAULTs are ignored, the resulting (null-initialized) variable is still checked against the domain's constraints, including a potential NOT NULL constraint create domain myint as int not null; create or replace function myint() returns myint as $body$ declare i myint; begin return i; end; $body$ language plpgsql immutable; raises ERROR: domain myint does not allow null values CONTEXT: PL/pgSQL function myint line 3 during statement block local variable initialization This has the potential to cause some headache I think if you use domains to prohibit NULL values because they make no semantic sense for your application, and depend on DEFAULT to fill in some other value (like an empty string or an empty array). best regards, Florian Pflug smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] DEFAULT of domain ignored in plpgsql (8.4.1)
Gurjeet Singh wrote: On Sat, Nov 21, 2009 at 7:26 AM, Josh Berkus j...@agliodbs.com mailto:j...@agliodbs.com wrote: However, there are some other issues to be resolved: (1) what should be the interaction of DEFAULT parameters and domains with defaults? The function's DEFAULT parameter should take precedence over the default of the domain. I think Josh was pondering whether create domain myint as int default 0; create function f(i myint) ...; should behave like create function f(i myint default 0) ...; and hence call f(0) if you do select f();, or instead raise an error because no f with zero parameters is defined (as it does now). I'd say no, because no default should be treated the same as default null, so for consistency we'd then have to also support create function g(i int) ...; select g(); And of course throw an error if there was another function defined as create function g() ...; This way leads to madness... If one really wanted to do that, there'd have to be an OPTIONAL clause for function parameters that works like DEFAULT, but doesn't take a default value and instead uses the type's default (NULL except for domains with DEFAULT clause). But I wouldn't got that far, personally... best regards, Florian Pflug smime.p7s Description: S/MIME Cryptographic Signature
[HACKERS] ALTER TABLE, find_composite_type_dependencies and locking
Hi I'm currently investigating how much work it'd be to implement arrays of domains since I have a client who might be interested in sponsoring that work. The comments around the code handling ALTER DOMAIN ADD CONSTRAINT are pretty clear about the lack of proper locking in that code - altering a domain while simultaneously add a column with that domain as a type might result in inconsistencies between the data in that column and the domain's constraints after both transactions committed. I do, however, suspect that ALTER TABLE is plagued by similar problems. Currently, during the rewrite phase of ALTER TABLE, find_composite_type_dependencies is used to verify that the table's row type (or any type directly or indirectly depending on that type) is not used as a column's type anywhere in the database. But since this code does not take any permanent locks on the visited types, it seems that adding such a column concurrently is not prevented. If the original ALTER TABLE changed a column's type, data inserted into the newly added column before the original ALTER TABLE committed will have a type different from what the catalog says after the original ALTER TABLE commits. Or at least so I think - I haven't yet tested that theory... I am aware that since a commit fest is currently running, now might not be the best time to bring up this topic. Since I feared forgetting this all together, I decided to still post now, though. I figured people still have to option to ignore this for now if they're busy with getting those patches committed. best regards, Florian Pflug smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] ALTER TABLE, find_composite_type_dependencies and locking (Confirmed)
Florian G. Pflug wrote: I do, however, suspect that ALTER TABLE is plagued by similar problems. Currently, during the rewrite phase of ALTER TABLE, find_composite_type_dependencies is used to verify that the table's row type (or any type directly or indirectly depending on that type) is not used as a column's type anywhere in the database. But since this code does not take any permanent locks on the visited types, it seems that adding such a column concurrently is not prevented. If the original ALTER TABLE changed a column's type, data inserted into the newly added column before the original ALTER TABLE committed will have a type different from what the catalog says after the original ALTER TABLE commits. Or at least so I think - I haven't yet tested that theory... I was able to confirm that this is an actual bug in 8.5. I did, however, need to use an array-of-composite type. With only nested composite types it seems that CheckAttributeType() protects against the race, because it follows the dependency chain and opens each type's relation in AccessShareLock mode. This blocks once the traversal hits the type which is being altered, hence forcing the table creation to wait for the concurrent alter table to complete. Create two types in session 1 session 1: create table t1 (t1_i int); session 1: create type t2 as (t2_t1 t1); Warm the type cache in session 2 (A simple select array[row(row(-1))::t2] would probably suffice) session 2: create table bug (bug_t2s t2[]); session 2: insert into bug (bug_t2s) values (array[row(row(-1))::t2]); session 2: select bug.bug_t2s[1].t2_t1.t1_i from bug; [select correctly returns one row containing -1] session 2: drop table bug; Alter type of t1_i in session 1 session 1: alter table t1 alter column t1_i type varchar; [Pause session 1 using gdb *right* after the call to find_composite_type_dependencies in ATRewriteTable returned] Create the bug table in session 2, and insert record session 2: create table bug (bug_t2s t2[]); session 2: insert into bug (bug_t2s) values (array[row(row(-1))::t2]); session 2: select bug.bug_t2s[1].t2_t1.t1_i from bug; [select correctly returns one row containing -1] Complete the alter table in session 1 [Resume session 1 using gdb] session 1: select bug.bug_t2s[1].t2_t1.t1_i from bug; [select returns bogus string. On my 8.5 debug+cassert build, its a long chain of \x7F\x7F\x7F\x...] Don't have any good idea how to fix this, yet. If CheckAttributeType() really *does* offer sufficient protected in the non-array case, extending that to the general case might work. But OTOH it might equally well be that a more sophisticated race exists even in the non-array case, and I simply didn't manage to trigger it... best regards, Florian Pflug smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] File IO - why does PG do things in pages?
Dan Eloff wrote: At the lower levels in PG, reading from the disk into cache, and writing from the cache to the disk is always done in pages. Why does PG work this way? Is it any slower to write whole pages rather than just the region of the page that changed? Conversely, is it faster? From what I think I know of operating systems, reading should bring the whole page into the os buffers anyway, so reading the whole page instead of just part of it isn't much more expensive. Perhaps writing works similarly? First, data fetched from the disk is (except for data in temporary tables, I believe) not stored in private memory of the backend process doing the read(), but instead a a shared memory segment accessible by all backend processes. This allows two different backend processes to work modify the data concurrently without them stepping on each other's toes. Note that immediatly writing back any changes is *not* an option, since WAL logging mandates that all changes got to the WAL *first*. Hence, if you were to write out each changed tuple immediately, you'd have to first write the changes to the WAL *and* fsync the WAL to guarantee they hit the disk first. Sharing the data between backend processes requires a fair amount of infrastructure. You need a way to locate a given chunk of on-disk data in the shared memory buffer cache, and be able to acquire and release locks on those buffers to prevent two backends from wrecking havoc when they try to update the same piece of information. Organizing data in fixed-sized chunks (which is what pages are) helps with keeping the complexity of that infrastructure manageable, and the overhead reasonably low. There are also things like tracking the free space in a data file, which also gets easier if you only have to track it page-wise (Is there free space on this page or not), instead of having to track arbitrary ranges of free space. Finally, since data writes happen in units of blocks (and not bytes), you need to guarantee that you do your IO in some multiple of that unit anyway, otherwise you'd have a very hard time guaranteeing data consistency after a crash. Google for torn page writes, that should give you more details about this problem. Note, however, that a postgres page (usually 8K) is usually larger than the filesystem's blocksize (usually 512b). So always reading in full pages induces *some* IO overhead. Just not that much - especially since the blocks comprising a page are extremely likely to be arranges consecutively on disk, so there is no extra seeking involved. This, at least, are what I believe to be the main reasons for doing things in units of pages - hope this helps at least somewhat. best regards, Florian Pflug smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] Application name patch - v4
Tom Lane wrote: : One possibility would be to make it possible to issue SETs that behave : as if set in a startup packet - imho its an implementation detail that : SET currently is used. I think there's a good deal of merit in this, and it would't be hard at all to implement, seeing that we already have SET LOCAL and SET SESSION. We could add a third keyword, say SET DEFAULT, that would have the behavior of setting the value in a fashion that would persist across resets. I'm not sure that DEFAULT is exactly le mot juste here, but agreeing on a keyword would probably be the hardest part of making it happen. Hm, but without a way to prevent the users of a connection pool from issuing SET DEFAULT, that leaves a connection pool with no way to revert a connection to a known state. How about SET CONNECTION, with an additional GUC called connection_setup which can only be set to true, never back to false. Once connection_setup is set to true, further SET CONNECTION attempts would fail. In a way, this mimics startup-packet SETs without actually doing things in the startup packet. best regards, Florian Pflug smime.p7s Description: S/MIME Cryptographic Signature
[HACKERS] Compiling HEAD with -Werror int 64-bit mode
Hi HEAD fails to compile in 64-bit mode on Mac OS X 10.6 with gcc 4.2 and -Werror. What happens is that INT64_FORMAT gets defined as %ld (which is correct - long and unsigned long are 64 bits wide on x86_64), but the check for a working 64-bit int fails, causing INT64_IS_BUSTED to get defined and int64 becoming actually a 32-bit type. This is turn causes warnings when these pseudo int64s are passed to printf with format specified INT64_FORMAT, which get turned to errors by -Werror. configure fails to recognize long as a working 64-bit type because the does_int64_work configure test produces warning due to a missing return value declaration for main() and a missing prototype for does_int64_work(). (Aain, those warning are turned into errors by -Werror). I use the following envvar settings (when running ./configure) to force 64-bit mode and -Werror CC=gcc-4.2 CFLAGS=-arch x86_64 -Werror LDFLAGS=-arch x86_64 The following patch fixed the problem for me - though I didn't yet try it on any other platform that Mac OS X 10.6 with gcc 4.2 and in 64-bit mode. -- diff --git a/config/c-compiler.m4 b/config/c-compiler.m4 index 9ac2c30..c6bd523 100644 --- a/config/c-compiler.m4 +++ b/config/c-compiler.m4 @@ -35,7 +35,7 @@ AC_CACHE_CHECK([whether $1 is 64 bits], [Ac_cachevar], ac_int64 a = 2001; ac_int64 b = 4005; -int does_int64_work() +static int does_int64_work() { ac_int64 c,d; @@ -49,8 +49,8 @@ int does_int64_work() return 0; return 1; } -main() { - exit(! does_int64_work()); +int main() { + return(! does_int64_work()); }], [Ac_cachevar=yes], [Ac_cachevar=no], -- best regards, Florian Pflug smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] Compiling HEAD with -Werror int 64-bit mode
On 15.12.09 16:02 , Tom Lane wrote: Florian G. Pflugf...@phlo.org writes: configure fails to recognize long as a working 64-bit type because the does_int64_work configure test produces warning due to a missing return value declaration for main() and a missing prototype for does_int64_work(). (Aain, those warning are turned into errors by -Werror). autoconf's test programs tend to be sufficiently sloppy that I would expect -Werror to break a whole lot of things, not just this. We can possibly neaten up the particular test case but there are many tests whose expansion we don't have much control over. Yeah, I expected all hell to break loose - only to be pleasantly surprised by this being the only issue I encountered. So I figured fixing this might be worthwhile - even if this surely does not fix -Werror builds on all platforms and/or compilers. Alternatively - is there a way to use -Werror only for building the actual sources, not the configure tests? I didn't find one, but my autoconf-fu is pretty limited... best regards, Florian Pflug smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] Range types
On 15.12.09 15:52 , Tom Lane wrote: to...@tuxteam.de writes: (and as Andrew Dunstan pointed out off-list: I was wrong with my bold assertion that one can squeeze infinitely many (arbitrary length) strings between two given. This is not always the case). Really? If the string length is unbounded I think you were right. One example is a and aa (assuming a is minimal character in your alphabet). The general case is the strings A and Aaa...a I think - it doesn't get any more exciting than this. This *is* a bit surprising, since one usually assumes that the ordering of strings and reals is fairly similar, since both are lexical. But note that the mapping of strings into the reals this intuition is based on (simply prefix a the string with 0. and interpret as a real, or something similar if the alphabet isn't {0,1}) isn't one-to-one - the strings 1, 10, 100, ... are all mapped to the *same* real number 0.1 So for reals, the statement is reduced to the trivial fact that for every x there is no y with x y x. Which is of course true.. best regards, Florian Pflug smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] Compiling HEAD with -Werror int 64-bit mode
On 15.12.09 23:38 , Tom Lane wrote: Peter Eisentrautpete...@gmx.net writes: So to summarize, this is just a bad idea. Creating a less obscure way to use -Werror might be worthwhile, though. I suppose we could add --with-Werror but it seems pretty specialized to me. A more appropriate solution would allow the user to provide flags that get added to CFLAGS only after we do all the configure tests (implying that it's on the user's head that these flags are right and don't break anything, but then again that's pretty much true of up-front CFLAGS too). And that basically describes COPTS ... the only thing lacking is documentation. For what it's worth, I agree. Though we might want to arrange for configure to store the value of COPT somewhere so that COPT=-Werror ./configure make works which it currently doesn't seem to. best regards, Florian Pflug smime.p7s Description: S/MIME Cryptographic Signature
[HACKERS] [WIP]Â Inspection of row types in pl/pgsq l and pl/sql
Hi I've completed a (first) working version of a extension that allows easier introspection of composite types from SQL and pl/PGSQL. The original proposal and ensuing discussion can be found here: http://archives.postgresql.org/pgsql-hackers/2009-11/msg00695.php The extension can be found on: http://github.com/fgp/pg_record_inspect This is what the extension currently provides (all in schema record_inspect). * fieldinfo [composite type] Used to by fieldinfos() to describe a record's fields. Contains the fields fieldname (name), fieldtype (regclass), fieldtypemod (varchar) * fieldinfo[] fieldinfos(record) Returns an array of fieldinfos describing the record''s fields * anyelement fieldvalue(record, field name, defval anyelement, coerce boolean) Returns the value of the field field, or defval should the value be null. If coerce is true, the value is coerced to defval's type if possible, otherwise an error is raised if the field''s type and defval's type differ. * anyelement fieldvalues(record, defval anyelement, coerce boolean) Returns an array containing values of the record'' fields. NULL values are replaced by defval. If coerce is false, only the fields with the same type as defval are considered. Otherwise, the field'' values are coerced if possible, or an error is raised if not. The most hacky part of the code is probably coerceDatum() - needed to coerce a field's value to the requested output type. I wanted to avoid creating and parsing an actual SQL statement for every cast, and instead chose to use coerce_to_target_type() to create the expression trees representing casts. I use the noe type CoerceToDomainValue to inject the source value into the cast plan upon execution - see makeCastPlan() and execCastPlan() for details. If anyone has a better idea, please speak up I personally would like to see this becoming a contrib module one day, but that of course depends on how much interest there is in such a feature. best regards, Florian Pflug smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] Serializable implementation
On 28.12.09 18:54 , Kevin Grittner wrote: To give some idea of the scope of development, Michael Cahill added SSI to InnoDB by modifying 250 lines of code and adding 450 lines of code; however, InnoDB already had the S2PL option and the prototype implementation isn't as sophisticated as I feel is necessary for real production use (particularly regarding the granularity of SIREAD locks). I'm assuming it would take more to reach real production quality in PostgreSQL. My SWAG would be to multiply by two or three. I believe the hard part of implementing true serializability is not the actual SSI or S2PL algorithm, but rather the necessary predicate locking strategy. So I think checking how InnoDB tackles that and how much of it's code is invovled might give a more realistic estimate of the effort required. best regards, Florian Plug smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] GSoC - proposal - Materialized Views in PostgreSQL
On 11.04.10 20:47 , Robert Haas wrote: On Sun, Apr 11, 2010 at 10:26 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Robert Haas wrote: 2010/4/10 Andrew Dunstanand...@dunslane.net: Heikki Linnakangas wrote: 1. Keep the materialized view up-to-date when the base tables change. This can be further divided into many steps, you can begin by supporting automatic updates only on very simple views with e.g a single table and a where clause. Then extend that to support joins, aggregates, subqueries etc. Keeping it really limited, you could even require the user to write the required triggers himself. That last bit doesn't strike me as much of an advance. Isn't the whole point of this to automate it? Creating greedy materialized views is usually not terribly difficult now, but you do have to write the triggers. Yeah, I agree. It doesn't accomplish anything interesting on its own. But if you do the planner changes to automatically use the materialized view to satisfy queries (item 2. in my previous email), it's useful. But you can't do that with a snapshot view, only a continuous updated one. If continuous updates prove to be too hard initially, you could instead update the view on select if it's outdated. Such a materialized view would be a kind of inter-session cache for subselects. The hard part would probably be to figure out how to decide whether the view is outdated or not, and to deal with two concurrent transactions trying to use an outdates view (and both trying to refresh it). What makes the second problem hard is that you wouldn't want one of the transactions to wait for the other to complete, because this is not how SELECTs traditionally behave. best regards, Florian Pflug smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] Reduction in WAL for UPDATEs
Kenneth Marshall wrote: We use DSPAM as one of our anti-spam options. Its UPDATE pattern is to increment a spam counter or a not-spam counter while keeping the user and token information the same. This would benefit from this optimization. Currently we are forced to use MySQL with MyISM tables to support the update load, although PostgreSQL 8.2 performance is right at the I/O break-even point for switching databases. With HOT and more optimized UPDATE I/O, 8.3 would give us enough I/O headroom to switch to PostgreSQL. Interesting. I've switched from MySQL to PostgreSQL for dspam, because of concurrency issues with MyISAM which caused bad performance. I am eager to see how much HOT speeds of my setup, though ;-) BTW, the COMMIT NOWAIT feature Simon Riggs proposed should provide a huge speedup too, since dspam runs one transaction for each token it has to update. greetings, Florian Pflug ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] CREATE INDEX and HOT - revised design
Simon Riggs wrote: On Wed, 2007-03-28 at 22:24 +0530, Pavan Deolasee wrote: Just when I thought we have nailed down CREATE INDEX, I realized that there something more to worry. The problem is with the HOT-chains created by our own transaction which is creating the index. We thought it will be enough to index the tuple at the head-of-the-chain since that would be the visible copy once the transaction commits. We thought of keeping the index unavailable for queries in pre-existing transactions by setting a new xid attribute in pg_index. The question is what value to assign to xid. I though we would assign ReadNewTransactionId(). If you are indexing a table that hasn't just been created by you, set the xcreate field on pg_index at the *end* of the build using ReadNewTransactionId(). Any xid less than that sees the index as invalid. If you created the table in this transaction (i.e. createSubId != 0) then set xcreate to creating xid. Couldn't you store the creating transaction's xid in pg_index, and let other transaction check that against their snapshot like they would for any tuple's xmin or xmax? (With one exception - the creating transaction would consider indices it built itself invalid, which is not how things usually work for xmin/xmax). This would mean that any transaction that believes that the creating transaction has committed also consideres the index to be valid. greetings, Florian Pflug ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] CREATE INDEX and HOT - revised design
Tom Lane wrote: Florian G. Pflug [EMAIL PROTECTED] writes: Couldn't you store the creating transaction's xid in pg_index, and let other transaction check that against their snapshot like they would for any tuple's xmin or xmax? What snapshot? I keep having to remind people that system catalog operations are SnapshotNow. In the particular context here, the place where doing something else would break down is that the planner has no idea when it makes a plan what snapshot(s) the plan might later be used with. Sorry - now that you say it, I remember that you've said that already multiple times... So the question is, why did this work until now, and CREATE INDEX+HOT just doesn't seem to fit into this scheme? I think the answer is that all other DDL statements manage to assure that any database objects they create or modify are usable for everybody else immediatly after they are committed. This usually implies pretty strong locking requirements - for example, I think that the core reason why TRUNCATE needs an exclusive lock is precisely that guarantee it has to make. Maybe this could somehow be relaxed? Could, for example, the planner be allowed to base some of it's decisions on the SerializableSnapshot the every transaction (even read-only ones) posseses? It seems that this would prevent plans from living longer than a transaction, but maybe plan invalidation could help here? greetings, Florian Pflug ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] CREATE INDEX and HOT - revised design
Pavan Deolasee wrote: On 3/28/07, Tom Lane [EMAIL PROTECTED] wrote: Florian G. Pflug [EMAIL PROTECTED] writes: Couldn't you store the creating transaction's xid in pg_index, and let other transaction check that against their snapshot like they would for any tuple's xmin or xmax? What snapshot? I keep having to remind people that system catalog operations are SnapshotNow. In the particular context here, the place where doing something else would break down is that the planner has no idea when it makes a plan what snapshot(s) the plan might later be used with. Tom, please correct me if I am wrong. But ISTM that this idea might work in this context. In get_relation_info(), we would check if xcreate xid stored in pg_index for the index under consideration is seen committed with respect to the snapshot at that point of time. Even if the snapshot changes later and index becomes valid, we might not replan and hence not use index. But that doesn't seem like a big problem to me. That problem are usecases like PREPARE my_plan ; BEGIN; EXECUTE my_plan ; COMMIT ; Is that PREPARE even run inside a transaction? Even if it is, it probably won't have created a snapshot... I think allowing the use of some sort of snapshot from inside the planner would allow some locking to be relaxed, but there seems be a lot of corner cases to consider :-( OTOH, if you manage to make this work, a TRUNCATE that doesn't block concurrent selects might become possible to do. This would for example allow dropping and rebuilding subscriptions on a slony node while it is in use. greetings, Florian Pflug ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] CREATE INDEX and HOT - revised design
Pavan Deolasee wrote: In this specific context, this particular case is easy to handle because we are only concerned about the serializable transactions started before CREATE INDEX commits. If PREPARE can see the new index, it implies that the CI transaction is committed. So the transaction starting after than can only see the tuple version that we have indexed. Yes, but the non-index plan PREPARE generated will be used until the end of the session, nut only until the end of the transaction. Imagine that it wasn't explicitly PREPARED (where you might say this is acceptable), but rather just a query inside a plpgsql function, maybe even called from some app using connection pooling. This means that the non-index using plan might get used for a quite long time, which contradics the work Tom did on plan invalidation I think. Maybe Tom can comment on wheter it's possible to use plan invalidation to eventually get rid of a stale plan in this context? greetings, Florian Pflug ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] CREATE INDEX and HOT - revised design
Pavan Deolasee wrote: On 3/29/07, Florian G. Pflug [EMAIL PROTECTED] wrote: Yes, but the non-index plan PREPARE generated will be used until the end of the session, nut only until the end of the transaction. Frankly I don't know this works, but are you sure that the plan will be used until the end of the session ? Even if thats the case, it can happen even today if we create a new index, but the existing sessions will use the stale plan (assuming what you said is true) I've checked that: test=# prepare myplan as select * from test where id=1 ; PREPARE test=# explain execute myplan ; QUERY PLAN Seq Scan on test (cost=0.00..22897.70 rows=5421 width=36) Filter: (id = 1) (2 rows) Now I create an index in another session test=# explain select * from test where id=1 ; QUERY PLAN -- Bitmap Heap Scan on test (cost=95.11..8248.45 rows=5000 width=36) Recheck Cond: (id = 1) - Bitmap Index Scan on idx (cost=0.00..93.86 rows=5000 width=0) Index Cond: (id = 1) (4 rows) test=# explain execute myplan ; QUERY PLAN Seq Scan on test (cost=0.00..22897.70 rows=5421 width=36) Filter: (id = 1) (2 rows) !!! Index got used by the select .. but not by execute myplan ... !!! test=# prepare myplan2 as select * from test where id=1 ; PREPARE test=# explain execute myplan2 ; QUERY PLAN - Index Scan using idx on test (cost=0.00..8.38 rows=1 width=37) Index Cond: (id = 1) (2 rows) !!! A newly prepared plan of course uses the index !!! So yes, plans get cached until the end of the session, and yes, 8.2 won't notice index creation either ;-) The open question is how CVS HEAD with plan invalidation behaves. If it replans after the index-creating transaction commits, then basing index validity on a snapshot will break this, because upon replay they index might not be useable, but later on it may very well be (but that plan invalidation machinery won't realize that) So this might not introduce a regression compared to 8.2, but to a future 8.3 with plan invalidation... Sorry for being so unclear in my previous emails - I had confused myself ;-) greetings, Florian Pflug ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] CREATE INDEX and HOT - revised design
Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: ISTM that the run-another-transaction-afterwards idea is the only one that does everything I think we need. I really do wish we could put in a wait, like CIC, but I just think it will break existing programs. Actually, there's a showstopper objection to that: plain CREATE INDEX has to be able to run within a larger transaction. (To do otherwise breaks pg_dump --single-transaction, just for starters.) This means it can *not* commit partway through. I believe the original idea was to invent some kind of on commit run this transaction hook - similar to how files are deleted on commit, I think. At least I understood the Run another transaction on commit that way... greetings, Florian Pflug ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] CREATE INDEX and HOT - revised design
Simon Riggs wrote: On Thu, 2007-03-29 at 17:27 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: ISTM that the run-another-transaction-afterwards idea is the only one that does everything I think we need. I really do wish we could put in a wait, like CIC, but I just think it will break existing programs. Actually, there's a showstopper objection to that: plain CREATE INDEX has to be able to run within a larger transaction. (To do otherwise breaks pg_dump --single-transaction, just for starters.) This means it can *not* commit partway through. The idea is to make note that the transaction has created an index within a transaction block, so that after the top level transaction commits we sneak in an extra hidden transaction to update the pg_index tuple with the xcreate of the first transaction. The only other alternative is to forcibly throw a relcache inval event in the same circumstances without running the additional transaction, but the solution is mostly the same. I think one alternative might be to store a list of xid's together with a cached plan, and replan if the commit status (as percieved by the transaction the plan will be executed in) of one of those xid's changes. greetings, Florian Pflug ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] CREATE INDEX and HOT - revised design
Tom Lane wrote: Pavan Deolasee [EMAIL PROTECTED] writes: How about storing the snapshot which we used during planning in CachedPlanSource, if at least one index was seen unusable because its CREATE INDEX transaction was seen as in-progress ? I'm getting tired of repeating this, but: the planner doesn't use a snapshot. System catalogs run on SnapshotNow. But it would still do that - it would just compare the createxid of the index against some snapshot, and the query would be replanned if the cached result of this comparison differs from the one the current snapshot yields. It might well be that this won't work, because the planner is invoked in situations where there is no active snapshot - I'm not sure if your comment refers to that case, or not. greetings, Florian Pflug ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] CREATE INDEX and HOT - revised design
Tom Lane wrote: Pavan Deolasee [EMAIL PROTECTED] writes: What I am suggesting is to use ActiveSnapshot (actually Florian's idea) to decide whether the transaction that created index was still running when we started. Isn't it the case that some snapshot will be active when we plan ? I do not think you can assume that the plan won't be used later with some older snapshot. Consider recursive plpgsql functions for a counterexample: the inner occurrence might be the first to arrive at a given line of the function, hence the first to plan it, yet when we return to the outer instance we might revert to an older snapshot. So maybe we'd need to use the SerializableSnapshot created at the start of each transaction for this check, and not the ActiveSnapshot? Could that work? What about doing PREPARE myplan select ... ; outside of a transaction? Will this be execute inside a transaction? Is is a query always planned upon it's first execution, and not when PREPARE is issued? greetings, Florian Pflug ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] CREATE INDEX and HOT - revised design
Tom Lane wrote: Florian G. Pflug [EMAIL PROTECTED] writes: Tom Lane wrote: I do not think you can assume that the plan won't be used later with some older snapshot. So maybe we'd need to use the SerializableSnapshot created at the start of each transaction for this check, and not the ActiveSnapshot? Could that work? That might work, but it doesn't seem to address the core objection: there's no mechanism to cause the query to be replanned once the snapshot is new enough, because no relcache inval will happen. So most likely existing backends will keep using old plans that don't consider the index. Pavan suggested storing the IndexSnapshot in the cached plan, and to compare it to the IndexSnapshot when the query is executed. If those two snapshots differ, the query would be replanned. My idea was to store a list of xid's together with the cached plan that are assumed to be uncommitted accoring to the IndexSnapshot. The query is replanned if upon execution the IndexSnapshot assumes that one of these xid's is committed. Those two ideas seem to be mostly equivalent, mine seems to be a bit more fine-grained, but at the cost of more work upon each query execution. greetings, Florian Pflug ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] CREATE INDEX and HOT - revised design
Pavan Deolasee wrote: On 3/30/07, Florian G. Pflug [EMAIL PROTECTED] wrote: My idea was to store a list of xid's together with the cached plan that are assumed to be uncommitted accoring to the IndexSnapshot. The query is replanned if upon execution the IndexSnapshot assumes that one of these xid's is committed. Actually, if we are using Serializable Snapshot then there is no chance to replan the query before the transaction completes and the next transaction to start in the session must see the index and hence we must replan. So it would be enough just to associate a transaction id with the cached plan. If this xid is set and our transaction id is different than that, we replan. I believe this is true for the CREATE INDEX scenario. However, comparing either the snapshot or the result of xid checks seems like it might be useful for other things beside CREATE INDEX. I'm specifically thinking about TRUNCATE here - the create index + HOT problems sound quite similar to the problems a non-exclusive-locking TRUNCATE would face. greetings, Florian Pflug ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Minor changes to Recovery related code
Simon Riggs wrote: On Fri, 2007-03-30 at 16:34 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: 2. pg_stop_backup() should wait until all archive files are safely archived before returning Not sure I agree with that one. If it fails, you can't tell whether the action is done and it failed while waiting for the archiver, or if you need to redo it. There's a slight delay between pg_stop_backup() completing and the archiver doing its stuff. Currently if somebody does a -m fast straight after the pg_stop_backup() the backup may be unusable. We need a way to plug that small hole. I suggest that pg_stop_backup() polls once per second until pg_xlog/archive_status/LOG.ready disappears, in which case it ends successfully. If it does this for more than 60 seconds it ends successfully but produces a WARNING. I fear that ending sucessfully despite having not archived all wals will make this feature less worthwile. If a dba knows what he is doing, he can code a perfectly safe backup script using 8.2 too. He'll just have to check the current wal position after pg_stop_backup(), (There is a function for that, right?), and wait until the corresponding wal was archived. In realitly, however, I feare that most people will just create a script that does 'echo select pg_stop_backup | psql' or something similar. If they're a bit more carefull, they will enable ON_ERROR_STOP, and check the return value of pgsql. I believe that those are the people who would really benefit from a pg_stop_backup() that waits for archiving to complete. But they probably won't check for WARNINGs. Maybe doing it the other way round would be an option? pg_stop_backup() could wait for the archiver to complete forever, but spit out a warning every 60 seconds or so WARNING: Still waiting for wal archiving of wal ??? to complete. If someone really wants a 60-second timeout, he can just use statement_timeout. Anyway, just my 0.02 eurocents, maybe I'm totally mistaken about the postgresql dba's out there... greetings, Florian Pflug ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Fate of pgsnmpd
Hi Does anyone know if pgsnmpd is still actively developed? The last version (0.1b1) is about 15 months old. greetings, Florian Pflug ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] select ('{}'::text[])[1] returns NULL -- is it correct?
Nikolay Samokhvalov wrote: On 4/10/07, Tom Lane [EMAIL PROTECTED] wrote: Nikolay Samokhvalov [EMAIL PROTECTED] writes: I remember several cases when people (e.g. me :-) ) were spending some time trying to find an error in some pl/pgsql function and the reason lied in incorrect work with arrays (i.e. messages like index is out of bounds and index cannot be negative number would help, surely). Well, if indexes *couldn't* be negative numbers then that might be helpful, but they can. Ooops :-) OK, my proposal is narrowing to very simple one: what about triggering WARNINGs when user tries to access nonexistent element of array? Please don't ;-) There are two sane options - return an error, or return NULL. Both are sensible, and different programming languages make different choices. The only reason for a WARNING would be a long-term plan to change the existing behaviour. But this will cause lots of pain, for no real gain, because no matter which behaviour you pick, there are always situations where the other would be more convenient. Just look at the mess PHP has created by altering fundamental aspects of the language (4.4 - 5.0). greetings, Florian Pflug ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [DOCS] uuid type not documented
Neil Conway wrote: On Tue, 2007-04-10 at 18:28 +0200, Peter Eisentraut wrote: The problem is that most of the standard methods are platform dependent, as they require MAC addresses or a good random source, for instance. http://archives.postgresql.org/pgsql-patches/2007-01/msg00392.php ISTM random() or similar sources is a sufficient PSRNG for the purposes of UUID generation -- I can't see anything in the RFC that would contradict that. Maybe a short-term solution could be a UUID-generated function that takes some kind of seed as a parameter. People not concerned about collisons could just pass some random value, while others could use the mac-address of the client or something similar. greetings, Florian Pflug ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] What tools do people use to hack on PostgreSQL?
Hi I'm very excited that my project for implementing read-only queries on PITR slaves was accepted for GSoC, and I'm now trying to work out what tools I'll use for that job. I'd like to be able to create some sort of branches and tags for my own work (only inside my local repository of course). I've considered using git, but I couldn't make the cvs-git gateway work - neither using the postgresql CVS repository directly, nor with a private copy obtained with CVSup. There is also svk, but I think I'd need a svn repo that mirrors the postgresql CVS for that to work. I think Joshua Drake created one once, but I don't now if it is kept up-to-date. What do you guys use for your development work? greetings, Florian Pflug ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] What tools do people use to hack on PostgreSQL?
Joshua D. Drake wrote: Alexey Klyukin wrote: Alvaro Herrera wrote: But if you have a checked out tree, does it work to do an update after the tree has been regenerated? As far as I know, the repo is generated completely every few hours, so it wouldn't surprise me that the checked out copy is not compatible with the new repo. I admit I haven't tried. I have tried and svn up worked without issues. As a note we will be updating this to subversion 1.4 shortly so people can do svnsync too. Do I read this correctly as This repository will stay around for a while, and isn't just an experiment that might be stopped tomorrow?. If so, I'll try using it - and lots of thanks for providing that greetings, Florian Pflug ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Build-Problem with pgc.c on OSX 10.4
Hi When I try to build CVS HEAD on OSX 10.4, compiling src/interfaces/ecpg/preproc/preproc.c fails with: In file included from preproc.y:6951: pgc.l:3:20: error: config.h: No such file or directory In file included from pgc.l:28, from preproc.y:6951: preproc.h:996: error: conflicting types for 'base_yylloc' y.tab.c:18673: error: previous declaration of 'base_yylloc' was here In file included from preproc.y:6951: pgc.l:43: error: 'MAX_PARSE_BUFFER' undeclared here (not in a function) If I delete pgc.c, it is rebuilt automatically, and then preproc.c compiles just fine. I'm using gcc 4.0.1, flex 2.5.4 and bison 2.3 greetings, Florian Pflug ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Build-Problem with pgc.c on OSX 10.4
Tom Lane wrote: Florian G. Pflug [EMAIL PROTECTED] writes: When I try to build CVS HEAD on OSX 10.4, compiling src/interfaces/ecpg/preproc/preproc.c fails with: ... If I delete pgc.c, it is rebuilt automatically, and then preproc.c compiles just fine. ... I'm using gcc 4.0.1, flex 2.5.4 and bison 2.3 Perhaps you changed bison versions and didn't force a rebuild? Those line numbers don't seem to sync up with my copies of the derived files. I just realized that this file isn't even in the postgresql CVS repo. But it _is_ part of the SVN mirror at https://projects.commandprompt.com/public/pgsql/repo. The version that shows up in the trunk of the SVN repo is the revision 1.5 from CVS (http://developer.postgresql.org/cvsweb.cgi/pgsql/src/interfaces/ecpg/preproc/Attic/pgc.c?rev=1.5;content-type=text%2Fplain;hideattic=0) This is the same as https://projects.commandprompt.com/public/pgsql/repo/trunk/pgsql/src/interfaces/ecpg/preproc/pgc.c modulo the expansion of the $Header macro. Seems to be a bug in the CVS-SVN conversion process... greetings, Florian Pflug ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Build-Problem with pgc.c on OSX 10.4
Alvaro Herrera wrote: Ah, it seems the SVN repo just got its first user ;-) Congratulations. Ask Joshua to send you a Command Prompt tee shirt, maybe he is excited enough. I hope the fact that I use the SVN repo just to get the changes into git doesn't reduce my chances of getting that t-shirt ;-) greetings, Florian Pflug ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Build-Problem with pgc.c on OSX 10.4
Alvaro Herrera wrote: Florian G. Pflug wrote: Alvaro Herrera wrote: Ah, it seems the SVN repo just got its first user ;-) Congratulations. Ask Joshua to send you a Command Prompt tee shirt, maybe he is excited enough. I hope the fact that I use the SVN repo just to get the changes into git doesn't reduce my chances of getting that t-shirt ;-) Hum, why don't you just use the CVS directly then? That'd avoid this sort of infelicities. git-cvsimport didn't work for me - neither with the main CVS repo, nor with a rsync'ed copy. It complained about all sorts of problems - I don't have enough CVS knowhow to judge if those were actual problems with the repo, or just deficiencies of git-cvsimport. Plus I didn't find a way to import the current version of HEAD as one revision, any only go incrementally from there. It always wanted to mirror the whole history stores in the CVS in my git repo, which is overkill. For SVN, there is git-svn, which does just what I want - I started with some revision a few days ago, and it just incrementally imports updates from there into a special branch of my git repo, and doesn't care about what happened before that revision. It's all not perfect, but I think for me it works better than just doing my changes in a CVS checkout. greetings, Florian Pflug ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] What tools do people use to hack on PostgreSQL?
Joshua D. Drake wrote: http://projects.commandprompt.com/public/pgsql/browser or do the anonymous checkout with: svn co http://projects.commandprompt.com/public/pgsql/repo/ But if you have a checked out tree, does it work to do an update after the tree has been regenerated? As far as I know, the repo is generated completely every few hours, so it wouldn't surprise me that the checked out copy is not compatible with the new repo. I admit I haven't tried. Not a clue. Anyone try it yet? git-svn seems to work fine against the SVN repo, apart from the problem with the files deleted in CVS which still show up in SVN. It's only running for about two days though... greetings, Florian Pflug ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Hacking on PostgreSQL via GIT
Martin Langhoff wrote: Hi Florian, I am right now running an rsync of the Pg CVS repo to my work machine to get a git import underway. I'm rather keen on seeing your cool PITR Pg project go well and I have some git+cvs fu I can apply here (being one of the git-cvsimport maintainers) ;-) Cool - I'm new to git, so I really appreciate any help that I can get. For the kind of work you'll be doing (writing patches that you'll want to be rebasing onto the latest HEAD for merging later) git is probably the best tool. That's what I use it for... tracking my experimental / custom branches of projects that use CVS or SVN :-) Thats how I figured I'd work - though I don't yet understand what the advantage of rebase is over merge. Currently, I've setup a git repo that pulls in the changes from the SVN repo, and pushed them to my main soc git repo. On that main repo I have two branches, master and pgsql-head, and I call cg-merge pgsql-head if I want to merge with CVS HEAD. Initially, I'll post it on http://git.catalyst.net.nz/ and I can run a daily import for you - once that's in place you can probably get a repo with your work on http://repo.or.cz/ Having a git mirror of the pgsql CVS would be great. BTW, I've just check out repo.or.cz, and noticed that there is already a git mirror of the pgsql CVS: http://repo.or.cz/w/PostgreSQL.git greetings + thanks Florian Pflug ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Hacking on PostgreSQL via GIT
Aidan Van Dyk wrote: Martin Langhoff wrote: Well, now that more than one of us are working with git on PostgreSQL... I've had a repo conversion running for a while... I've only got it to what I consider stable last week: http://repo.or.cz/w/PostgreSQL.git git://repo.or.cz/PostgreSQL.git Ah - thats what I just stumbled over ;-) For those interested int he conversion process, I've used a slightly modified version of fromcvs (A ruby cvs to git/Hg tool), and it runs on all of pgsql in about 20 minutes. I gave up on git-svn (because of both speed and my in-ablility to easy filter out Keywords, etc) and git-cvsimport (because cvsps doesn't seem to like pgsql's repo) Yeah, git-cvsimport didn't work for me either... I update the git repo daily, based on an anonymous rsync of the cvsroot. If the anon-rsync is updated much more frequently, and people think my git conversion should match it, I have no problem having cron run it more than daily. Also - note that I give *no* guarentees of it's integrity, etc. I've diffed a CVS checkout and a git checkout, and the are *almost* identical. Almost, because it seems like my git repository currently has 3 files that a cvs checkout doesn't: backend/parser/gram.c |12088 +++ interfaces/ecpg/preproc/pgc.c | 2887 ++ interfaces/ecpg/preproc/preproc.c |16988 ++ And at this point, I haven't been bothered to see where those files came from (and where they dissapear) in CVS and why my import isn't picking that up... I could probably be pushed if others find this repo really useful, but those files problematic... Thats interesting - the SVN mirror of the pgsql CVS at http://projects.commandprompt.com/public/pgsql/browser has exactly the same problem with those 3 files, as I found out the hard way ;-) In the case of pgc.c, I've compared that revisions in CVS with the one in SVN. SVN include the cvs-version 1.5 if this file in trunk, which seems to be the last version of that file in CVS HEAD. Interestingly, http://developer.postgresql.org/cvsweb.cgi/pgsql/src/interfaces/ecpg/preproc/Attic/pgc.c shows no trace of the file being deleted from HEAD either - it just shows that it was removed from WIN32_DEV. But still a CVS checkout doesn't include that file... Since 3 tools (cvsweb, git-cvsimport and whatever commandprompt uses to create the SVN mirror) all come to the same conclusion regarding this file, I think that this is caused by some corruption of the CVS repository - but I don't have the cvs-fu to debug this... greetings, Florian Pflug ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Re: IDENTITY/GENERATED v36 Re: [PATCHES] Final version of IDENTITY/GENERATED patch
Zoltan Boszormenyi wrote: Tom Lane Ărta: Zoltan Boszormenyi [EMAIL PROTECTED] writes: Also, the current grammar is made to give a syntax error if you say colname type GENERATED BY DEFAULT AS ( expr ). But it makes the grammar unbalanced, and gives me: bison -y -d gram.y conflicts: 2 shift/reduce I'ts been quite a time since I last used bison, but as far as I remember, you can tell it to write a rather details log about it's analysis of the grammar. That log should include more detailed information about those conflicts - maybe that helps to figure out their exact cause, and to find a workaround. greetings, Florian Pflug ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Hacking on PostgreSQL via GIT
Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: These files are generated (from gram.y, pgc.l and preproc.y respectievly) and are not present in the CVS repo, though I think they have been at some point. It's strange that other generated files (that have also been in the repo in the past) like preproc.h are not showing up. The weird thing about these files is that the CVS history shows commits on HEAD later than the file removal commit. I don't recall if Vadim unintentionally re-added the files before making those commits ... but if he did, you'd think it'd have taken another explicit removal to get rid of them in HEAD. More likely, there was some problem in his local tree that allowed a cvs commit to think it should update the repository with copies of the derived files he happened to have. I think this is a corner case that CVS handles in a particular way and the tools people are using to read the repository handle in a different way. Which would be a bug in those tools, since CVS's interpretation must be right by definition. The question is if it'd be acceptable to manually remove that last commit from the repository. I guess simply readding, and then removing the files again should do the trick, though I'd be cleaner to fix remove the offending commit in the first place. Should postgres ever decide to switch to another version control system (which I don't advocate), that'd be one obstacle less to deal with... Or is the risk of causing breakage too high? greetings, Florian Pflug ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] RESET command seems pretty disjointed now
Tom Lane wrote: Mark Kirkwood [EMAIL PROTECTED] writes: Tom Lane wrote: The current documentation for RESET exhibits a certain lack of, um, intellectual cohesiveness: Synopsis RESET configuration_parameter RESET ALL RESET { PLANS | SESSION | TEMP | TEMPORARY } Maybe DISCARD for the plans etc might be more intuitive than extending RESET? DISCARD PLANS and DISCARD TEMP seem pretty reasonable, but DISCARD SESSION sounds a bit odd --- it seems like it might mean disconnect, which of course is exactly what we're trying to avoid. But possibly we could rename RESET SESSION as DISCARD ALL. Leastwise I haven't got any better ideas. Anyone have another proposal? What about RESET parameter RESET { PLANS | TEMP | TEMPORARY } RESET ALL { PARAMETERS | STATE } RESET ALL would become an abbreviation of RESET ALL PARAMETERS (for backwards compatibility), while RESET SESSION would be renamed to RESET ALL STATE. greetings, Florian Pflug ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Hacking on PostgreSQL via GIT
Martin Langhoff wrote: Aidan Van Dyk wrote: And remember the warning I gave that my conversion is *not* a direct CVS import - I intentionally *unexpand* all Keywords before stuffing them into GIT so that merging and branching can ignore all the Keyword conflicts... My import is unexpanding those as well to support rebasing and merging better. So - if you are committed to providing your gateway long term to Florian, I'm happy to drop my gateway in favour of yours. There seem to be other people than me who are interested in a git mirror. Maybe we could declare one of those mirrors the official one - I guess things would be easier if all people interested in using git would use the same mirror... What do you guys think? (Florian, before basing your code on either you should get a checkout of Aidan's and mine and check that the tips of the branches you are working on match the cvs branches -- the cvsimport code is good but whereever CVS is involved, there's a lot of interpretation at play, a sanity check is always good). I actually hoped that I could just take my current git repo, and rebase my branch onto one of those two repos - or does rebase only work from an ancestor to a descendant? greetings, Florian Pflug ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] [PATCH] A crash and subsequent recovery of the master can cause the slave to get out-of-sync
Hi I believe I have discovered the following problem in pgsql 8.2 and HEAD, concerning warm-standbys using WAL log shipping. The problem is that after a crash, the master might complete incomplete actions via rm_cleanup() - but since it won't wal-log those changes, the slave won't know about this. This will at least prevent the creation of any further restart points on the slave (because safe_restartpoint) will never return true again - it it might even cause data corruption, if subsequent wal records are interpreted wrongly by the slave because it sees other data than the master did when it generated them. Attached is a patch that lets RecoveryRestartPoint call all rm_cleanup() methods and create a restart point whenever it encounters a shutdown checkpoint in the wal (because those are generated after recovery). This ought not cause a performance degradation, because shutdown checkpoints will occur very infrequently. The patch is per discussion with Simon Riggs. I've not yet had a chance to test this patch, I only made sure that it compiles. I'm sending this out now because I hope this might make it into 8.2.4. greetings, Florian Pflug diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 6c67821..93c86a1 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -5060,10 +5060,13 @@ #endif * Perform a checkpoint to update all our recovery activity to disk. * * Note that we write a shutdown checkpoint rather than an on-line - * one. This is not particularly critical, but since we may be - * assigning a new TLI, using a shutdown checkpoint allows us to have - * the rule that TLI only changes in shutdown checkpoints, which - * allows some extra error checking in xlog_redo. + * one. A slave will always create a restart point if it sees a + * shutdown checkpoint, and will call all rm_cleanup() methods before + * it does so. This guarantees that any actions taken by the master + * in rm_cleanup will also be carried out on the slave. + * Additionally, we may be assigning a new TLI, so using a shutdow + * checkpoint allows us to have the rule that TLI only changes in shutdown + * checkpoints, which allows some extra error checking in xlog_redo. */ CreateCheckPoint(true, true); @@ -5672,35 +5675,56 @@ CheckPointGuts(XLogRecPtr checkPointRedo * restartpoint is needed or not. */ static void -RecoveryRestartPoint(const CheckPoint *checkPoint) +RecoveryRestartPoint(const CheckPoint *checkPoint, const bool shutdownCheckpoint) { int elapsed_secs; int rmid; /* - * Do nothing if the elapsed time since the last restartpoint is less than - * half of checkpoint_timeout. (We use a value less than - * checkpoint_timeout so that variations in the timing of checkpoints on - * the master, or speed of transmission of WAL segments to a slave, won't - * make the slave skip a restartpoint once it's synced with the master.) - * Checking true elapsed time keeps us from doing restartpoints too often - * while rapidly scanning large amounts of WAL. + * If the checkpoint we saw in the wal was a shutdown checkpoint, it might + * have been written after the recovery following a crash of the master. + * In that case, the master will have completed any actions that were + * incomplete when it crashed *during recovery*, and these completions + * are therefor *not* logged in the wal. + * To prevent getting out of sync, we follow what the master did, and + * call the rm_cleanup() methods. To be on the safe side, we then create + * a RestartPoint, regardless of the time elapsed. Note that asking + * the resource managers if they have partial state would be redundant + * after calling rm_cleanup(). */ - elapsed_secs = time(NULL) - ControlFile-time; - if (elapsed_secs CheckPointTimeout / 2) - return; + if (shutdownCheckpoint) { + for (rmid = 0; rmid = RM_MAX_ID; rmid++) + { + if (RmgrTable[rmid].rm_cleanup != NULL) +RmgrTable[rmid].rm_cleanup(); + } + } + else { + /* + * Do nothing if the elapsed time since the last restartpoint is less than + * half of checkpoint_timeout. (We use a value less than + * checkpoint_timeout so that variations in the timing of checkpoints on + * the master, or speed of transmission of WAL segments to a slave, won't + * make the slave skip a restartpoint once it's synced with the master.) + * Checking true elapsed time keeps us from doing restartpoints too often + * while rapidly scanning large amounts of WAL. + */ + elapsed_secs = time(NULL) - ControlFile-time; + if (elapsed_secs CheckPointTimeout / 2) + return; - /* - * Is it safe to checkpoint? We must ask each of the resource managers - * whether they have any partial state information that might prevent a - * correct restart from this point. If so, we skip this opportunity, but - * return at the next checkpoint record for another try. - */ - for (rmid = 0; rmid =
Re: [HACKERS] [PATCH] A crash and subsequent recovery of themaster can cause the slave to get out-of-sync
Simon Riggs wrote: On Thu, 2007-04-19 at 22:37 +0200, Florian G. Pflug wrote: The problem is that after a crash, the master might complete incomplete actions via rm_cleanup() - but since it won't wal-log those changes, the slave won't know about this. This will at least prevent the creation of any further restart points on the slave (because safe_restartpoint) will never return true again - it it might even cause data corruption, if subsequent wal records are interpreted wrongly by the slave because it sees other data than the master did when it generated them. I agree the problem exists. It is somewhat rare because the idea is that if the Primary crashes we would failover to the Standby, which would mean that both Primary and Standby have executed rm_cleanup(), if needed. So in the case where the Primary fails and we choose *not* to failover, there is a potential difficulty on the Standby. It occured to me today that this might plague 8.1 too. As you explain below, the problem is not really connected to restartpoints - failing to create them is merely a sympton of this. On 8.1, this might still lead to rm_cleanup() being called much later (if you consider the wal position to be the time) on the slave than on the master. I dunno if this really causes trouble - I don't yet understand the btree code well enough to judge this. The rationale for this fix could be described somewhat differently: When we shutdown, we know for certain that safe_restartpoint() is true. However, we don't know whether it is true because we successfully did a clean shutdown, or because we crashed, recovered and then issued a shutdown checkpoint following recovery. In the latter case we *must* execute rm_cleanup() on the standby because it has been executed on the primary. Not doing so at this point *might* be safe, but is not certain to be safe. We don't *need* to log a restartpoint at this time, but it seems sensible to do so. While creating the patch, I've been thinking if it might be worthwile to note that we just did recovery in the ShutdownCheckpoint (or create a new checkpoint type RecoveryCheckpoint). This wouldl allow for more error checking, because then the slave could check that safe_restartpoint() is true for all ShutdownCheckpoints that were not after recovering. We need to check that rm_cleanup() routines don't assume that they will only ever be called once or this will clearly fail. There is also no need to call rm_cleanup() unless rm_safe_restartpoint() is false. But a non-idempotent rm_cleanup() routine will cause trouble anyway, if postgres crashes after having called rm_cleanup() but before creating the ShutdownCheckpoint. greetings, Florian Pflug ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Hacking on PostgreSQL via GIT
Martin Langhoff wrote: So - if you are committed to providing your gateway long term to Florian, I'm happy to drop my gateway in favour of yours. (Florian, before basing your code on either you should get a checkout of Aidan's and mine and check that the tips of the branches you are working on match the cvs branches -- the cvsimport code is good but whereever CVS is involved, there's a lot of interpretation at play, a sanity check is always good). Sorry for responding so late - I was rather busy during the last 1 1/2 weeks with university stuff, and had only very little time to spend on SoC. I've tried to switch my repo to both git mirrors, but there seems to be something strange happening. The checkout pulls a _lot_ of objects ( a few hunder thousands), and then takes ages to unpack them all, bloating my local repository (Just rm-ing my local repo takes a few minutes after the checkout). It seems as if git pulls all revisions of all files during the pull - which it shouldn't do as far as I understand things - it should only pull those objects referenced by some head, no? The interesting thing is that exactly the same problem occurs with both if your mirrors... Any ideas? Or is this just how things are supposed to work? greetings, Florian Pflug ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Hacking on PostgreSQL via GIT
Aidan Van Dyk wrote: * Florian G. Pflug [EMAIL PROTECTED] [070430 08:58]: It seems as if git pulls all revisions of all files during the pull - which it shouldn't do as far as I understand things - it should only pull those objects referenced by some head, no? Git pulls full history to a common ancestor on the clone/pull. So the first pull on a repo *will* necessarily pull in the full object history. So unless you have a recent common ancestor, it will pull lots. Note that because git uses crypto hashes to identify objects, my conversion and Martin's probably do not have a recent common ancestor (because my header munging probably doesn't match Martin's exactly). Ah, OK - that explains things. The interesting thing is that exactly the same problem occurs with both if your mirrors... Any ideas? Or is this just how things are supposed to work? Until you have a local repository of it, you'll need to go through the full pull/clone. If you're really not interested in history you can truncate history with the --depth option to git clone. That will give you a shallow repository, which you can use, develop, branch, etc in, but won't give you all the history locally. I'll retry with the --depth option - I'm doing development on my powerbook, and OSX seems to cope badly with lots of little files - the initial unpacking took hours - literally.. Also - what version of GIT are you using? I *really* recommend using at least 1.5 (1.5.2.X is current stable). Please, do your self a favour, and don't use 1.4.4. I'm using 1.5.0 currently - it was the latest stable release when I began to experiment with git. greetings, Florian Pflug ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Heap page diagnostic functions
Zdenek Kotala wrote: I did not find forensics in translator and It mentions in Oxford vocabulary but explanation is not clear for me. I agree with Bruce It is not good name. What about short form of diagnostic diag? Doesn't forensics basically mean to find the cause of something *after* it happened, based on traces that the event left behind? Like finding the culprit of a crime done using for example fingerprints he left, or tracing the actions of an intruder by analyzing logfiles or modified binaries? In that case, it doesn't accuratly describe those functions anyway I think, because you call them from inside the database while it's running, not from the outside after it was stopped or crashed. Just the 2 eurocents of a non-native speaker... Greetings, Florian Pflug ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Concurrently updating an updatable view
Richard Huxton wrote: Richard Huxton wrote: Heikki Linnakangas wrote: The problem is that the new tuple version is checked only against the condition in the update rule, id=OLD.id, but not the condition in the original update-claus, dt='a'. Yeah, that's confusing :(. Bit more than just normal rule confusion I'd say. Try the following two statements in parallel (assuming you've just run the previous): UPDATE test SET dt='c'; UPDATE test SET dt='x' FROM test t2 WHERE test.id=t2.id AND t2.dt='b'; This isn't a problem with the view mechanism - it's a problem with re-checking clauses involving subqueries or joins I'd guess. I'm trying to decide if it's unexpected or just plain wrong, and I think I'd have to argue wrong. Or perhaps I'd not argue that :-/ Well, src/backend/executor/README agrees with you that it's wrong.. Note a fundamental bogosity of this approach: if the relation containing the original tuple is being used in a self-join, the other instance(s) of the relation will be treated as still containing the original tuple, whereas logical consistency would demand that the modified tuple appear in them too. But we'd have to actually substitute the modified tuple for the original, while still returning all the rest of the relation, to ensure consistent answers. Implementing this correctly is a task for future work. This is really about MVCC in read committed mode, and the just right for simpler cases: http://www.postgresql.org/docs/8.2/interactive/transaction-iso.html#XACT-READ-COMMITTED Clearly there needs to be a change to the sentence: Because of the above rule, it is possible for an updating command to see an inconsistent snapshot: it can see the effects of concurrent updating commands that affected the same rows it is trying to update Not true if there's a subquery/join involved. If the cited part of the README is correct, then all joins and subqueries are fine, except if they refer to the table being updated. I think there should be a big, fat warning that self-referential updates have highly non-obvious behaviour in read-committed mode, and should be avoided. greetings, Florian Pflug ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Concurrently updating an updatable view
Richard Huxton wrote: Hiroshi Inoue wrote: Florian G. Pflug wrote: I think there should be a big, fat warning that self-referential updates have highly non-obvious behaviour in read-committed mode, and should be avoided. It seems pretty difficult for PostgreSQL rule system to avoid such kind of updates. I'm suspicious if UPDATABLE VIEWS can be implemented using the rule system. Remember this affects all self-referential joins on an UPDATE (and DELETE?) not just views. It's just that a rule is more likely to produce that type of query. Is there consensus what the correct behaviour should be for self-referential updates in read-committed mode? Does the SQL Spec have anything to say about this? greetings, Florian Pflug ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Not ready for 8.3
Andrew Dunstan wrote: What would making a branch actually do for you? The only advantage I can see is that it will give you a way of checkpointing your files. As I remarked upthread, I occasionally use RCS for that. But mostly I don't actually bother. I don't see how you can do it reasonably off a local cvs mirror - rsync will just blow away any changes you have checked in next time you sync with the master. I don't think we can make CVS behave like a distributed SCM system, and ability to create local branches seems to me one of the fundamental points of such systems. If that's what the demand is for, then we should look again at moving to something like Mercurial. I think the great thing about DCVS systems is that not everybody necessarily needs to use the *same* system. And it doesn't really matter what the central repository runs on - I think they are gateway from/to nearly everything available... I currently use GIT for my SoC project, and it works quite well - I can create an abitrary number of local branches, and syncing the currently active branch with CVS is archived by just doing cg-update pgsql-head. greetings, Florian Pflug ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: Do we need a TODO? (was Re: [HACKERS] Concurrently updating anupdatable view)
Simon Riggs wrote: On Mon, 2007-05-28 at 19:56 -0400, Bruce Momjian wrote: Added to TODO: * Fix self-referential UPDATEs seeing inconsistent row versions in read-committed mode http://archives.postgresql.org/pgsql-hackers/2007-05/msg00507.php I'm sorry guys but I don't agree this is a TODO item. Maybe the TODO suggested has a too narrow focus, but I think that that *something* has to be done about this. IMHO this follows documented behaviour, even if y'all are shocked. Yes, but documented != sensible documented != intuitive documented != logical. If you don't want the example cases to fail you can - use SERIALIZABLE mode to throw an error if inconsistency is detected - use SELECT FOR SHARE to lock the rows in the subselect e.g. UPDATE foo SET pkcol = 'x' WHERE pkcol IN (SELECT pkcol FROM foo FOR SHARE); In the case of concurrent UPDATEs the second UPDATE will normally perform the subSELECT then hang waiting to perform the UPDATE. If you use FOR SHARE the query will hang on the subSELECT (i.e. slightly earlier), which makes the second query return zero rows, as some of you were expecting. Sure, but with a similar argument you could question the whole update-in-read-committed-mode logic. After all, you wouldn't need that logic if you always obtained a share lock on the rows to be updated *before* you started updating them. Maybe we need a way of specifying that the non-UPDATE relation should be locked FOR SHARE in a self-referencing UPDATE? Though that syntax could seems to look pretty weird from here, so I'd say cover this situation in a code example and be done. Also, methinks we should have agreed behaviour before we make something a TODO item. That would help us uncover this type of thing in more detail, or at least force TODO to read investigate whether Ack. Thats why I initially asked if there was consesus on what the correct behaviour is. greetings, Florian Pflug ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] recovery_target_xid crashes on the master
Hi I'm currently working on splitting StartupXLog into smaller parts, because I need to reuse some of the parts for concurrent wal recovery (for my GSoC project) The function recoveryStopsHere in xlog.c checks if we should stop recovery due to the values of recovery_target_xid and recovery_target_time. For recovery_target_xid, we stop if we see a commit or abort record for the given xid. Now I wonder what happens if an (admittely rather confused) DBA uses an xid of a transaction that was aborted because of a crash of the master as recovery_target_xid. The way I read the code, postgres will just recover until it reaches the end of the xlog in that case because neither an COMMIT nor an ABORT for that xid exists in the WAL. I'm not sure if this is worth fixing - it seems like a rather contrived corner case - but I though I'd bring it up... greetings, Florian Pflug ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far
Matthew T. O'Connor wrote: Florian G. Pflug wrote: Work done so far: - .) Don't start autovacuum and bgwriter. Do table stats used by the planner get replicated on a PITR slave? I assume so, but if not, you would need autovac to do analyzes. Yes - everything that get wal-logged on the master gets replicated to the slave. In my design, it isn't possible to do analyze on the slave, because all datafiles are strictly readonly (well, with the small exception of hit-bit updates actually). greetings, Florian Pflug ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far
Jeff Davis wrote: On Wed, 2007-06-06 at 16:11 +0200, Florian G. Pflug wrote: .) Since the slaves needs to track an Snapshot in shared memory, it cannot resize that snapshot to accomodate however many concurrent transactions might have been running on the master. My current plan is to detect if that global snapshot overflows, and to lock out readonly queries on the slave (and therefore remove the need of keeping the snapshot current) until the number of active xids on the master has dropped below max_connections on the slave. A warning will be written to the postgres log that suggest that the DBA increases the max_connections value on the slave. If we did lock the slave while waiting for transactions to complete on the master, we'd need to document some stronger warnings against idle transactions so that administrators could notice and correct the problem. It's not exactly locking until it complete on the master, it's locking the slave until we reach a position in the wal on the slave with less than max_connections concurrent transactions. But yes, I agree, this will need to be documented. Are you referring to the size of the xip array being a problem? Would it help to tie the size of the xip array to max_connections? I understand that max_connections might be greater on the master, but maybe something similar? Thats what I currently do - the xip array on the slave is sized to hold max_connections entries (Actually, it's max_connections + max_prepared_xacts I think). The problem occurs exactly if those values are set too small on the slave - and since shared mem objects are not resizeable, I don't see how the slave can handle an xip overflow gracefully other than by not publishing the information in shared memory as long as it doesn't fit there. On a further thinking - maybe locking out transactions isn't even necessary - they would just continue to see the old global snapshot, so time wouldn't advance for them until the number of concurrent transactions decreases again. greetings, Florian Pflug ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far
Simon Riggs wrote: On Wed, 2007-06-06 at 16:11 +0200, Florian G. Pflug wrote: .) Added a new GUC operational_mode, which can be set to either readwrite or readonly. If it is set to readwrite (the default), postgres behaves as usual. All the following changes are only in effect if operational_mode is set to readonly. Do we need this? We are already InArchiveRecovery. If I understand you correctly, you suggest that readonly queries are allways allowed during archive recovery - so upon startup postgres step through these states: .) Initial recovery (Until we reach a consistent state) .) Allow readonly queries .) Finish recovery in the background (might mean recovering forever on a PITR slave) .) Allow readwrite queries My plan was to have a global switch, which lets you choose between .) All queries are readonly (Until the next postmaster restart at least), but you get background replay .) No background replay, but once replay is done, readwrite queries can be execute (Just what PG does now). The main reason why I invented that global switch operational_mode was to remove to need to switch between readonly mode and readwrite mode on the fly. .) Created a macro ASSUME_OPMODE_READWRITE that does elog(ERROR) if postgre is not in readwrite mode. This macro protects the following functions to make sure that no writes occur in readonly mode. SimpleLruWritePage, SLruPhysicalWritePage EndPrepare, FinishPreparedTransaction XLogInsert, XLogWrite, ShutdownXLog CreateCheckpoint MarkBufferDirty. These are Asserts? The macro ASSUME_OPMODE_READWRITE just does if (!OperationalModeReadWrite) elog(ERROR, ...) .) Don't start autovacuum and bgwriter. Instead of bgwriter, bgreplay is started, and it takes over that role that bgwriter play in the shutdown process. Autovacuum - understood. What does bgreplay do? Why not just start bgwriter earlier and disable some of its other functionality while InRecovery? See above - it seemed simpler to clearly seperate .) Transactions are assigned a dummy xid ReadOnlyTransactionId, that is considered to be later than any other xid. So you are bumping FirstNormalTransactionId up by one for this? In fact I changed MaxTransactionId to 0xFFFE, and set ReadOnlyTransactionId to 0x. Additionally, I changed IsNormalTransactionId to test not only for = FirstNormalTransactionid, but also for = MaxTransactionId. You're assuming then that we will freeze replay while we run a query? No. My plan is to first get to a point where replay is freezes while queries are running, and to then figure out a more intelligent way to do this. I already have a few ideas how to do this, but I want to complete the simple version, before I start with that work. Otherwise doing this will mean the snapshot changes as a query executes. Why? It's only the xid of the transaction, not it's xmin and xmax that are set to ReadOnlyTransactionId. .) A global ReadOnlySnapshot is maintained in shared memory. This is copied into backend local memory by GetReadonlySnapshotData (which replaces GetSnapshotData in readonly mode). .) Crash recovery is not performed in readonly mode - instead, postgres PANICs, and tells the DBA to restart in readwrite mode. Archive recovery of course *will* be allowed, but I'm not that far yet. This is the very heart of the matter. This isn't just a technical issue, it goes to the heart of the use case for this feature. Can we recover while running queries? Yes. My comment only applies only to crash recovery - i.e, recovery that happens *without* a recovery.conf present, after a crash. It only really matters if you do following .) Start pg in readwrite mode. .) Kill it / It crashes .) Restart in readonly mode. The main different between crash recovery, and recovery from a filesystem-level backup is the additional information that the backup label gives us in the second case - more specifically, the minRecoveryLoc that we read from the backup label. Only with that knowledge is recovering until we reach a consistent state a welldefined operation. And readonly queries can only be executed *after* we did this minimal recovery. So if there is crash recovery to be done, we best we could do is to recover, and then start in readonly mode. If this is *really* what the DBA wants, he can just start in readwrite mode first, then cleanly shut PG down, and restart in readonly mode. If not, how much time will we spend in replay mode v query mode? Will we be able to run long running queries *and* maintain a reasonable time to recover? Is this a mechanism for providing HA and additional query capacity, or is it just a mechanism for additional query capacity only? Those are open questions to which I don't have any answers yet myself. My goal is to allow replay and queries to run concurrently, at least as long as only inserts, updates
Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far
Heikki Linnakangas wrote: Florian G. Pflug wrote: Jeff Davis wrote: Are you referring to the size of the xip array being a problem? Would it help to tie the size of the xip array to max_connections? I understand that max_connections might be greater on the master, but maybe something similar? Thats what I currently do - the xip array on the slave is sized to hold max_connections entries (Actually, it's max_connections + max_prepared_xacts I think). The problem occurs exactly if those values are set too small on the slave - and since shared mem objects are not resizeable, I don't see how the slave can handle an xip overflow gracefully other than by not publishing the information in shared memory as long as it doesn't fit there. You could store the value of max_connections in the checkpoint xlog record, and read it from there in the slave. Though one could still change it on the master and restart without restarting the slave as well. But AFAIK shmem allocation happens before recovery starts... Even if this was solved, it would only be a partial solution since as you note, the master might be restarted while the slave keeps running. So I think it's better not too add too much complexity, and just tell the DBA to increase max_connections on the slave, together with a comment in the documentation never to sex max_connections smaller on the slave than on the master. greetings, Florian Pflug ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [RFC] GSoC Work on readonly queries done so far
Jeff Davis wrote: On Wed, 2007-06-06 at 22:36 +0100, Simon Riggs wrote: .) Transactions are assigned a dummy xid ReadOnlyTransactionId, that is considered to be later than any other xid. So you are bumping FirstNormalTransactionId up by one for this? You're assuming then that we will freeze replay while we run a query? Otherwise doing this will mean the snapshot changes as a query executes. Is it possible to put a normal xmax for the snapshot? It wouldn't be a real transaction on the slave, and also the master will use that ID for a real transaction itself. However, I don't see a real problem on the slave because it would only be used for the purpose of the snapshot we need at that moment. My plan is the following: .) Initially, queries and recovery will run interleaved, but not concurrently. For that, an empty snapshot is sufficient, with xmin=xid=xmax=ReadOnlyTransactionId. .) Then, I'll work on running them concurrently. The replay process will publish a current snapshot in shared memory, using real xmin and xmax values it generates by maintaining a list of currently active (as in: running when the wal was written on the master) transactions. In that case, only xid is set to ReadOnlyTransactionId. greetings, Florian Pflug ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Is this a feature?
Joshua D. Drake wrote: Take the following: INFO: analyzing pg_catalog.pg_authid INFO: pg_authid: scanned 1 of 1 pages, containing 5 live rows and 0 dead rows; 5 rows in sample, 5 estimated total rows The above is completely redundant. Why not just say: INFO: pg_authid: scanned 1 of 1 pages, containing 5 live rows and 0 dead rows; 5 rows in sample, 5 estimated total rows If the first line is meant to be an indicator, then make the above line do this: INFO: analyzing pg_catalog.pg_authid : Don't add a new line, and when the next step of information comes up append it to the existing line to get: INFO: analyzing pg_catalog.pg_authid: scanned 1 of 1 pages, containing 5 live rows and 0 dead rows; 5 rows in sample, 5 estimated total rows But then the line could only be pushed to the client *after* the analysis of the table has finished, while with the current output you know what postgres is currently doing, because you get analyzing ... *before* the operation starts. greetings, Florian Pflug ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Selecting a constant question
Dann Corbit wrote: -Original Message- From: Hannu Krosing [mailto:[EMAIL PROTECTED] Since libpq function PQfsize returns -2 for all constant character strings in SQL statements ... What is the proper procedure to determine the length of a constant character column after query execution but before fetching the first row of data? Why not just get the first row and determine the width from it before you actually use any of tha data ? What if the second row is 1000x longer? Thats exactly the point. Consider select mytext from mytable ; How can PostgreSQL possibly know the maximum length of the returned values *before* it has scanned the whole table? greetings, Florian Pflug ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] EXPLAIN omits schema?
Gregory Stark wrote: Joshua D. Drake [EMAIL PROTECTED] writes: I agree. XML seems like a fairly natural fit for this. Just as people should not try to shoehorn everything into XML, neither should they try to shoehorn everything into a relational format either. Now all we need is an XML schema for it ;-) Well I am not a big fan of XML but it certainly seems applicable in this case. I'm not a fan either so perhaps I'm biased, but this seems like a good example of where it would be an *awful* idea. Once you have an XML plan what can you do with it? All you can do is parse it into constituent bits and display it. You cant do any sort of comparison between plans, aggregate results, search for plans matching constraints, etc. How would I, with XML output, do something like: SELECT distinct node.relation FROM plan_table WHERE node.expected_rows node.actual_rows*2; or SELECT node.type, average(node.ms/node.cost) FROM plan_table GROUP BY node.type; I believe that XQuery actually supports such queries. So if postgres supported XQuery (or does it already? I honestly don't know), writing such a query wouldn't be that hard I think. The execution probably won't be super-efficient, but for query plans that seems OK. greetings, Florian Pflug ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Controlling Load Distributed Checkpoints
Heikki Linnakangas wrote: Jim C. Nasby wrote: On Thu, Jun 07, 2007 at 10:16:25AM -0400, Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Thinking about this whole idea a bit more, it occured to me that the current approach to write all, then fsync all is really a historical artifact of the fact that we used to use the system-wide sync call instead of fsyncs to flush the pages to disk. That might not be the best way to do things in the new load-distributed-checkpoint world. How about interleaving the writes with the fsyncs? I don't think it's a historical artifact at all: it's a valid reflection of the fact that we don't know enough about disk layout to do low-level I/O scheduling. Issuing more fsyncs than necessary will do little except guarantee a less-than-optimal scheduling of the writes. If we extended relations by more than 8k at a time, we would know a lot more about disk layout, at least on filesystems with a decent amount of free space. I doubt it makes that much difference. If there was a significant amount of fragmentation, we'd hear more complaints about seq scan performance. OTOH, extending a relation that uses N pages by something like min(ceil(N/1024), 1024)) pages might help some filesystems to avoid fragmentation, and hardly introduce any waste (about 0.1% in the worst case). So if it's not too hard to do it might be worthwhile, even if it turns out that most filesystems deal well with the current allocation pattern. greetings, Florian Pflug ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Worries about delayed-commit semantics
PFC wrote: On Fri, 22 Jun 2007 16:43:00 +0200, Bruce Momjian [EMAIL PROTECTED] wrote: Simon Riggs wrote: On Fri, 2007-06-22 at 14:29 +0100, Gregory Stark wrote: Joshua D. Drake [EMAIL PROTECTED] writes: Tom Lane wrote: untrustworthy disk hardware, for instance. I'd much rather use names derived from deferred commit or delayed commit or some such. Honestly, I prefer these names as well as it seems directly related versus transaction guarantee which sounds to be more like us saying, if we turn it off our transactions are bogus. That was the intention..., but name change accepted. Hm, another possibility: synchronous_commit = off Ooo, I like that. Any other takers? Yea, I like that too but I am now realizing that we are not really deferring or delaying the COMMIT command but rather the recovery of the commit. GUC as full_commit_recovery? commit_waits_for_fsync = force_yes: makes all commits hard yes: commits are hard unless specified otherwise [default] no: commits are soft unless specified otherwise [should replace fsync=off use case] force_no: makes all commits soft (controller with write cache emulator) I think you got the last line backwards - without the fsync() after a commit, you can't be sure that the data made it into the controller cache. To be safe you *always* need the fsync() - but it will probably be much cheaper if your controller doesn't have to actually write to the disks, but can cache in battery-backed ram instead. Therefore, if you own such a controller, you probably don't need deferred commits. BTW, I like synchronous_commit too - but maybe asynchronous_commit would be even better, with inverted semantics of course. The you'd have asynchronous_commit = off as default. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Worries about delayed-commit semantics
Richard Huxton wrote: Bruce Momjian wrote: Tom Lane wrote: What's wrong with synchronous_commit? It's accurate and simple. That is fine too. My concern would be that it can be read two ways: 1. When you commit, sync (something or other - unspecified) 2. Synchronise commits (to each other? to something else?)* It's obvious to people on the -hackers list what we're talking about, but is it so clear to a newbie, perhaps non-English speaker? * I can see people thinking this means something like commit_delay. OTOH, the concept of synchronous vs. asynchronous (function) calls should be pretty well-known among database programmers and administrators. And (at least to me), this is really what this is about - the commit happens asynchronously, at the convenience of the database, and not the instant that I requested it. greetings, Florian Pflug ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Still recommending daily vacuum...
Michael Paesold wrote: Alvaro Herrera wrote: So what you are proposing above amounts to setting scale factor = 0.05. The threshold is unimportant -- in the case of a big table it matters not if it's 0 or 1000, it will be almost irrelevant in calculations. In the case of small tables, then the table will be vacuumed in almost every iteration if the threshold is 0, which is fine because the table is small anyway. So why not let the threshold be 0 and be done with it? For very small tables, setting a threshold of 0 could mean a vacuum after every single row update (or every other row). I think that is just burning cycles. What about a threshold of 10 or 50, to have at least some sanity limit? Even though the cost of vacuum of a small table is low, it is still not free, IMHO, no? A bit off-topic (because probably not realistic in a 8.3 timeframe) - but maybe the threshold should be specified in terms of expected number of pages to be freed, instead specifing a bias for the number of modified rows as it is done now. Then 1 would probably be a reasonable default, because a vacuum that won't free at least one page seems to be not really worth the effort - it won't safe any future IO bandwith. Just an idea I got while following this thread... greetings, Florian Pflug ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [PATCH] A crash and subsequent recovery of themaster can cause the slave to get out-of-sync
Tom Lane wrote: [ back to dealing with this patch, finally ] Florian G. Pflug [EMAIL PROTECTED] writes: While creating the patch, I've been thinking if it might be worthwile to note that we just did recovery in the ShutdownCheckpoint (or create a new checkpoint type RecoveryCheckpoint). This wouldl allow for more error checking, because then the slave could check that safe_restartpoint() is true for all ShutdownCheckpoints that were not after recovering. I concur that this is a good idea --- we should have a third checkpoint record type that shows that a crash recovery occurred. However, we can probably only do that for 8.3 and beyond. If we try to do it in existing release branches then there's likelihood of trouble due to WAL incompatibility between master and standby. While we do advise people to update their standbys first, I don't think it's worth risking such problems just to add some more error checking. Conclusion: we should apply Florian's patch as-is in 8.2, do something morally equivalent in 8.1 and before, and invent a CrashRecoveryCheckpoint record type in HEAD. Sounds good. Do you want me to code up such patches for 8.1 and 8.3 in the next days, or is someone else already working on it? greetings, Florian Pflug ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org