Re: [HACKERS] PITR Dead horse?
-Original Message- From: Nicolai Tufar [mailto:[EMAIL PROTECTED] Sent: 05 February 2004 00:01 To: [EMAIL PROTECTED] Subject: Re: [HACKERS] PITR Dead horse? Totally agree. Robustness and rock-solidness are the only things missing for PostgreSQL to become the killer of certain commercial enterprise databases out there. Well I've only been using PostgreSQL since 1997 and the *only* release I ever had problems with was 6.3.2. We also use(d) Informix SE, DB2, Unidata and SQL Server and only Informix and Unidata come close to the robustness of PostgreSQL - and they're not the ones we need to worry about. Now I'm not saying we shouldn't be continually looking to improve things, but I don't think this is quite the problem you imply. Regards, Dave. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PITR Dead horse?
-Original Message- From: Nicolai Tufar [mailto:[EMAIL PROTECTED] Sent: 05 February 2004 08:15 To: Dave Page Subject: RE: [HACKERS] PITR Dead horse? -Original Message- From: Dave Page [mailto:[EMAIL PROTECTED] Well I've only been using PostgreSQL since 1997 and the *only* release I ever had problems with was 6.3.2. We also use(d) Informix SE, DB2, Unidata and SQL Server and only Informix and Unidata come close to the robustness of PostgreSQL - and they're not the ones we need to worry about. Don't know. But apparently different users will have different demands From a database. Of course, but I would argue that my claim that PostgreSQL is reliable is backed up by the lack of people posting messages like 'we had a powercut and now my DB is hosed'. Now I'm not saying we shouldn't be continually looking to improve things, but I don't think this is quite the problem you imply. For the customers I am dealing with it is quite a problem, believe me. Do they have specific problems with the reliability of PostgreSQL then? Perhaps you could post details of how things have gone wrong for them (assuming you haven't already - I don't recall anything on -hackers recently). Regards, Dave ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint
I don't think the bgwriter is going to be able to keep up with I/O bound backends, but I do think it can scan and set those booleans fast enough for the backends to then perform the writes. As long as the bgwriter does not do sync writes (which it does not, since that would need a whole lot of work to be performant) it calls write which returns more or less at once. So the bottleneck can only be the fsync. From those you would want at least one per pg disk open in parallel. But I think it should really be left to the OS when it actually does the IO for the writes from the bgwriter inbetween checkpoints. So Imho the target should be to have not much IO open for the checkpoint, so the fsync is fast enough, even if serial. Andreas ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] log session end - again
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Right. And if we have sessionids we would want them logged there, I think. And that would rule out anything based on xid or backend pid. Uh, what's wrong with backend pid? Since we fork before we start doing anything with a connection, it should surely be available soon enough for the connection log message. Larry's idea about combining PID and backend start time didn't sound too unreasonable to me. Wow, like the idea too --- pid plus time_t start time of backend. Actully, it would be good to have star time first so you can sort everything in order of start time. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PATCHES] log session end - again
Bruce Momjian wrote: Tom Lane wrote: Larry's idea about combining PID and backend start time didn't sound too unreasonable to me. Wow, like the idea too --- pid plus time_t start time of backend. Actully, it would be good to have star time first so you can sort everything in order of start time. Yes. Of course, you would need either to sort additionally on log timestamp or use an order-preserving sort. Based on Larry's idea, I had in mind to provide a third escape in the log_line_info string (in addition to the %U and %D that I had previously done) of %S for sessionid, which would look something like this: 402251fc.713f I will start redoing this feature when the log_disconnections patch is dealt with. cheers andrew ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PITR Dead horse?
Dave Page wrote: -Original Message- From: Nicolai Tufar [mailto:[EMAIL PROTECTED] Sent: 05 February 2004 00:01 To: [EMAIL PROTECTED] Subject: Re: [HACKERS] PITR Dead horse? Totally agree. Robustness and rock-solidness are the only things missing for PostgreSQL to become the killer of certain commercial enterprise databases out there. Well I've only been using PostgreSQL since 1997 and the *only* release I ever had problems with was 6.3.2. We also use(d) Informix SE, DB2, Unidata and SQL Server and only Informix and Unidata come close to the robustness of PostgreSQL - and they're not the ones we need to worry about. Now I'm not saying we shouldn't be continually looking to improve things, but I don't think this is quite the problem you imply. I assume he was talking about the lack of data recovery in cases of hard drive failure --- we now require you restore from backup or use a replicated machine/drive setup. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint
Zeugswetter Andreas SB SD [EMAIL PROTECTED] writes: So Imho the target should be to have not much IO open for the checkpoint, so the fsync is fast enough, even if serial. The best we can do is push out dirty pages with write() via the bgwriter and hope that the kernel will see fit to write them before checkpoint time arrives. I am not sure if that hope has basis in fact or if it's just wishful thinking. Most likely, if it does have basis in fact it's because there is a standard syncer daemon forcing a sync() every thirty seconds. That means that instead of an I/O storm every checkpoint interval, we get a smaller I/O storm every 30 seconds. Not sure this is a big improvement. Jan already found out that issuing very frequent sync()s isn't a win. People keep saying that the bgwriter mustn't write pages synchronously because it'd be bad for performance, but I think that analysis is faulty. Performance of what --- the bgwriter? Nonsense, the *point* of the bgwriter is to do the slow tasks. The only argument that has any merit is that O_SYNC or immediate fsync will prevent us from having multiple writes outstanding and thus reduce the efficiency of disk write scheduling. This is a valid point but there is a limit to how many writes we need to have in flight to keep things flowing smoothly. What I'm thinking now is that the bgwriter should issue frequent fsyncs for its writes --- not immediate, but a lot more often than once per checkpoint. Perhaps take one recently-written unsynced file to fsync every time it is about to sleep. You could imagine various rules for deciding which one to sync; perhaps the one with the most writes issued against it since last sync. When we have tablespaces it'd make sense to try to distribute the syncs across tablespaces, on the assumption that the tablespaces are probably on different drives. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PITR Dead horse?
Tatsuo Ishii wrote: Has this been beaten to death now? Just curious if PITR was in Dev tree yet. Been out of the loop. TIA. I and my co workers are very interested in implementing PITR. We will tackle this for 7.5 if no one objects. I have put up a PITR project page: http://momjian.postgresql.org/main/writings/pgsql/project -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] PITR Dead horse?
Koichi Suzuki wrote: Hi, This is Suzuki from NTT DATA Intellilink. I told Bruce Momjan that I and my colleagues are interested in implementing PITR in BOF in NY LW2004. NTT's laboratory is very interested in this issue and I'm planning to work with them. I hope we could cooperate. Yes, I am going to focus on this next week when I return. With Win32 moving along, PITR is my next big target. I want to get things moving. The first step is for Tom to get the PITR WAL patches in. Then we need to discuss what else we need and get those on the PITR project page: http://momjian.postgresql.org/main/writings/pgsql/project -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PITR Dead horse?
Nicolai Tufar wrote: I would like to join this effort too. I was afraid that people at RedHat are already halfway though and were to release their work shortly. But it does not seem to be the case. We are a long way away from completion: http://momjian.postgresql.org/main/writings/pgsql/project -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] PITR Dead horse?
Tom Lane wrote: Marc G. Fournier [EMAIL PROTECTED] writes: Is this something large enough, like the win32 stuff, that having a side list for discussions is worth setting up? In terms of the amount of code to be written, I expect it's larger than the win32 porting effort. And it should be mostly pretty separate from hacking the core backend, since most of what remains to do is writing external management utilities (I think). I've been dissatisfied with having the separate pgsql-hackers-win32 list; I feel it just fragments the discussion, and people tend to end up crossposting to -hackers anyway. But a separate list for PITR work might be a good idea despite that experience, since it seems like it'd be a more separable project. I think the win32 email list has worked well. What is has allowed is people who want to track only win32 to get only those emails. It doesn't help people already on hackers because hacker input is needed. There are currently 102 Win32 subscribers, and most are not on the hackers list. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PITR Dead horse?
Marc G. Fournier wrote: On Wed, 4 Feb 2004, Tatsuo Ishii wrote: I and some other developers are also interested in. Do you think we can work together? Sure. Why not. I think it would be practical to decide who is the leader of this project, though. Is this something large enough, like the win32 stuff, that having a side list for discussions is worth setting up? Yes, I would like to have such a list, and will advertize it on the PITR project page: http://momjian.postgresql.org/main/writings/pgsql/project -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] PITR Dead horse?
Bruce Momjian wrote: Dave Page wrote: -Original Message- From: Nicolai Tufar [mailto:[EMAIL PROTECTED] Sent: 05 February 2004 00:01 To: [EMAIL PROTECTED] Subject: Re: [HACKERS] PITR Dead horse? Totally agree. Robustness and rock-solidness are the only things missing for PostgreSQL to become the killer of certain commercial enterprise databases out there. Well I've only been using PostgreSQL since 1997 and the *only* release I ever had problems with was 6.3.2. We also use(d) Informix SE, DB2, Unidata and SQL Server and only Informix and Unidata come close to the robustness of PostgreSQL - and they're not the ones we need to worry about. Now I'm not saying we shouldn't be continually looking to improve things, but I don't think this is quite the problem you imply. I assume he was talking about the lack of data recovery in cases of hard drive failure --- we now require you restore from backup or use a replicated machine/drive setup. I retract this email. He clearly was talking about PostgreSQL reliability, and Dave is right, it is pretty much a non-issue, though maybe mindshare needs some help. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint
On Thursday 05 February 2004 20:24, Tom Lane wrote: Zeugswetter Andreas SB SD [EMAIL PROTECTED] writes: So Imho the target should be to have not much IO open for the checkpoint, so the fsync is fast enough, even if serial. The best we can do is push out dirty pages with write() via the bgwriter and hope that the kernel will see fit to write them before checkpoint time arrives. I am not sure if that hope has basis in fact or if it's just wishful thinking. Most likely, if it does have basis in fact it's because there is a standard syncer daemon forcing a sync() every thirty seconds. There are other benefits of writing pages earlier even though they might not get synced immediately. It would tell kernel that this is latest copy of updated buffer. Kernel VFS should make that copy visible to every other backend as well. The buffer manager will fetch the updated copy from VFS cache next time. All without going to disk actually..(Within the 30 seconds window of course..) People keep saying that the bgwriter mustn't write pages synchronously because it'd be bad for performance, but I think that analysis is faulty. Performance of what --- the bgwriter? Nonsense, the *point* of the bgwriter is to do the slow tasks. The only argument that has any merit is that O_SYNC or immediate fsync will prevent us from having multiple writes outstanding and thus reduce the efficiency of disk write scheduling. This is a valid point but there is a limit to how many writes we need to have in flight to keep things flowing smoothly. Is it a valid assumption for platforms-that-postgresql-supports that a write call would make changes visible across processes? What I'm thinking now is that the bgwriter should issue frequent fsyncs for its writes --- not immediate, but a lot more often than once per frequent fsyncs or frequent fsyncs per file descriptor written? I thought it was later. Just a thought. Shridhar ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] dollar quoting
What has become of the dollar quoting mechanism that we had so much discussion about back in August/September? IIRC, a consensus was reached on the actual format of the quote delimiters (either $$ or $identifier$), and Tom had a proof of concept patch to the parser to handle it, but work was needed on psql, plpgsql, pg_dump (and pg_restore?) + docs. Is anyone working on it? cheers andrew ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] prepared queries
The current release of DBD::Pg contains this in the CHANGES file: - $dbh-prepare() rewrites the SQL statement into an internal form, striping out comments and whitespace, and if PostgreSQL 7.3 takes the stripped statement and passes that to Postgres' PREPARE statement, then rewrites the statement as 'EXECUTE DBD::PG::cached_query n ($1, $2, ... $n, $n+1)' for DBD::Pg's execute. -- Currently disabled until PREPARE works a little better Is there a TODO here, or is it just that the Driver is a bit out of date? Perhaps one of these items relates to the matter?: . Allow clients to query a list of WITH HOLD cursors and prepared statements . Allow libpq to return information about prepared queries It's a pity if such an important client as DBD:Pg can't really use PQs. Is the problem that you need to supply the data types of the params at PREPARE time rather than discovering them from the result of a PREPARE? cheers andrew ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] dollar quoting
On Thu, 5 Feb 2004, Andrew Dunstan wrote: What has become of the dollar quoting mechanism that we had so much discussion about back in August/September? IIRC, a consensus was reached on the actual format of the quote delimiters (either $$ or $identifier$), and Tom had a proof of concept patch to the parser to handle it, but work was needed on psql, plpgsql, pg_dump (and pg_restore?) + docs. Is anyone working on it? I am, but not very quickly. I hope to have some time in the next month, but if someone else beats me to it I'll just be happy it got done. Jon ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PITR Dead horse?
[EMAIL PROTECTED] I set myself as owner, since I didn't figure it was something you really needed added to your plate? :) Just means you don't have to go through and do the Approvals for postings when they need it, I'll just do it as my normal stuff ... On Thu, 5 Feb 2004, Bruce Momjian wrote: Marc G. Fournier wrote: On Wed, 4 Feb 2004, Tatsuo Ishii wrote: I and some other developers are also interested in. Do you think we can work together? Sure. Why not. I think it would be practical to decide who is the leader of this project, though. Is this something large enough, like the win32 stuff, that having a side list for discussions is worth setting up? Yes, I would like to have such a list, and will advertize it on the PITR project page: http://momjian.postgresql.org/main/writings/pgsql/project -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint
Shridhar Daithankar [EMAIL PROTECTED] writes: There are other benefits of writing pages earlier even though they might not get synced immediately. Such as? It would tell kernel that this is latest copy of updated buffer. Kernel VFS should make that copy visible to every other backend as well. The buffer manager will fetch the updated copy from VFS cache next time. All without going to disk actually..(Within the 30 seconds window of course..) This seems quite irrelevant given the way we handle shared buffers. frequent fsyncs or frequent fsyncs per file descriptor written? I thought it was later. You can only fsync one FD at a time (too bad ... if there were a multi-file-fsync API it'd solve the overspecified-write-ordering issue). regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] dollar quoting
Andrew Dunstan [EMAIL PROTECTED] writes: IIRC, a consensus was reached on the actual format of the quote delimiters (either $$ or $identifier$), and Tom had a proof of concept patch to the parser to handle it, but work was needed on psql, plpgsql, pg_dump (and pg_restore?) + docs. I think someone has to fix psql before we can consider applying the backend patch. Fixing the other stuff can come after. Is anyone working on it? I kinda thought you had volunteered to work on the psql part... regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] dollar quoting
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: IIRC, a consensus was reached on the actual format of the quote delimiters (either $$ or $identifier$), and Tom had a proof of concept patch to the parser to handle it, but work was needed on psql, plpgsql, pg_dump (and pg_restore?) + docs. I think someone has to fix psql before we can consider applying the backend patch. Fixing the other stuff can come after. Makes sense. Is anyone working on it? I kinda thought you had volunteered to work on the psql part... I don't recall being that specific, but you could be right. In any case, I didn't want to trip over anyone else, which is why I asked. I will try to coordinate with Jon. cheers andrew ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] PITR Dead horse?
Wow. What a wonderful response. Thanks all! On Thu, 2004-02-05 at 08:57, Bruce Momjian wrote: Tatsuo Ishii wrote: Has this been beaten to death now? Just curious if PITR was in Dev tree yet. Been out of the loop. TIA. I and my co workers are very interested in implementing PITR. We will tackle this for 7.5 if no one objects. I have put up a PITR project page: http://momjian.postgresql.org/main/writings/pgsql/project -- Austin Gonyou [EMAIL PROTECTED] Coremetrics, Inc. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [pgsql-hackers-win32] Sync vs. fsync during checkpoint
People keep saying that the bgwriter mustn't write pages synchronously because it'd be bad for performance, but I think that analysis is faulty. Performance of what --- the bgwriter? Nonsense, the *point* Imho that depends on the workload. For a normal OLTP workload this is certainly correct. I do not think it is correct for mass loading, or an otherwise IO bound db. of the bgwriter is to do the slow tasks. The only argument that has any merit is that O_SYNC or immediate fsync will prevent us from having multiple writes outstanding and thus reduce the efficiency of disk write scheduling. This is a valid point but there is a limit to how many writes we need to have in flight to keep things flowing smoothly. But that is imho the main point. The difference for modern disks is 1Mb/s for random 8k vs. 20 Mb/s for random 256k. Don't understand me wrong I think sync writing would achieve maximum performance, but you have to try to write physically adjacent 256k, and you need a vague idea which blocks to write in parallel. And since that is not so easy I think we could leave it to the OS. And as an aside I think 20-30 minute checkpoint intervals would be sufficient with a bgwriter. Andreas Ps: don't most syncers have 60s intervals, not 30 ? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PITR Dead horse?
-Original Message- From: Dave Page [mailto:[EMAIL PROTECTED] Sent: Thursday, February 05, 2004 11:02 AM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: RE: [HACKERS] PITR Dead horse? Of course, but I would argue that my claim that PostgreSQL is reliable is backed up by the lack of people posting messages like 'we had a powercut and now my DB is hosed'. It's not like that. It's more like 'what will happen if we had a powercut/ disk failure/cpu failure/memory failure, etc, etc.' and that answer I have to give is 'why, there is PITR of course!'. No other answer will pass in enterprise world. Those people are not open-minded, they'd rather be safe than sorry. Do they have specific problems with the reliability of PostgreSQL then? Perhaps you could post details of how things have gone wrong for them (assuming you haven't already - I don't recall anything on -hackers recently). Nothing remarkable. PostgreSQL just works. Bu as I said before, In enterprise world, good sleep at night is treasured above all. Regards, Dave ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PITR Dead horse?
-Original Message- From: Nicolai Tufar [mailto:[EMAIL PROTECTED] Sent: 05 February 2004 17:35 To: Dave Page; [EMAIL PROTECTED] Subject: RE: [HACKERS] PITR Dead horse? -Original Message- From: Dave Page [mailto:[EMAIL PROTECTED] Sent: Thursday, February 05, 2004 11:02 AM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: RE: [HACKERS] PITR Dead horse? Of course, but I would argue that my claim that PostgreSQL is reliable is backed up by the lack of people posting messages like 'we had a powercut and now my DB is hosed'. It's not like that. It's more like 'what will happen if we had a powercut/ disk failure/cpu failure/memory failure, etc, etc.' and that answer I have to give is 'why, there is PITR of course!'. No other answer will pass in enterprise world. Those people are not open-minded, they'd rather be safe than sorry. Ahh, that's not quite what I thought you meant. It sounded like you were questioning the reliability of PostgreSQL, not it's ability to be recovered to point of failure. Do they have specific problems with the reliability of PostgreSQL then? Perhaps you could post details of how things have gone wrong for them (assuming you haven't already - I don't recall anything on -hackers recently). Nothing remarkable. PostgreSQL just works. Bu as I said before, In enterprise world, good sleep at night is treasured above all. My SQL2K servers give me far more sleepless nights than PostgreSQL ever did! Regards, Dave. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] It's past time to redo the smgr API
A long time ago Vadim proposed that we should revise smgr.c's API so that it does not depend on Relations (relcache entries); rather, only a RelFileNode value should be needed to access a file in smgr and lower levels. This would allow us to get rid of the concept of blind writes. As of CVS tip, with most writes being done by the bgwriter, most writes are blind. Try tracing the bgwriter: every write involves a file open and close, which has got to be hurting its performance. The same has been true for awhile now of the background checkpointer. I'm motivated to do something about this today because I'm reviewing the PITR patch that J.R. Nield and Patrick Macdonald were working on, and one of the things I don't care for is the kluges it adds to the smgr API to work around lack of a Relation in some operations. I propose the following revisions: * smgr.c should internally keep a hashtable (indexed by RelFileNode) with struct SMgrRelation entries for each relation that has recently been accessed. File access information for the lower-level modules (md.c, fd.c) would appear in this hashtable entry. * smgropen takes a RelFileNode and returns a pointer to a hashtable entry, which is used as the argument to all other smgr operations. smgropen itself does not cause any actual I/O; thus it doesn't matter if the file doesn't exist yet. smgropen followed by smgrcreate is the way to create a new relation on disk. Without smgrcreate, file existence will be checked at the first read or write attempt. * smgrclose closes the md.c-level file and drops the hashtable entry. Hashtable entries remain valid unless explicitly closed (thus, multiple smgropens for the same file are legal). * The rd_fd field of relcache nodes will be replaced by an SMgrRelation * pointer, so that callers having a Relation reference can access the smgr file easily. (In particular, this means that most calls to the bufmgr will still pass Relations, and we don't need to propagate the API change up to the bufmgr level.) * The existing places in the bufmgr that do RelationNodeCacheGetRelation followed by either regular or blind write would be replaced by smgropen followed by smgrwrite. Since smgropen is just a hash table lookup, it is no more expensive than the RelationNodeCacheGetRelation it replaces. Note these places would *not* do smgrclose afterwards. * Because we don't smgrclose after a write, it is possible to have dangling smgr entries that aren't useful any more, as well as open file descriptors underneath them. This isn't too big a deal on Unix but it will be fatal for the Windows port, since it would prevent a DROP TABLE if some other backend happens to have touched the table. What I propose to do about this is: 1. In the bgwriter, at each checkpoint do smgrcloseall to close all open files. 2. In regular backends, receipt of a relcache flush message will result in smgrclose(), even if there is not a relcache entry for the given relation ID. A global cache flush event (sinval buffer overflow) causes smgrcloseall. This ensures that open descriptors for a dead relation will not persist past the next checkpoint. We had already agreed, I think, that the best solution for Windows' problem with deleting open files is to retry pending deletes at checkpoint. This smgr rewrite will not contribute to actually making that happen, but it won't make things worse either. * I'm going to get rid of the which argument that once selected a particular smgr implementation (md.c vs mm.c). It's vestigial at present and is misdefined anyway. If we were to resurrect the concept of multiple storage managers, we'd need the selection to be determinable from the RelFileNode value, rather than being a separate argument. (When we add tablespaces, we could imagine associating smgrs with tablespaces, but it would have to be the responsibility of smgr.c to determine which smgr to use based on the tablespace ID.) * We can remove the dummy cache support in relcache.c, as well as RelationNodeCacheGetRelation() and the hashtable that supports it. * The smgr hashtable entries could possibly be used for recording other status; for instance keeping track of which relations need fsync in the bgwriter. I'm also thinking of merging smgr.c's existing list-of-rels-to-be-deleted into this data structure. * AFAICS the only downside of not having a Relation available in smgr.c and md.c is that error messages could only refer to the RelFileNode numbers and not to the relation name. I'm not sure this is bad, since in my experience what you want to know about such errors is the actual disk filename, which RelFileNode tells you and relation name doesn't. We could preserve the current behavior by passing the relation name to smgropen when available, and saving the name in struct SMgrRelation. But I'm inclined not to. Comments? regards, tom lane ---(end of broadcast)---
Re: [HACKERS] PITR Dead horse?
Dave Page [EMAIL PROTECTED] writes: Ahh, that's not quite what I thought you meant. It sounded like you were questioning the reliability of PostgreSQL, not it's ability to be recovered to point of failure. I think the waters got muddied a bit by the suggestion elsewhere in the thread (not from Nicolai, IIRC) that we needed a mailing list to talk about reliability issues in general. We know we need PITR to help us become a more credible enterprise-grade database; so that discussion is short and sweet. What people were confused about was whether there was enough other issues to need ongoing discussion. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PITR Dead horse?
-Original Message- From: Dave Page [mailto:[EMAIL PROTECTED] My SQL2K servers give me far more sleepless nights than PostgreSQL ever did! You bet! I totally agree with you. Technicians like you, me and most people on this list Already know that PostgreSQL is stable and reliable. It is management that needs to be convinced, and for this we need to have PITR in feature list. Regards, Dave. Regards, Nicolai ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PITR Dead horse?
On Thu, 2004-02-05 at 14:00, Nicolai Tufar wrote: -Original Message- From: Dave Page [mailto:[EMAIL PROTECTED] My SQL2K servers give me far more sleepless nights than PostgreSQL ever did! You bet! I totally agree with you. Technicians like you, me and most people on this list Already know that PostgreSQL is stable and reliable. It is management that needs to be convinced, and for this we need to have PITR in feature list. Regards, Dave. As previously stated by Bruce I believe, the mindshare department needs some work. For this, the PITR is a necessity, but also when comparing features with other DBs that people and businesses are currently familiar with. -- Austin Gonyou [EMAIL PROTECTED] Coremetrics, Inc. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Why has postmaster shutdown gotten so slow?
Jan Wieck wrote: Tom Lane wrote: Shutdown of an idle postmaster used to take about two or three seconds (mostly due to the sync/sleep(2)/sync in md_sync). For the last couple of days it's taking more like a dozen seconds. I presume somebody broke something, but I'm unsure whether to pin the blame on bgwriter or Windows changes. Anyone care to fess up? I guess it could well be the bgwriter, which when having nothing to do at all is sleeping for 10 seconds. Not sure, will check. I checked the background writer for this and I can not reproduce the behaviour. If the bgwriter had zero blocks to write it does PG_USLEEP for 10 seconds, which on Unix is done by select() and that is correctly interrupted when the postmaster sends it the term signal on shutdown. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Recursive queries?
hello hackers, some time ago i played with PgSQL and have written simpliest working prototype of WITH clause for it. it don't do any checks and performs only simpliest selects, but it works. i can contribute it and develop it further to production state. regards, .evgen ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] Preventing duplicate vacuums?
Folks, Just occurred to me that we have no code to prevent a user from running two simultaneos lazy vacuums on the same table.I can't think of any circumstance why running two vacuums would be desirable behavior; how difficult would it be to make this an exception? This becomes a more crucial issue now since the introduction of vacuum_delay makes overlapping vacuums more probable. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] It's past time to redo the smgr API
On Thu, 5 Feb 2004, Tom Lane wrote: * Because we don't smgrclose after a write, it is possible to have dangling smgr entries that aren't useful any more, as well as open file descriptors underneath them. This isn't too big a deal on Unix but it will be fatal for the Windows port, since it would prevent a DROP TABLE if some other backend happens to have touched the table. What I propose to do about this is: 1. In the bgwriter, at each checkpoint do smgrcloseall to close all open files. 2. In regular backends, receipt of a relcache flush message will result in smgrclose(), even if there is not a relcache entry for the given relation ID. A global cache flush event (sinval buffer overflow) causes smgrcloseall. This ensures that open descriptors for a dead relation will not persist past the next checkpoint. We had already agreed, I think, that the best solution for Windows' problem with deleting open files is to retry pending deletes at checkpoint. This smgr rewrite will not contribute to actually making that happen, but it won't make things worse either. 'k, only comment is on this one ... would it not be a bit more efficient to add a flag to the SMgrRelation * structure that acts as a timer? if -1, then relation is deleted, is 0, make it epoch of the time that Relation was last accessed? then your 'smgrcloseall', instead of closing all files (even active/undeleted ones), would only close those that are either idle for x minutes (maybe a postgresql.conf tunable there?) or those that have been DROP'd? Assuming, that is, that your end goal is to reduce the overall # of open/closes of files ... this way, instead of closing 20 just to reopen 19 of them, you only close that one that needs it ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] It's past time to redo the smgr API
Marc G. Fournier [EMAIL PROTECTED] writes: 'k, only comment is on this one ... would it not be a bit more efficient to add a flag to the SMgrRelation * structure that acts as a timer? Hm, we could try that, although I'm not sure it would help much. You'd have to set the timeout to be longer than a checkpoint interval to make any difference. In the back of my mind is the thought that the Windows guys are going to end up passing file-delete requests over to the bgwriter anyway, which would largely eliminate the issue --- the bgwriter would know which files need to be sgmrclose'd and wouldn't have to do smgrcloseall. (If they don't do this, how are they going to cope with backends that exit before their file deletion is completed?) I'll do it the easy way for now and we can refine it after we see what the file-close solution for Windows ends up looking like. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Preventing duplicate vacuums?
On Thu, 2004-02-05 at 15:37, Josh Berkus wrote: Folks, Just occurred to me that we have no code to prevent a user from running two simultaneos lazy vacuums on the same table.I can't think of any circumstance why running two vacuums would be desirable behavior; how difficult would it be to make this an exception? You have a 8 billion row table with some very high turn over tuples (lots of updates to a few thousand rows). A partial or targeted vacuum would be best, failing that you kick them off fairly frequently, especially if IO isn't really an issue. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] It's past time to redo the smgr API
On Thu, Feb 05, 2004 at 02:05:46PM -0500, Tom Lane wrote: * smgrclose closes the md.c-level file and drops the hashtable entry. Hashtable entries remain valid unless explicitly closed (thus, multiple smgropens for the same file are legal). So, will there be a refcount on each cache entry? Else, how would smgrclose when to drop the cache entry if there had been multiple smgropen calls to the same file? (unless, of course, each smgropen call yields a different hash entry?) I'm also thinking of merging smgr.c's existing list-of-rels-to-be-deleted into this data structure. Please don't. In the nested transaction environment, each subxact has to keep track of which files should be deleted. That's why I was trying to set the list inside a transaction status stack. Another way to do it would be keeping the list of files to delete along with the deleting Xid, but that would also require keeping a list of which xacts aborted and which ones didn't. * AFAICS the only downside of not having a Relation available in smgr.c and md.c is that error messages could only refer to the RelFileNode numbers and not to the relation name. I'm not sure this is bad, since in my experience what you want to know about such errors is the actual disk filename, which RelFileNode tells you and relation name doesn't. I agree it's best to have the filename in the error message, but IMHO the relation name should also be presented to the user for clarity. -- Alvaro Herrera (alvherre[a]dcc.uchile.cl) Endurecerse, pero jamás perder la ternura (E. Guevara) ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Preventing duplicate vacuums?
Rod, You have a 8 billion row table with some very high turn over tuples (lots of updates to a few thousand rows). A partial or targeted vacuum would be best, failing that you kick them off fairly frequently, especially if IO isn't really an issue. Yes, but we don't have partial or targeted vacuums yet. When tablespaces is finished, presumably the lock would be per tablespace. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] [PATCHES] log session end - again
Chester Kustarz wrote: On Thu, 5 Feb 2004, Bruce Momjian wrote: Wow, like the idea too --- pid plus time_t start time of backend. Actully, it would be good to have star time first so you can sort everything in order of start time. Why not just add a printf like pattern so the user can output whatever they like? %p - backend pid %t - connect time %C - standard connection string %t:%p %a - autoincrement number log_line_info where this would be implemented is indeed done with a printf like string, which is easily extensible The following is adapted from the description of the original patch which it is my intention to bring up to date soon and resubmit: Prefixes each message in the server log file or syslog with extra information as specified in the string. The default is an empty string. The following sequences in the string are translated: '%U' is replace with the username, '%D' is replaced by the database name, and '%%' is replaced by '%'. All other characters are copied literally to the log. For example, log_line_info = '%U%%%D ' will produce output like this: myuser%mydb LOG: query: create table foo(blah text); cheers andrew ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Preventing duplicate vacuums?
Tom, Yes we do: there's a lock. Sorry, bad test. Forget I said anything. Personally, I would like to have the 2nd vacuum error out instead of blocking. However, I'll bet that a lot of people won't agree with me. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] It's past time to redo the smgr API
Alvaro Herrera [EMAIL PROTECTED] writes: On Thu, Feb 05, 2004 at 02:05:46PM -0500, Tom Lane wrote: * smgrclose closes the md.c-level file and drops the hashtable entry. Hashtable entries remain valid unless explicitly closed (thus, multiple smgropens for the same file are legal). So, will there be a refcount on each cache entry? No, I deliberately intended not. There is at most one persistent reference to an smgr hashtable entry; that's the one in the relcache entry for the relation. The only caller of smgrclose will be relcache.c, and it'll do it only when clearing the relcache reference (or, in the sinval case, upon determining that there is no relcache entry for the relation). Adding a reference count would just complicate matters --- for instance, we'd have to be able to reset them after an error, which would be rather hard from smgr's point of view since it really shouldn't be looking into the relcache to see if there's a reference there. I'm also thinking of merging smgr.c's existing list-of-rels-to-be-deleted into this data structure. Please don't. In the nested transaction environment, each subxact has to keep track of which files should be deleted. That's why I was trying to set the list inside a transaction status stack. Hm. Okay, I'll leave that separate for now, although I think we do want to merge the structures eventually. Another way to do it would be keeping the list of files to delete along with the deleting Xid, but that would also require keeping a list of which xacts aborted and which ones didn't. Really? When a subtransaction finishes, you either delete its files immediately (ie, if it aborted, you can drop files it created) or you reassign them as the responsibility of the parent (since any effects of the subtransaction really depend on whether the parent commits, you can't actually do its deletes yet). I don't see the need to remember subtransaction state further than that. So ISTM all you need is one field added to the existing list entries to remember which subtransaction is currently on the hook for a given file-deletion request. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] It's past time to redo the smgr API
On Thu, Feb 05, 2004 at 04:55:44PM -0500, Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: Another way to do it would be keeping the list of files to delete along with the deleting Xid, but that would also require keeping a list of which xacts aborted and which ones didn't. Really? When a subtransaction finishes, you either delete its files immediately (ie, if it aborted, you can drop files it created) or you reassign them as the responsibility of the parent (since any effects of the subtransaction really depend on whether the parent commits, you can't actually do its deletes yet). Right, that's the mechanism I originally envisioned. (A similar one is needed for deferred triggers, async notifies, heavy locks and on commit actions.) -- Alvaro Herrera (alvherre[a]dcc.uchile.cl) Y dijo Dios: Que sea Satanás, para que la gente no me culpe de todo a mí. Y que hayan abogados, para que la gente no culpe de todo a Satanás ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] It's past time to redo the smgr API
On Thu, 5 Feb 2004, Tom Lane wrote: Marc G. Fournier [EMAIL PROTECTED] writes: 'k, only comment is on this one ... would it not be a bit more efficient to add a flag to the SMgrRelation * structure that acts as a timer? Hm, we could try that, although I'm not sure it would help much. You'd have to set the timeout to be longer than a checkpoint interval to make any difference. Why? Setting it to the checkpoint interval itself should be sufficient, no? All you want to do is avoid closing any files that were used during that last checkpoint interval, since there is a good chance you'd have to once more reopen them in the checkpoint interval ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] It's past time to redo the smgr API
Marc G. Fournier [EMAIL PROTECTED] writes: Why? Setting it to the checkpoint interval itself should be sufficient, no? All you want to do is avoid closing any files that were used during that last checkpoint interval, since there is a good chance you'd have to once more reopen them in the checkpoint interval ... If we did that then (on Windows) every DROP TABLE would take one extra checkpoint interval to take effect in terms of freeing disk space. Not sure if this is a good tradeoff for avoiding some file opens. In any case, I think we should leave it to be debated after we see what the Windows file-closing solution turns out to be. It might become a non-issue. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] It's past time to redo the smgr API
Marc G. Fournier [EMAIL PROTECTED] writes: k, but that would be a different scenario, no? As I mentioned in my original, a DROP TABLE would reset its timeout to -1, meaning to close it and drop it on the next checkpoint interval ... How would it do that? These structs are local to particular backends, so there's no way for a DROP TABLE occurring in one backend to reach over and reset the timeout in the bgwriter process. If we add communication that lets the bgwriter know the table is being dropped, then the problem is solved directly. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] It's past time to redo the smgr API
* AFAICS the only downside of not having a Relation available in smgr.c and md.c is that error messages could only refer to the RelFileNode numbers and not to the relation name. I'm not sure this is bad, since in my experience what you want to know about such errors is the actual disk filename, which RelFileNode tells you and relation name doesn't. We could preserve the current behavior by passing the relation name to smgropen when available, and saving the name in struct SMgrRelation. But I'm inclined not to. Comments? That all sounds pretty nice. From my point of view I recall you saying that this would need to be done for tablespaces a long time ago - so I just request that the rewrite be done with future tablespaces in mind :) Chris ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] Vacuum Delay feature
The attached patch applies to CVS tip as of 02/05/2004 and implements the cost based vacuum delay feature. A detailed description with charts of different configuration settings can be found here: http://developer.postgresql.org/~wieck/vacuum_cost/ There is a problem left that seems to be related to Toms observations in the shutdown behaviour of the postmaster. My current guess is that the napping done via select(2) somehow prevents responding to the query abort signal. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # vacuum_cost.75devel.diff.gz Description: GNU Zip compressed data ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Vacuum Delay feature
Attached is a corrected version that solves the query cancel problem by not napping any more and going full speed as soon as any signal is pending. If nobody objects, I'm going to commit this tomorrow. Jan Jan Wieck wrote: The attached patch applies to CVS tip as of 02/05/2004 and implements the cost based vacuum delay feature. A detailed description with charts of different configuration settings can be found here: http://developer.postgresql.org/~wieck/vacuum_cost/ There is a problem left that seems to be related to Toms observations in the shutdown behaviour of the postmaster. My current guess is that the napping done via select(2) somehow prevents responding to the query abort signal. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # vacuum_cost.75devel.diff.gz Description: GNU Zip compressed data ---(end of broadcast)--- TIP 8: explain analyze is your friend