[BUGS] 8.2b2: update.bad in windows release points to wrong .msi
as the subject says - the upgrade.bat in the b2 release thats currently being mirrored points to the installation files of8.1 instead of the 8.2 ones. best regards, thomas
Re: [BUGS] 8.2b2: update.bad in windows release points to wrong .msi
Thanks, fixed in CVS. Regards, Dave. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Thomas H.Sent: 25 October 2006 12:16To: pgsql-bugs@postgresql.orgSubject: [BUGS] 8.2b2: update.bad in windows release points to wrong .msi as the subject says - the upgrade.bat in the b2 release thats currently being mirrored points to the installation files of8.1 instead of the 8.2 ones. best regards, thomas
Re: [BUGS] BUG #2712: could not fsync segment: Permission
The same problem exists in 8.1 too. See this thread http://archives.postgresql.org/pgsql-bugs/2006-04/msg00177.php Tom and Magnus tracked down a cause, but I don't think a fix was ever implemented. Thomas seems to have two different issues there: the could not rename file problem on the pg_xlog file is probably explained by the mechanism we identified back then (and I'm not sure why no fix has been installed), but there is no known reason other than antivirus software for the could not fsync problem. As for fixing the problem we do understand: ISTM it's just an awful idea for pgrename and pgunlink to be willing to loop forever. I think they should time out and report the failure after some reasonable period (say between 10 sec and a minute). If we simply made that change, then the behavior when there's an idle backend sitting on a filehandle for an old xlog segment would be that checkpoints would fail at the MoveOfflineLogs stage, which would not be fatal, but it'd be annoying. We'd probably want to further tweak InstallXLogFileSegment so that rename failure isn't an ERROR, at least not on Windows. (I think we could just make it return false, which'd cause the caller to try to delete the xlog segment, which should work even though rename doesn't.) I'm not in a position to test this though. Magnus or Bruce? I haven't reproduced this on my box. But if you can give me a patch to try I can build binaries for Thomas to test, if he can do testing but not building. //Magnus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [BUGS] BUG #2712: could not fsync segment: Permission
I'm not in a position to test this though. Magnus or Bruce? I haven't reproduced this on my box. But if you can give me a patch to try I can build binaries for Thomas to test, if he can do testing but not building. a binary would be marvelous. if too much hasle, i can setup a msvc++ 2005 here and try to build it on my own, but would obviously prefere if i won't have to... b2 is installed here, but i'm seeing the same problems, so yes, i'm ready for testing ;-) thanks, thomas ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [BUGS] BUG #2712: could not fsync segment: Permission
Magnus Hagander [EMAIL PROTECTED] writes: I haven't reproduced this on my box. But if you can give me a patch to try I can build binaries for Thomas to test, if he can do testing but not building. Utterly untested ... BTW, why does pgrename have an #if to check either GetLastError() or errno, but pgunlink doesn't? regards, tom lane binIB0etwwbRx.bin Description: xlog-rename.patch.gz ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [BUGS] BUG #2712: could not fsync segment: Permission
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: I haven't reproduced this on my box. But if you can give me a patch to try I can build binaries for Thomas to test, if he can do testing but not building. Utterly untested ... BTW, why does pgrename have an #if to check either GetLastError() or errno, but pgunlink doesn't? Seems like a bug --- they both should have it. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [BUGS] 8.2beta1 (w32): server process crash (tsvector)
just a small update: this problem is also present in beta 2. not a big problem for the moment, as we currently have disabled fulltext search capabilities on the website. regards, thomas - Original Message - From: [EMAIL PROTECTED] To: Tom Lane [EMAIL PROTECTED] Cc: pgsql-bugs@postgresql.org Sent: Tuesday, October 17, 2006 10:19 PM Subject: Re: [BUGS] 8.2beta1 (w32): server process crash (tsvector) the following query will crash the server process: INSERT INTO news.news SELECT * FROM news.news2; This is undoubtedly data-dependent. Can you supply some sample data that makes it happen? it's not only happening with INSERTS, but also updates. as thats easier to test, here's how i can reproduce the error: 1. create new database (encoding: UTF8) with tsearch2 on 8.2b1 win32 (system locale: de_CH.1252) 2. insert the data from the zip file [http://alternize.com/pgsql/tsearch2test.zip] (be sure to also update pg_ts_cf / pg_ts_cfgmap as we have WIN1252 locale) 3. execute UPDATE test SET idxFTI = to_tsvector('default', sometext); or similar queries 4. hopefully see the process crashing as i do ;-) 2006-10-17 17:23:44 LOG: server process (PID 4584) exited with exit code -1073741819 2006-10-17 17:23:44 LOG: terminating any other active server processes 2006-10-17 17:23:44 WARNING: terminating connection because of crash of another server process 2006-10-17 17:23:44 DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. {snipp} 2006-10-17 17:23:44 LOG: all server processes terminated; reinitializing 2006-10-17 17:23:44 LOG: database system was interrupted at 2006-10-17 17:23:41 W. Europe Daylight Time 2006-10-17 17:23:44 LOG: Windows fopen(recovery.conf,r) failed: code 2, errno 2 2006-10-17 17:23:44 LOG: Windows fopen(pg_xlog/0001.history,r) failed: code 2, errno 2 2006-10-17 17:23:44 LOG: Windows fopen(backup_label,r) failed: code 2, errno 2 2006-10-17 17:23:44 LOG: checkpoint record is at 0/E2ECA728 2006-10-17 17:23:44 LOG: redo record is at 0/E2ECA728; undo record is at 0/0; shutdown FALSE 2006-10-17 17:23:44 LOG: next transaction ID: 0/514299; next OID: 6276957 2006-10-17 17:23:44 LOG: next MultiXactId: 1; next MultiXactOffset: 0 2006-10-17 17:23:44 LOG: database system was not properly shut down; automatic recovery in progress 2006-10-17 17:23:44 LOG: redo starts at 0/E2ECA778 2006-10-17 17:23:44 LOG: unexpected pageaddr 0/DB0CC000 in log file 0, segment 227, offset 835584 2006-10-17 17:23:44 LOG: redo done at 0/E30CBE78 2006-10-17 17:23:45 LOG: database system is ready 2006-10-17 17:23:45 LOG: Windows fopen(global/pg_fsm.cache,rb) failed: code 2, errno 2 2006-10-17 17:23:45 LOG: transaction ID wrap limit is 2147484172, limited by database postgres 2006-10-17 17:23:45 LOG: Windows fopen(global/pgstat.stat,rb) failed: code 2, errno 2 i've also tried to update each record on its own in a for-loop. here the crash happens as well, sometimes after 10 updates, sometimes after 100 updates, sometimes even after 1 update. but eventually every record can be updated. so i do not think its entierly content-related... for what its worth, here's the output of pg_controldata: pg_control version number:822 Catalog version number: 200609181 Database system identifier: 4986650172201464825 Database cluster state: in production pg_control last modified: 17.10.2006 17:44:29 Current log file ID: 0 Next log file segment:230 Latest checkpoint location: 0/E4E0F978 Prior checkpoint location:0/E46BF420 Latest checkpoint's REDO location:0/E4E03098 Latest checkpoint's UNDO location:0/0 Latest checkpoint's TimeLineID: 1 Latest checkpoint's NextXID: 0/531333 Latest checkpoint's NextOID: 6285149 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Time of latest checkpoint:17.10.2006 17:43:45 Minimum recovery ending location: 0/0 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment:16777216 Maximum length of identifiers:64 Maximum columns in an index: 32 Date/time type storage: floating-point numbers Maximum length of locale name:128 LC_COLLATE: German_Switzerland.1252 LC_CTYPE: German_Switzerland.1252 let me know if more information / data is needed. on a sidenote: are those fopen() errors debug-code-leftovers or something one should worry about? i can't find those files on the file system. - thomas ---(end of broadcast)--- TIP 4: Have you searched our list
[BUGS] Out of memory error causes Abort, Abort tries to allocate memory
I found the root cause of the bug I reported at: http://archives.postgresql.org/pgsql-bugs/2006-10/msg00211.php What happens is this: * Out of memory condition causes an ERROR * ERROR triggers an AbortTransaction() * AbortTransaction() calls RecordTransactionAbort() * RecordTransactionAbort calls smgrGetPendingDeletes() * smgrGetPendingDeletes() calls palloc() * palloc() fails, resulting in ERROR, causing infinite recursion * elog.c detects infinite recursion, and elevates it to PANIC I'm not sure how easy this is to fix, but I asked on IRC and got some agreement that this is a bug. It seems to me, in order to fix it, we would have to avoid allocating memory on the AbortTransaction path. All smgrGetPendingDeletes() needs to allocate is a few dozen bytes (depending on the number of relations to be deleted). Perhaps it could allocate those bytes as list of pending deletes fills up. Or maybe we can somehow avoid needing to record the relnodes to be deleted in order for the abort to succeed. I'm still not sure why foreign keys on large insert statements don't eat memory on 7.4, but do on 8.0+. Regards, Jeff Davis ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [BUGS] Out of memory error causes Abort, Abort tries to allocate memory
Jeff Davis wrote: I found the root cause of the bug I reported at: http://archives.postgresql.org/pgsql-bugs/2006-10/msg00211.php What happens is this: * Out of memory condition causes an ERROR * ERROR triggers an AbortTransaction() * AbortTransaction() calls RecordTransactionAbort() * RecordTransactionAbort calls smgrGetPendingDeletes() * smgrGetPendingDeletes() calls palloc() * palloc() fails, resulting in ERROR, causing infinite recursion * elog.c detects infinite recursion, and elevates it to PANIC I'm not sure how easy this is to fix, but I asked on IRC and got some agreement that this is a bug. Hmm, maybe we could have AbortTransaction switch to ErrorContext, which has some preallocated space, before calling RecordTransactionAbort (or maybe have RecordTransactionAbort itself do it). Problem is, what happens if ErrorContext is filled up by doing this? At that point we will be severely fscked up, and you probably won't get the PANIC either. (Maybe it doesn't happen in this particular case, but seems a real risk.) -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [BUGS] Out of memory error causes Abort, Abort tries to
On Wed, 2006-10-25 at 16:20 -0300, Alvaro Herrera wrote: Jeff Davis wrote: I found the root cause of the bug I reported at: http://archives.postgresql.org/pgsql-bugs/2006-10/msg00211.php What happens is this: * Out of memory condition causes an ERROR * ERROR triggers an AbortTransaction() * AbortTransaction() calls RecordTransactionAbort() * RecordTransactionAbort calls smgrGetPendingDeletes() * smgrGetPendingDeletes() calls palloc() * palloc() fails, resulting in ERROR, causing infinite recursion * elog.c detects infinite recursion, and elevates it to PANIC I'm not sure how easy this is to fix, but I asked on IRC and got some agreement that this is a bug. Hmm, maybe we could have AbortTransaction switch to ErrorContext, which has some preallocated space, before calling RecordTransactionAbort (or maybe have RecordTransactionAbort itself do it). Problem is, what happens if ErrorContext is filled up by doing this? At that point we will be severely fscked up, and you probably won't get the PANIC either. (Maybe it doesn't happen in this particular case, but seems a real risk.) If we have a way to allocate memory and recover if it fails, perhaps RecordTransactionAbort() could set the rels to delete part of the log record to some special value that means There might be relations to delete, but I don't know which ones. Then, if necessary, it could determine the relations that should be deleted at recovery time. This idea assumes that we can figure out which relations are abandoned, and also assumes that smgrGetPendingDeletes() is the only routine that allocates memory on the path to abort a transaction due to an out of memory error. Regards, Jeff Davis ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [BUGS] Out of memory error causes Abort, Abort tries to allocate memory
Alvaro Herrera [EMAIL PROTECTED] writes: Jeff Davis wrote: * smgrGetPendingDeletes() calls palloc() * palloc() fails, resulting in ERROR, causing infinite recursion Hmm, maybe we could have AbortTransaction switch to ErrorContext, which has some preallocated space, before calling RecordTransactionAbort (or maybe have RecordTransactionAbort itself do it). Seems like it'd be smarter to try to free some memory before we push forward with transaction abort. ErrorContext has only a limited amount of space ... regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [BUGS] BUG #2712: could not fsync segment: Permission
As for fixing the problem we do understand: ISTM it's just an awful idea for pgrename and pgunlink to be willing to loop forever. I think they should time out and report the failure after some reasonable period (say between 10 sec and a minute). is the main problem realy in the rename/delete function? while i'm in no position of actually knowing whats going on under the hood, my observations in +10 cases during this afternoon/evening revealed some patterns: it is defenitely the writer process that blocks the db. but in every case the writer process seems to fail to rename the file due to another postgresql still holding a filehandle to the very xlog file that should be renamed. ProcessExplorer lets you force a close of the file handle - as soon as you do this [which is a bad thing to do, i assume], the rename succeeds and processing continues normally. i actually can reproduce the error at will now - i just need do pump enough data into the db (~200mb data seems sufficient) to have it lock up. - thomas ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [BUGS] BUG #2712: could not fsync segment: Permission
Thomas H. [EMAIL PROTECTED] writes: it is defenitely the writer process that blocks the db. but in every case the writer process seems to fail to rename the file due to another postgresql still holding a filehandle to the very xlog file that should be renamed. Right, all you need is a backend process that's made at least one xlog write and then is left to sit idle for long enough that that xlog file is due for recycling. However, the fact that the writer process is stuck should not in itself cause the DB to lock up. I think what's really happening is that after the writer process gets stuck, the remaining backends chew all the available WAL, and then they need to create more WAL segments for themselves, and the writer process is holding the ControlFileLock so they can't. It might be interesting to think about not requiring the ControlFileLock to be held while making new WAL segments. I think the only reason it does that is to ensure that link/rename failure can be treated as a hard error (because no one could have beat us to the filename), but we're already having to back off that stance for Windows ... regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org