Re: [BUGS] BUG #2712: could not fsync segment: Permission
It seems to me that it's not been included in b3. Tom? //Magnus -Original Message- From: Thomas H. [mailto:[EMAIL PROTECTED] Sent: den 8 november 2006 12:59 To: Magnus Hagander Cc: pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #2712: could not fsync segment: Permission magnus, has this fix made it into the 8.2b3 codebase? at least it fixed 1 of my 3 problems ;-) regards, thomas - Original Message - From: Magnus Hagander [EMAIL PROTECTED] To: Thomas H. [EMAIL PROTECTED] Sent: Monday, October 30, 2006 1:21 PM Subject: RE: [BUGS] BUG #2712: could not fsync segment: Permission Yeah, IIRC it was only for the xlog stuff. //Magnus -Original Message- From: Thomas H. [mailto:[EMAIL PROTECTED] Sent: den 30 oktober 2006 13:21 To: Magnus Hagander Subject: Re: [BUGS] BUG #2712: could not fsync segment: Permission the fsync error still appears: 2006-10-30 13:14:52 [4896] LOG: 42501: could not fsync segment 0 of relation 1663/3964774/6554576: Permission denied 2006-10-30 13:14:52 [4896] LOCATION: mdsync, md.c:785 2006-10-30 13:14:52 [4896] ERROR: XX000: storage sync failed on magnetic disk: Permission denied 2006-10-30 13:14:52 [4896] LOCATION: smgrsync, smgr.c:888 the patch is for the xlog rename error / lockup, right? i'll try to test if the xlog lockup is fixed by the patch later this evening when doing a bulk data load. regards, thomas - Original Message - From: Magnus Hagander [EMAIL PROTECTED] To: Thomas H. [EMAIL PROTECTED] Sent: Monday, October 30, 2006 12:49 PM Subject: RE: [BUGS] BUG #2712: could not fsync segment: Permission That's permission denied. Did you by any chanc eunpack tghe EXE into a different directory and then move it into place? Hmm. It may be in the ZIP, but isn't in the MSI. I'll talk to Dave about getting it out of the ZIP. //Magnus -Original Message- From: Thomas H. [mailto:[EMAIL PROTECTED] Sent: den 30 oktober 2006 12:49 To: Magnus Hagander Subject: Re: [BUGS] BUG #2712: could not fsync segment: Permission - C:\Program Files\PostgreSQL\8.2beta1\binpg_ctl start -D D:\DB\PostgreSQL-8.2\data Server startet could not open process token: error code 5 - does this help? FYI: postmaster.exe still is (was) in the current beta2 zip file... cheers, thomas - Original Message - From: Magnus Hagander [EMAIL PROTECTED] To: Thomas H. [EMAIL PROTECTED] Sent: Monday, October 30, 2006 12:37 PM Subject: RE: [BUGS] BUG #2712: could not fsync segment: Permission I didn't *think* so - it should have no changes affecting that. Can you start it manually frmot he commandprmopt using pg_ctl and see if you get a different error msg on that? WRT postmaster.exe, you can just remove that file, it's not used anymore. And it's not installed by the current MSI installer anymore. //Magnus -Original Message- From: Thomas H. [mailto:[EMAIL PROTECTED] Sent: den 30 oktober 2006 00:18 To: Magnus Hagander Subject: Re: [BUGS] BUG #2712: could not fsync segment: Permission hmm thats what i did - b2 is running here all the time (so pg_ctl really is v8.2b2). still, replacing didn't work. so what i meant by build-dependent is, that the new patched 8.2b2 postgres.exe might also need a newer pg_ctl than the one that comes with the b2 package? cheers, thomas - Original Message - From: Magnus Hagander [EMAIL PROTECTED] To: Thomas H. [EMAIL PROTECTED] Sent: Sunday, October 29, 2006 10:37 PM Subject: RE: [BUGS] BUG #2712: could not fsync segment: Permission Hi! Yeah, it might be versio ndependent. If you can, please first install 8.2beta2 and *then* replace the files. That's the version I patched and used. //Magnus -Original Message- From: Thomas H. [mailto:[EMAIL PROTECTED] Sent: Sunday, October 29, 2006 10:35 PM To: Magnus Hagander Subject: Re: [BUGS] BUG #2712: could not fsync segment: Permission hi magnus sorry for directly writing to you - don't want to bother the others with my noise ;-) unfortunately it cannot even start with the replaced files. (don't mind the service name - i updated from b1 to b2 without changing folder/service names) - The PostgreSQL Database Server 8.2beta1 service is starting. The PostgreSQL Database Server 8.2beta1 service could not be started. A system error has occurred. System error 1067 has occurred. The process terminated unexpectedly. - the eventlog says: pg_ctl: konnte postgres Programmdatei nicht finden (which means: pg_ctl: could not locate postgres file
Re: [BUGS] BUG #2712: could not fsync segment: Permission
Magnus Hagander [EMAIL PROTECTED] writes: It seems to me that it's not been included in b3. Tom? I'm waiting for some report of whether it fixes the problems? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [BUGS] BUG #2712: could not fsync segment: Permission
Magnus Hagander [EMAIL PROTECTED] writes: It seems to me that it's not been included in b3. Tom? I'm waiting for some report of whether it fixes the problems? voilĂ : Sent: Tuesday, October 31, 2006 7:23 AM Subject: Re: [BUGS] xlog lockup patch (was: BUG #2712: could not fsync segment: Permission) i've loaded 1gb of data without any xlog-problems, whereas with the 8.2b2 executable it locked up after ~100mb. the xlog-files are cycling... if i need to test for some specific behaviour let me know. after using the patched b2 executable, the db system never locked up again, and no xlog error messages in the logs either. the other problem with locked files within the data directories is unaffected by the patch, obviously - but as that does not lock up the system, its not that severe (but still scary ;-)) regards, thomas ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [BUGS] BUG #2712: could not fsync segment: Permission
I wrote: It might be interesting to think about not requiring the ControlFileLock to be held while making new WAL segments. I think the only reason it does that is to ensure that link/rename failure can be treated as a hard error (because no one could have beat us to the filename), but we're already having to back off that stance for Windows ... Just for the archives: I looked into doing this, and realized that the reason we use a lock here is that on platforms without working link() we use rename() to emplace new WAL segments. And rename() is defined to unlink any existing file of the target name, which would be completely unsafe to do. On platforms with link() we could probably get away without a lock, but I'm disinclined to introduce that much of a platform-specific difference in behavior when we don't have to. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [BUGS] BUG #2712: could not fsync segment: Permission
I haven't reproduced this on my box. But if you can give me a patch to try I can build binaries for Thomas to test, if he can do testing but not building. Utterly untested ... BTW, why does pgrename have an #if to check either GetLastError() or errno, but pgunlink doesn't? Ok, I've built a .EXE with this patch. It's compiled without pretty much all other options, hope that still works :-) (Meaning no NLS, no kerberos, no SSL etc) Grab the exe from http://www.hagander.net/download/postgres_renamepatch.zip. Sorry about the delay. //Magnus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [BUGS] BUG #2712: could not fsync segment: Permission
thanks alot. using only standard features, so the missing bits shouldn't be a problem. will try to test later tonight or tomorrow. Grab the exe from http://www.hagander.net/download/postgres_renamepatch.zip. report will follow. regards, thomas - Original Message - From: Magnus Hagander [EMAIL PROTECTED] To: Tom Lane [EMAIL PROTECTED] Cc: Peter Brant [EMAIL PROTECTED]; Thomas H. [EMAIL PROTECTED]; pgsql-bugs@postgresql.org; Bruce Momjian [EMAIL PROTECTED] Sent: Sunday, October 29, 2006 6:10 PM Subject: RE: [BUGS] BUG #2712: could not fsync segment: Permission I haven't reproduced this on my box. But if you can give me a patch to try I can build binaries for Thomas to test, if he can do testing but not building. Utterly untested ... BTW, why does pgrename have an #if to check either GetLastError() or errno, but pgunlink doesn't? Ok, I've built a .EXE with this patch. It's compiled without pretty much all other options, hope that still works :-) (Meaning no NLS, no kerberos, no SSL etc) Grab the exe from http://www.hagander.net/download/postgres_renamepatch.zip. Sorry about the delay. //Magnus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [BUGS] BUG #2712: could not fsync segment: Permission
It might be interesting to think about not requiring the ControlFileLock to be held while making new WAL segments. I think the only reason it does that is to ensure that link/rename failure can be treated as a hard error (because no one could have beat us to the filename), but we're already having to back off that stance for Windows ... on a sidenote, i was able to work around the total xlog-lock by ingreasing checkpoint_segments from 3 (default) to 12. that seems enough to have the processes release (timeout?) the filehandles before writer-process wants to rename the xlog file, at least under normal workload. if there is a data load, the lockup still happens, but i can live with that for now. the logs are still being swamped with the other delete error messages, tho: 2006-10-27 16:16:58 [5828] ERROR: XX000: storage sync failed on magnetic disk: Permission denied 2006-10-27 16:16:58 [5828] LOCATION: smgrsync, smgr.c:888 2006-10-27 16:16:59 [5828] LOG: 42501: could not fsync segment 0 of relation 1663/3964774/6495380: Permission denied 2006-10-27 16:16:59 [5828] LOCATION: mdsync, md.c:785 2006-10-27 16:16:59 [5828] ERROR: XX000: storage sync failed on magnetic disk: Permission denied 2006-10-27 16:16:59 [5828] LOCATION: smgrsync, smgr.c:888 magnus, where you able to do a debug build for me to test the patch? would be nice if a solution could be found for the final 8.2 release. cheers, thomas ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [BUGS] BUG #2712: could not fsync segment: Permission
The same problem exists in 8.1 too. See this thread http://archives.postgresql.org/pgsql-bugs/2006-04/msg00177.php Tom and Magnus tracked down a cause, but I don't think a fix was ever implemented. Thomas seems to have two different issues there: the could not rename file problem on the pg_xlog file is probably explained by the mechanism we identified back then (and I'm not sure why no fix has been installed), but there is no known reason other than antivirus software for the could not fsync problem. As for fixing the problem we do understand: ISTM it's just an awful idea for pgrename and pgunlink to be willing to loop forever. I think they should time out and report the failure after some reasonable period (say between 10 sec and a minute). If we simply made that change, then the behavior when there's an idle backend sitting on a filehandle for an old xlog segment would be that checkpoints would fail at the MoveOfflineLogs stage, which would not be fatal, but it'd be annoying. We'd probably want to further tweak InstallXLogFileSegment so that rename failure isn't an ERROR, at least not on Windows. (I think we could just make it return false, which'd cause the caller to try to delete the xlog segment, which should work even though rename doesn't.) I'm not in a position to test this though. Magnus or Bruce? I haven't reproduced this on my box. But if you can give me a patch to try I can build binaries for Thomas to test, if he can do testing but not building. //Magnus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [BUGS] BUG #2712: could not fsync segment: Permission
I'm not in a position to test this though. Magnus or Bruce? I haven't reproduced this on my box. But if you can give me a patch to try I can build binaries for Thomas to test, if he can do testing but not building. a binary would be marvelous. if too much hasle, i can setup a msvc++ 2005 here and try to build it on my own, but would obviously prefere if i won't have to... b2 is installed here, but i'm seeing the same problems, so yes, i'm ready for testing ;-) thanks, thomas ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [BUGS] BUG #2712: could not fsync segment: Permission
Magnus Hagander [EMAIL PROTECTED] writes: I haven't reproduced this on my box. But if you can give me a patch to try I can build binaries for Thomas to test, if he can do testing but not building. Utterly untested ... BTW, why does pgrename have an #if to check either GetLastError() or errno, but pgunlink doesn't? regards, tom lane binIB0etwwbRx.bin Description: xlog-rename.patch.gz ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [BUGS] BUG #2712: could not fsync segment: Permission
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: I haven't reproduced this on my box. But if you can give me a patch to try I can build binaries for Thomas to test, if he can do testing but not building. Utterly untested ... BTW, why does pgrename have an #if to check either GetLastError() or errno, but pgunlink doesn't? Seems like a bug --- they both should have it. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [BUGS] BUG #2712: could not fsync segment: Permission
As for fixing the problem we do understand: ISTM it's just an awful idea for pgrename and pgunlink to be willing to loop forever. I think they should time out and report the failure after some reasonable period (say between 10 sec and a minute). is the main problem realy in the rename/delete function? while i'm in no position of actually knowing whats going on under the hood, my observations in +10 cases during this afternoon/evening revealed some patterns: it is defenitely the writer process that blocks the db. but in every case the writer process seems to fail to rename the file due to another postgresql still holding a filehandle to the very xlog file that should be renamed. ProcessExplorer lets you force a close of the file handle - as soon as you do this [which is a bad thing to do, i assume], the rename succeeds and processing continues normally. i actually can reproduce the error at will now - i just need do pump enough data into the db (~200mb data seems sufficient) to have it lock up. - thomas ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [BUGS] BUG #2712: could not fsync segment: Permission
Thomas H. [EMAIL PROTECTED] writes: it is defenitely the writer process that blocks the db. but in every case the writer process seems to fail to rename the file due to another postgresql still holding a filehandle to the very xlog file that should be renamed. Right, all you need is a backend process that's made at least one xlog write and then is left to sit idle for long enough that that xlog file is due for recycling. However, the fact that the writer process is stuck should not in itself cause the DB to lock up. I think what's really happening is that after the writer process gets stuck, the remaining backends chew all the available WAL, and then they need to create more WAL segments for themselves, and the writer process is holding the ControlFileLock so they can't. It might be interesting to think about not requiring the ControlFileLock to be held while making new WAL segments. I think the only reason it does that is to ensure that link/rename failure can be treated as a hard error (because no one could have beat us to the filename), but we're already having to back off that stance for Windows ... regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [BUGS] BUG #2712: could not fsync segment: Permission
i've installed Filemon (http://www.sysinternals.com/Utilities/Filemon.html) now. this gives more insight what happens to the file. ... D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 DELETE PEND Options: Open Access: 0012019F This is quite interesting, because it says that Filemon knows how to distinguish a delete pending error from other errors. If we could do that, then my prior worries about ignoring all EACCES errors would go away. What's it looking at exactly? filemon is a kernel mode filter driver. So it's looking at kernel-only datastructures. AFAIK, there is no way to see that from userspace. //Magnus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [BUGS] BUG #2712: could not fsync segment: Permission denied
unfortunately not. and this is not happening with 8.1 regards, thomas - Original Message - From: Tom Lane [EMAIL PROTECTED] To: Thomas H [EMAIL PROTECTED] Cc: pgsql-bugs@postgresql.org Sent: Monday, October 23, 2006 4:07 AM Subject: Re: [BUGS] BUG #2712: could not fsync segment: Permission denied Thomas H [EMAIL PROTECTED] writes: Operating system: windows 2003 standard Description:could not fsync segment: Permission denied The usual answer to this has been that you're running some overenthusiastic antivirus software. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [BUGS] BUG #2712: could not fsync segment: Permission denied
there is defenitely something terribly wrong in the windows 8.2b1 regarding file access/locking. 2nd total db lockup today due to file access locks (all hold by postmaster): {...} 2006-10-23 17:48:10 LOG: 42501: could not fsync segment 0 of relation 1663/3964774/6419608: Permission denied 2006-10-23 17:48:10 LOCATION: mdsync, md.c:785 2006-10-23 17:48:10 ERROR: XX000: storage sync failed on magnetic disk: Permission denied 2006-10-23 17:48:10 LOCATION: smgrsync, smgr.c:888 2006-10-23 17:48:10 LOG: 0: duration: 327.999 ms statement: SELECT threads.*, first.login AS first_user, last.login AS last_user FROM forum.threads JOIN users.users AS first ON first.id = threads.t_first_user LEFT JOIN users.users AS last ON last.id = threads.t_last_user WHERE t_b_id = 4 AND t_status_deleted = false ORDER BY t_status_sticky DESC, t_last_post DESC 2006-10-23 17:48:10 LOCATION: exec_simple_query, postgres.c:1007 2006-10-23 17:48:14 LOG: 0: could not rename file pg_xlog/00010004002E to pg_xlog/000100040037, continuing to try 2006-10-23 17:48:14 LOCATION: pgrename, dirmod.c:142 2006-10-23 18:12:05 LOG: 0: received fast shutdown request 2006-10-23 18:12:05 LOCATION: pmdie, postmaster.c:1903 2006-10-23 18:12:05 LOG: 0: aborting any active transactions 2006-10-23 18:12:05 LOCATION: pmdie, postmaster.c:1910 2006-10-23 18:12:05 FATAL: 57P01: terminating connection due to administrator command 2006-10-23 18:12:05 LOCATION: ProcessInterrupts, postgres.c:2465 2006-10-23 18:12:06 ERROR: XX000: could not rename file pg_xlog/00010004002E to pg_xlog/000100040037 (initialization of log file 4, segment 55): A blocking operation was interrupted by a call to WSACancelBlockingCall. 2006-10-23 18:12:06 LOCATION: InstallXLogFileSegment, xlog.c:2201 {...} from 17:48:14 on pgsql didn't handle anymore queries until shutdown. as soon as one restarts postmaster, the file locks are cleared up. and no, there are no other file locking tools (av scanners and the such) running - 8.1 on the same box (even on same partition) run fine. regarnds, - thomas - Original Message - From: Thomas H. [EMAIL PROTECTED] To: Tom Lane [EMAIL PROTECTED] Cc: pgsql-bugs@postgresql.org Sent: Monday, October 23, 2006 11:52 AM Subject: Re: [BUGS] BUG #2712: could not fsync segment: Permission denied unfortunately not. and this is not happening with 8.1 regards, thomas - Original Message - From: Tom Lane [EMAIL PROTECTED] To: Thomas H [EMAIL PROTECTED] Cc: pgsql-bugs@postgresql.org Sent: Monday, October 23, 2006 4:07 AM Subject: Re: [BUGS] BUG #2712: could not fsync segment: Permission denied Thomas H [EMAIL PROTECTED] writes: Operating system: windows 2003 standard Description:could not fsync segment: Permission denied The usual answer to this has been that you're running some overenthusiastic antivirus software. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [BUGS] BUG #2712: could not fsync segment: Permission
The same problem exists in 8.1 too. See this thread http://archives.postgresql.org/pgsql-bugs/2006-04/msg00177.php Tom and Magnus tracked down a cause, but I don't think a fix was ever implemented. FWIW, we were bitten by the fsync problem which you noticed too. Unfortunately we were never able to track down a cause (see the mailing list archives). They are separate problems though. Pete Thomas H. [EMAIL PROTECTED] 23.10.2006 18:21 there is defenitely something terribly wrong in the windows 8.2b1 regarding file access/locking. 2nd total db lockup today due to file access locks (all hold by postmaster): 2006-10-23 17:48:10 LOCATION: exec_simple_query, postgres.c:1007 2006-10-23 17:48:14 LOG: 0: could not rename file pg_xlog/00010004002E to pg_xlog/000100040037, continuing to try ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [BUGS] BUG #2712: could not fsync segment: Permission
Peter Brant [EMAIL PROTECTED] writes: The same problem exists in 8.1 too. See this thread http://archives.postgresql.org/pgsql-bugs/2006-04/msg00177.php Tom and Magnus tracked down a cause, but I don't think a fix was ever implemented. Thomas seems to have two different issues there: the could not rename file problem on the pg_xlog file is probably explained by the mechanism we identified back then (and I'm not sure why no fix has been installed), but there is no known reason other than antivirus software for the could not fsync problem. As for fixing the problem we do understand: ISTM it's just an awful idea for pgrename and pgunlink to be willing to loop forever. I think they should time out and report the failure after some reasonable period (say between 10 sec and a minute). If we simply made that change, then the behavior when there's an idle backend sitting on a filehandle for an old xlog segment would be that checkpoints would fail at the MoveOfflineLogs stage, which would not be fatal, but it'd be annoying. We'd probably want to further tweak InstallXLogFileSegment so that rename failure isn't an ERROR, at least not on Windows. (I think we could just make it return false, which'd cause the caller to try to delete the xlog segment, which should work even though rename doesn't.) I'm not in a position to test this though. Magnus or Bruce? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [BUGS] BUG #2712: could not fsync segment: Permission
The same problem exists in 8.1 too. See this thread its only appearing in 8.2 here, i've just rechecked our logs... is there any workaround? how did you get around that problem of having a total lockdown? thanks, thomas Thomas H. [EMAIL PROTECTED] 23.10.2006 18:21 there is defenitely something terribly wrong in the windows 8.2b1 regarding file access/locking. 2nd total db lockup today due to file access locks (all hold by postmaster): 2006-10-23 17:48:10 LOCATION: exec_simple_query, postgres.c:1007 2006-10-23 17:48:14 LOG: 0: could not rename file pg_xlog/00010004002E to pg_xlog/000100040037, continuing to try ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [BUGS] BUG #2712: could not fsync segment: Permission
That might be one cause (or it might otherwise exacerbate the problem), but it isn't the only cause. We weren't running anti-virus software and neither is Thomas. Unfortunately with the last go around, we collectively ran out of ideas before an underlying cause could be identified. Pete Tom Lane [EMAIL PROTECTED] 23.10.2006 19:49 installed), but there is no known reason other than antivirus software for the could not fsync problem. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [BUGS] BUG #2712: could not fsync segment: Permission
Move to Linux. :-) In our case, everything but the database servers were already Linux so it was an easy choice. Things have been rock solid since then. Once things get stuck, I don't think there is an alternative besides stop -m immediate. However, since the problem is caused by an idle backend holding onto an old WAL segment, maybe having your middle tier/connection pool close and reopen the connections to the database every so often would function as a workaround. Somebody with more knowledge of PG internals than I would have to define every so often though (if the idea is viable at all). Pete Thomas H. [EMAIL PROTECTED] 23.10.2006 20:00 is there any workaround? how did you get around that problem of having a total lockdown? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [BUGS] BUG #2712: could not fsync segment: Permission
Peter Brant [EMAIL PROTECTED] writes: FWIW, we were bitten by the fsync problem which you noticed too. Unfortunately we were never able to track down a cause (see the mailing list archives). They are separate problems though. Actually, now that I look back in the archives, I think we had theorized that the fsync errors come from attempting to fsync a file that's already been deleted but some backend still has a reference to. Apparently that leads to EACCES instead of ENOENT (which the code is already prepared to expect). http://archives.postgresql.org/pgsql-bugs/2006-04/msg00215.php regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [BUGS] BUG #2712: could not fsync segment: Permission
Actually, now that I look back in the archives, I think we had theorized that the fsync errors come from attempting to fsync a file that's already been deleted but some backend still has a reference to. Apparently that leads to EACCES instead of ENOENT (which the code is already prepared to expect). with process explorer i can actually check which postgres.exe instance (in all cases i've checked its just 1 instance, and always just 1 file) holds the lock for the file in question. but will that help in determining why it is still holding a reference? the postgres instance that holds the lock eventually closes the filehandle after some minutes. the process itself is not killed but continues thereafter. let me know if i can be of any assistance. since we do regurarly reindex one table whose index size keeps growing despite of often vacuuming, the fsync-problem happens almost 4-5 times per hour. regards, thomas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [BUGS] BUG #2712: could not fsync segment: Permission
Thomas H. [EMAIL PROTECTED] writes: with process explorer i can actually check which postgres.exe instance (in all cases i've checked its just 1 instance, and always just 1 file) holds the lock for the file in question. So which one is it? the postgres instance that holds the lock eventually closes the filehandle after some minutes. the process itself is not killed but continues thereafter. That sounds a bit like what I'd expect the bgwriter to do, but the bgwriter is also the one trying to issue the fsync. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [BUGS] BUG #2712: could not fsync segment: Permission
with process explorer i can actually check which postgres.exe instance (in all cases i've checked its just 1 instance, and always just 1 file) holds the lock for the file in question. So which one is it? it's always one of the db-slaves and not logger process, stats collector process or writer process: right now its PID 4844 (\BaseNamedObjects\pgident: postgres: db_outnow outnow1 127.0.0.1(2122) idle) that tries to write D:\DB\PostgreSQL-8.2\data\base\3964774\6422331 can i somehow check what object that file-OID belong(ed/s) to? - thomas ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [BUGS] BUG #2712: could not fsync segment: Permission
Thomas H. [EMAIL PROTECTED] writes: right now its PID 4844 (\BaseNamedObjects\pgident: postgres: db_outnow outnow1 127.0.0.1(2122) idle) that tries to write D:\DB\PostgreSQL-8.2\data\base\3964774\6422331 Do you actually mean it's trying to write that file? Or is it just sitting there holding the open filehandle? can i somehow check what object that file-OID belong(ed/s) to? You can check in pg_class.relfilenode and pg_class.oid of that database to see if you get a match. But our theory is that this table has been deleted ... regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [BUGS] BUG #2712: could not fsync segment: Permission
Thomas H. [EMAIL PROTECTED] writes: right now its PID 4844 (\BaseNamedObjects\pgident: postgres: db_outnow outnow1 127.0.0.1(2122) idle) that tries to write D:\DB\PostgreSQL-8.2\data\base\3964774\6422331 Do you actually mean it's trying to write that file? Or is it just sitting there holding the open filehandle? well, hard to tell :-) according to the log-messages i would assume it is *trying* to write. but the file in question isn't physically there anymore, it's just the open file handle that keeps it from vanish totally - you do not have access to the file (permission denied / access denied) if you for example try to read it or its attributes in file explorer. i've installed Filemon (http://www.sysinternals.com/Utilities/Filemon.html) now. this gives more insight what happens to the file. in this case its file 6422806, the first error message appeared at 23:45:21, the last one at 23:45:26 (only short duration this time). {} 23:44:57 postgres.exe:1944 WRITE D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS Offset: 16384 Length: 8192 23:44:57 postgres.exe:1944 CLOSE D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS 23:44:57 postgres.exe:1944 OPEN D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS Options: Open Access: 00010080 23:44:57 postgres.exe:1944 QUERY INFORMATION D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS FileAttributeTagInformation 23:44:57 postgres.exe:1944 DELETE D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS 23:44:57 postgres.exe:1944 CLOSE D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS 23:44:57 postgres.exe:1944 OPEN D:\DB\PostgreSQL-8.2\data\base\3964774\6422806.1 NOT FOUND Options: Open Access: 00010080 23:44:57 postgres.exe:5364 CLOSE D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS 23:44:57 postgres.exe:2780 CLOSE D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS 23:44:59 postgres.exe:6036 CLOSE D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS 23:45:11 postgres.exe:5196 CLOSE D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS 23:45:20 postgres.exe:1268 CLOSE D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS 23:45:21 postgres.exe:5196 OPEN D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 DELETE PEND Options: Open Access: 0012019F 23:45:22 postgres.exe:5196 OPEN D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 DELETE PEND Options: Open Access: 0012019F 23:45:23 postgres.exe:5196 OPEN D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 DELETE PEND Options: Open Access: 0012019F 23:45:24 postgres.exe:5196 OPEN D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 DELETE PEND Options: Open Access: 0012019F 23:45:25 postgres.exe:5196 OPEN D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 DELETE PEND Options: Open Access: 0012019F 23:45:26 postgres.exe:5196 OPEN D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 DELETE PEND Options: Open Access: 0012019F 23:45:26 postgres.exe:5428 CLOSE D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS 23:45:26 postgres.exe:2200 CLOSE D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS 23:45:27 postgres.exe:5196 OPEN D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 NOT FOUND Options: Open Access: 0012019F i have earlier log data for this file if needed, but at :45:27 was the last entry. unfortunately i wasn't quick enough to find the blocking process in processviewer, but i guess its pid 5196 can i somehow check what object that file-OID belong(ed/s) to? You can check in pg_class.relfilenode and pg_class.oid of that database to see if you get a match. But our theory is that this table has been deleted ... nothing there as assumed. - thomas ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [BUGS] BUG #2712: could not fsync segment: Permission
Thomas H. [EMAIL PROTECTED] writes: Do you actually mean it's trying to write that file? Or is it just sitting there holding the open filehandle? well, hard to tell :-) according to the log-messages i would assume it is *trying* to write. The log messages you have don't make it clear which process is trying to do the fsync, but I would expect it to be the bgwriter. (Possibly you should modify log_line_prefix to include PID so we can tell a bit better.) regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [BUGS] BUG #2712: could not fsync segment: Permission
The log messages you have don't make it clear which process is trying to do the fsync, but I would expect it to be the bgwriter. (Possibly you should modify log_line_prefix to include PID so we can tell a bit better.) you're right (as always :-)). its the writer process (pid 5196) that outputs the error messages: 2006-10-24 00:09:09 [5196] ERROR: XX000: storage sync failed on magnetic disk: Permission denied 2006-10-24 00:09:09 [5196] LOCATION: smgrsync, smgr.c:888 2006-10-24 00:09:10 [5196] LOG: 42501: could not fsync segment 0 of relation 1663/3964774/6422947: Permission denied 2006-10-24 00:09:10 [5196] LOCATION: mdsync, md.c:785 and in this case, its process 5988 that keeps the file handle open (its entry in pg_class is already deleted): \BaseNamedObjects\pgident: postgres: db_outnow outnow1 127.0.0.1(2362) idle D:\DB\PostgreSQL-8.2\data\base\3964774\6422947 (1 references, 1 handle) ... while pid 5196 constantly tries to open the file (for over 15min in this case), until... 00:22:18 postgres.exe:5196 OPEN D:\DB\PostgreSQL-8.2\data\base\3964774\6422947 DELETE PEND Options: Open Access: 0012019F 00:22:19 postgres.exe:5196 OPEN D:\DB\PostgreSQL-8.2\data\base\3964774\6422947 DELETE PEND Options: Open Access: 0012019F 00:22:20 postgres.exe:5988 CLOSE D:\DB\PostgreSQL-8.2\data\base\3964774\6422947 SUCCESS 00:22:20 postgres.exe:5196 OPEN D:\DB\PostgreSQL-8.2\data\base\3964774\6422947 NOT FOUND Options: Open Access: 0012019F is that of any use? what more logging options would be interesting? - thomas ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [BUGS] BUG #2712: could not fsync segment: Permission
Thomas H. [EMAIL PROTECTED] writes: i've installed Filemon (http://www.sysinternals.com/Utilities/Filemon.html) now. this gives more insight what happens to the file. ... D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 DELETE PEND Options: Open Access: 0012019F This is quite interesting, because it says that Filemon knows how to distinguish a delete pending error from other errors. If we could do that, then my prior worries about ignoring all EACCES errors would go away. What's it looking at exactly? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[BUGS] BUG #2712: could not fsync segment: Permission denied
The following bug has been logged online: Bug reference: 2712 Logged by: Thomas H Email address: [EMAIL PROTECTED] PostgreSQL version: 8.2b1 Operating system: windows 2003 standard Description:could not fsync segment: Permission denied Details: sometimes we're seeing loads of errors in the log: 2006-10-22 23:48:50 LOG: could not fsync segment 0 of relation 1663/3964774/6409340: Permission denied 2006-10-22 23:48:50 ERROR: storage sync failed on magnetic disk: Permission denied 2006-10-22 23:48:51 LOG: could not fsync segment 0 of relation 1663/3964774/6409340: Permission denied 2006-10-22 23:48:51 ERROR: storage sync failed on magnetic disk: Permission denied 2006-10-22 23:48:52 LOG: could not fsync segment 0 of relation 1663/3964774/6409340: Permission denied 2006-10-22 23:48:52 ERROR: storage sync failed on magnetic disk: Permission denied 2006-10-22 23:48:53 LOG: could not fsync segment 0 of relation 1663/3964774/6409340: Permission denied 2006-10-22 23:48:53 ERROR: storage sync failed on magnetic disk: Permission denied {...} when this happens, there are also files locked within the data\base\{dbid}\. access to those files are denied by the os - the files vanish as soon as postmaster ist stopped restarted. i haven't yet found a possible reason - i suspect the error to appear *sometimes* after issuing a VACUUM FULL ANALYZE {tablename} / REINDEX TABLE {tablename}. the hardware is checked and ok. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [BUGS] BUG #2712: could not fsync segment: Permission denied
in verbose mode, the log shows a little bit more: 2006-10-23 03:23:14 LOG: 42501: could not fsync segment 0 of relation 1663/3964774/6411190: Permission denied 2006-10-23 03:23:14 LOCATION: mdsync, md.c:785 2006-10-23 03:23:14 ERROR: XX000: storage sync failed on magnetic disk: Permission denied 2006-10-23 03:23:14 LOCATION: smgrsync, smgr.c:888 - thomas - Original Message - From: Thomas H [EMAIL PROTECTED] To: pgsql-bugs@postgresql.org Sent: Monday, October 23, 2006 1:28 AM Subject: [BUGS] BUG #2712: could not fsync segment: Permission denied The following bug has been logged online: Bug reference: 2712 Logged by: Thomas H Email address: [EMAIL PROTECTED] PostgreSQL version: 8.2b1 Operating system: windows 2003 standard Description:could not fsync segment: Permission denied Details: sometimes we're seeing loads of errors in the log: 2006-10-22 23:48:50 LOG: could not fsync segment 0 of relation 1663/3964774/6409340: Permission denied 2006-10-22 23:48:50 ERROR: storage sync failed on magnetic disk: Permission denied 2006-10-22 23:48:51 LOG: could not fsync segment 0 of relation 1663/3964774/6409340: Permission denied 2006-10-22 23:48:51 ERROR: storage sync failed on magnetic disk: Permission denied 2006-10-22 23:48:52 LOG: could not fsync segment 0 of relation 1663/3964774/6409340: Permission denied 2006-10-22 23:48:52 ERROR: storage sync failed on magnetic disk: Permission denied 2006-10-22 23:48:53 LOG: could not fsync segment 0 of relation 1663/3964774/6409340: Permission denied 2006-10-22 23:48:53 ERROR: storage sync failed on magnetic disk: Permission denied {...} when this happens, there are also files locked within the data\base\{dbid}\. access to those files are denied by the os - the files vanish as soon as postmaster ist stopped restarted. i haven't yet found a possible reason - i suspect the error to appear *sometimes* after issuing a VACUUM FULL ANALYZE {tablename} / REINDEX TABLE {tablename}. the hardware is checked and ok. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [BUGS] BUG #2712: could not fsync segment: Permission denied
Thomas H [EMAIL PROTECTED] writes: Operating system: windows 2003 standard Description:could not fsync segment: Permission denied The usual answer to this has been that you're running some overenthusiastic antivirus software. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend