Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-11-08 Thread Magnus Hagander
It seems to me that it's not been included in b3. Tom?

//Magnus 

 -Original Message-
 From: Thomas H. [mailto:[EMAIL PROTECTED] 
 Sent: den 8 november 2006 12:59
 To: Magnus Hagander
 Cc: pgsql-bugs@postgresql.org
 Subject: Re: [BUGS] BUG #2712: could not fsync segment: Permission
 
 magnus, has this fix made it into the 8.2b3 codebase? at 
 least it fixed 1 of my 3 problems ;-)
 
 regards,
 thomas
 
 - Original Message -
 From: Magnus Hagander [EMAIL PROTECTED]
 To: Thomas H. [EMAIL PROTECTED]
 Sent: Monday, October 30, 2006 1:21 PM
 Subject: RE: [BUGS] BUG #2712: could not fsync segment: Permission
 
 
 Yeah, IIRC it was only for the xlog stuff.
 
 //Magnus
 
  -Original Message-
  From: Thomas H. [mailto:[EMAIL PROTECTED]
  Sent: den 30 oktober 2006 13:21
  To: Magnus Hagander
  Subject: Re: [BUGS] BUG #2712: could not fsync segment: Permission
 
  the fsync error still appears:
  2006-10-30 13:14:52 [4896] LOG:  42501: could not fsync
  segment 0 of relation 1663/3964774/6554576: Permission denied
  2006-10-30 13:14:52 [4896] LOCATION:  mdsync, md.c:785
  2006-10-30 13:14:52 [4896] ERROR:  XX000: storage sync failed
  on magnetic
  disk: Permission denied
  2006-10-30 13:14:52 [4896] LOCATION:  smgrsync, smgr.c:888
 
  the patch is for the xlog rename error / lockup, right?
 
  i'll try to test if the xlog lockup is fixed by the patch
  later this evening when doing a bulk data load.
 
  regards,
  thomas
 
  - Original Message -
  From: Magnus Hagander [EMAIL PROTECTED]
  To: Thomas H. [EMAIL PROTECTED]
  Sent: Monday, October 30, 2006 12:49 PM
  Subject: RE: [BUGS] BUG #2712: could not fsync segment: Permission
 
 
  That's permission denied. Did you by any chanc eunpack tghe 
 EXE into a
  different directory and then move it into place?
 
  Hmm. It may be in the ZIP, but isn't in the MSI. I'll talk to
  Dave about
  getting it out of the ZIP.
 
  //Magnus
 
 
   -Original Message-
   From: Thomas H. [mailto:[EMAIL PROTECTED]
   Sent: den 30 oktober 2006 12:49
   To: Magnus Hagander
   Subject: Re: [BUGS] BUG #2712: could not fsync segment: Permission
  
   -
   C:\Program Files\PostgreSQL\8.2beta1\binpg_ctl start -D
   D:\DB\PostgreSQL-8.2\data Server startet could not open
   process token: error code 5
   -
  
   does this help?
  
   FYI: postmaster.exe still is (was) in the current beta2 
 zip file...
  
   cheers,
   thomas
  
   - Original Message -
   From: Magnus Hagander [EMAIL PROTECTED]
   To: Thomas H. [EMAIL PROTECTED]
   Sent: Monday, October 30, 2006 12:37 PM
   Subject: RE: [BUGS] BUG #2712: could not fsync segment: Permission
  
  
   I didn't *think* so - it should have no changes affecting
   that. Can you
   start it manually frmot he commandprmopt using pg_ctl and see
   if you get
   a different error msg on that?
  
   WRT postmaster.exe, you can just remove that file, it's not used
   anymore. And it's not installed by the current MSI
  installer anymore.
  
   //Magnus
  
-Original Message-
From: Thomas H. [mailto:[EMAIL PROTECTED]
Sent: den 30 oktober 2006 00:18
To: Magnus Hagander
Subject: Re: [BUGS] BUG #2712: could not fsync segment: 
 Permission
   
hmm thats what i did - b2 is running here all the time (so
pg_ctl really is v8.2b2). still, replacing didn't work. so
what i meant by build-dependent is, that the new patched
8.2b2 postgres.exe might also need a newer pg_ctl than the
one that comes with the b2 package?
   
cheers,
thomas
   
- Original Message -
From: Magnus Hagander [EMAIL PROTECTED]
To: Thomas H. [EMAIL PROTECTED]
Sent: Sunday, October 29, 2006 10:37 PM
Subject: RE: [BUGS] BUG #2712: could not fsync segment: 
 Permission
   
   
Hi!
   
Yeah, it might be versio ndependent. If you can, please
   first install
8.2beta2 and *then* replace the files. That's the version I
patched and
used.
   
//Magnus
   
 -Original Message-
 From: Thomas H. [mailto:[EMAIL PROTECTED]
 Sent: Sunday, October 29, 2006 10:35 PM
 To: Magnus Hagander
 Subject: Re: [BUGS] BUG #2712: could not fsync segment:
  Permission

 hi magnus

 sorry for directly writing to you - don't want to bother the
 others with my noise ;-)

 unfortunately it cannot even start with the replaced files.
 (don't mind the service name - i updated from b1 to b2
 without changing folder/service names)

 -
 The PostgreSQL Database Server 8.2beta1 service is starting.
 The PostgreSQL Database Server 8.2beta1 service could not
be started.
 A system error has occurred.
 System error 1067 has occurred.
 The process terminated unexpectedly.
 -

 the eventlog says:
 pg_ctl: konnte postgres Programmdatei nicht finden (which
 means: pg_ctl: could not locate postgres file

Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-11-08 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 It seems to me that it's not been included in b3. Tom?

I'm waiting for some report of whether it fixes the problems?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-11-08 Thread Thomas H.

Magnus Hagander [EMAIL PROTECTED] writes:

It seems to me that it's not been included in b3. Tom?


I'm waiting for some report of whether it fixes the problems?


voilĂ :


Sent: Tuesday, October 31, 2006 7:23 AM
Subject: Re: [BUGS] xlog lockup patch (was: BUG #2712: could not fsync 
segment: Permission)


i've loaded 1gb of data without any xlog-problems, whereas with the 8.2b2
executable it locked up after ~100mb. the xlog-files are cycling...

if i need to test for some specific behaviour let me know.



after using the patched b2 executable, the db system never locked up again, 
and no xlog error messages in the logs either.


the other problem with locked files within the data directories is 
unaffected by the patch, obviously - but as that does not lock up the 
system, its not that severe (but still scary ;-))


regards,
thomas 




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-11-08 Thread Tom Lane
I wrote:
 It might be interesting to think about not requiring the ControlFileLock
 to be held while making new WAL segments.  I think the only reason it
 does that is to ensure that link/rename failure can be treated as a hard
 error (because no one could have beat us to the filename), but we're
 already having to back off that stance for Windows ...

Just for the archives: I looked into doing this, and realized that the
reason we use a lock here is that on platforms without working link()
we use rename() to emplace new WAL segments.  And rename() is defined
to unlink any existing file of the target name, which would be
completely unsafe to do.

On platforms with link() we could probably get away without a lock,
but I'm disinclined to introduce that much of a platform-specific
difference in behavior when we don't have to.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-29 Thread Magnus Hagander
  I haven't reproduced this on my box. But if you can give me 
 a patch to 
  try I can build binaries for Thomas to test, if he can do 
 testing but 
  not building.
 
 Utterly untested ... BTW, why does pgrename have an #if to 
 check either GetLastError() or errno, but pgunlink doesn't?

Ok, I've built a .EXE with this patch. It's compiled without pretty much
all other options, hope that still works :-) (Meaning no NLS, no
kerberos, no SSL etc)

Grab the exe from
http://www.hagander.net/download/postgres_renamepatch.zip.

Sorry about the delay.

//Magnus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-29 Thread Thomas H.
thanks alot. using only standard features, so the missing bits shouldn't be 
a problem. will try to test later tonight or tomorrow.



Grab the exe from
http://www.hagander.net/download/postgres_renamepatch.zip.


report will follow.

regards,
thomas

- Original Message - 
From: Magnus Hagander [EMAIL PROTECTED]

To: Tom Lane [EMAIL PROTECTED]
Cc: Peter Brant [EMAIL PROTECTED]; Thomas H. 
[EMAIL PROTECTED]; pgsql-bugs@postgresql.org; Bruce Momjian 
[EMAIL PROTECTED]

Sent: Sunday, October 29, 2006 6:10 PM
Subject: RE: [BUGS] BUG #2712: could not fsync segment: Permission



 I haven't reproduced this on my box. But if you can give me
a patch to
 try I can build binaries for Thomas to test, if he can do
testing but
 not building.

Utterly untested ... BTW, why does pgrename have an #if to
check either GetLastError() or errno, but pgunlink doesn't?


Ok, I've built a .EXE with this patch. It's compiled without pretty much
all other options, hope that still works :-) (Meaning no NLS, no
kerberos, no SSL etc)

Grab the exe from
http://www.hagander.net/download/postgres_renamepatch.zip.

Sorry about the delay.

//Magnus



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-27 Thread Thomas H.

It might be interesting to think about not requiring the ControlFileLock
to be held while making new WAL segments.  I think the only reason it
does that is to ensure that link/rename failure can be treated as a hard
error (because no one could have beat us to the filename), but we're
already having to back off that stance for Windows ...


on a sidenote, i was able to work around the total xlog-lock by ingreasing 
checkpoint_segments from 3 (default) to 12. that seems enough to have the 
processes release (timeout?) the filehandles before writer-process wants to 
rename the xlog file, at least under normal workload. if there is a data 
load, the lockup still happens, but i can live with that for now.


the logs are still being swamped with the other delete error messages, tho:

2006-10-27 16:16:58 [5828] ERROR:  XX000: storage sync failed on magnetic 
disk: Permission denied

2006-10-27 16:16:58 [5828] LOCATION:  smgrsync, smgr.c:888
2006-10-27 16:16:59 [5828] LOG:  42501: could not fsync segment 0 of 
relation 1663/3964774/6495380: Permission denied

2006-10-27 16:16:59 [5828] LOCATION:  mdsync, md.c:785
2006-10-27 16:16:59 [5828] ERROR:  XX000: storage sync failed on magnetic 
disk: Permission denied

2006-10-27 16:16:59 [5828] LOCATION:  smgrsync, smgr.c:888

magnus, where you able to do a debug build for me to test the patch? would 
be nice if a solution could be found for the final 8.2 release.


cheers,
thomas 




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-25 Thread Magnus Hagander
  The same problem exists in 8.1 too.  See this thread 
  http://archives.postgresql.org/pgsql-bugs/2006-04/msg00177.php
  Tom and Magnus tracked down a cause, but I don't think a 
 fix was ever 
  implemented.
 
 Thomas seems to have two different issues there: the could 
 not rename file problem on the pg_xlog file is probably 
 explained by the mechanism we identified back then (and I'm 
 not sure why no fix has been installed), but there is no 
 known reason other than antivirus software for the could not 
 fsync problem.
 
 As for fixing the problem we do understand: ISTM it's just an 
 awful idea for pgrename and pgunlink to be willing to loop 
 forever.  I think they should time out and report the failure 
 after some reasonable period (say between 10 sec and a minute).
 
 If we simply made that change, then the behavior when there's 
 an idle backend sitting on a filehandle for an old xlog 
 segment would be that checkpoints would fail at the 
 MoveOfflineLogs stage, which would not be fatal, but it'd be 
 annoying.  We'd probably want to further tweak 
 InstallXLogFileSegment so that rename failure isn't an ERROR, 
 at least not on Windows.  (I think we could just make it 
 return false, which'd cause the caller to try to delete the 
 xlog segment, which should work even though rename doesn't.)
 
 I'm not in a position to test this though.  Magnus or Bruce?

I haven't reproduced this on my box. But if you can give me a patch to
try I can build binaries for Thomas to test, if he can do testing but
not building.

//Magnus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-25 Thread Thomas H.



I'm not in a position to test this though.  Magnus or Bruce?


I haven't reproduced this on my box. But if you can give me a patch to
try I can build binaries for Thomas to test, if he can do testing but
not building.


a binary would be marvelous. if too much hasle, i can setup a msvc++ 2005 
here and try to build it on my own, but would obviously prefere if i won't 
have to...


b2 is installed here, but i'm seeing the same problems, so yes, i'm ready 
for testing ;-)


thanks,
thomas 




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-25 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 I haven't reproduced this on my box. But if you can give me a patch to
 try I can build binaries for Thomas to test, if he can do testing but
 not building.

Utterly untested ... BTW, why does pgrename have an #if to check
either GetLastError() or errno, but pgunlink doesn't?

regards, tom lane



binIB0etwwbRx.bin
Description: xlog-rename.patch.gz

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-25 Thread Bruce Momjian
Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
  I haven't reproduced this on my box. But if you can give me a patch to
  try I can build binaries for Thomas to test, if he can do testing but
  not building.
 
 Utterly untested ... BTW, why does pgrename have an #if to check
 either GetLastError() or errno, but pgunlink doesn't?

Seems like a bug ---  they both should have it.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-25 Thread Thomas H.

As for fixing the problem we do understand: ISTM it's just an
awful idea for pgrename and pgunlink to be willing to loop
forever.  I think they should time out and report the failure
after some reasonable period (say between 10 sec and a minute).


is the main problem realy in the rename/delete function? while i'm in no 
position of actually knowing whats going on under the hood, my observations 
in +10 cases during this afternoon/evening revealed some patterns:


it is defenitely the writer process that blocks the db. but in every case 
the writer process seems to fail to rename the file due to another 
postgresql still holding a filehandle to the very xlog file that should be 
renamed. ProcessExplorer lets you force a close of the file handle - as soon 
as you do this [which is a bad thing to do, i assume], the rename succeeds 
and processing continues normally.


i actually can reproduce the error at will now - i just need do pump enough 
data into the db (~200mb data seems sufficient) to have it lock up.


- thomas 




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-25 Thread Tom Lane
Thomas H. [EMAIL PROTECTED] writes:
 it is defenitely the writer process that blocks the db. but in every case 
 the writer process seems to fail to rename the file due to another 
 postgresql still holding a filehandle to the very xlog file that should be 
 renamed.

Right, all you need is a backend process that's made at least one xlog
write and then is left to sit idle for long enough that that xlog file
is due for recycling.

However, the fact that the writer process is stuck should not in itself
cause the DB to lock up.  I think what's really happening is that after
the writer process gets stuck, the remaining backends chew all the
available WAL, and then they need to create more WAL segments for
themselves, and the writer process is holding the ControlFileLock so
they can't.

It might be interesting to think about not requiring the ControlFileLock
to be held while making new WAL segments.  I think the only reason it
does that is to ensure that link/rename failure can be treated as a hard
error (because no one could have beat us to the filename), but we're
already having to back off that stance for Windows ...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-24 Thread Magnus Hagander
  i've installed Filemon 
  (http://www.sysinternals.com/Utilities/Filemon.html)
  now. this gives more insight what happens to the file.
  ...
  D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 DELETE PEND Options: 
  Open
  Access: 0012019F
 
 This is quite interesting, because it says that Filemon knows 
 how to distinguish a delete pending error from other 
 errors.  If we could do that, then my prior worries about 
 ignoring all EACCES errors would go away.  What's it looking 
 at exactly?

filemon is a kernel mode filter driver. So it's looking at kernel-only
datastructures. AFAIK, there is no way to see that from userspace.

//Magnus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [BUGS] BUG #2712: could not fsync segment: Permission denied

2006-10-23 Thread Thomas H.
unfortunately not. 
and this is not happening with 8.1


regards,
thomas

- Original Message - 
From: Tom Lane [EMAIL PROTECTED]

To: Thomas H [EMAIL PROTECTED]
Cc: pgsql-bugs@postgresql.org
Sent: Monday, October 23, 2006 4:07 AM
Subject: Re: [BUGS] BUG #2712: could not fsync segment: Permission denied



Thomas H [EMAIL PROTECTED] writes:

Operating system:   windows 2003 standard
Description:could not fsync segment: Permission denied


The usual answer to this has been that you're running some
overenthusiastic antivirus software.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [BUGS] BUG #2712: could not fsync segment: Permission denied

2006-10-23 Thread Thomas H.
there is defenitely something terribly wrong in the windows 8.2b1 regarding 
file access/locking. 2nd total db lockup today due to file access locks (all 
hold by postmaster):


{...}
2006-10-23 17:48:10 LOG:  42501: could not fsync segment 0 of relation 
1663/3964774/6419608: Permission denied

2006-10-23 17:48:10 LOCATION:  mdsync, md.c:785
2006-10-23 17:48:10 ERROR:  XX000: storage sync failed on magnetic disk: 
Permission denied

2006-10-23 17:48:10 LOCATION:  smgrsync, smgr.c:888
2006-10-23 17:48:10 LOG:  0: duration: 327.999 ms  statement: SELECT 
threads.*, first.login AS first_user, last.login AS last_user FROM 
forum.threads JOIN users.users AS first ON first.id = threads.t_first_user 
LEFT JOIN users.users AS last ON last.id = threads.t_last_user WHERE t_b_id 
= 4 AND t_status_deleted = false ORDER BY t_status_sticky DESC, t_last_post 
DESC

2006-10-23 17:48:10 LOCATION:  exec_simple_query, postgres.c:1007
2006-10-23 17:48:14 LOG:  0: could not rename file 
pg_xlog/00010004002E to pg_xlog/000100040037, 
continuing to try

2006-10-23 17:48:14 LOCATION:  pgrename, dirmod.c:142
2006-10-23 18:12:05 LOG:  0: received fast shutdown request
2006-10-23 18:12:05 LOCATION:  pmdie, postmaster.c:1903
2006-10-23 18:12:05 LOG:  0: aborting any active transactions
2006-10-23 18:12:05 LOCATION:  pmdie, postmaster.c:1910
2006-10-23 18:12:05 FATAL:  57P01: terminating connection due to 
administrator command

2006-10-23 18:12:05 LOCATION:  ProcessInterrupts, postgres.c:2465
2006-10-23 18:12:06 ERROR:  XX000: could not rename file 
pg_xlog/00010004002E to pg_xlog/000100040037 
(initialization of log file 4, segment 55): A blocking operation was 
interrupted by a call to WSACancelBlockingCall.

2006-10-23 18:12:06 LOCATION:  InstallXLogFileSegment, xlog.c:2201
{...}

from 17:48:14 on pgsql didn't handle anymore queries until shutdown. as soon 
as one restarts postmaster, the file locks are cleared up.


and no, there are no other file locking tools (av scanners and the such) 
running - 8.1 on the same box (even on same partition) run fine.


regarnds,
- thomas




- Original Message - 
From: Thomas H. [EMAIL PROTECTED]

To: Tom Lane [EMAIL PROTECTED]
Cc: pgsql-bugs@postgresql.org
Sent: Monday, October 23, 2006 11:52 AM
Subject: Re: [BUGS] BUG #2712: could not fsync segment: Permission denied



unfortunately not. and this is not happening with 8.1

regards,
thomas

- Original Message - 
From: Tom Lane [EMAIL PROTECTED]

To: Thomas H [EMAIL PROTECTED]
Cc: pgsql-bugs@postgresql.org
Sent: Monday, October 23, 2006 4:07 AM
Subject: Re: [BUGS] BUG #2712: could not fsync segment: Permission denied



Thomas H [EMAIL PROTECTED] writes:

Operating system:   windows 2003 standard
Description:could not fsync segment: Permission denied


The usual answer to this has been that you're running some
overenthusiastic antivirus software.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq





---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-23 Thread Peter Brant
The same problem exists in 8.1 too.  See this thread

http://archives.postgresql.org/pgsql-bugs/2006-04/msg00177.php 

Tom and Magnus tracked down a cause, but I don't think a fix was ever
implemented.

FWIW, we were bitten by the fsync problem which you noticed too. 
Unfortunately we were never able to track down a cause (see the mailing
list archives).  They are separate problems though.

Pete

 Thomas H. [EMAIL PROTECTED] 23.10.2006 18:21 
there is defenitely something terribly wrong in the windows 8.2b1
regarding 
file access/locking. 2nd total db lockup today due to file access locks
(all 
hold by postmaster):

2006-10-23 17:48:10 LOCATION:  exec_simple_query, postgres.c:1007
2006-10-23 17:48:14 LOG:  0: could not rename file 
pg_xlog/00010004002E to
pg_xlog/000100040037, continuing to try


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-23 Thread Tom Lane
Peter Brant [EMAIL PROTECTED] writes:
 The same problem exists in 8.1 too.  See this thread
 http://archives.postgresql.org/pgsql-bugs/2006-04/msg00177.php 
 Tom and Magnus tracked down a cause, but I don't think a fix was ever
 implemented.

Thomas seems to have two different issues there: the could not rename
file problem on the pg_xlog file is probably explained by the mechanism
we identified back then (and I'm not sure why no fix has been
installed), but there is no known reason other than antivirus software
for the could not fsync problem.

As for fixing the problem we do understand: ISTM it's just an awful idea
for pgrename and pgunlink to be willing to loop forever.  I think they
should time out and report the failure after some reasonable period
(say between 10 sec and a minute).

If we simply made that change, then the behavior when there's an idle
backend sitting on a filehandle for an old xlog segment would be that
checkpoints would fail at the MoveOfflineLogs stage, which would not
be fatal, but it'd be annoying.  We'd probably want to further tweak
InstallXLogFileSegment so that rename failure isn't an ERROR, at least
not on Windows.  (I think we could just make it return false, which'd
cause the caller to try to delete the xlog segment, which should work
even though rename doesn't.)

I'm not in a position to test this though.  Magnus or Bruce?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-23 Thread Thomas H.

The same problem exists in 8.1 too.  See this thread


its only appearing in 8.2 here, i've just rechecked our logs...
is there any workaround? how did you get around that problem of having a 
total lockdown?


thanks,
thomas


Thomas H. [EMAIL PROTECTED] 23.10.2006 18:21 

there is defenitely something terribly wrong in the windows 8.2b1
regarding
file access/locking. 2nd total db lockup today due to file access locks
(all
hold by postmaster):

2006-10-23 17:48:10 LOCATION:  exec_simple_query, postgres.c:1007
2006-10-23 17:48:14 LOG:  0: could not rename file
pg_xlog/00010004002E to
pg_xlog/000100040037, continuing to try


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq





---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-23 Thread Peter Brant
That might be one cause (or it might otherwise exacerbate the problem),
but it isn't the only cause.  We weren't running anti-virus software and
neither is Thomas.  Unfortunately with the last go around, we
collectively ran out of ideas before an underlying cause could be
identified.

Pete

 Tom Lane [EMAIL PROTECTED] 23.10.2006 19:49 
installed), but there is no known reason other than antivirus software
for the could not fsync problem.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-23 Thread Peter Brant
Move to Linux. :-)  In our case, everything but the database servers
were already Linux so it was an easy choice.  Things have been rock
solid since then.

Once things get stuck, I don't think there is an alternative besides
stop -m immediate.  However, since the problem is caused by an idle
backend holding onto an old WAL segment, maybe having your middle
tier/connection pool close and reopen the connections to the database
every so often would function as a workaround.  Somebody with more
knowledge of PG internals than I would have to define every so often
though (if the idea is viable at all).

Pete

 Thomas H. [EMAIL PROTECTED] 23.10.2006 20:00 
is there any workaround? how did you get around that problem of having
a 
total lockdown?


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-23 Thread Tom Lane
Peter Brant [EMAIL PROTECTED] writes:
 FWIW, we were bitten by the fsync problem which you noticed too. 
 Unfortunately we were never able to track down a cause (see the mailing
 list archives).  They are separate problems though.

Actually, now that I look back in the archives, I think we had theorized
that the fsync errors come from attempting to fsync a file that's
already been deleted but some backend still has a reference to.
Apparently that leads to EACCES instead of ENOENT (which the code is
already prepared to expect).

http://archives.postgresql.org/pgsql-bugs/2006-04/msg00215.php

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-23 Thread Thomas H.

Actually, now that I look back in the archives, I think we had theorized
that the fsync errors come from attempting to fsync a file that's
already been deleted but some backend still has a reference to.
Apparently that leads to EACCES instead of ENOENT (which the code is
already prepared to expect).


with process explorer i can actually check which postgres.exe instance (in 
all cases i've checked its just 1 instance, and always just 1 file) holds 
the lock for the file in question. but will that help in determining why it 
is still holding a reference?
the postgres instance that holds the lock eventually closes the filehandle 
after some minutes. the process itself is not killed but continues 
thereafter.


let me know if i can be of any assistance. since we do regurarly reindex one 
table whose index size keeps growing despite of often vacuuming, the 
fsync-problem happens almost 4-5 times per hour.


regards,
thomas 




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-23 Thread Tom Lane
Thomas H. [EMAIL PROTECTED] writes:
 with process explorer i can actually check which postgres.exe instance (in 
 all cases i've checked its just 1 instance, and always just 1 file) holds 
 the lock for the file in question.

So which one is it?

 the postgres instance that holds the lock eventually closes the filehandle 
 after some minutes. the process itself is not killed but continues 
 thereafter.

That sounds a bit like what I'd expect the bgwriter to do, but the
bgwriter is also the one trying to issue the fsync.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-23 Thread Thomas H.
with process explorer i can actually check which postgres.exe instance 
(in

all cases i've checked its just 1 instance, and always just 1 file) holds
the lock for the file in question.


So which one is it?


it's always one of the db-slaves and not logger process, stats 
collector process or writer process:


right now its PID 4844 (\BaseNamedObjects\pgident: postgres: db_outnow 
outnow1 127.0.0.1(2122) idle) that tries to write 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422331


can i somehow check what object that file-OID belong(ed/s) to?

- thomas 




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-23 Thread Tom Lane
Thomas H. [EMAIL PROTECTED] writes:
 right now its PID 4844 (\BaseNamedObjects\pgident: postgres: db_outnow 
 outnow1 127.0.0.1(2122) idle) that tries to write 
 D:\DB\PostgreSQL-8.2\data\base\3964774\6422331

Do you actually mean it's trying to write that file?  Or is it just
sitting there holding the open filehandle?

 can i somehow check what object that file-OID belong(ed/s) to?

You can check in pg_class.relfilenode and pg_class.oid of that database
to see if you get a match.  But our theory is that this table has been
deleted ...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-23 Thread Thomas H.

Thomas H. [EMAIL PROTECTED] writes:

right now its PID 4844 (\BaseNamedObjects\pgident: postgres: db_outnow
outnow1 127.0.0.1(2122) idle) that tries to write
D:\DB\PostgreSQL-8.2\data\base\3964774\6422331


Do you actually mean it's trying to write that file?  Or is it just
sitting there holding the open filehandle?


well, hard to tell :-)
according to the log-messages i would assume it is *trying* to write. but 
the file in question isn't physically there anymore, it's just the open file 
handle that keeps it from vanish totally - you do not have access to the 
file (permission denied / access denied) if you for example try to read it 
or its attributes in file explorer.


i've installed Filemon (http://www.sysinternals.com/Utilities/Filemon.html) 
now. this gives more insight what happens to the file. in this case its file 
6422806, the first error message appeared at 23:45:21, the last one at 
23:45:26 (only short duration this time).


{}
23:44:57 postgres.exe:1944 WRITE 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS Offset: 16384 Length: 
8192
23:44:57 postgres.exe:1944 CLOSE 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS
23:44:57 postgres.exe:1944 OPEN 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS Options: Open 
Access: 00010080
23:44:57 postgres.exe:1944 QUERY INFORMATION 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS 
FileAttributeTagInformation
23:44:57 postgres.exe:1944 DELETE 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS
23:44:57 postgres.exe:1944 CLOSE 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS
23:44:57 postgres.exe:1944 OPEN 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422806.1 NOT FOUND Options: Open 
Access: 00010080
23:44:57 postgres.exe:5364 CLOSE 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS
23:44:57 postgres.exe:2780 CLOSE 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS
23:44:59 postgres.exe:6036 CLOSE 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS
23:45:11 postgres.exe:5196 CLOSE 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS
23:45:20 postgres.exe:1268 CLOSE 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS
23:45:21 postgres.exe:5196 OPEN 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 DELETE PEND Options: Open 
Access: 0012019F
23:45:22 postgres.exe:5196 OPEN 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 DELETE PEND Options: Open 
Access: 0012019F
23:45:23 postgres.exe:5196 OPEN 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 DELETE PEND Options: Open 
Access: 0012019F
23:45:24 postgres.exe:5196 OPEN 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 DELETE PEND Options: Open 
Access: 0012019F
23:45:25 postgres.exe:5196 OPEN 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 DELETE PEND Options: Open 
Access: 0012019F
23:45:26 postgres.exe:5196 OPEN 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 DELETE PEND Options: Open 
Access: 0012019F
23:45:26 postgres.exe:5428 CLOSE 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS
23:45:26 postgres.exe:2200 CLOSE 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 SUCCESS
23:45:27 postgres.exe:5196 OPEN 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 NOT FOUND Options: Open 
Access: 0012019F


i have earlier log data for this file if needed, but at :45:27 was the last 
entry. unfortunately i wasn't quick enough to find the blocking process in 
processviewer, but i guess its pid 5196



can i somehow check what object that file-OID belong(ed/s) to?


You can check in pg_class.relfilenode and pg_class.oid of that database
to see if you get a match.  But our theory is that this table has been
deleted ...


nothing there as assumed.

- thomas 




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-23 Thread Tom Lane
Thomas H. [EMAIL PROTECTED] writes:
 Do you actually mean it's trying to write that file?  Or is it just
 sitting there holding the open filehandle?

 well, hard to tell :-)
 according to the log-messages i would assume it is *trying* to write.

The log messages you have don't make it clear which process is trying to
do the fsync, but I would expect it to be the bgwriter.  (Possibly you
should modify log_line_prefix to include PID so we can tell a bit
better.)

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-23 Thread Thomas H.

The log messages you have don't make it clear which process is trying to
do the fsync, but I would expect it to be the bgwriter.  (Possibly you
should modify log_line_prefix to include PID so we can tell a bit
better.)


you're right (as always :-)). its the writer process (pid 5196) that 
outputs the error messages:


2006-10-24 00:09:09 [5196] ERROR:  XX000: storage sync failed on magnetic 
disk: Permission denied

2006-10-24 00:09:09 [5196] LOCATION:  smgrsync, smgr.c:888
2006-10-24 00:09:10 [5196] LOG:  42501: could not fsync segment 0 of 
relation 1663/3964774/6422947: Permission denied

2006-10-24 00:09:10 [5196] LOCATION:  mdsync, md.c:785

and in this case, its process 5988 that keeps the file handle open (its 
entry in pg_class is already deleted):


\BaseNamedObjects\pgident: postgres: db_outnow outnow1 127.0.0.1(2362) idle
D:\DB\PostgreSQL-8.2\data\base\3964774\6422947 (1 references, 1 handle)

... while pid 5196 constantly tries to open the file (for over 15min in this 
case), until...



00:22:18 postgres.exe:5196 OPEN 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422947 DELETE PEND Options: Open 
Access: 0012019F
00:22:19 postgres.exe:5196 OPEN 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422947 DELETE PEND Options: Open 
Access: 0012019F
00:22:20 postgres.exe:5988 CLOSE 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422947 SUCCESS
00:22:20 postgres.exe:5196 OPEN 
D:\DB\PostgreSQL-8.2\data\base\3964774\6422947 NOT FOUND Options: Open 
Access: 0012019F


is that of any use? what more logging options would be interesting?

- thomas 




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [BUGS] BUG #2712: could not fsync segment: Permission

2006-10-23 Thread Tom Lane
Thomas H. [EMAIL PROTECTED] writes:
 i've installed Filemon (http://www.sysinternals.com/Utilities/Filemon.html) 
 now. this gives more insight what happens to the file.
 ...
 D:\DB\PostgreSQL-8.2\data\base\3964774\6422806 DELETE PEND Options: Open 
 Access: 0012019F

This is quite interesting, because it says that Filemon knows how to
distinguish a delete pending error from other errors.  If we could
do that, then my prior worries about ignoring all EACCES errors would
go away.  What's it looking at exactly?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[BUGS] BUG #2712: could not fsync segment: Permission denied

2006-10-22 Thread Thomas H

The following bug has been logged online:

Bug reference:  2712
Logged by:  Thomas H
Email address:  [EMAIL PROTECTED]
PostgreSQL version: 8.2b1
Operating system:   windows 2003 standard
Description:could not fsync segment: Permission denied
Details: 

sometimes we're seeing loads of errors in the log:

2006-10-22 23:48:50 LOG:  could not fsync segment 0 of relation
1663/3964774/6409340: Permission denied
2006-10-22 23:48:50 ERROR:  storage sync failed on magnetic disk: Permission
denied
2006-10-22 23:48:51 LOG:  could not fsync segment 0 of relation
1663/3964774/6409340: Permission denied
2006-10-22 23:48:51 ERROR:  storage sync failed on magnetic disk: Permission
denied
2006-10-22 23:48:52 LOG:  could not fsync segment 0 of relation
1663/3964774/6409340: Permission denied
2006-10-22 23:48:52 ERROR:  storage sync failed on magnetic disk: Permission
denied
2006-10-22 23:48:53 LOG:  could not fsync segment 0 of relation
1663/3964774/6409340: Permission denied
2006-10-22 23:48:53 ERROR:  storage sync failed on magnetic disk: Permission
denied
{...}

when this happens, there are also files locked within the data\base\{dbid}\.
access to those files are denied by the os - the files vanish as soon as
postmaster ist stopped  restarted.

i haven't yet found a possible reason - i suspect the error to appear
*sometimes* after issuing a VACUUM FULL ANALYZE {tablename} / REINDEX
TABLE {tablename}.

the hardware is checked and ok.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [BUGS] BUG #2712: could not fsync segment: Permission denied

2006-10-22 Thread Thomas H.

in verbose mode, the log shows a little bit more:

2006-10-23 03:23:14 LOG:  42501: could not fsync segment 0 of relation 
1663/3964774/6411190: Permission denied

2006-10-23 03:23:14 LOCATION:  mdsync, md.c:785
2006-10-23 03:23:14 ERROR:  XX000: storage sync failed on magnetic disk: 
Permission denied

2006-10-23 03:23:14 LOCATION:  smgrsync, smgr.c:888

- thomas

- Original Message - 
From: Thomas H [EMAIL PROTECTED]

To: pgsql-bugs@postgresql.org
Sent: Monday, October 23, 2006 1:28 AM
Subject: [BUGS] BUG #2712: could not fsync segment: Permission denied




The following bug has been logged online:

Bug reference:  2712
Logged by:  Thomas H
Email address:  [EMAIL PROTECTED]
PostgreSQL version: 8.2b1
Operating system:   windows 2003 standard
Description:could not fsync segment: Permission denied
Details:

sometimes we're seeing loads of errors in the log:

2006-10-22 23:48:50 LOG:  could not fsync segment 0 of relation
1663/3964774/6409340: Permission denied
2006-10-22 23:48:50 ERROR:  storage sync failed on magnetic disk: 
Permission

denied
2006-10-22 23:48:51 LOG:  could not fsync segment 0 of relation
1663/3964774/6409340: Permission denied
2006-10-22 23:48:51 ERROR:  storage sync failed on magnetic disk: 
Permission

denied
2006-10-22 23:48:52 LOG:  could not fsync segment 0 of relation
1663/3964774/6409340: Permission denied
2006-10-22 23:48:52 ERROR:  storage sync failed on magnetic disk: 
Permission

denied
2006-10-22 23:48:53 LOG:  could not fsync segment 0 of relation
1663/3964774/6409340: Permission denied
2006-10-22 23:48:53 ERROR:  storage sync failed on magnetic disk: 
Permission

denied
{...}

when this happens, there are also files locked within the 
data\base\{dbid}\.

access to those files are denied by the os - the files vanish as soon as
postmaster ist stopped  restarted.

i haven't yet found a possible reason - i suspect the error to appear
*sometimes* after issuing a VACUUM FULL ANALYZE {tablename} / REINDEX
TABLE {tablename}.

the hardware is checked and ok.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings





---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [BUGS] BUG #2712: could not fsync segment: Permission denied

2006-10-22 Thread Tom Lane
Thomas H [EMAIL PROTECTED] writes:
 Operating system:   windows 2003 standard
 Description:could not fsync segment: Permission denied

The usual answer to this has been that you're running some
overenthusiastic antivirus software.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend