Re: [HACKERS] rmtree() failure on Windows

2004-10-27 Thread Andrew Dunstan
problem area found. see below.
Reini Urban wrote:
Andrew Dunstan schrieb:
Here is some more info. Below is a trace from dropdb. There is a loop 
around the rmdir() calls which I have set to time out at 600 seconds. 
The call eventually succeeds after around 300 seconds (I've seen this 
several times). It looks like we are the victim of some caching - the 
directory still thinks it has some of the files it has told us we 
have deleted successfully.

300 secs (!) fs timeout is really broken.
Looks more like a locking or network timeout issue.
What error codes does unlink(3) return?

success.
Why don't you use DeletFileA() instead of unlink()?
Or even better, why don't you use this delete on close snippet instead:

[snip]
Before I tried anything like that I tried one more thing. I disabled the 
background writer and the problem stopped. So now we know the culprit.


It should only happen a ERROR_SHARING_VIOLATION on NT systems with 
such a long timeout. This is then a concurrency problem. win95 will 
not return ERROR_SHARING_VIOLATION, only ERROR_ACCESS_DENIED


We don't support W95/W98/WME at all. The tests were done on XP-Pro.
cheers
andrew
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] rmtree() failure on Windows

2004-10-27 Thread Reini Urban
Andrew Dunstan schrieb:
problem area found. see below.
Reini Urban wrote:
Andrew Dunstan schrieb:
Here is some more info. Below is a trace from dropdb. There is a loop 
around the rmdir() calls which I have set to time out at 600 seconds. 
The call eventually succeeds after around 300 seconds (I've seen this 
several times). It looks like we are the victim of some caching - the 
directory still thinks it has some of the files it has told us we 
have deleted successfully.
300 secs (!) fs timeout is really broken.
Looks more like a locking or network timeout issue.
What error codes does unlink(3) return?

success.
Oops! 5min timeout for success is certainly problematic.
Why don't you use DeletFileA() instead of unlink()?
Or even better, why don't you use this delete on close snippet instead:
[snip]
Before I tried anything like that I tried one more thing. I disabled the 
background writer and the problem stopped. So now we know the culprit.
Good! Relieve.
It should only happen a ERROR_SHARING_VIOLATION on NT systems with 
such a long timeout. This is then a concurrency problem. win95 will 
not return ERROR_SHARING_VIOLATION, only ERROR_ACCESS_DENIED

We don't support W95/W98/WME at all. The tests were done on XP-Pro.
Ah sorry. I forgot.
--
Reini Urban
http://xarch.tu-graz.ac.at/home/rurban/
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] rmtree() failure on Windows

2004-10-27 Thread Andrew Dunstan

Reini Urban wrote:
300 secs (!) fs timeout is really broken.
Looks more like a locking or network timeout issue.
What error codes does unlink(3) return?


success.

Oops! 5min timeout for success is certainly problematic.

You misunderstood. The 300 secs is not in waiting for unlink() to 
return, it is in waiting for its effects to be seen by rmdir() (i.e. for 
the entry to actually be cleared from the directory). unlink() is 
returning very quickly. If the bgwriter is disabled then the entries are 
cleared very quickly too (i.e. before we even get to rmdir())

cheers
andrew
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] rmtree() failure on Windows

2004-10-27 Thread Bruce Momjian
Andrew Dunstan wrote:
 
 
 Reini Urban wrote:
 
 
  300 secs (!) fs timeout is really broken.
  Looks more like a locking or network timeout issue.
  What error codes does unlink(3) return?
 
 
  success.
 
 
  Oops! 5min timeout for success is certainly problematic.
 
 
 
 You misunderstood. The 300 secs is not in waiting for unlink() to 
 return, it is in waiting for its effects to be seen by rmdir() (i.e. for 
 the entry to actually be cleared from the directory). unlink() is 
 returning very quickly. If the bgwriter is disabled then the entries are 
 cleared very quickly too (i.e. before we even get to rmdir())

What I am thinking is that the way we open these files allows us to
delete them while they are still open, but perhaps it doesn't allow us
to actually delete the directory that contains the files.

Also, it is possible bgwriter is keeping some files open in that
directory and that is causing the long delays.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] rmtree() failure on Windows

2004-10-27 Thread Zeugswetter Andreas DAZ SD

  300 secs (!) fs timeout is really broken.
  Looks more like a locking or network timeout issue.
  What error codes does unlink(3) return?
 
 
  success.
 
 
  Oops! 5min timeout for success is certainly problematic.
 
 
 
 You misunderstood. The 300 secs is not in waiting for unlink() to 
 return, it is in waiting for its effects to be seen by 
 rmdir() (i.e. for 

because the bgwriter obviously keeps them open, no ?

Andreas

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] rmtree() failure on Windows

2004-10-27 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Before I tried anything like that I tried one more thing. I disabled the 
 background writer and the problem stopped. So now we know the culprit.

Okay.  So what that says is that win32_open's claim to allow unlinking
an open file is a lie; or at least, it does not work the way the
equivalent facility on Unix does.  It sounds to me like Windows is
simply marking the open file as to be deleted on last close --- the
directory entry remains present and so the directory can't be dropped
either.

One relatively low-impact workaround would be to force a checkpoint
(on Windows only) during DROP DATABASE, just before we actually fire
the rmtree() operation.  The bgwriter is already coded to close all its
open files after a checkpoint ...

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] rmtree() failure on Windows

2004-10-27 Thread Andrew Dunstan

Tom Lane wrote:
Andrew Dunstan [EMAIL PROTECTED] writes:
 

Before I tried anything like that I tried one more thing. I disabled the 
background writer and the problem stopped. So now we know the culprit.
   

Okay.  So what that says is that win32_open's claim to allow unlinking
an open file is a lie; or at least, it does not work the way the
equivalent facility on Unix does.  It sounds to me like Windows is
simply marking the open file as to be deleted on last close --- the
directory entry remains present and so the directory can't be dropped
either.
 

Looks that way to me too.
One relatively low-impact workaround would be to force a checkpoint
(on Windows only) during DROP DATABASE, just before we actually fire
the rmtree() operation.  The bgwriter is already coded to close all its
open files after a checkpoint ...
 

Works for me. If someone gives me a patch I'll be happy to test it.
I did wonder if there should be a call that instead of forcing a flush 
could tell bgwriter just to forget about the file(s) because we're 
discarding them. But that was just idle speculation - I haven't looked 
at bgwriter at all.

cheers
andrew
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] rmtree() failure on Windows

2004-10-27 Thread Andrew Dunstan

Zeugswetter Andreas DAZ SD wrote:
You misunderstood. The 300 secs is not in waiting for unlink() to 
return, it is in waiting for its effects to be seen by 
rmdir() (i.e. for 
   

because the bgwriter obviously keeps them open, no ?
 

Yes.
cheers
andrew
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] rmtree() failure on Windows

2004-10-27 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 One relatively low-impact workaround would be to force a checkpoint
 (on Windows only) during DROP DATABASE, just before we actually fire
 the rmtree() operation.  The bgwriter is already coded to close all its
 open files after a checkpoint ...

 Works for me. If someone gives me a patch I'll be happy to test it.

Try putting RequestCheckpoint(true) in dbcommands.c just before
remove_dbtablespaces (about line 630).

It looks like the bgwriter is not quite up-to-speed for this, either;
you should rearrange things near line 350 of bgwriter.c so that 
smgrcloseall is performed before marking the checkpoint done in shmem.
Else RequestCheckpoint could come back before the files are all closed.

 I did wonder if there should be a call that instead of forcing a flush 
 could tell bgwriter just to forget about the file(s) because we're 
 discarding them. But that was just idle speculation - I haven't looked 
 at bgwriter at all.

Not necessary, as long as you put the checkpoint after the DropBuffers
call in dbcommands.c.  The bgwriter won't find anything to write.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] rmtree() failure on Windows

2004-10-27 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 You misunderstood. The 300 secs is not in waiting for unlink() to 
 return, it is in waiting for its effects to be seen by rmdir() (i.e. for 
 the entry to actually be cleared from the directory). unlink() is 
 returning very quickly. If the bgwriter is disabled then the entries are 
 cleared very quickly too (i.e. before we even get to rmdir())

More specifically, 300 seconds is the most time you'd have to wait for a
checkpoint to occur (with default checkpoint settings) ...

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] rmtree() failure on Windows

2004-10-26 Thread Andrew Dunstan

Here is some more info. Below is a trace from dropdb. There is a loop 
around the rmdir() calls which I have set to time out at 600 seconds. 
The call eventually succeeds after around 300 seconds (I've seen this 
several times). It looks like we are the victim of some caching - the 
directory still thinks it has some of the files it has told us we have 
deleted successfully.

Bottom line, this is a real mess. Surely postgres is not the only 
application in the world that wants to be able to delete a directory 
tree reliably on Windows. What do other apps do?

cheers
andrew
2004-10-26 10:26:35 [2496] LOG:  connection received: host=127.0.0.1 
port=1918
2004-10-26 10:26:35 [2496] LOG:  connection authorized: user=pgrunner 
database=template1
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/1247
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/1249
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/1255
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/1259
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16384
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16386
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16388
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16390
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16392
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16394
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16396
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16398
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16400
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16402
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16404
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16406
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16408
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16410
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16412
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16414
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16416
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16418
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16672
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16674
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16676
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16678
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16679
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16680
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16681
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16682
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16683
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16684
2004-10-26 10:26:35 [2496] DEBUG:  unlinking 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16685
2004-10-26 10:26:35 [2496] 

Re: [HACKERS] rmtree() failure on Windows

2004-10-26 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Here is some more info. Below is a trace from dropdb. There is a loop 
 around the rmdir() calls which I have set to time out at 600 seconds. 
 The call eventually succeeds after around 300 seconds (I've seen this 
 several times). It looks like we are the victim of some caching - the 
 directory still thinks it has some of the files it has told us we have 
 deleted successfully.

If you rescan the directory after deleting the files, does it show
as empty?

 Bottom line, this is a real mess. Surely postgres is not the only 
 application in the world that wants to be able to delete a directory 
 tree reliably on Windows. What do other apps do?

I'm wondering if this is a side effect of the way win32_open does
things.  It's hard to believe that rmdir is that bogus in general,
but perhaps win32_open is causing us to exercise a corner case?

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] rmtree() failure on Windows

2004-10-26 Thread Andrew Dunstan

Tom Lane wrote:
Andrew Dunstan [EMAIL PROTECTED] writes:
 

Here is some more info. Below is a trace from dropdb. There is a loop 
around the rmdir() calls which I have set to time out at 600 seconds. 
The call eventually succeeds after around 300 seconds (I've seen this 
several times). It looks like we are the victim of some caching - the 
directory still thinks it has some of the files it has told us we have 
deleted successfully.
   

If you rescan the directory after deleting the files, does it show
as empty?
 

No! That's how I got the list of files it still thinks are there. 
Gross, eh?

 

Bottom line, this is a real mess. Surely postgres is not the only 
application in the world that wants to be able to delete a directory 
tree reliably on Windows. What do other apps do?
   

I'm wondering if this is a side effect of the way win32_open does
things.  It's hard to believe that rmdir is that bogus in general,
but perhaps win32_open is causing us to exercise a corner case?
 

I don't know. I tried to reproduce it in a simple case using 
fopen/fclose and wasn't able to.

cheers
andrew
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] rmtree() failure on Windows

2004-10-26 Thread Reini Urban
Andrew Dunstan schrieb:
Here is some more info. Below is a trace from dropdb. There is a loop 
around the rmdir() calls which I have set to time out at 600 seconds. 
The call eventually succeeds after around 300 seconds (I've seen this 
several times). It looks like we are the victim of some caching - the 
directory still thinks it has some of the files it has told us we have 
deleted successfully.
300 secs (!) fs timeout is really broken.
Looks more like a locking or network timeout issue.
What error codes does unlink(3) return?
Why don't you use DeletFileA() instead of unlink()?
Or even better, why don't you use this delete on close snippet instead:
HANDLE h;
h = CreateFile (win32_name, 0, FILE_SHARE_READ, sec_none_nih,
	  OPEN_EXISTING, FILE_FLAG_DELETE_ON_CLOSE, 0);
if (h != INVALID_HANDLE_VALUE)
{
	(void) SetFileAttributes (win32_name, (DWORD) win32_name);
	BOOL res = CloseHandle (h);
	//syscall_printf (%d = CloseHandle (%p), res, h);
	if (GetFileAttributes (win32_name) == INVALID_FILE_ATTRIBUTES)
	{
	  //syscall_printf (CreateFile (FILE_FLAG_DELETE_ON_CLOSE) 
succeeded);
	  goto ok;
	}
	  else
	{
	  //syscall_printf (CreateFile (FILE_FLAG_DELETE_ON_CLOSE) failed);
	  SetFileAttributes (win32_name, (DWORD) win32_name  
~(FILE_ATTRIBUTE_READONLY | FILE_ATTRIBUTE_SYSTEM));
	}
	}
}

  /* Try a delete with attributes reset */
  if (DeleteFile (win32_name))
{
  syscall_printf (DeleteFile after CreateFile/CloseHandle succeeded);
  goto ok;
}
It should only happen a ERROR_SHARING_VIOLATION on NT systems with such 
a long timeout. This is then a concurrency problem. win95 will not 
return ERROR_SHARING_VIOLATION, only ERROR_ACCESS_DENIED

...
2004-10-26 10:31:09 [2496] WARNING:  rmtree: rmdir took 274 secs/loops
2004-10-26 10:31:09 [2496] LOG:  disconnection: session time: 0:04:34.11 
user=pgrunner database=template1 host=127.0.0.1 port=1918
--
Reini Urban
http://xarch.tu-graz.ac.at/home/rurban/
---(end of broadcast)---
TIP 8: explain analyze is your friend


[HACKERS] rmtree() failure on Windows

2004-10-25 Thread Andrew Dunstan
Houston, we have a problem.
Shown below is an extract from the traces of make installcheck in 
contrib. It is decorated with some extra traces I built into 
src/port/dirmod.c::rmtree(). It shows quite reproducible failure of 
rmtree(), mostly at the rmdir calls, but even more worryingly there are 
consistent unlink failures also.

At this stage I have no idea how to go about fixing it.
cheers
andrew

../../src/test/regress/pg_regress init int2 int4 int8 float4 float8 cash 
oid timestamp timestamptz time timetz date interval macaddr inet cidr 
text varchar char bytea bit varbit numeric
== dropping database regression ==
DROP DATABASE

../../src/test/regress/pg_regress cube
== dropping database regression ==
DROP DATABASE
WARNING:  rmtree could not rmdir 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/55646: 
Directory not empty
WARNING:  could not remove database directory 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/55646

../../src/test/regress/pg_regress dblink
== dropping database regression ==
DROP DATABASE
WARNING:  rmtree could not rmdir 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/69688: 
Directory not empty
WARNING:  could not remove database directory 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/69688

../../src/test/regress/pg_regress earthdistance
== dropping database regression ==
DROP DATABASE
WARNING:  rmtree could not rmdir 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/72850: 
Directory not empty
WARNING:  could not remove database directory 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/72850

../../src/test/regress/pg_regress _int
== dropping database regression ==
DROP DATABASE
WARNING:  rmtree could not rmdir 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/72914: 
Directory not empty
WARNING:  could not remove database directory 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/72914

../../src/test/regress/pg_regress ltree
== dropping database regression ==
DROP DATABASE
WARNING:  rmtree could not unlink 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/72987/80059: 
No such file or directory
WARNING:  could not remove database directory 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/72987

../../src/test/regress/pg_regress pg_trgm
== dropping database regression ==
DROP DATABASE
WARNING:  rmtree could not rmdir 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/80061: 
Directory not empty
WARNING:  could not remove database directory 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/80061

../../src/test/regress/pg_regress init md5 sha1 hmac-md5 hmac-sha1 
blowfish rijndael crypt-des crypt-md5 crypt-blowfish crypt-xdes
== dropping database regression ==
DROP DATABASE
WARNING:  rmtree could not rmdir 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/82213: 
Directory not empty
WARNING:  could not remove database directory 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/82213

../../src/test/regress/pg_regress rtree_gist
== dropping database regression ==
DROP DATABASE
WARNING:  rmtree could not unlink 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/83241/83277: 
No such file or directory
WARNING:  could not remove database directory 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/83241

../../src/test/regress/pg_regress seg
== dropping database regression ==
DROP DATABASE
WARNING:  rmtree could not unlink 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/83283/86678: 
No such file or directory
WARNING:  could not remove database directory 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/83283

../../src/test/regress/pg_regress tablefunc
== dropping database regression ==
DROP DATABASE
WARNING:  rmtree could not rmdir 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/90065: 
Directory not empty
WARNING:  could not remove database directory 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/90065

../../src/test/regress/pg_regress tsearch
== dropping database regression ==
DROP DATABASE
WARNING:  rmtree could not rmdir 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/92694: 
Directory not empty
WARNING:  could not remove database directory 
C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/92694


Re: [HACKERS] rmtree() failure on Windows

2004-10-25 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Shown below is an extract from the traces of make installcheck in 
 contrib. It is decorated with some extra traces I built into 
 src/port/dirmod.c::rmtree(). It shows quite reproducible failure of 
 rmtree(), mostly at the rmdir calls, but even more worryingly there are 
 consistent unlink failures also.

I kinda suspect that what you are looking at is a problem with the
delayed-unlinking feature that we built to cope with Windows' inability
to unlink open files, ie, it's being a little too slow to do the
unlinks.  Would you refresh my memory about exactly where and when the
unlink happens if the initial try fails?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] rmtree() failure on Windows

2004-10-25 Thread Andrew Dunstan

Tom Lane wrote:
Andrew Dunstan [EMAIL PROTECTED] writes:
 

Shown below is an extract from the traces of make installcheck in 
contrib. It is decorated with some extra traces I built into 
src/port/dirmod.c::rmtree(). It shows quite reproducible failure of 
rmtree(), mostly at the rmdir calls, but even more worryingly there are 
consistent unlink failures also.
   

I kinda suspect that what you are looking at is a problem with the
delayed-unlinking feature that we built to cope with Windows' inability
to unlink open files, ie, it's being a little too slow to do the
unlinks.  Would you refresh my memory about exactly where and when the
unlink happens if the initial try fails?
 

Same file, although Bruce says that looping code is now redundant, as we 
open files in a way that allows unlinking even if they are open.

cheers
andrew
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] rmtree() failure on Windows

2004-10-25 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 I kinda suspect that what you are looking at is a problem with the
 delayed-unlinking feature that we built to cope with Windows' inability
 to unlink open files, ie, it's being a little too slow to do the
 unlinks.  Would you refresh my memory about exactly where and when the
 unlink happens if the initial try fails?
 
 Same file, although Bruce says that looping code is now redundant, as we 
 open files in a way that allows unlinking even if they are open.

Does the problem go away if you insert a sleep 1 at the bottom of the
pg_regress script?  I have been thinking of proposing that anyway,
because on some platforms I consistently get database removal failed:
database regression is being accessed by other users failures from
the contrib regression tests because the previous backend doesn't get
enough time to quit before the next test tries to drop and recreate the
regression database.  I think what you are looking at may be some
variant of the same issue, ie, old backend still running.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] rmtree() failure on Windows

2004-10-25 Thread Reini Urban
Tom Lane schrieb:
Andrew Dunstan [EMAIL PROTECTED] writes:
Shown below is an extract from the traces of make installcheck in 
contrib. It is decorated with some extra traces I built into 
src/port/dirmod.c::rmtree(). It shows quite reproducible failure of 
rmtree(), mostly at the rmdir calls, but even more worryingly there are 
consistent unlink failures also.

I kinda suspect that what you are looking at is a problem with the
delayed-unlinking feature that we built to cope with Windows' inability
to unlink open files, ie, it's being a little too slow to do the
unlinks.  Would you refresh my memory about exactly where and when the
unlink happens if the initial try fails?
You can have a look into the cygwin sources how we do that :)
kinda problematic.
http://sources.redhat.com/cgi-bin/cvsweb.cgi/winsup/cygwin/delqueue.cc?cvsroot=uberbaum
http://sources.redhat.com/cgi-bin/cvsweb.cgi/winsup/cygwin/syscalls.cc?cvsroot=uberbaum
in short:
if the return status of DeleteFileA() is ERROR_SHARING_VIOLATION, defer 
deletion until the end of the process.
but win95 reports ERROR_ACCESS_DENIED and not ERROR_SHARING_VIOLATION as 
NT does.

--
Reini Urban
http://xarch.tu-graz.ac.at/home/rurban/
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] rmtree() failure on Windows

2004-10-25 Thread Andrew Dunstan

Tom Lane wrote:
Andrew Dunstan [EMAIL PROTECTED] writes:
 

Another data point - when rmdir() fails it fails quickly, but when 
unlink (i.e. our pg_unlink()) fails it takes a very long time (minutes) 
to fail. And the file is actually not there. So it looks like we loop 
over and over and keep getting EACCESS, and then get ENOENT, but the 
last one that failed with EACCESS actually succeeded. *sigh*
   

... or someone else deleted it in between our last EACCESS failure and
the ENOENT try.  What someone else would that be?  More than likely,
the same guy who was holding it open to cause the EACCESS failures.
Perhaps there are paths in the code that don't go through win32_open?
 

Well, on looking at the MS API at 
http://msdn.microsoft.com/library/en-us/vclib/html/vcrefRunTimeLibraryReference.asp 
I see that opendir() and friends aren't there, which means we might be 
at the mercy of what mingw does for us.

If I increase the sleep time before dropdb enormously (35 secs) the 
unlink errors seem to go away, and instead they become rmdir errors like 
the rest.

I am wondering if we need to look at using direct calls to the Windows 
API (findfirst() and friends).

cheers
andrew
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] rmtree() failure on Windows

2004-10-25 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 If I increase the sleep time before dropdb enormously (35 secs) the 
 unlink errors seem to go away, and instead they become rmdir errors like 
 the rest.

Do you have anything equivalent to ps that you could use to check
whether there is still an old backend alive during this interval?

Does Windows forbid deleting a directory that is the current working
directory of some process (assuming they even have the same concept
of cwd as Unixen)?  If so, is this the same error symptom you would get?
I am wondering if the rmdir failure occurs because an old backend is
still chdir'd into the database directory.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] rmtree() failure on Windows

2004-10-25 Thread Andrew Dunstan

Tom Lane wrote:
Andrew Dunstan [EMAIL PROTECTED] writes:
 

If I increase the sleep time before dropdb enormously (35 secs) the 
unlink errors seem to go away, and instead they become rmdir errors like 
the rest.
   

Do you have anything equivalent to ps that you could use to check
whether there is still an old backend alive during this interval?
Does Windows forbid deleting a directory that is the current working
directory of some process (assuming they even have the same concept
of cwd as Unixen)?  If so, is this the same error symptom you would get?
I am wondering if the rmdir failure occurs because an old backend is
still chdir'd into the database directory.
 

No, testing shows you get permission denied rather than directory not 
empty in this case.

cheers
andrew
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])