Re: [HACKERS] EINTR error in SunOS

2006-01-02 Thread Doug Royer



Doug McNaught wrote:


c) treat EINTR as an I/O error (I don't know how easy this would be)


So then at this point - it is detected, so problem solved?

If a LOCAL hard drive fails to reply, you hang. Same with hard,intr
NFS file system.


bytesRead = read(fd, buffer, requestedBytes);

if (bytesRead  0) {
switch (errno) {

case EAGAIN:
#ifdef USING_RECORD_LOCKING_OR_NON_BLOCKING_IO
...do the above read() again...
#else
/*FALLTHRU*/
#endif
default:
... log error and errno...
break;
}

} else if (bytesRead == 0) {
...AT EOF...

} else if (bytesRead  requestdBytes) {
...if you care, loop on read until
remaining bytes are fetched
or at EOF...
}

return(bytesRead);




d) say if you mount 'soft' and lose data, tough luck for you


I seem to recall from my days at Sun, you should NOT use soft
mount for NFS writes at all. Soft mounts are for non-critical
disk resources. (Solaris admin  manual?)

--

Doug Royer | http://INET-Consulting.com
---|-

  We Do Standards - You Need Standards

begin:vcard
fn:Doug Royer
n:Royer;Doug
org:INET-Consulting.com
adr:;;U.S.A
email;internet:[EMAIL PROTECTED]
title:CEO
tel;work:866-594-8574
tel;fax:866-594-8574
note;quoted-printable:AOL: SupportUnix=0D=0A=
	MSN: [EMAIL PROTECTED]
	Yahoo: Help4Unix
x-mozilla-html:FALSE
url:http://Royer.com
version:2.1
end:vcard



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [HACKERS] EINTR error in SunOS

2006-01-02 Thread Martijn van Oosterhout
On Mon, Jan 02, 2006 at 08:55:47AM -0700, Doug Royer wrote:
 
 
 Doug McNaught wrote:
 
 c) treat EINTR as an I/O error (I don't know how easy this would be)
 
 So then at this point - it is detected, so problem solved?
 
 If a LOCAL hard drive fails to reply, you hang. Same with hard,intr
 NFS file system.

Not really. If a local hard drive fails to respond, the kernel times
out the request and returns EIO to the app. That's the most annoying
thing about NFS. At least even with reading bad floppies where the
kernel keeps retrying, eventually the read() returns and you can
cancel. With NFS, it never returns if the server never comes back.

The kernel is trying to be helpful by returning EINTR to say ok, it
didn't complete. There's no error yet but it may yet work. With local
hard drives if they don't respond, you assume they're broken. When NFS
servers don't respond you assume someone has temporarily pulled a
cable and it will come back soon. Huh?

I would vote for the kernel, if the server didn't respond within 5
seconds, to simply return EIO. At least we know how to handle that...

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpVQleQRvmC0.pgp
Description: PGP signature


Re: [HACKERS] EINTR error in SunOS

2006-01-02 Thread Doug McNaught
Martijn van Oosterhout kleptog@svana.org writes:

 I would vote for the kernel, if the server didn't respond within 5
 seconds, to simply return EIO. At least we know how to handle that...

You can do this now by mounting 'soft' and setting the timeout
appropriately.  Whether it's really the best idea, well...

-Doug

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] EINTR error in SunOS

2006-01-02 Thread Greg Stark

Martijn van Oosterhout kleptog@svana.org writes:

 The kernel is trying to be helpful by returning EINTR to say ok, it
 didn't complete. There's no error yet but it may yet work. 

Well it only returns EINTR if a signal was received. 

 With local hard drives if they don't respond, you assume they're broken.
 When NFS servers don't respond you assume someone has temporarily pulled a
 cable and it will come back soon. Huh?

Well firstly with local hard drives you never get EINTR. Interrupts won't be
delivered until after the syscall returns. You don't get EINTR because in the
original BSD implementation it was more efficient to implement it that way and
since disk i/o was always extremely fast it didn't threaten to delay your
signals.

You're mixing up operations timing out with signals being received. The reason
you don't want NFS filesystem operations timing out (and you really don't) is
that it's *possible* it will come back later.

If you're the sysadmin and you're told your NFS server is down so you fix it
and it comes back up properly you should be able to expect that the world
returns to normal.

If you have the soft option enabled then you now have to run around
restarting every other service in your data center because you don't know
which ones might have received an error and crashed.

Worse, if any of those programs failed to notice the error (and they're not
wrong to, traditionally certain operations never signaled errors) then your
data is now corrupt. Some updates have been made but not others, and later
updates may be based on the incorrect data.

Now on the other hand the intr option is entirely reasonable to enable as
long as you know you don't have software that doesn't expect it. It only kicks
in if an actual signal is received, such as the user hitting C-c. Even if the
server comes back 20m later the user isn't going to be upset that his C-c got
handled. The only problem is that some software doesn't expect to get EINTR
handles it poorly.

 I would vote for the kernel, if the server didn't respond within 5
 seconds, to simply return EIO. At least we know how to handle that...

How do you handle it? By having Postgres shut down? And then the NFS server
comes back and then what?

-- 
greg


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] EINTR error in SunOS

2006-01-02 Thread Doug Royer



Greg Stark wrote:


I would vote for the kernel, if the server didn't respond within 5
seconds, to simply return EIO. At least we know how to handle that...



How do you handle it? By having Postgres shut down? And then the NFS server
comes back and then what?


Log the error if you can.
Refuse new connections - until it is back up.
Refuse or hang new queries - until it is back up.

Retry?

What should be done?

--

Doug Royer | http://INET-Consulting.com
---|-

  We Do Standards - You Need Standards

begin:vcard
fn:Doug Royer
n:Royer;Doug
org:INET-Consulting.com
adr:;;U.S.A
email;internet:[EMAIL PROTECTED]
title:CEO
tel;work:866-594-8574
tel;fax:866-594-8574
note;quoted-printable:AOL: SupportUnix=0D=0A=
	MSN: [EMAIL PROTECTED]
	Yahoo: Help4Unix
x-mozilla-html:FALSE
url:http://Royer.com
version:2.1
end:vcard



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [HACKERS] EINTR error in SunOS

2006-01-01 Thread Tom Lane
Qingqing Zhou [EMAIL PROTECTED] writes:
 I understand put a CHECK_FOR_INTERRUPTS() in the retry-loop may make more
 graceful stop, but it won't work in some cases -- notice that the io
 routines we will patch can be used before the signal mechanism is setup.

I don't think it will help much at all: too many of the operations in
question are invoked in places where CHECK_FOR_INTERRUPTS is a no-op.
Examples:
* disk writes are mostly done by the bgwriter and not backends at all
* unlinks are generally done during xact commit/rollback

Qingqing's point about failures in system()-invoked commands (think
archive_command for PITR) is a mighty good one too.  That puts a
serious crimp into any illusion that we can really fix this in any
reliable way.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] EINTR error in SunOS

2006-01-01 Thread Qingqing Zhou


On Sun, 1 Jan 2006, Tom Lane wrote:

 Qingqing Zhou [EMAIL PROTECTED] writes:
  I understand put a CHECK_FOR_INTERRUPTS() in the retry-loop may make more
  graceful stop, but it won't work in some cases -- notice that the io
  routines we will patch can be used before the signal mechanism is setup.

 I don't think it will help much at all: too many of the operations in
 question are invoked in places where CHECK_FOR_INTERRUPTS is a no-op.
 Examples:
 * disk writes are mostly done by the bgwriter and not backends at all
 * unlinks are generally done during xact commit/rollback

Right.

 Qingqing's point about failures in system()-invoked commands (think
 archive_command for PITR) is a mighty good one too.  That puts a
 serious crimp into any illusion that we can really fix this in any
 reliable way.


Not my credit, I just collect Rod  Greg's posts about this here :-) And I
still not sure what exactly the problem we want to fix here -- think our
target is the operation should not faild because of EINTR.

Regards,
Qingqing

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] EINTR error in SunOS

2006-01-01 Thread Doug Royer


From the Linux 'nfs' man page:

 intr   If  an  NFS file operation has a major timeout and it is
hard mounted, then allow signals to  interupt  the  file
operation  and  cause  it to return EINTR to the calling
program.  The default is to not allow file operations to
be interrupted.

Solaris 'mount_nfs' man page

 intr | nointr
Allow (do not allow) keyboard interrupts to kill
a  process  that  is  hung  while  waiting for a
response on  a  hard-mounted  file  system.  The
default  is  intr,  which  makes it possible for
clients to interrupt applications  that  may  be
waiting for a remote mount.

The Solaris and Linux defaults seem to be the opposite of each other.

So I think we are saying the same thing.

You can get EINTR with hard+intr mounts.

I am not sure what you get with soft mounts on a timeout.

Doug McNaught wrote:

Doug Royer [EMAIL PROTECTED] writes:



The 'intr' option to NFS is not the same as EINTR. It
it means 'if the server does not respond for a while,
then return an EINTR', just like any other disk read()
or write() does when it fails to reply.



No, you're thinking of 'soft'.  'intr' (which is actually a modifier
to the 'hard' setting) causes the I/O to hang until the server comes
back or the process gets a signal (in which case EINTR is returned).

-Doug

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


--

Doug Royer | http://INET-Consulting.com
---|-

  We Do Standards - You Need Standards

begin:vcard
fn:Doug Royer
n:Royer;Doug
org:INET-Consulting.com
adr:;;U.S.A
email;internet:[EMAIL PROTECTED]
title:CEO
tel;work:866-594-8574
tel;fax:866-594-8574
note;quoted-printable:AOL: SupportUnix=0D=0A=
	MSN: [EMAIL PROTECTED]
	Yahoo: Help4Unix
x-mozilla-html:FALSE
url:http://Royer.com
version:2.1
end:vcard



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [HACKERS] EINTR error in SunOS

2006-01-01 Thread Doug McNaught
Doug Royer [EMAIL PROTECTED] writes:

  From the Linux 'nfs' man page:

   intr   If  an  NFS file operation has a major timeout and it is
  hard mounted, then allow signals to  interupt  the  file
  operation  and  cause  it to return EINTR to the calling
  program.  The default is to not allow file operations to
  be interrupted.

 Solaris 'mount_nfs' man page

   intr | nointr
  Allow (do not allow) keyboard interrupts to kill
  a  process  that  is  hung  while  waiting for a
  response on  a  hard-mounted  file  system.  The
  default  is  intr,  which  makes it possible for
  clients to interrupt applications  that  may  be
  waiting for a remote mount.

 The Solaris and Linux defaults seem to be the opposite of each other.

Actually they're the same, though differently worded.  Major timeout
means the server has not responded for N milliseconds, not that the
client has decided to time out the request.  If 'hard' is set, the
client will keep trying indefinitely, though you can interrupt it if
you've specified 'intr'.

 So I think we are saying the same thing.

 You can get EINTR with hard+intr mounts.

Yes, *only* if the user specifically decides to send a signal, or if
it uses SIGALRM or whatever.  I agree that if you expect 'intr' to be
used, your code needs to handle EINTR.

 I am not sure what you get with soft mounts on a timeout.

The Linux manpage implies you get EIO.

-Doug

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] EINTR error in SunOS

2006-01-01 Thread Bruce Momjian

Let me give you a sky-high view of this.  Database reliability requires
that the disk drive be 100% reliable.  If any part of the disk storage
fails (I/O write failure, NFS failure) we have to assume that the disk
storage is corrupt and the database needs to be restored from backup. 
The NFS failure modes seem to suggest that any kind of NFS failure makes
our storage suspect, meaning we want NFS to be as non-failure mode as
possible.  Making PostgreSQL work on NFS system itself is risky, and
allowing it to work on systems that will soft-failure on writes seems
even worse.

---

Doug McNaught wrote:
 Doug Royer [EMAIL PROTECTED] writes:
 
   From the Linux 'nfs' man page:
 
intr   If  an  NFS file operation has a major timeout and it is
   hard mounted, then allow signals to  interupt  the  file
   operation  and  cause  it to return EINTR to the calling
   program.  The default is to not allow file operations to
   be interrupted.
 
  Solaris 'mount_nfs' man page
 
intr | nointr
   Allow (do not allow) keyboard interrupts to kill
   a  process  that  is  hung  while  waiting for a
   response on  a  hard-mounted  file  system.  The
   default  is  intr,  which  makes it possible for
   clients to interrupt applications  that  may  be
   waiting for a remote mount.
 
  The Solaris and Linux defaults seem to be the opposite of each other.
 
 Actually they're the same, though differently worded.  Major timeout
 means the server has not responded for N milliseconds, not that the
 client has decided to time out the request.  If 'hard' is set, the
 client will keep trying indefinitely, though you can interrupt it if
 you've specified 'intr'.
 
  So I think we are saying the same thing.
 
  You can get EINTR with hard+intr mounts.
 
 Yes, *only* if the user specifically decides to send a signal, or if
 it uses SIGALRM or whatever.  I agree that if you expect 'intr' to be
 used, your code needs to handle EINTR.
 
  I am not sure what you get with soft mounts on a timeout.
 
 The Linux manpage implies you get EIO.
 
 -Doug
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] EINTR error in SunOS

2006-01-01 Thread Doug Royer


Yes - if you assume that EINTR only happens on NFS mounts.
My point is that independent of NFS, the error checking
that I have found in the code is not complete even for
non-NFS file systems.


The read() and write() LINUX man pages do NOT specify that EINTR
is an NFS-only error.

 EINTR  The call was interrupted by a signal before any data was
read.

The read() and write() SOLARIS man pages say:

 EINTR A signal was caught during the read operation  and  no
   data was transferred.

There are other SVR read() and write() errors:

EOVERFLOW (read)
   The file is a regular file, nbyte is greater  than  0,
   the  starting  position is before the end-of-file, and
   the starting position is greater than or equal to  the
   offset  maximum  established in the open file descrip-
   tion associated with fildes.

EDEADLK
   The write was going  to  go  to  sleep   and  cause  a
   deadlock situation to occur.

 EDQUOT
   The user's quota of disk blocks  on  the  file  system
   containing the file has been exhausted.

 EFBIG  (write)
   An attempt is made to write a file  that  exceeds  the
   process's  file  size  limit  or the maximum file size
   (see getrlimit(2) and ulimit(2)).

 EFBIG The file is a regular file, nbyte is greater  than  0,
   and  the starting position is greater than or equal to
   the offset maximum established in the file description
   associated with fildes.

 ENOSPC
   During a write to an ordinary file, there is no   free
   space left on the device.




Bruce Momjian wrote:

Let me give you a sky-high view of this.  Database reliability requires
that the disk drive be 100% reliable.  If any part of the disk storage
fails (I/O write failure, NFS failure) we have to assume that the disk
storage is corrupt and the database needs to be restored from backup. 
The NFS failure modes seem to suggest that any kind of NFS failure makes

our storage suspect, meaning we want NFS to be as non-failure mode as
possible.  Making PostgreSQL work on NFS system itself is risky, and
allowing it to work on systems that will soft-failure on writes seems
even worse.


--

Doug Royer | http://INET-Consulting.com
---|-

  We Do Standards - You Need Standards

begin:vcard
fn:Doug Royer
n:Royer;Doug
org:INET-Consulting.com
adr:;;U.S.A
email;internet:[EMAIL PROTECTED]
title:CEO
tel;work:866-594-8574
tel;fax:866-594-8574
note;quoted-printable:AOL: SupportUnix=0D=0A=
	MSN: [EMAIL PROTECTED]
	Yahoo: Help4Unix
x-mozilla-html:FALSE
url:http://Royer.com
version:2.1
end:vcard



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [HACKERS] EINTR error in SunOS

2006-01-01 Thread Doug McNaught
Doug Royer [EMAIL PROTECTED] writes:

 The MOUNT options are opposite.

 Linux NFS mount   - defualts to no-intr
 Solaris NFS mount - default to intr

Oh, right--I didn't realize that was what you were talking about.

-Doug

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] EINTR error in SunOS

2006-01-01 Thread Doug McNaught
Doug Royer [EMAIL PROTECTED] writes:

 Yes - if you assume that EINTR only happens on NFS mounts.
 My point is that independent of NFS, the error checking
 that I have found in the code is not complete even for
 non-NFS file systems.


 The read() and write() LINUX man pages do NOT specify that EINTR
 is an NFS-only error.

   EINTR  The call was interrupted by a signal before any data was
  read.

Right, but I think that's because read() and write() also work on
sockets and serial ports, which are always interruptible.  I have not
heard of local-disk filesystem code on any Unix I've seen ever giving
EINTR--a process waiting for disk is always in D state, which means
it's not interruptible by signals.  If I have the time maybe I'll
grovel through the Linux sources and verify this, but I'm pretty sure
of it. 

I'm not a PG internals expert by any means, but my $0.02 on this is
that we should:

a) recommend NOT using NFS for the database storage
b) if NFS must be used, recommend 'hard,nointr' mounts
c) treat EINTR as an I/O error (I don't know how easy this would be)
d) say if you mount 'soft' and lose data, tough luck for you

-Doug

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 Qingqing Zhou [EMAIL PROTECTED] writes:
 I have patched IO routines in backend/storage that POSIX says EINTR is
 possible except unlink(). Though POSIX says EINTR is not possible, during
 many regressions, I found it sometimes sets this errno on NFS (I still
 don't know where is the smoking-gun):

 Well there is a reason intr is not the default for NFS mounts. It's precisely
 because it breaks the traditional unix filesystem interface.

Yeah.  We have looked at this before and decided that trying to defend
against it is too invasive and too fragile (how will you ever be sure
you've fixed everyplace, or keep other places from sneaking in later?)

What I'd rather do is document prominently that running a DB over NFS
isn't recommended, and running it over NFS with interrupts allowed is
just not going to work.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Qingqing Zhou


On Sat, 31 Dec 2005, Tom Lane wrote:

 What I'd rather do is document prominently that running a DB over NFS
 isn't recommended, and running it over NFS with interrupts allowed is
 just not going to work.


Agreed. IO syscalls is not the only problem for NFS -- if we can't fix
them in a run, then don't do it.

Regards,
Qingqing

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Rod Taylor
On Sat, 2005-12-31 at 14:40 -0500, Tom Lane wrote:
 Greg Stark [EMAIL PROTECTED] writes:
  Qingqing Zhou [EMAIL PROTECTED] writes:
  I have patched IO routines in backend/storage that POSIX says EINTR is
  possible except unlink(). Though POSIX says EINTR is not possible, during
  many regressions, I found it sometimes sets this errno on NFS (I still
  don't know where is the smoking-gun):
 
  Well there is a reason intr is not the default for NFS mounts. It's 
  precisely
  because it breaks the traditional unix filesystem interface.

 What I'd rather do is document prominently that running a DB over NFS
 isn't recommended, and running it over NFS with interrupts allowed is
 just not going to work.

Are there issues with having an archive_command which does things with
NFS based filesystems?

-- 


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Greg Stark

Qingqing Zhou [EMAIL PROTECTED] writes:

 On Sat, 31 Dec 2005, Tom Lane wrote:
 
  What I'd rather do is document prominently that running a DB over NFS
  isn't recommended, and running it over NFS with interrupts allowed is
  just not going to work.
 
 Agreed. IO syscalls is not the only problem for NFS -- if we can't fix
 them in a run, then don't do it.

I don't think that's reasonable. The NFS intr option breaks the traditional
unix filesystem semantics which breaks a lot of older or naive programs. But
that's no reason to decide that Postgres can't handle the new semantics.

Handling EINTR after all file system calls doesn't sound like it would be
terribly hard. And Postgres of all systems has the infrastructure necessary to
handle error conditions, abort and roll back the transaction when a file
system error occurs. I think mainly this means it would be possible to hit C-c
or shut down postgres (uncleanly) when there's a network outage.

-- 
greg


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Qingqing Zhou


On Sat, 31 Dec 2005, Greg Stark wrote:


 I don't think that's reasonable. The NFS intr option breaks the traditional
 unix filesystem semantics which breaks a lot of older or naive programs. But
 that's no reason to decide that Postgres can't handle the new semantics.


Is that by default the EINTR is truned off in NFS? If so, I don't see that
will be a problem. Sorry for my limited knowledge, is there any
requirements/benefits that people turn on EINTR?

 Handling EINTR after all file system calls doesn't sound like it would be
 terribly hard.

The problem is not restricted to file system. Actually my patched
version(only backend/storage) passed hundreds times of regression without
any problem, but EINTR can hurt other syscalls as well. Find out *all* the
EINTR situtations may need big efforts AFAICS.

Regards,
Qingqing

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Greg Stark

Qingqing Zhou [EMAIL PROTECTED] writes:

 On Sat, 31 Dec 2005, Greg Stark wrote:
 
 
  I don't think that's reasonable. The NFS intr option breaks the traditional
  unix filesystem semantics which breaks a lot of older or naive programs. But
  that's no reason to decide that Postgres can't handle the new semantics.
 
 
 Is that by default the EINTR is truned off in NFS? If so, I don't see that
 will be a problem. Sorry for my limited knowledge, is there any
 requirements/benefits that people turn on EINTR?

That's why the intr option (and the soft) option has traditionally not
been enabled by default in NFS implementations. But many people don't like
that when their NFS server disappears their client applications become
unkillable. They like to be able to hit C-c and stop whatever is running.

In the case of Postgres having intr off on the NFS mount point would mean
you couldn't C-c a query stuck because the database is on NFS. Of course it's
not like you would be able to run any more queries after that, but you might
want your terminal back.

You wouldn't even be able to shut down Postgres, even with kill -9. If your
NFS server is unrecoverable and you want to bring up a Postgres instance using
a backup restored some other place you would have to bring it up on another
port or reboot your machine.

That's the kind of thing that leads lots of sysadmins to use the intr and
soft options. And those sysadmins generally aren't aware of these kinds of
consequences since it's more of a programming level issue.

  Handling EINTR after all file system calls doesn't sound like it would be
  terribly hard.
 
 The problem is not restricted to file system. Actually my patched
 version(only backend/storage) passed hundreds times of regression without
 any problem, but EINTR can hurt other syscalls as well. Find out *all* the
 EINTR situtations may need big efforts AFAICS.

Well NFS is only going to affect filesystem calls. If there are other syscalls
that can signal EINTR on some obscure platform where Postgres isn't handling
it then that's just a run-of-the-mill porting issue.

But like I mentioned in the other thread POSIX is of no help here. With the
exception of the pthreads syscalls POSIX doesn't prohibit functions from
signalling errors other than the ones documented in the specification. So in
other words, just about any function can signal just about any error including
errors that are proprietary additions any time. Good luck :)

-- 
greg


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Qingqing Zhou


On Sat, 31 Dec 2005, Greg Stark wrote:

 Qingqing Zhou [EMAIL PROTECTED] writes:

 
  Is that by default the EINTR is truned off in NFS? If so, I don't see that
  will be a problem. Sorry for my limited knowledge, is there any
  requirements/benefits that people turn on EINTR?

 That's why the intr option (and the soft) option has traditionally not
 been enabled by default in NFS implementations. But many people don't like
 that when their NFS server disappears their client applications become
 unkillable. They like to be able to hit C-c and stop whatever is running.


Thanks Greg and Martin, I now understand better of intr :-) So we can
killed Postgres or not depends on our signal handler. Query Cancel signal
won't work because ImmediateInterruptOK forbids it and the retry style
code in read/write will put the Postgres process into uninterruptable
sleep again. But die signal will work I think.

Regards,
Qingqing


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Greg Stark

Rod Taylor [EMAIL PROTECTED] writes:

 Are there issues with having an archive_command which does things with
 NFS based filesystems?

Well, whatever command you use for archive_command -- probably just cp if
you're using NFS would hang if the NFS server went away. What would happen
then might be interesting. If Postgres finds the archive_command hanging
indefinitely will it correctly avoid recycling the WAL log indefinitely? I
assume so.

What's nonoptimal here is that I don't think there would be any warning that
anything was wrong until the WAL logs eventually filled up their filesystem
and then postgres stopped running. In the meantime your archived WAL logs
would be getting older and older and you would have no indication that
anything was failing.

This was the intention with the NFS error handling. The theory being that
eventually the server comes back up and things resume functioning exactly
where they left off with no lost operations. The upside is you don't have
things failing, then resuming later and unhandled errors in the meantime
leading to data corruption. The downside is there's no way for cp and
ultimately Postgres to know anything's wrong except to have a timeout itself
and an arbitrary maximum amount of time to expect operations to take.

-- 
greg


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Doug Royer


EINTR on read() or write() is not unique to NFS.
It can happen on many file systems - it is just seen
less frequently on most of them.

The code should be able to handle ANY valid read()
and write() errno. And EINTR is documented on Linux, BSD,
Solaris (1 and 2), and POSIX.

Even the Linux man pages can return ENTER on read() and
write(). This can happen on soft-mirrors, SCSI disks, and SOME
other disk drivers when they have errors.

The 'intr' option to NFS is not the same as EINTR. It
it means 'if the server does not respond for a while,
then return an EINTR', just like any other disk read()
or write() does when it fails to reply.

I have seen lots of open source code that assumes that all
disk reads and writs work 100% or fail 100%. Many do not
check the return value to see if all data was written or
read from disk. And many do not look at errno at all.
I have NOT looked to see how postgres does it.

If storage/*.c is where the reads occur, it does
very LITTLE when checking for errors.



Handling EINTR after all file system calls doesn't sound like it would be
terribly hard.


The problem is not restricted to file system. Actually my patched
version(only backend/storage) passed hundreds times of regression without
any problem, but EINTR can hurt other syscalls as well. Find out *all* the
EINTR situtations may need big efforts AFAICS.



Well NFS is only going to affect filesystem calls. If there are other syscalls
that can signal EINTR on some obscure platform where Postgres isn't handling
it then that's just a run-of-the-mill porting issue.

But like I mentioned in the other thread POSIX is of no help here. With the
exception of the pthreads syscalls POSIX doesn't prohibit functions from
signalling errors other than the ones documented in the specification. So in
other words, just about any function can signal just about any error including
errors that are proprietary additions any time. Good luck :)



--

Doug Royer | http://INET-Consulting.com
---|-

  We Do Standards - You Need Standards

begin:vcard
fn:Doug Royer
n:Royer;Doug
org:INET-Consulting.com
adr:;;U.S.A
email;internet:[EMAIL PROTECTED]
title:CEO
tel;work:866-594-8574
tel;fax:866-594-8574
note;quoted-printable:AOL: SupportUnix=0D=0A=
	MSN: [EMAIL PROTECTED]
	Yahoo: Help4Unix
x-mozilla-html:FALSE
url:http://Royer.com
version:2.1
end:vcard



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Doug McNaught
Doug Royer [EMAIL PROTECTED] writes:

 The 'intr' option to NFS is not the same as EINTR. It
 it means 'if the server does not respond for a while,
 then return an EINTR', just like any other disk read()
 or write() does when it fails to reply.

No, you're thinking of 'soft'.  'intr' (which is actually a modifier
to the 'hard' setting) causes the I/O to hang until the server comes
back or the process gets a signal (in which case EINTR is returned).

-Doug

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Qingqing Zhou

Greg Stark [EMAIL PROTECTED] wrote

 Well NFS is only going to affect filesystem calls. If there are other 
 syscalls
 that can signal EINTR on some obscure platform where Postgres isn't 
 handling
 it then that's just a run-of-the-mill porting issue.


Ok, NFS just affects filesystem calls(I mix it with another problem). If 
possible, I hope we can draw some conclusion / schetch a fix plan here for 
future developers who want to come up with a patch. The question is:

Where and how should we fix exactly in order to incorporate intr NFS in 
server side?

More details we write down here, more feasible/infeasible plan we can get. I 
could think of these places:

+ direct file system calls
- open() family, fopen() family in backend/storage
- scattered open() etc in the whole backend (seems unlink is with 
biggest problem)

The problem of above is if a signal sneaks in, these syscalls will fail. 
With a retry, we can fix it.

+ indirect file system calls
- system(xxx) calls, xxx = cp, etc.

If intr NFS is enabled, what's the problem exactly?


Any others?

Regards,
Qingqing




---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Greg Stark

Qingqing Zhou [EMAIL PROTECTED] writes:

 The problem of above is if a signal sneaks in, these syscalls will fail. 
 With a retry, we can fix it.

It's a bit stickier than that but only a bit. If you just retry then you're
saying users have to use kill -9 to get away from the situation. For some
filesystem operations that may be the best we can do. But for most it ought to
be possible to CHECK_FOR_INTERRUPTS() and handle the regular signals like C-c
or kill -1 normally. Even having the single backend exit (to avoid file
resource leaks) is nicer than having to restart the entire instance.

-- 
greg


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] EINTR error in SunOS

2005-12-31 Thread Qingqing Zhou


On Sun, 1 Jan 2006, Greg Stark wrote:

 Qingqing Zhou [EMAIL PROTECTED] writes:

  The problem of above is if a signal sneaks in, these syscalls will fail.
  With a retry, we can fix it.

 It's a bit stickier than that but only a bit. If you just retry then you're
 saying users have to use kill -9 to get away from the situation. For some
 filesystem operations that may be the best we can do. But for most it ought to
 be possible to CHECK_FOR_INTERRUPTS() and handle the regular signals like C-c
 or kill -1 normally. Even having the single backend exit (to avoid file
 resource leaks) is nicer than having to restart the entire instance.


I understand put a CHECK_FOR_INTERRUPTS() in the retry-loop may make more
graceful stop, but it won't work in some cases -- notice that the io
routines we will patch can be used before the signal mechanism is setup.

Regards,
Qingqing

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] EINTR error in SunOS

2005-12-30 Thread Qingqing Zhou


On Fri, 30 Dec 2005, Tom Lane wrote:

 I've heard of this in connection with NFS ... is your DB on an NFS
 filesystem by any chance?


I have patched IO routines in backend/storage that POSIX says EINTR is
possible except unlink(). Though POSIX says EINTR is not possible, during
many regressions, I found it sometimes sets this errno on NFS (I still
don't know where is the smoking-gun):

  TRUNCATE TABLE trunc_c,trunc_d,trunc_e;   -- ok
+ WARNING:  could not remove relation 1663/16384/37822: Interrupted system call

There are many other unlink() scattered in backend, some even without
error check. Shall we patch pg_unlink for this situation and replace them
like this:

pg_unlink(const char* path, int errlevel)
{
retry:
returnCode = unlink(path);
if (returnCode  0  errno==EINTR)
goto retry;

if other_errors
elog(elevel, ...);

return returnCode;
}

Or

pg_unlink(const char* path)
{
/* no elog -- but we still have to do error check */
}

Or

let it be ...

If we decide to do something for unlink(), then we'd better do something
for other EINTR-possible IO routines for fairness :-)

By the way, seems POSIX is not very consistent with EINTR. For example,
closedir() can set EINTR, but opendir()/readdir() can't. Any magic in it?

Regards,
Qingqing

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] EINTR error in SunOS

2005-12-30 Thread Greg Stark

Qingqing Zhou [EMAIL PROTECTED] writes:

 On Fri, 30 Dec 2005, Tom Lane wrote:
 
  I've heard of this in connection with NFS ... is your DB on an NFS
  filesystem by any chance?
 
 I have patched IO routines in backend/storage that POSIX says EINTR is
 possible except unlink(). Though POSIX says EINTR is not possible, during
 many regressions, I found it sometimes sets this errno on NFS (I still
 don't know where is the smoking-gun):

Well there is a reason intr is not the default for NFS mounts. It's precisely
because it breaks the traditional unix filesystem interface. Syscalls that
historically are not interruptible become interruptible and not all programs
behave properly when that occurs.

In any case POSIX explicitly allows functions to return other errors aside
from those specified as long as it's for error conditions not listed.

[Chapter 2 Section 3, paragraph 6]

  Implementations may support additional errors not included in this list, may
  generate errors included in this list under circumstances other than those
  described here, or may contain extensions or limitations that prevent some
  errors from occurring. The ERRORS section on each reference page specifies
  whether an error shall be returned, or whether it may be returned.
  Implementations shall not generate a different error number from the ones
  described here for error conditions described in this volume of IEEE Std
  1003.1-2001, but may generate additional errors unless explicitly disallowed
  for a particular function


Ironically EINTR *is* singled out to be specifically forbidden to be returned
from some system calls but only those in the Threads option which are mostly
pthread* functions. unlink isn't covered by that prohibition.

-- 
greg


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] EINTR error in SunOS

2005-12-29 Thread Qingqing Zhou

I encountered an error today (can't repeat) on SunOS 5.8:

  --test that we read consecutive LFs properly
  CREATE TEMP TABLE testnl (a int, b text, c int);
+ ERROR:  could not open relation 1663/16384/37713: Interrupted system call

The reason I guess is the open() call is interrupted by a signal (what
signal BTW?). This error may be specific to SunOS/Solaris, but POSIX does
say that an EINTR is possible on open(), close(), read(), write() and also
the fopen() family:

http://www.opengroup.org/onlinepubs/007908799/xsh/open.html

We have patched read()/write(), shall we do so to open()/close() and also
fopen() family? Patching files other than fd.c seems unnecessary for two
reasons: (1) they are not frequently exercised; (2) they don't have the
basic errno-check code there.

Regards,
Qingqing

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] EINTR error in SunOS

2005-12-29 Thread Tom Lane
Qingqing Zhou [EMAIL PROTECTED] writes:
 + ERROR:  could not open relation 1663/16384/37713: Interrupted system call

 The reason I guess is the open() call is interrupted by a signal (what
 signal BTW?).

I've heard of this in connection with NFS ... is your DB on an NFS
filesystem by any chance?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] EINTR error in SunOS

2005-12-29 Thread Qingqing Zhou

Tom Lane [EMAIL PROTECTED] wrote
 Qingqing Zhou [EMAIL PROTECTED] writes:
 + ERROR:  could not open relation 1663/16384/37713: Interrupted system 
 call

 The reason I guess is the open() call is interrupted by a signal (what
 signal BTW?).

 I've heard of this in connection with NFS ... is your DB on an NFS
 filesystem by any chance?


Exactly. I guess school machines love NFS.

Regards,
Qingqing 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org