Re: select() setting ERESTARTNOHAND (514).
On Thu, Jan 11, 2007 at 11:22:46PM +0100, bert hubert wrote: >Anything else relevant? Do you know which signal interrupted select? Is this >a single or multithreaded application? And where did the signal come from? It is, AFAIK, a multi-threaded application. I don't have any information on which signal interrupted the process. I'll ask the person who reported it to me, Doug, to respond with additional information. >I tried to reproduce your problem in various ways on 2.6.20-rc4, but it >didn't appear. Thanks. Sean -- [...] Premature optimization is the root of all evil. -- Donald Knuth Sean Reifschneider, Member of Technical Staff <[EMAIL PROTECTED]> tummy.com, ltd. - Linux Consulting since 1995: Ask me about High Availability - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: select() setting ERESTARTNOHAND (514).
On Thu, Jan 11, 2007 at 11:22:46PM +0100, bert hubert wrote: Anything else relevant? Do you know which signal interrupted select? Is this a single or multithreaded application? And where did the signal come from? It is, AFAIK, a multi-threaded application. I don't have any information on which signal interrupted the process. I'll ask the person who reported it to me, Doug, to respond with additional information. I tried to reproduce your problem in various ways on 2.6.20-rc4, but it didn't appear. Thanks. Sean -- [...] Premature optimization is the root of all evil. -- Donald Knuth Sean Reifschneider, Member of Technical Staff [EMAIL PROTECTED] tummy.com, ltd. - Linux Consulting since 1995: Ask me about High Availability - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: select() setting ERESTARTNOHAND (514).
On Thu, Jan 11, 2007 at 01:25:16AM -0700, Sean Reifschneider wrote: > Nope, I haven't looked in strace at all. It's definitely making it to > user-space. The code in question is (abbreviated): > >if (select(0, (fd_set *)0, (fd_set *)0, (fd_set *)0, ) != 0) { > PyErr_SetFromErrno(PyExc_IOError); > return -1; > } Anything else relevant? Do you know which signal interrupted select? Is this a single or multithreaded application? And where did the signal come from? I tried to reproduce your problem in various ways on 2.6.20-rc4, but it didn't appear. Thanks. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PATCH - x86-64 signed-compare bug, was Re: select() setting ERESTARTNOHAND (514).
On Thursday 11 January 2007 02:02, Neil Brown wrote: > If regs->rax is unsigned long, then I would think the compiler would > be allowed to convert > >switch (regs->rax) { > case -514 : whatever; >} > > to a no-op, as regs->rax will never have a negative value. In C, you never actually compare different types. They always promoted to some common type first. both sides of (impicit) == here get promoted to "biggest" integer, in this case, to unsigned long. "-514" is an int, so it gets sign extended to the width of "long" and then converted to unsigned long. -- vda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: select() setting ERESTARTNOHAND (514).
On Wed, Jan 10, 2007 at 05:15:20PM -0800, David Miller wrote: >If you're only seeing it in strace, that's expected due to some Nope, I haven't looked in strace at all. It's definitely making it to user-space. The code in question is (abbreviated): if (select(0, (fd_set *)0, (fd_set *)0, (fd_set *)0, ) != 0) { PyErr_SetFromErrno(PyExc_IOError); return -1; } which causes the Python interpreter to raise an IOError exception, including the value of errno, which is 514. Thanks, Sean -- This mountain is PURE SNOW! Do you know what the street value of this mountain is!?!-- Better Off Dead Sean Reifschneider, Member of Technical Staff <[EMAIL PROTECTED]> tummy.com, ltd. - Linux Consulting since 1995: Ask me about High Availability - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: select() setting ERESTARTNOHAND (514).
On Wed, Jan 10, 2007 at 05:15:20PM -0800, David Miller wrote: If you're only seeing it in strace, that's expected due to some Nope, I haven't looked in strace at all. It's definitely making it to user-space. The code in question is (abbreviated): if (select(0, (fd_set *)0, (fd_set *)0, (fd_set *)0, t) != 0) { PyErr_SetFromErrno(PyExc_IOError); return -1; } which causes the Python interpreter to raise an IOError exception, including the value of errno, which is 514. Thanks, Sean -- This mountain is PURE SNOW! Do you know what the street value of this mountain is!?!-- Better Off Dead Sean Reifschneider, Member of Technical Staff [EMAIL PROTECTED] tummy.com, ltd. - Linux Consulting since 1995: Ask me about High Availability - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PATCH - x86-64 signed-compare bug, was Re: select() setting ERESTARTNOHAND (514).
On Thursday 11 January 2007 02:02, Neil Brown wrote: If regs-rax is unsigned long, then I would think the compiler would be allowed to convert switch (regs-rax) { case -514 : whatever; } to a no-op, as regs-rax will never have a negative value. In C, you never actually compare different types. They always promoted to some common type first. both sides of (impicit) == here get promoted to biggest integer, in this case, to unsigned long. -514 is an int, so it gets sign extended to the width of long and then converted to unsigned long. -- vda - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: select() setting ERESTARTNOHAND (514).
On Thu, Jan 11, 2007 at 01:25:16AM -0700, Sean Reifschneider wrote: Nope, I haven't looked in strace at all. It's definitely making it to user-space. The code in question is (abbreviated): if (select(0, (fd_set *)0, (fd_set *)0, (fd_set *)0, t) != 0) { PyErr_SetFromErrno(PyExc_IOError); return -1; } Anything else relevant? Do you know which signal interrupted select? Is this a single or multithreaded application? And where did the signal come from? I tried to reproduce your problem in various ways on 2.6.20-rc4, but it didn't appear. Thanks. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PATCH - x86-64 signed-compare bug, was Re: select() setting ERESTARTNOHAND (514).
On Thu, Jan 11, 2007 at 12:02:53PM +1100, Neil Brown wrote: >On Thursday January 11, [EMAIL PROTECTED] wrote: >> Normally it should be only visible in strace. Did you see it without >> strace? > >No, only in strace. I am absolutely seeing it outside of strace. It is showing up as an errno to the select call: if (select(0, (fd_set *)0, (fd_set *)0, (fd_set *)0, ) != 0) { if (errno != EINTR) { PyErr_SetFromErrno(PyExc_IOError); return -1; } This code is seeing errno=514. >> > You don't mention in the Email which kernel version you use but I see >> > from the web page you reference it is 2.6.19.1. I'm using The production system is running CentOS 4.4, 2.6.9 kernel. However, it looks to be the same issue all the way up to 2.6.19.1, and google shows reports of it on 2.6.17. Thanks, Sean -- George Washington was first in war, first in peace -- and first to have his birthday juggled to make a long weekend. Sean Reifschneider, Member of Technical Staff <[EMAIL PROTECTED]> tummy.com, ltd. - Linux Consulting since 1995: Ask me about High Availability - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PATCH - x86-64 signed-compare bug, was Re: select() setting ERESTARTNOHAND (514).
On Thursday 11 January 2007 02:02, Neil Brown wrote: > On Thursday January 11, [EMAIL PROTECTED] wrote: > > > Just a 'me too' at this point. > > > The X server on my shiny new notebook (Core 2 Duo) occasionally dies > > > with 'select' repeatedly returning ERESTARTNOHAND. It is most > > > annoying! > > > > Normally it should be only visible in strace. Did you see it without > > strace? > > No, only in strace. strace leaks internal errors. At some point that should be fixed, but it's not really a serious problem. There was one other report of internal errors leaking without strace, but it was vague and I never got confirmation. > Still, I think it would be safer to have the cast, in case the compiler > decided to be clever or does the C standard ensure against that? It does. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: select() setting ERESTARTNOHAND (514).
From: Sean Reifschneider <[EMAIL PROTECTED]> Date: Wed, 10 Jan 2007 18:04:29 -0700 > On Wed, Jan 10, 2007 at 04:27:47PM -0800, David Miller wrote: > >It gets caught by the return into userspace code. > > Ok, so somehow it is leaking. I have a system in the lab that is the same > hardware as production, but it currently has no, you know, hard drives in > it, so some assembly is required. I'll see if I can reproduce it in a test > environment and then see if I can get more information on when/where it is > leaking. If you're only seeing it in strace, that's expected due to some unfortunate things in the way that x86 and x86_64 handle signal return events via ptrace(). On sparc and sparc64 I fixed this long ago such that ptrace() will update the user registers before ptrace parents are notified, and therefore you'll never see those kernel internal error codes. The upside of this is that you'll really need to see what value is making it to the application. What the kernel is probably doing is looping trying to restart the system call and sending the signal. If it's doing that the application is being rewound to call the system call again once the signal handler returns (if that is even being run at all). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: select() setting ERESTARTNOHAND (514).
On Wed, Jan 10, 2007 at 04:27:47PM -0800, David Miller wrote: >It gets caught by the return into userspace code. Ok, so somehow it is leaking. I have a system in the lab that is the same hardware as production, but it currently has no, you know, hard drives in it, so some assembly is required. I'll see if I can reproduce it in a test environment and then see if I can get more information on when/where it is leaking. >Note that select() only returns these values when signal_pending() >is true. Yes, I saw that. I didn't fully understand it, but I saw it. Thanks, Sean -- CChheecckk yyoouurr dduupplleexx sswwiittcchh.. Sean Reifschneider, Member of Technical Staff <[EMAIL PROTECTED]> tummy.com, ltd. - Linux Consulting since 1995: Ask me about High Availability Back off man. I'm a scientist. http://HackingSociety.org/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PATCH - x86-64 signed-compare bug, was Re: select() setting ERESTARTNOHAND (514).
On Thursday January 11, [EMAIL PROTECTED] wrote: > > Just a 'me too' at this point. > > The X server on my shiny new notebook (Core 2 Duo) occasionally dies > > with 'select' repeatedly returning ERESTARTNOHAND. It is most > > annoying! > > Normally it should be only visible in strace. Did you see it without > strace? No, only in strace. > > > > > You don't mention in the Email which kernel version you use but I see > > from the web page you reference it is 2.6.19.1. I'm using > > 2.6.18.something. > > > > I thought I'd have a quick look at the code, comparing i386 to x86-64 > > and guess what I found. > > > > On x86-64, regs->rax is "unsigned long", so the following is > > needed > > regs->rax is unsigned long. > I don't think your patch will make any difference. What do you think > it will change? If regs->rax is unsigned long, then I would think the compiler would be allowed to convert switch (regs->rax) { case -514 : whatever; } to a no-op, as regs->rax will never have a negative value. However it appears that the current compiler doesn't make that optimisation so I guess I was too hasty. Still, I think it would be safer to have the cast, in case the compiler decided to be clever or does the C standard ensure against that? Sorry for the noise, NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PATCH - x86-64 signed-compare bug, was Re: select() setting ERESTARTNOHAND (514).
From: Neil Brown <[EMAIL PROTECTED]> Date: Thu, 11 Jan 2007 11:37:05 +1100 > On x86-64, regs->rax is "unsigned long", so the following is > needed > > I haven't tried it yet. Doesn't type promotion take care of that? Did you verify that assember? I checked the assembler on sparc64 for similar constructs and it does the right thing. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PATCH - x86-64 signed-compare bug, was Re: select() setting ERESTARTNOHAND (514).
On Thursday 11 January 2007 01:37, Neil Brown wrote: > On Wednesday January 10, [EMAIL PROTECTED] wrote: > > > > In looking at the Linux code for ERESTARTNOHAND, I see that > > include/linux/errno.h says this errno should never make it to the user. > > However, in this instance we ARE seeing it. Looking around on google shows > > others are seeing it as well, though hits are few. > .. > > > > Thoughts? > > Just a 'me too' at this point. > The X server on my shiny new notebook (Core 2 Duo) occasionally dies > with 'select' repeatedly returning ERESTARTNOHAND. It is most > annoying! Normally it should be only visible in strace. Did you see it without strace? > > You don't mention in the Email which kernel version you use but I see > from the web page you reference it is 2.6.19.1. I'm using > 2.6.18.something. > > I thought I'd have a quick look at the code, comparing i386 to x86-64 > and guess what I found. > > On x86-64, regs->rax is "unsigned long", so the following is > needed regs->rax is unsigned long. I don't think your patch will make any difference. What do you think it will change? -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
PATCH - x86-64 signed-compare bug, was Re: select() setting ERESTARTNOHAND (514).
On Wednesday January 10, [EMAIL PROTECTED] wrote: > > In looking at the Linux code for ERESTARTNOHAND, I see that > include/linux/errno.h says this errno should never make it to the user. > However, in this instance we ARE seeing it. Looking around on google shows > others are seeing it as well, though hits are few. .. > > Thoughts? Just a 'me too' at this point. The X server on my shiny new notebook (Core 2 Duo) occasionally dies with 'select' repeatedly returning ERESTARTNOHAND. It is most annoying! You don't mention in the Email which kernel version you use but I see from the web page you reference it is 2.6.19.1. I'm using 2.6.18.something. I thought I'd have a quick look at the code, comparing i386 to x86-64 and guess what I found. On x86-64, regs->rax is "unsigned long", so the following is needed I haven't tried it yet. NeilBrown Signed-off-by: Neil Brown <[EMAIL PROTECTED]> ### Diffstat output ./arch/x86_64/kernel/signal.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff .prev/arch/x86_64/kernel/signal.c ./arch/x86_64/kernel/signal.c --- .prev/arch/x86_64/kernel/signal.c 2007-01-11 11:33:27.0 +1100 +++ ./arch/x86_64/kernel/signal.c 2007-01-11 11:34:01.0 +1100 @@ -331,7 +331,7 @@ handle_signal(unsigned long sig, siginfo /* Are we from a system call? */ if ((long)regs->orig_rax >= 0) { /* If so, check system call restarting.. */ - switch (regs->rax) { + switch ((long)regs->rax) { case -ERESTART_RESTARTBLOCK: case -ERESTARTNOHAND: regs->rax = -EINTR; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: select() setting ERESTARTNOHAND (514).
From: Sean Reifschneider <[EMAIL PROTECTED]> Date: Wed, 10 Jan 2007 16:42:38 -0700 > In looking at the select() code, I see that there are definitely cases > where sys_select() or sys_pselect7() can return -ERESTARTNOHAND. However, > I don't know if this is expected to be caught elsewhere, or if returning it > here would send it back to user-space. Worse, I don't fully understand > what the impact would be of trapping the ERESTARTNOHAND in the > sys_select/sys_pselect7 functions would be. It gets caught by the return into userspace code. Specifically the signal dispatch should repair that return value to a valid error return code when it tries to dispatch the signal that select() set in the task struct. Note that select() only returns these values when signal_pending() is true. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
select() setting ERESTARTNOHAND (514).
I've been looking at an issue in Python where a "time.sleep(1)" will sporadically raise an IOError exception with errno=514. time.sleep() is implemented with select(), to get sub-second resolution. In looking at the Linux code for ERESTARTNOHAND, I see that include/linux/errno.h says this errno should never make it to the user. However, in this instance we ARE seeing it. Looking around on google shows others are seeing it as well, though hits are few. In looking at the select() code, I see that there are definitely cases where sys_select() or sys_pselect7() can return -ERESTARTNOHAND. However, I don't know if this is expected to be caught elsewhere, or if returning it here would send it back to user-space. Worse, I don't fully understand what the impact would be of trapping the ERESTARTNOHAND in the sys_select/sys_pselect7 functions would be. Is this something that's intended to be retrned back to the user, in which case the message in include/linux/errno.h should be corrected and people using time.sleep() in python will just have to live with it sometimes raising an exception? Or is it something that definitely should never reach the user-space code, and there's some leak. Just to be clear, this is happening only on one machine out of at least 4 where this has been tested. The machine where it's happening is a dual processor, dual core Xeon 2GHz 51xx series system. The other systems where it's not happening are single CPU Celeron or P4 class systems, though one is a 2-year-old quad CPU Xeon running something <2GHz, IIRC. More details on my investigation are at: http://www.tummy.com/journals/entries/jafo_20070110_154659 Thoughts? Thanks, Sean (Not subscribed, I'll use the list archive to follow-up) -- Electricity travels a foot in a nanosecond. -- Commodore Grace Murray Hopper Sean Reifschneider, Member of Technical Staff <[EMAIL PROTECTED]> tummy.com, ltd. - Linux Consulting since 1995: Ask me about High Availability - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
select() setting ERESTARTNOHAND (514).
I've been looking at an issue in Python where a time.sleep(1) will sporadically raise an IOError exception with errno=514. time.sleep() is implemented with select(), to get sub-second resolution. In looking at the Linux code for ERESTARTNOHAND, I see that include/linux/errno.h says this errno should never make it to the user. However, in this instance we ARE seeing it. Looking around on google shows others are seeing it as well, though hits are few. In looking at the select() code, I see that there are definitely cases where sys_select() or sys_pselect7() can return -ERESTARTNOHAND. However, I don't know if this is expected to be caught elsewhere, or if returning it here would send it back to user-space. Worse, I don't fully understand what the impact would be of trapping the ERESTARTNOHAND in the sys_select/sys_pselect7 functions would be. Is this something that's intended to be retrned back to the user, in which case the message in include/linux/errno.h should be corrected and people using time.sleep() in python will just have to live with it sometimes raising an exception? Or is it something that definitely should never reach the user-space code, and there's some leak. Just to be clear, this is happening only on one machine out of at least 4 where this has been tested. The machine where it's happening is a dual processor, dual core Xeon 2GHz 51xx series system. The other systems where it's not happening are single CPU Celeron or P4 class systems, though one is a 2-year-old quad CPU Xeon running something 2GHz, IIRC. More details on my investigation are at: http://www.tummy.com/journals/entries/jafo_20070110_154659 Thoughts? Thanks, Sean (Not subscribed, I'll use the list archive to follow-up) -- Electricity travels a foot in a nanosecond. -- Commodore Grace Murray Hopper Sean Reifschneider, Member of Technical Staff [EMAIL PROTECTED] tummy.com, ltd. - Linux Consulting since 1995: Ask me about High Availability - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: select() setting ERESTARTNOHAND (514).
From: Sean Reifschneider [EMAIL PROTECTED] Date: Wed, 10 Jan 2007 16:42:38 -0700 In looking at the select() code, I see that there are definitely cases where sys_select() or sys_pselect7() can return -ERESTARTNOHAND. However, I don't know if this is expected to be caught elsewhere, or if returning it here would send it back to user-space. Worse, I don't fully understand what the impact would be of trapping the ERESTARTNOHAND in the sys_select/sys_pselect7 functions would be. It gets caught by the return into userspace code. Specifically the signal dispatch should repair that return value to a valid error return code when it tries to dispatch the signal that select() set in the task struct. Note that select() only returns these values when signal_pending() is true. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
PATCH - x86-64 signed-compare bug, was Re: select() setting ERESTARTNOHAND (514).
On Wednesday January 10, [EMAIL PROTECTED] wrote: In looking at the Linux code for ERESTARTNOHAND, I see that include/linux/errno.h says this errno should never make it to the user. However, in this instance we ARE seeing it. Looking around on google shows others are seeing it as well, though hits are few. .. Thoughts? Just a 'me too' at this point. The X server on my shiny new notebook (Core 2 Duo) occasionally dies with 'select' repeatedly returning ERESTARTNOHAND. It is most annoying! You don't mention in the Email which kernel version you use but I see from the web page you reference it is 2.6.19.1. I'm using 2.6.18.something. I thought I'd have a quick look at the code, comparing i386 to x86-64 and guess what I found. On x86-64, regs-rax is unsigned long, so the following is needed I haven't tried it yet. NeilBrown Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./arch/x86_64/kernel/signal.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff .prev/arch/x86_64/kernel/signal.c ./arch/x86_64/kernel/signal.c --- .prev/arch/x86_64/kernel/signal.c 2007-01-11 11:33:27.0 +1100 +++ ./arch/x86_64/kernel/signal.c 2007-01-11 11:34:01.0 +1100 @@ -331,7 +331,7 @@ handle_signal(unsigned long sig, siginfo /* Are we from a system call? */ if ((long)regs-orig_rax = 0) { /* If so, check system call restarting.. */ - switch (regs-rax) { + switch ((long)regs-rax) { case -ERESTART_RESTARTBLOCK: case -ERESTARTNOHAND: regs-rax = -EINTR; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PATCH - x86-64 signed-compare bug, was Re: select() setting ERESTARTNOHAND (514).
On Thursday 11 January 2007 01:37, Neil Brown wrote: On Wednesday January 10, [EMAIL PROTECTED] wrote: In looking at the Linux code for ERESTARTNOHAND, I see that include/linux/errno.h says this errno should never make it to the user. However, in this instance we ARE seeing it. Looking around on google shows others are seeing it as well, though hits are few. .. Thoughts? Just a 'me too' at this point. The X server on my shiny new notebook (Core 2 Duo) occasionally dies with 'select' repeatedly returning ERESTARTNOHAND. It is most annoying! Normally it should be only visible in strace. Did you see it without strace? You don't mention in the Email which kernel version you use but I see from the web page you reference it is 2.6.19.1. I'm using 2.6.18.something. I thought I'd have a quick look at the code, comparing i386 to x86-64 and guess what I found. On x86-64, regs-rax is unsigned long, so the following is needed regs-rax is unsigned long. I don't think your patch will make any difference. What do you think it will change? -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PATCH - x86-64 signed-compare bug, was Re: select() setting ERESTARTNOHAND (514).
From: Neil Brown [EMAIL PROTECTED] Date: Thu, 11 Jan 2007 11:37:05 +1100 On x86-64, regs-rax is unsigned long, so the following is needed I haven't tried it yet. Doesn't type promotion take care of that? Did you verify that assember? I checked the assembler on sparc64 for similar constructs and it does the right thing. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PATCH - x86-64 signed-compare bug, was Re: select() setting ERESTARTNOHAND (514).
On Thursday January 11, [EMAIL PROTECTED] wrote: Just a 'me too' at this point. The X server on my shiny new notebook (Core 2 Duo) occasionally dies with 'select' repeatedly returning ERESTARTNOHAND. It is most annoying! Normally it should be only visible in strace. Did you see it without strace? No, only in strace. You don't mention in the Email which kernel version you use but I see from the web page you reference it is 2.6.19.1. I'm using 2.6.18.something. I thought I'd have a quick look at the code, comparing i386 to x86-64 and guess what I found. On x86-64, regs-rax is unsigned long, so the following is needed regs-rax is unsigned long. I don't think your patch will make any difference. What do you think it will change? If regs-rax is unsigned long, then I would think the compiler would be allowed to convert switch (regs-rax) { case -514 : whatever; } to a no-op, as regs-rax will never have a negative value. However it appears that the current compiler doesn't make that optimisation so I guess I was too hasty. Still, I think it would be safer to have the cast, in case the compiler decided to be clever or does the C standard ensure against that? Sorry for the noise, NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: select() setting ERESTARTNOHAND (514).
On Wed, Jan 10, 2007 at 04:27:47PM -0800, David Miller wrote: It gets caught by the return into userspace code. Ok, so somehow it is leaking. I have a system in the lab that is the same hardware as production, but it currently has no, you know, hard drives in it, so some assembly is required. I'll see if I can reproduce it in a test environment and then see if I can get more information on when/where it is leaking. Note that select() only returns these values when signal_pending() is true. Yes, I saw that. I didn't fully understand it, but I saw it. Thanks, Sean -- CChheecckk yyoouurr dduupplleexx sswwiittcchh.. Sean Reifschneider, Member of Technical Staff [EMAIL PROTECTED] tummy.com, ltd. - Linux Consulting since 1995: Ask me about High Availability Back off man. I'm a scientist. http://HackingSociety.org/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: select() setting ERESTARTNOHAND (514).
From: Sean Reifschneider [EMAIL PROTECTED] Date: Wed, 10 Jan 2007 18:04:29 -0700 On Wed, Jan 10, 2007 at 04:27:47PM -0800, David Miller wrote: It gets caught by the return into userspace code. Ok, so somehow it is leaking. I have a system in the lab that is the same hardware as production, but it currently has no, you know, hard drives in it, so some assembly is required. I'll see if I can reproduce it in a test environment and then see if I can get more information on when/where it is leaking. If you're only seeing it in strace, that's expected due to some unfortunate things in the way that x86 and x86_64 handle signal return events via ptrace(). On sparc and sparc64 I fixed this long ago such that ptrace() will update the user registers before ptrace parents are notified, and therefore you'll never see those kernel internal error codes. The upside of this is that you'll really need to see what value is making it to the application. What the kernel is probably doing is looping trying to restart the system call and sending the signal. If it's doing that the application is being rewound to call the system call again once the signal handler returns (if that is even being run at all). - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PATCH - x86-64 signed-compare bug, was Re: select() setting ERESTARTNOHAND (514).
On Thursday 11 January 2007 02:02, Neil Brown wrote: On Thursday January 11, [EMAIL PROTECTED] wrote: Just a 'me too' at this point. The X server on my shiny new notebook (Core 2 Duo) occasionally dies with 'select' repeatedly returning ERESTARTNOHAND. It is most annoying! Normally it should be only visible in strace. Did you see it without strace? No, only in strace. strace leaks internal errors. At some point that should be fixed, but it's not really a serious problem. There was one other report of internal errors leaking without strace, but it was vague and I never got confirmation. Still, I think it would be safer to have the cast, in case the compiler decided to be clever or does the C standard ensure against that? It does. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PATCH - x86-64 signed-compare bug, was Re: select() setting ERESTARTNOHAND (514).
On Thu, Jan 11, 2007 at 12:02:53PM +1100, Neil Brown wrote: On Thursday January 11, [EMAIL PROTECTED] wrote: Normally it should be only visible in strace. Did you see it without strace? No, only in strace. I am absolutely seeing it outside of strace. It is showing up as an errno to the select call: if (select(0, (fd_set *)0, (fd_set *)0, (fd_set *)0, t) != 0) { if (errno != EINTR) { PyErr_SetFromErrno(PyExc_IOError); return -1; } This code is seeing errno=514. You don't mention in the Email which kernel version you use but I see from the web page you reference it is 2.6.19.1. I'm using The production system is running CentOS 4.4, 2.6.9 kernel. However, it looks to be the same issue all the way up to 2.6.19.1, and google shows reports of it on 2.6.17. Thanks, Sean -- George Washington was first in war, first in peace -- and first to have his birthday juggled to make a long weekend. Sean Reifschneider, Member of Technical Staff [EMAIL PROTECTED] tummy.com, ltd. - Linux Consulting since 1995: Ask me about High Availability - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/