Re: Kernel bug with UNIX sockets not detecting other end gone?
On Thu, 17 May 2001, Alan Cox wrote: > > The following program blocks indefinitely on Linux (2.2, 2.4 not tested). > > Since the other end is clearly gone, I would expect some sort of error > > condition. Indeed, FreeBSD gives ECONNRESET. > > Since its a datagram socket Im not convinced thats a justifiable assumption. Hmm - there's definitely a Linux inconsistency here. With SOCK_DGRAM, read() is blocking but write() is giving ECONNRESET. The ECONNRESET makes sense to me (despite this being a datagram socket), because the sockets are anonymous. Once one end goes away, the other end is pretty useless. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Kernel bug with UNIX sockets not detecting other end gone?
Hi, I wonder if the following is a bug? It certainly differs from FreeBSD 4.2 behaviour, which gives the behaviour I would expect. The following program blocks indefinitely on Linux (2.2, 2.4 not tested). Since the other end is clearly gone, I would expect some sort of error condition. Indeed, FreeBSD gives ECONNRESET. #include #include #include #include int main(int argc, const char* argv[]) { int the_sockets[2]; int retval; char the_char; int opt = 1; retval = socketpair(PF_UNIX, SOCK_DGRAM, 0, the_sockets); if (retval != 0) { perror("socketpair"); exit(1); } close(the_sockets[0]); /* Linux (2.2) blocks here; FreeBSD does not */ retval = read(the_sockets[1], _char, sizeof(the_char)); } Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.2, 2.4 bug in sock_no_fcntl()/F_SETOWN? (fwd)
Resend (no response first time) -- Forwarded message -- Date: Wed, 24 Jan 2001 21:09:09 + (GMT) From: Chris Evans <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: 2.2, 2.4 bug in sock_no_fcntl()/F_SETOWN? Hi, Looking at the code for sock_no_fcntl() in net/core.c, I cannot specify "0" as a value for F_SETOWN, unless I'm the superuser. I believe this to be a bug, it stops de-registering an interest in SIGURG signals. Let me know if you want a patch. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.4-ac10
On Thu, 17 May 2001, Alan Cox wrote: > 2.4.4-ac10 [...] > - now 2.4.5pre vm seems sane dump other vmscan > experiments Has anyone benched 2.4.5pre3 vs 2.4.4 vs. ? Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.4-ac10
On Thu, 17 May 2001, Alan Cox wrote: 2.4.4-ac10 [...] - now 2.4.5pre vm seems sane dump other vmscan experiments Has anyone benched 2.4.5pre3 vs 2.4.4 vs. ? Cheers Chris - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Kernel bug with UNIX sockets not detecting other end gone?
Hi, I wonder if the following is a bug? It certainly differs from FreeBSD 4.2 behaviour, which gives the behaviour I would expect. The following program blocks indefinitely on Linux (2.2, 2.4 not tested). Since the other end is clearly gone, I would expect some sort of error condition. Indeed, FreeBSD gives ECONNRESET. #include sys/types.h #include sys/socket.h #include stdio.h #include unistd.h int main(int argc, const char* argv[]) { int the_sockets[2]; int retval; char the_char; int opt = 1; retval = socketpair(PF_UNIX, SOCK_DGRAM, 0, the_sockets); if (retval != 0) { perror(socketpair); exit(1); } close(the_sockets[0]); /* Linux (2.2) blocks here; FreeBSD does not */ retval = read(the_sockets[1], the_char, sizeof(the_char)); } Cheers Chris - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel bug with UNIX sockets not detecting other end gone?
On Thu, 17 May 2001, Alan Cox wrote: The following program blocks indefinitely on Linux (2.2, 2.4 not tested). Since the other end is clearly gone, I would expect some sort of error condition. Indeed, FreeBSD gives ECONNRESET. Since its a datagram socket Im not convinced thats a justifiable assumption. Hmm - there's definitely a Linux inconsistency here. With SOCK_DGRAM, read() is blocking but write() is giving ECONNRESET. The ECONNRESET makes sense to me (despite this being a datagram socket), because the sockets are anonymous. Once one end goes away, the other end is pretty useless. Cheers Chris - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.2, 2.4 bug in sock_no_fcntl()/F_SETOWN? (fwd)
Resend (no response first time) -- Forwarded message -- Date: Wed, 24 Jan 2001 21:09:09 + (GMT) From: Chris Evans [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: 2.2, 2.4 bug in sock_no_fcntl()/F_SETOWN? Hi, Looking at the code for sock_no_fcntl() in net/core.c, I cannot specify 0 as a value for F_SETOWN, unless I'm the superuser. I believe this to be a bug, it stops de-registering an interest in SIGURG signals. Let me know if you want a patch. Cheers Chris - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4 Scalability, Samba, and Netbench
On Wed, 9 May 2001, Alan Cox wrote: > > significant problems with lockmeter. csum_partial_copy_generic was the > > highest % in profile, at 4.34%. I'll see if we can get some space on > > Are you using Antons optimisations to samba to use sendfile ? And you might like to try 2.4.4 (I saw 2.4.0 and 2.4.3 mentioned). 2.4.4 has the zerocopy TCP stuff (or was it 2.4.3 :) Also, if the load is not disk limited, you might like to try Mingo's pagecache/timers scalability patches. etc. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4 Scalability, Samba, and Netbench
On Wed, 9 May 2001, Alan Cox wrote: significant problems with lockmeter. csum_partial_copy_generic was the highest % in profile, at 4.34%. I'll see if we can get some space on Are you using Antons optimisations to samba to use sendfile ? And you might like to try 2.4.4 (I saw 2.4.0 and 2.4.3 mentioned). 2.4.4 has the zerocopy TCP stuff (or was it 2.4.3 :) Also, if the load is not disk limited, you might like to try Mingo's pagecache/timers scalability patches. etc. Cheers Chris - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CHECKER] copy_*_user length bugs?
On Wed, 18 Apr 2001, Russell King wrote: > > Now, providing the malicious user passes a low user space pointer (e.g. > > just above 0), the kernel's virtual address space wrap check will not > > trigger because ~0 + ~2Gb does not exceed 4G. And the result is the user > > being able to read kernel memory. > > But ~0 + ~2GB = ~2GB. Last time I checked, ~2GB is less than 3GB, and 3GB > is the start of kernel memory on x86. Therefore, I don't see that the > user will be able to read kernel memory. The problem is that (up to) a 2Gb copy is attempted into userspace. The source is a kernel object which is not 2Gb large! So, we read off the end of some kernel object, and there is often something very interesting after it ;-) For a good real-world example, please see my Bugtraq post regarding sysctl(): http://www.securityfocus.com/archive/1/161764 Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CHECKER] copy_*_user length bugs?
On Wed, 18 Apr 2001, David Schleef wrote: > On Tue, Apr 17, 2001 at 09:39:15PM -0700, Dawson Engler wrote: > > Hi All, > > > > at the suggestion of Chris ([EMAIL PROTECTED]) I wrote a simple > > checker to warn when the length parameter to copy_*_user was (1) an > > integer and (2) not checked < 0. > > > > As an example, the ipv6 routine rawv6_geticmpfilter gets an integer 'len' > > from user space, checks that it is smaller than a struct size and then > > uses length as an argument to copy_to_user: > > > > if (get_user(len, optlen)) > > return -EFAULT; > > if (len > sizeof(struct icmp6_filter)) > > len = sizeof(struct icmp6_filter); > > if (put_user(len, optlen)) > > return -EFAULT; > > if (copy_to_user(optval, >tp_pinfo.tp_raw.filter, len)) > > return -EFAULT; > > > > Is this a real bug? Or is the checked rule only applicable to > > __copy_*_user routines rather than copy_*_user routines? (If its a real > > bug, theres about 8 others that we found). > > The len parameter is an unsigned value, so this code is ok as > long as access_ok() correctly checks that the range to copy > doesn't stray outside of the userspace range, including the > possible wraparound for a very large len. access_ok() on i386 > checks for the wraparound. m68k doesn't use it. PowerPC > is correct, but only because TASK_SIZE is 0x8000. If it > is ever changed, there could be a problem. I didn't check > other architectures, because I don't understand the asm. Incorrect - if the "len" variable is a signed integer, this is a nasty bug. To justify this, consider if len were set to minus 2 billion. This will pass the sanity check, and pass the value straight on to copy_to_user. The copy_to_user parameter is unsigned, so this value because approximately +2Gb. Now, providing the malicious user passes a low user space pointer (e.g. just above 0), the kernel's virtual address space wrap check will not trigger because ~0 + ~2Gb does not exceed 4G. And the result is the user being able to read kernel memory. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CHECKER] copy_*_user length bugs?
On Wed, 18 Apr 2001, David Schleef wrote: On Tue, Apr 17, 2001 at 09:39:15PM -0700, Dawson Engler wrote: Hi All, at the suggestion of Chris ([EMAIL PROTECTED]) I wrote a simple checker to warn when the length parameter to copy_*_user was (1) an integer and (2) not checked 0. As an example, the ipv6 routine rawv6_geticmpfilter gets an integer 'len' from user space, checks that it is smaller than a struct size and then uses length as an argument to copy_to_user: if (get_user(len, optlen)) return -EFAULT; if (len sizeof(struct icmp6_filter)) len = sizeof(struct icmp6_filter); if (put_user(len, optlen)) return -EFAULT; if (copy_to_user(optval, sk-tp_pinfo.tp_raw.filter, len)) return -EFAULT; Is this a real bug? Or is the checked rule only applicable to __copy_*_user routines rather than copy_*_user routines? (If its a real bug, theres about 8 others that we found). The len parameter is an unsigned value, so this code is ok as long as access_ok() correctly checks that the range to copy doesn't stray outside of the userspace range, including the possible wraparound for a very large len. access_ok() on i386 checks for the wraparound. m68k doesn't use it. PowerPC is correct, but only because TASK_SIZE is 0x8000. If it is ever changed, there could be a problem. I didn't check other architectures, because I don't understand the asm. Incorrect - if the "len" variable is a signed integer, this is a nasty bug. To justify this, consider if len were set to minus 2 billion. This will pass the sanity check, and pass the value straight on to copy_to_user. The copy_to_user parameter is unsigned, so this value because approximately +2Gb. Now, providing the malicious user passes a low user space pointer (e.g. just above 0), the kernel's virtual address space wrap check will not trigger because ~0 + ~2Gb does not exceed 4G. And the result is the user being able to read kernel memory. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CHECKER] copy_*_user length bugs?
On Wed, 18 Apr 2001, Russell King wrote: Now, providing the malicious user passes a low user space pointer (e.g. just above 0), the kernel's virtual address space wrap check will not trigger because ~0 + ~2Gb does not exceed 4G. And the result is the user being able to read kernel memory. But ~0 + ~2GB = ~2GB. Last time I checked, ~2GB is less than 3GB, and 3GB is the start of kernel memory on x86. Therefore, I don't see that the user will be able to read kernel memory. The problem is that (up to) a 2Gb copy is attempted into userspace. The source is a kernel object which is not 2Gb large! So, we read off the end of some kernel object, and there is often something very interesting after it ;-) For a good real-world example, please see my Bugtraq post regarding sysctl(): http://www.securityfocus.com/archive/1/161764 Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CHECKER] security rules?
Hi Dawson, Excellent project. Can I suggest that you check for signedness issues? A typical signature of a signedness problem is: int i = get_from_userspace_somehow(); /* Sanity check i */ if (i > MAX_LEN_FOR_I) goto bad_bad_out; /* Bug here!! i can be negative! */ I suspect you find a lot of these sort of errors. I've already nailed a few. Cheers Chris On Fri, 13 Apr 2001, Dawson Engler wrote: > > We're looking at making a set of security checkers. Does anyone have > suggestions for good things to go after in addition to the usual > copy_*_user and buffer overrun bugs? For example, are there any > documents that describe the rules for when/how 'capable' is supposed to > be used? > > Thanks for any help, > Dawson > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CHECKER] security rules?
Hi Dawson, Excellent project. Can I suggest that you check for signedness issues? A typical signature of a signedness problem is: int i = get_from_userspace_somehow(); /* Sanity check i */ if (i MAX_LEN_FOR_I) goto bad_bad_out; /* Bug here!! i can be negative! */ I suspect you find a lot of these sort of errors. I've already nailed a few. Cheers Chris On Fri, 13 Apr 2001, Dawson Engler wrote: We're looking at making a set of security checkers. Does anyone have suggestions for good things to go after in addition to the usual copy_*_user and buffer overrun bugs? For example, are there any documents that describe the rules for when/how 'capable' is supposed to be used? Thanks for any help, Dawson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Non-root sshd and capabilities
[cc: security-audit, because it's interesting :-)] On Sun, 18 Mar 2001, Topi Miettinen wrote: > (Please cc: me, I'm not subscribed.) > > Using the magical prctl() call it's possible to run daemons as non-root > while still possessing some capabilities. For full support, patched kernel > with ext2 capabilities is required, but if the daemon doesn't exec() > anything (for example, by emulating exec() with mmap()), stock 2.4 is > enough. Kernel 2.2.18 (I think) also added this prctl(). > This works well for programs like pppd, hwclock and XFree86. There is a > problem if the daemon uses setuid() and setgid() to change identity, like > sshd or cron. In function cap_emulate_setxuid() (in kernel/sys.c) the > capabilities are cleared when IDs are switched. However, the check misses > the case where old_*uid are already nonzero. This patch attempts to fix > the problem. [...] > Any suggestions? No comments on the patch/bug you've highlighted, but I've got some comments on the general approach. Firstly, changing sshd so it runs with minimal privilege, is an excellent project. You only need to look at the recent deattack.c vulnerability to see why. I was going to tackle this once I finished vsftpd (also makes use of capabilities and the prctl()). However, I don't think running any daemon with CAP_SETUID can be considered running with "minimal privilege". With CAP_SETUID, you can change your uid to the owner of any number of critical system files, and gain full access, as if you hadn't bothered using capabilities at all. Even inside a chroot() jail, you have to be careful with CAP_SETUID. Think "ptrace(), sysctl()". Of course, _something_ needs to have CAP_SETUID, otherwise you cannot switch to the authenticated userid at all. The solution is to have a minimal privileged helper process, which takes authentication details from the main sshd process over a pipe or socket. The helper process carefully validates the authentication details, and if they are correct, switches to the authenticated user, drops privileges, and runs some action on behalf of sshd. The above is a bit of hassle, but extremely powerful and secure. If you also throw in a bit of chroot(), you can make future sshd holes very low severity indeed. For bonus points, make sure that sensitive information such as the private host key, is only accesible to the privileged helper. Trickier (maybe not feasible), but useful. Finally, a comprised sshd session should not be able to compromise other sshd sessions. This can be accomplished by ensuring the sshd session processes all have "dumpable == 0" in the kernel, e.g. by starting sshd as root and doing setuid() to some other userid without any exec() Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Non-root sshd and capabilities
[cc: security-audit, because it's interesting :-)] On Sun, 18 Mar 2001, Topi Miettinen wrote: (Please cc: me, I'm not subscribed.) Using the magical prctl() call it's possible to run daemons as non-root while still possessing some capabilities. For full support, patched kernel with ext2 capabilities is required, but if the daemon doesn't exec() anything (for example, by emulating exec() with mmap()), stock 2.4 is enough. Kernel 2.2.18 (I think) also added this prctl(). This works well for programs like pppd, hwclock and XFree86. There is a problem if the daemon uses setuid() and setgid() to change identity, like sshd or cron. In function cap_emulate_setxuid() (in kernel/sys.c) the capabilities are cleared when IDs are switched. However, the check misses the case where old_*uid are already nonzero. This patch attempts to fix the problem. [...] Any suggestions? No comments on the patch/bug you've highlighted, but I've got some comments on the general approach. Firstly, changing sshd so it runs with minimal privilege, is an excellent project. You only need to look at the recent deattack.c vulnerability to see why. I was going to tackle this once I finished vsftpd (also makes use of capabilities and the prctl()). However, I don't think running any daemon with CAP_SETUID can be considered running with "minimal privilege". With CAP_SETUID, you can change your uid to the owner of any number of critical system files, and gain full access, as if you hadn't bothered using capabilities at all. Even inside a chroot() jail, you have to be careful with CAP_SETUID. Think "ptrace(), sysctl()". Of course, _something_ needs to have CAP_SETUID, otherwise you cannot switch to the authenticated userid at all. The solution is to have a minimal privileged helper process, which takes authentication details from the main sshd process over a pipe or socket. The helper process carefully validates the authentication details, and if they are correct, switches to the authenticated user, drops privileges, and runs some action on behalf of sshd. The above is a bit of hassle, but extremely powerful and secure. If you also throw in a bit of chroot(), you can make future sshd holes very low severity indeed. For bonus points, make sure that sensitive information such as the private host key, is only accesible to the privileged helper. Trickier (maybe not feasible), but useful. Finally, a comprised sshd session should not be able to compromise other sshd sessions. This can be accomplished by ensuring the sshd session processes all have "dumpable == 0" in the kernel, e.g. by starting sshd as root and doing setuid() to some other userid without any exec() Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: system hang with "__alloc_page: 1-order allocation failed"
On Tue, 13 Mar 2001, Manfred Spraul wrote: > * bugfixes for get_pid(). This is the longest part of the patch, but > it's only necessary if you have more than 10.000 threads running. If you > have enough memory: launch a forkbomb. If ~ 32760 thread are running the > kernel enters an endless loop in get_pid() (or around 11000 threads if > they intentionally create additional sessions and process groups) I thought (on Intel) there was a 4092 hard limit? Chers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: system hang with __alloc_page: 1-order allocation failed
On Tue, 13 Mar 2001, Manfred Spraul wrote: * bugfixes for get_pid(). This is the longest part of the patch, but it's only necessary if you have more than 10.000 threads running. If you have enough memory: launch a forkbomb. If ~ 32760 thread are running the kernel enters an endless loop in get_pid() (or around 11000 threads if they intentionally create additional sessions and process groups) I thought (on Intel) there was a 4092 hard limit? Chers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch][rfc][rft] vm throughput 2.4.2-ac4
On Thu, 1 Mar 2001, Rik van Riel wrote: > True. I think we want something in-between our ideas... ^^^ > a while. This should make it possible for the disk reads to ^^ Oh dear.. not more "vm design by waving hands in the air". Come on people, improve the vm by careful profiling, tweaking and benching, not by throwing random patches in that seem cool in theory. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch][rfc][rft] vm throughput 2.4.2-ac4
On Thu, 1 Mar 2001, Rik van Riel wrote: True. I think we want something in-between our ideas... ^^^ a while. This should make it possible for the disk reads to ^^ Oh dear.. not more "vm design by waving hands in the air". Come on people, improve the vm by careful profiling, tweaking and benching, not by throwing random patches in that seem cool in theory. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.1 under heavy network load - more info
On Wed, 21 Feb 2001, Rik van Riel wrote: > I'm really interested in things which make Linux 2.4 break > performance-wise since I'd like to have them fixed before the > distributions start shipping 2.4 as default. Hi Rik, With kernel 2.4.1, I found that caching is way too aggressive. I was running konqueror in 32Mb (the quest for a lightwieght browser!) Unfortunately, the system seemed to insist on keeping 16Mb used for caches, with 15Mb given to the application and X. This led to a lot of swapping and paging by konqueror. I think the browser would be fully usable in 32Mb, were the caching not out of balance. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.1 under heavy network load - more info
On Wed, 21 Feb 2001, Rik van Riel wrote: I'm really interested in things which make Linux 2.4 break performance-wise since I'd like to have them fixed before the distributions start shipping 2.4 as default. Hi Rik, With kernel 2.4.1, I found that caching is way too aggressive. I was running konqueror in 32Mb (the quest for a lightwieght browser!) Unfortunately, the system seemed to insist on keeping 16Mb used for caches, with 15Mb given to the application and X. This led to a lot of swapping and paging by konqueror. I think the browser would be fully usable in 32Mb, were the caching not out of balance. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
On Mon, 19 Feb 2001 [EMAIL PROTECTED] wrote: > Wakeup does not happen until _enough_ (1/3 of snbuf) of space in sndbuf > is released, otherwise you will overschedule. So, as soon as > write() goes to sleep, it will sleep waiting until 1/3 is released. Of course. Thank you. > If it is interrupted, it use all the released space immediately before > exit. Again, to make more for in this context. This can be even wrong > and, probably, we should return instantly with -EAGAIN/-EINTR/partial > count, but it is most likely suboptimal (though I have already changed > this to instant return). But this does not look essential from > caller's viewpoint, except for sendfile() of course. 8) Cool. I think the proper fix, long term, is to fix our internal I/O routine APIs so that they are capable of returning a byte count _and_ an error. One day, that might be a useful thing to export to userspace. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
On Mon, 19 Feb 2001 [EMAIL PROTECTED] wrote: Wakeup does not happen until _enough_ (1/3 of snbuf) of space in sndbuf is released, otherwise you will overschedule. So, as soon as write() goes to sleep, it will sleep waiting until 1/3 is released. Of course. Thank you. If it is interrupted, it use all the released space immediately before exit. Again, to make more for in this context. This can be even wrong and, probably, we should return instantly with -EAGAIN/-EINTR/partial count, but it is most likely suboptimal (though I have already changed this to instant return). But this does not look essential from caller's viewpoint, except for sendfile() of course. 8) Cool. I think the proper fix, long term, is to fix our internal I/O routine APIs so that they are capable of returning a byte count _and_ an error. One day, that might be a useful thing to export to userspace. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
sendfile() breakage was Re: SO_SNDTIMEO: 2.4 kernel bugs
On Mon, 19 Feb 2001, Chris Evans wrote: > > BTW, if you have enough fast network, you probably can observe > > that sendfile() is even not interrupted by signals. 8) But this > > is possible to fix at least. BTW the same fix will repair SO_*TIMEO > > partially, i.e. it will timeout after n*timeo, where n is an arbitrary > > number not exceeding size/sndbuf. > > Hi Alexey, > > You are right - our sendfile() implementation is broken. I have fixed it > (patch at end of mail). Actually the whole mess stems from our broken internal ->write() and ->read() APIs. The _single_ return value is trying to convery _two_ pieces of information - always a bad move. They are: 1) Success/failure (and error code if it's a failure) 2) Amount of bytes read or written This bogon does not allow for the following information to be returned (assume I asked for 8192 bytes to be written): "4096 bytes were written, and the operation was aborted due to EINTR" Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
On Sun, 18 Feb 2001 [EMAIL PROTECTED] wrote: > Hello! > > > Unfortunately, I discovered a bug with SO_SNDTIMEO/sendfile(): > > None of the options apply to sendfile(). It is not socket level > operation. You have to use alarm for it. > > BTW, if you have enough fast network, you probably can observe > that sendfile() is even not interrupted by signals. 8) But this > is possible to fix at least. BTW the same fix will repair SO_*TIMEO > partially, i.e. it will timeout after n*timeo, where n is an arbitrary > number not exceeding size/sndbuf. Hi Alexey, You are right - our sendfile() implementation is broken. I have fixed it (patch at end of mail). However, I believe something is still wrong in the networking layer, even with my fix applied. Before I go into details, I want to step back and describe things from a _users_ perspective. That is most important after all. Take two different operations: write() to a socket and sendfile() down a socket. In both cases, the socket has a send timeout of 10 seconds. From a users' point of view, these are two socket write operations. The source of data is different (a buffer or a file descriptor), but that is irrelevant. The user has the right to expect a timeout after 10 seconds of no progress, on both operations. I have tried this on FreeBSD, and this is what happens: both sendfile() and write() timeout in the same way. On Linux, this is not the case => bug. I fixed a small sendfile() issue, which did not recognise partial writes as an interruption, but as I said above, the bug still remains. Investigation shows that the Linux network layer is behaving oddly. It seems that we are writing 4096 bytes to a socket. This proceeds in 4096 byte chunks until the send buffer on the socket is full, and a 4096 byte write blocks. This blocking write is eventually interrupted by the timeout, and the write call returns.. wait for it.. 4096! This suggests there was socket space after all, and the call should not have blocked. I wonder what is going on? I'd like to get this fixed. I think the FreeBSD behaviour is definitely correct and we want it on Linux. Cheers Chris --- filemap.c.old Sun Feb 18 23:35:06 2001 +++ filemap.c Mon Feb 19 00:13:38 2001 @@ -1062,7 +1062,7 @@ for (;;) { struct page *page, **hash; - unsigned long end_index, nr; + unsigned long end_index, nr, actor_ret; end_index = inode->i_size >> PAGE_CACHE_SHIFT; if (index > end_index) @@ -1110,13 +1110,13 @@ * "pos" here (the actor routine has to update the user buffer * pointers and the remaining count). */ - nr = actor(desc, page, offset, nr); - offset += nr; + actor_ret = actor(desc, page, offset, nr); + offset += actor_ret; index += offset >> PAGE_CACHE_SHIFT; offset &= ~PAGE_CACHE_MASK; page_cache_release(page); - if (nr && desc->count) + if (actor_ret == nr && desc->count) continue; break; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
On Sun, 18 Feb 2001 [EMAIL PROTECTED] wrote: > Hello! > > > So the actual timeout would be 2 * SO_SNDTIMEO. > > It will timeout if write of some page blocks for SO_SNDTIMEO. .. unless that page was partially written, in which case a short write count is returned (rather than a timeout error), and the loop goes around again. > If transmission of any page never takes more than SO_SNDTIMEO it never > times out. Which is good, because SO_SNDTIMEO is an inactivity monitor. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
On Sun, 18 Feb 2001 [EMAIL PROTECTED] wrote: > Hello! > > > Unfortunately, I discovered a bug with SO_SNDTIMEO/sendfile(): > > None of the options apply to sendfile(). It is not socket level > operation. You have to use alarm for it. Hi Alexey, Actually sendfile() _does_ timeout using SO_SNDTIMEO. It just takes longer to timeout because the kernel sendfile() page loop will (usually) need to timeout a short write, and then timeout a 0 byte write. So the actual timeout would be 2 * SO_SNDTIMEO. Unfortunately, I'm seeing timeout at (I think) 3 * SO_SNDTIMEO, which I can't account for. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
On Sun, 18 Feb 2001 [EMAIL PROTECTED] wrote: Hello! Unfortunately, I discovered a bug with SO_SNDTIMEO/sendfile(): None of the options apply to sendfile(). It is not socket level operation. You have to use alarm for it. Hi Alexey, Actually sendfile() _does_ timeout using SO_SNDTIMEO. It just takes longer to timeout because the kernel sendfile() page loop will (usually) need to timeout a short write, and then timeout a 0 byte write. So the actual timeout would be 2 * SO_SNDTIMEO. Unfortunately, I'm seeing timeout at (I think) 3 * SO_SNDTIMEO, which I can't account for. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
On Sun, 18 Feb 2001 [EMAIL PROTECTED] wrote: Hello! So the actual timeout would be 2 * SO_SNDTIMEO. It will timeout if write of some page blocks for SO_SNDTIMEO. .. unless that page was partially written, in which case a short write count is returned (rather than a timeout error), and the loop goes around again. If transmission of any page never takes more than SO_SNDTIMEO it never times out. Which is good, because SO_SNDTIMEO is an inactivity monitor. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
On Sun, 18 Feb 2001 [EMAIL PROTECTED] wrote: Hello! Unfortunately, I discovered a bug with SO_SNDTIMEO/sendfile(): None of the options apply to sendfile(). It is not socket level operation. You have to use alarm for it. BTW, if you have enough fast network, you probably can observe that sendfile() is even not interrupted by signals. 8) But this is possible to fix at least. BTW the same fix will repair SO_*TIMEO partially, i.e. it will timeout after n*timeo, where n is an arbitrary number not exceeding size/sndbuf. Hi Alexey, You are right - our sendfile() implementation is broken. I have fixed it (patch at end of mail). However, I believe something is still wrong in the networking layer, even with my fix applied. Before I go into details, I want to step back and describe things from a _users_ perspective. That is most important after all. Take two different operations: write() to a socket and sendfile() down a socket. In both cases, the socket has a send timeout of 10 seconds. From a users' point of view, these are two socket write operations. The source of data is different (a buffer or a file descriptor), but that is irrelevant. The user has the right to expect a timeout after 10 seconds of no progress, on both operations. I have tried this on FreeBSD, and this is what happens: both sendfile() and write() timeout in the same way. On Linux, this is not the case = bug. I fixed a small sendfile() issue, which did not recognise partial writes as an interruption, but as I said above, the bug still remains. Investigation shows that the Linux network layer is behaving oddly. It seems that we are writing 4096 bytes to a socket. This proceeds in 4096 byte chunks until the send buffer on the socket is full, and a 4096 byte write blocks. This blocking write is eventually interrupted by the timeout, and the write call returns.. wait for it.. 4096! This suggests there was socket space after all, and the call should not have blocked. I wonder what is going on? I'd like to get this fixed. I think the FreeBSD behaviour is definitely correct and we want it on Linux. Cheers Chris --- filemap.c.old Sun Feb 18 23:35:06 2001 +++ filemap.c Mon Feb 19 00:13:38 2001 @@ -1062,7 +1062,7 @@ for (;;) { struct page *page, **hash; - unsigned long end_index, nr; + unsigned long end_index, nr, actor_ret; end_index = inode-i_size PAGE_CACHE_SHIFT; if (index end_index) @@ -1110,13 +1110,13 @@ * "pos" here (the actor routine has to update the user buffer * pointers and the remaining count). */ - nr = actor(desc, page, offset, nr); - offset += nr; + actor_ret = actor(desc, page, offset, nr); + offset += actor_ret; index += offset PAGE_CACHE_SHIFT; offset = ~PAGE_CACHE_MASK; page_cache_release(page); - if (nr desc-count) + if (actor_ret == nr desc-count) continue; break; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
sendfile() breakage was Re: SO_SNDTIMEO: 2.4 kernel bugs
On Mon, 19 Feb 2001, Chris Evans wrote: BTW, if you have enough fast network, you probably can observe that sendfile() is even not interrupted by signals. 8) But this is possible to fix at least. BTW the same fix will repair SO_*TIMEO partially, i.e. it will timeout after n*timeo, where n is an arbitrary number not exceeding size/sndbuf. Hi Alexey, You are right - our sendfile() implementation is broken. I have fixed it (patch at end of mail). Actually the whole mess stems from our broken internal -write() and -read() APIs. The _single_ return value is trying to convery _two_ pieces of information - always a bad move. They are: 1) Success/failure (and error code if it's a failure) 2) Amount of bytes read or written This bogon does not allow for the following information to be returned (assume I asked for 8192 bytes to be written): "4096 bytes were written, and the operation was aborted due to EINTR" Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
Hi, By the way - I tested SO_RCVLOWAT, another 2.4 addition. Good news this time - seems to work fine. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
Hi Alexey, This patch fixes my simple read()/write() tests, nice one. The behaviour also now matches BSD (someone kindly donated me a FreeBSD shell for testing). Unfortunately, I discovered a bug with SO_SNDTIMEO/sendfile(): - Connect an AF_INET, SOCK_STREAM socket to a local listening socket. - Set 5 seconds SO_SNDTIMEO on the connected socket - Do a sendfile() from a big file down the connected socket. Make sure the size is big (e.g. 1Mb) so the call blocks. --> BUG!! The call blocks indefinitely rather than being interrupted after 5 seconds. Cheers Chris On Sat, 17 Feb 2001 [EMAIL PROTECTED] wrote: > Hello! > > > Unfortunately, it seems to be very buggy. Here are two buggy scenarios. > > > --- ../vger3-010210/linux/net/ipv4/tcp.c Sat Feb 10 23:16:51 2001 > +++ linux/net/ipv4/tcp.c Sat Feb 17 23:27:43 2001 > @@ -691,6 +691,8 @@ > > set_current_state(TASK_INTERRUPTIBLE); > > + if (!timeo) > + break; > if (signal_pending(current)) > break; > if (tcp_memory_free(sk) && !vm_wait) > --- ../vger3-010210/linux/net/core/sock.c Tue Jan 30 21:20:16 2001 > +++ linux/net/core/sock.c Sat Feb 17 23:27:44 2001 > @@ -727,6 +727,8 @@ > clear_bit(SOCK_ASYNC_NOSPACE, >socket->flags); > add_wait_queue(sk->sleep, ); > for (;;) { > + if (!timeo) > + break; > if (signal_pending(current)) > break; > set_bit(SOCK_NOSPACE, >socket->flags); > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
Alexey, Damn you are quick! :) Testing immediately Cheers Chris On Sat, 17 Feb 2001 [EMAIL PROTECTED] wrote: > Hello! > > > Unfortunately, it seems to be very buggy. Here are two buggy scenarios. > > > --- ../vger3-010210/linux/net/ipv4/tcp.c Sat Feb 10 23:16:51 2001 > +++ linux/net/ipv4/tcp.c Sat Feb 17 23:27:43 2001 > @@ -691,6 +691,8 @@ > > set_current_state(TASK_INTERRUPTIBLE); > > + if (!timeo) > + break; > if (signal_pending(current)) > break; > if (tcp_memory_free(sk) && !vm_wait) > --- ../vger3-010210/linux/net/core/sock.c Tue Jan 30 21:20:16 2001 > +++ linux/net/core/sock.c Sat Feb 17 23:27:44 2001 > @@ -727,6 +727,8 @@ > clear_bit(SOCK_ASYNC_NOSPACE, >socket->flags); > add_wait_queue(sk->sleep, ); > for (;;) { > + if (!timeo) > + break; > if (signal_pending(current)) > break; > set_bit(SOCK_NOSPACE, >socket->flags); > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
Alexey, Damn you are quick! :) Testing immediately Cheers Chris On Sat, 17 Feb 2001 [EMAIL PROTECTED] wrote: Hello! Unfortunately, it seems to be very buggy. Here are two buggy scenarios. --- ../vger3-010210/linux/net/ipv4/tcp.c Sat Feb 10 23:16:51 2001 +++ linux/net/ipv4/tcp.c Sat Feb 17 23:27:43 2001 @@ -691,6 +691,8 @@ set_current_state(TASK_INTERRUPTIBLE); + if (!timeo) + break; if (signal_pending(current)) break; if (tcp_memory_free(sk) !vm_wait) --- ../vger3-010210/linux/net/core/sock.c Tue Jan 30 21:20:16 2001 +++ linux/net/core/sock.c Sat Feb 17 23:27:44 2001 @@ -727,6 +727,8 @@ clear_bit(SOCK_ASYNC_NOSPACE, sk-socket-flags); add_wait_queue(sk-sleep, wait); for (;;) { + if (!timeo) + break; if (signal_pending(current)) break; set_bit(SOCK_NOSPACE, sk-socket-flags); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
Hi Alexey, This patch fixes my simple read()/write() tests, nice one. The behaviour also now matches BSD (someone kindly donated me a FreeBSD shell for testing). Unfortunately, I discovered a bug with SO_SNDTIMEO/sendfile(): - Connect an AF_INET, SOCK_STREAM socket to a local listening socket. - Set 5 seconds SO_SNDTIMEO on the connected socket - Do a sendfile() from a big file down the connected socket. Make sure the size is big (e.g. 1Mb) so the call blocks. -- BUG!! The call blocks indefinitely rather than being interrupted after 5 seconds. Cheers Chris On Sat, 17 Feb 2001 [EMAIL PROTECTED] wrote: Hello! Unfortunately, it seems to be very buggy. Here are two buggy scenarios. --- ../vger3-010210/linux/net/ipv4/tcp.c Sat Feb 10 23:16:51 2001 +++ linux/net/ipv4/tcp.c Sat Feb 17 23:27:43 2001 @@ -691,6 +691,8 @@ set_current_state(TASK_INTERRUPTIBLE); + if (!timeo) + break; if (signal_pending(current)) break; if (tcp_memory_free(sk) !vm_wait) --- ../vger3-010210/linux/net/core/sock.c Tue Jan 30 21:20:16 2001 +++ linux/net/core/sock.c Sat Feb 17 23:27:44 2001 @@ -727,6 +727,8 @@ clear_bit(SOCK_ASYNC_NOSPACE, sk-socket-flags); add_wait_queue(sk-sleep, wait); for (;;) { + if (!timeo) + break; if (signal_pending(current)) break; set_bit(SOCK_NOSPACE, sk-socket-flags); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
Hi, By the way - I tested SO_RCVLOWAT, another 2.4 addition. Good news this time - seems to work fine. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
SO_SNDTIMEO: 2.4 kernel bugs
Hi, I was glad to see Linux gain SO_SNDTIMEO in kernel 2.4. It is a very use feature which can avoid complexity and pain in userspace programs. Unfortunately, it seems to be very buggy. Here are two buggy scenarios. 1) Create a socketpair(), PF_UNIX, SOCK_STREAM. Set a 5 second SO_SNDTIMEO on the socket. write() 100k down the socket in one write(), i.e. enough to cause the write to have to block. --> BUG!!! The call blocks indefinitely instead of returning after 5 seconds (Note that the same test but with SO_RCVTIMEO and a read() works as expected - I get EAGAIN after 5 seconds). 2) Create a localhost listening socket - AF_INET, SOCK_STREAM. Connect to the listening port Set a 5 second SO_SNDTIMEO on the socket. write() 1Mb down the socket in one write(), i.e. enough to cause it to have to block -> The write() will return after 5 seconds with a partial write count. GOOD! Repeat the write() - send another 1Mb. --> BUG!! The call blocks indefinitely instead of returning with EAGAIN after 5s. I hope this is detailled enough. I'm trying to gain access to a FreeBSD box to compare results.. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
SO_SNDTIMEO: 2.4 kernel bugs
Hi, I was glad to see Linux gain SO_SNDTIMEO in kernel 2.4. It is a very use feature which can avoid complexity and pain in userspace programs. Unfortunately, it seems to be very buggy. Here are two buggy scenarios. 1) Create a socketpair(), PF_UNIX, SOCK_STREAM. Set a 5 second SO_SNDTIMEO on the socket. write() 100k down the socket in one write(), i.e. enough to cause the write to have to block. -- BUG!!! The call blocks indefinitely instead of returning after 5 seconds (Note that the same test but with SO_RCVTIMEO and a read() works as expected - I get EAGAIN after 5 seconds). 2) Create a localhost listening socket - AF_INET, SOCK_STREAM. Connect to the listening port Set a 5 second SO_SNDTIMEO on the socket. write() 1Mb down the socket in one write(), i.e. enough to cause it to have to block - The write() will return after 5 seconds with a partial write count. GOOD! Repeat the write() - send another 1Mb. -- BUG!! The call blocks indefinitely instead of returning with EAGAIN after 5s. I hope this is detailled enough. I'm trying to gain access to a FreeBSD box to compare results.. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
SO_RCVTIMEO, SO_SNDTIMEO
Hi, I notice the entities in the subject line have appeared in Linux 2.4. What is their functional specification? I guess they trigger if no bytes are received/send within a consecutive period. How does the app get the error? -EPIPE for a blocking read/write? If so, does SIGPIPE get raised? Or is -ETIMEDOUT used? ... TIA, Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://vger.kernel.org/lkml/
Re: BUG: SO_LINGER + shutdown() does not block?
On Sun, 11 Feb 2001, Andi Kleen wrote: > On Sun, Feb 11, 2001 at 08:41:04PM +0000, Chris Evans wrote: > > > > [cc: Andi] > > Missing context.. [...] > What do you exactly think is wrong? man socket(7) says that setting SO_LINGER on a socket will make shutdown() and close() block. That's incorrect; only close() blocks. Sorry for the missing context. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: SO_LINGER + shutdown() does not block?
[cc: Andi] On Sun, 11 Feb 2001 [EMAIL PROTECTED] wrote: > Hello! > > > I'm not seeing shutdown(2) block on a TCP socket. This is Linux kernel > > 2.2.16 (RH7.0). Is this a kernel bug, a documentation bug, > > Man page is wrong. Yes, man socket(7) seems to be wrong. I don't have access to a genuine BSD at the moment, but from man pages: - HP/UX specifically states that SO_LINGER has no effect on shutdown() - Solaris SO_LINGER only mentions that close() is affected. - Likewise FreeBSD Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
BUG: SO_LINGER + shutdown() does not block?
Hi, >From socket(7): SO_LINGER ... When enabled, a close(2) or shutdown(2) will not return until all queued messages for the socket have been successfully sent or the linger timeout has been reached. I'm not seeing shutdown(2) block on a TCP socket. This is Linux kernel 2.2.16 (RH7.0). Is this a kernel bug, a documentation bug, or does it all work fine and it's a Chris bug? Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
BUG: SO_LINGER + shutdown() does not block?
Hi, From socket(7): SO_LINGER ... When enabled, a close(2) or shutdown(2) will not return until all queued messages for the socket have been successfully sent or the linger timeout has been reached. I'm not seeing shutdown(2) block on a TCP socket. This is Linux kernel 2.2.16 (RH7.0). Is this a kernel bug, a documentation bug, or does it all work fine and it's a Chris bug? Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: SO_LINGER + shutdown() does not block?
On Sun, 11 Feb 2001, Andi Kleen wrote: On Sun, Feb 11, 2001 at 08:41:04PM +, Chris Evans wrote: [cc: Andi] Missing context.. [...] What do you exactly think is wrong? man socket(7) says that setting SO_LINGER on a socket will make shutdown() and close() block. That's incorrect; only close() blocks. Sorry for the missing context. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: sard on kernel 2.4
On Fri, 2 Feb 2001, Marcelo Tosatti wrote: > > Linus, > > There is a significative amount of people who use sard's additional block > layer statistics (I'm one of them). It would be nice to have it in the > official free. Definitely. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: sard on kernel 2.4
On Fri, 2 Feb 2001, Marcelo Tosatti wrote: Linus, There is a significative amount of people who use sard's additional block layer statistics (I'm one of them). It would be nice to have it in the official free. Definitely. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
[cc: davem because of the severity] On Thu, 1 Feb 2001, Malcolm Beattie wrote: > rid of the hang. So it looks as though some combination of > shutdown(2) and SIGABRT is at fault. After the hang the kernel-side Nope - I've nailed it to a _really_ simple test case. It looks like a read() on a shutdown() unix dgram socket just kills the kernel. Demo code below. I wonder if this affects UP or is SMP only? Malcolm, does the below code reproduce the problem for you? Cheers Chris #include #include #include #include int main(int argc, const char* argv[]) { int retval; int sockets[2]; char buf[1]; retval = socketpair(PF_UNIX, SOCK_DGRAM, 0, sockets); if (retval != 0) { perror("socketpair"); exit(1); } shutdown(sockets[0], SHUT_RDWR); read(sockets[0], buf, 1); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
On Thu, 1 Feb 2001, Malcolm Beattie wrote: > Mapping the addresses from whichever ScrollLock combination produced > the task list to symbols produces the call trace > do_exit <- do_signal <- tcp_destroy_sock <- inet_ioctl <- signal_return > > The inet_ioctl is odd there--vsftpd doesn't explicitly call ioctl > anywhere at all and the next function before it in memory is > inet_shutdown which looks more believable. I have checked I'm looking Probably, the empty SIGPIPE handler triggered. The response to this is a lot of shutdown() close() and finally an exit(). The trace you give above looks like the child process trace. I always see the parent process go nuts. The parent process is almost always blocking on read() of a unix dgram socket, which it shares with the child. The child does a shutdown() on this socket just before exit(). Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
On Thu, 1 Feb 2001, Malcolm Beattie wrote: > Chris Evans writes: > > I've just managed to reproduce this personally on 2.4.0. I've had a report > > that 2.4.1 is also affected. Both myself and the other person who > > reproduced this have SMP i686 machines, which may or may not be relevant. > > > > To reproduce, all you need to do is get my vsftpd ftp server: > > ftp://ferret.lmh.ox.ac.uk/pub/linux/vsftpd-0.0.9.tar.gz > > I got this just before lunch too. I was trying out 2.4.1 + zerocopy > (with netfilter configured off, see the sendfile/zerocopy thread for [...] I reproduced with 2.4.1. > Looking at the kernel's EIP every so often to see what was going > showed remove_wait_queue, add_wait_queue, skb_recv_datagram and > wait_for_packet mostly. Random thought: if vsftpd did a sendfile and > then exited, becoming a zombie, could there be a problem with > tearing down a sendfile mapping? I'm off to read some code. I get it simply doing CTRL-C at the ftp logon prompt. No sendfile has been used at this point. Trying to distill a test case... Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Serious reproducible 2.4.x kernel hang
Hi, I've just managed to reproduce this personally on 2.4.0. I've had a report that 2.4.1 is also affected. Both myself and the other person who reproduced this have SMP i686 machines, which may or may not be relevant. To reproduce, all you need to do is get my vsftpd ftp server: ftp://ferret.lmh.ox.ac.uk/pub/linux/vsftpd-0.0.9.tar.gz It runs from inetd. Connect using the Linux command line ftp client, to localhost, and simply press CTRL-C. If it matters, I'm using RH7.0 software. After the first iteration of this, I'm left with: [chris@localhost chris]$ ps auwx | grep ftp root 713 99.9 0.4 1416 592 ?SN 22:01 38:17 vsftpd /etc/vsftpd.conf nobody 715 0.0 0.0 00 ?ZN 22:01 0:00 [vsftpd ] As you can see, the root process is burning 100% of one of my CPUs. It _cannot_ be killed with kill -9! >From Alt-Sysrq-T: Jan 30 22:01:52 localhost kernel: vsftpdS 860 713670 715 (NOTLB) Jan 30 22:01:52 localhost kernel: Call Trace: [smp_apic_timer_interrupt+240/272] [smp_apic_timer_interrupt+240/272] [update_process_times+32/160] [smp_apic_timer_interrupt+240/272] [remove_wait_queue+6/48] [wait_for_packet+273/288] [skb_recv_datagram+205/240] Jan 30 22:01:52 localhost kernel:[unix_dgram_recvmsg+69/256] [sock_recvmsg+53/176] [sock_read+134/144] [sys_read+150/208] [system_call+51/56] Jan 30 22:01:52 localhost kernel: vsftpdZ C5E07040 1408 715713 (L-TLB) Jan 30 22:01:52 localhost kernel: Call Trace: [do_exit+628/672] [system_call+51/56] As we can see, the 100% CPU broken process has got stuck in a blocking read() on a unix socket. If I repeat the ftp connect/CTRL-C process again, I get a totally dead machine. Hope this is sufficient info. I'll try and write a minimal test case. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Serious reproducible 2.4.x kernel hang
Hi, I've just managed to reproduce this personally on 2.4.0. I've had a report that 2.4.1 is also affected. Both myself and the other person who reproduced this have SMP i686 machines, which may or may not be relevant. To reproduce, all you need to do is get my vsftpd ftp server: ftp://ferret.lmh.ox.ac.uk/pub/linux/vsftpd-0.0.9.tar.gz It runs from inetd. Connect using the Linux command line ftp client, to localhost, and simply press CTRL-C. If it matters, I'm using RH7.0 software. After the first iteration of this, I'm left with: [chris@localhost chris]$ ps auwx | grep ftp root 713 99.9 0.4 1416 592 ?SN 22:01 38:17 vsftpd /etc/vsftpd.conf nobody 715 0.0 0.0 00 ?ZN 22:01 0:00 [vsftpd defunct] As you can see, the root process is burning 100% of one of my CPUs. It _cannot_ be killed with kill -9! From Alt-Sysrq-T: Jan 30 22:01:52 localhost kernel: vsftpdS 860 713670 715 (NOTLB) Jan 30 22:01:52 localhost kernel: Call Trace: [smp_apic_timer_interrupt+240/272] [smp_apic_timer_interrupt+240/272] [update_process_times+32/160] [smp_apic_timer_interrupt+240/272] [remove_wait_queue+6/48] [wait_for_packet+273/288] [skb_recv_datagram+205/240] Jan 30 22:01:52 localhost kernel:[unix_dgram_recvmsg+69/256] [sock_recvmsg+53/176] [sock_read+134/144] [sys_read+150/208] [system_call+51/56] Jan 30 22:01:52 localhost kernel: vsftpdZ C5E07040 1408 715713 (L-TLB) Jan 30 22:01:52 localhost kernel: Call Trace: [do_exit+628/672] [system_call+51/56] As we can see, the 100% CPU broken process has got stuck in a blocking read() on a unix socket. If I repeat the ftp connect/CTRL-C process again, I get a totally dead machine. Hope this is sufficient info. I'll try and write a minimal test case. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
On Thu, 1 Feb 2001, Malcolm Beattie wrote: Chris Evans writes: I've just managed to reproduce this personally on 2.4.0. I've had a report that 2.4.1 is also affected. Both myself and the other person who reproduced this have SMP i686 machines, which may or may not be relevant. To reproduce, all you need to do is get my vsftpd ftp server: ftp://ferret.lmh.ox.ac.uk/pub/linux/vsftpd-0.0.9.tar.gz I got this just before lunch too. I was trying out 2.4.1 + zerocopy (with netfilter configured off, see the sendfile/zerocopy thread for [...] I reproduced with 2.4.1. Looking at the kernel's EIP every so often to see what was going showed remove_wait_queue, add_wait_queue, skb_recv_datagram and wait_for_packet mostly. Random thought: if vsftpd did a sendfile and then exited, becoming a zombie, could there be a problem with tearing down a sendfile mapping? I'm off to read some code. I get it simply doing CTRL-C at the ftp logon prompt. No sendfile has been used at this point. Trying to distill a test case... Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
On Thu, 1 Feb 2001, Malcolm Beattie wrote: Mapping the addresses from whichever ScrollLock combination produced the task list to symbols produces the call trace do_exit - do_signal - tcp_destroy_sock - inet_ioctl - signal_return The inet_ioctl is odd there--vsftpd doesn't explicitly call ioctl anywhere at all and the next function before it in memory is inet_shutdown which looks more believable. I have checked I'm looking Probably, the empty SIGPIPE handler triggered. The response to this is a lot of shutdown() close() and finally an exit(). The trace you give above looks like the child process trace. I always see the parent process go nuts. The parent process is almost always blocking on read() of a unix dgram socket, which it shares with the child. The child does a shutdown() on this socket just before exit(). Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
[cc: davem because of the severity] On Thu, 1 Feb 2001, Malcolm Beattie wrote: rid of the hang. So it looks as though some combination of shutdown(2) and SIGABRT is at fault. After the hang the kernel-side Nope - I've nailed it to a _really_ simple test case. It looks like a read() on a shutdown() unix dgram socket just kills the kernel. Demo code below. I wonder if this affects UP or is SMP only? Malcolm, does the below code reproduce the problem for you? Cheers Chris #include stdio.h #include unistd.h #include sys/types.h #include sys/socket.h int main(int argc, const char* argv[]) { int retval; int sockets[2]; char buf[1]; retval = socketpair(PF_UNIX, SOCK_DGRAM, 0, sockets); if (retval != 0) { perror("socketpair"); exit(1); } shutdown(sockets[0], SHUT_RDWR); read(sockets[0], buf, 1); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[SLUG] RE: Linux Disk Performance/File IO per process
On Mon, 29 Jan 2001 [EMAIL PROTECTED] wrote: > Thanks to both Jens and Chris - this provides the information I need to > obtain our busy rate > It's unfortunate that the kernel needs to be patched to provide this > information - hopefully it will become part of the kernel soon. > > I had a response saying that this shouldn't become part of the kernel due to > the performance cost that obtaining such data will involve. I agree that a > cost is involved here, however I think it's up to the user to decide which > cost is more expensive to them - getting the data, or not being able to see > how busy their disks are. My feeling here is that this support could be user > configurable at run time - eg 'cat 1 > /proc/getdiskperf'. Hi, I disagree with this runtime variable. It is unnecessary complexity. Maintaining a few counts is total noise compared with the time I/O takes. Cheers Chris -- SLUG - Sydney Linux User Group Mailing List - http://slug.org.au/ More Info: http://slug.org.au/lists/listinfo/slug
[SLUG] RE: Linux Disk Performance/File IO per process
On Mon, 29 Jan 2001 [EMAIL PROTECTED] wrote: Thanks to both Jens and Chris - this provides the information I need to obtain our busy rate It's unfortunate that the kernel needs to be patched to provide this information - hopefully it will become part of the kernel soon. I had a response saying that this shouldn't become part of the kernel due to the performance cost that obtaining such data will involve. I agree that a cost is involved here, however I think it's up to the user to decide which cost is more expensive to them - getting the data, or not being able to see how busy their disks are. My feeling here is that this support could be user configurable at run time - eg 'cat 1 /proc/getdiskperf'. Hi, I disagree with this runtime variable. It is unnecessary complexity. Maintaining a few counts is total noise compared with the time I/O takes. Cheers Chris -- SLUG - Sydney Linux User Group Mailing List - http://slug.org.au/ More Info: http://slug.org.au/lists/listinfo/slug
Re: Linux Disk Performance/File IO per process
On Mon, 29 Jan 2001 [EMAIL PROTECTED] wrote: > All, > > I work for a company that develops a systems and performance management > product for Unix (as well as PC and TANDEM) called PROGNOSIS. Currently we > support AIX, HP, Solaris, UnixWare, IRIX, and Linux. > > I've hit a bit of a wall trying to expand the data provided by our Linux > solution - I can't seem to find anywhere that provides the metrics needed to > calculate disk busy in the kernel! This is a major piece of information that > any mission critical system administrator needs to successfully monitor > their systems. Stephen Tweedie has a rather funky i/o stats enhancement patch which should provide what you need. It comes with RedHat7.0 and gives decent disk statistics in /proc/partitions. Unfortunately this patch is not yet in the 2.2 or 2.4 kernel. I'd like to see it make the kernel as a 2.4.x item. Failing that, it'll probably make the 2.5 kernel. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux Disk Performance/File IO per process
On Mon, 29 Jan 2001 [EMAIL PROTECTED] wrote: All, I work for a company that develops a systems and performance management product for Unix (as well as PC and TANDEM) called PROGNOSIS. Currently we support AIX, HP, Solaris, UnixWare, IRIX, and Linux. I've hit a bit of a wall trying to expand the data provided by our Linux solution - I can't seem to find anywhere that provides the metrics needed to calculate disk busy in the kernel! This is a major piece of information that any mission critical system administrator needs to successfully monitor their systems. Stephen Tweedie has a rather funky i/o stats enhancement patch which should provide what you need. It comes with RedHat7.0 and gives decent disk statistics in /proc/partitions. Unfortunately this patch is not yet in the 2.2 or 2.4 kernel. I'd like to see it make the kernel as a 2.4.x item. Failing that, it'll probably make the 2.5 kernel. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.2, 2.4 bug in sock_no_fcntl()/F_SETOWN?
Hi, Looking at the code for sock_no_fcntl() in net/core.c, I cannot specify "0" as a value for F_SETOWN, unless I'm the superuser. I believe this to be a bug, it stops de-registering an interest in SIGURG signals. Let me know if you want a patch. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.2, 2.4 bug in sock_no_fcntl()/F_SETOWN?
Hi, Looking at the code for sock_no_fcntl() in net/core.c, I cannot specify "0" as a value for F_SETOWN, unless I'm the superuser. I believe this to be a bug, it stops de-registering an interest in SIGURG signals. Let me know if you want a patch. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1
On Tue, 9 Jan 2001, Ingo Molnar wrote: > This is one of the busiest and most complex block-IO Linux systems i've > ever seen, this is why i quoted it - the talk was about block-IO > performance, and Stephen said that our block IO sucks. It used to suck, > but in 2.4, with the right patch from Jens, it doesnt suck anymore. ) Is this "right patch from Jens" on the radar for 2.4 inclusion? Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1
On Tue, 9 Jan 2001, Ingo Molnar wrote: This is one of the busiest and most complex block-IO Linux systems i've ever seen, this is why i quoted it - the talk was about block-IO performance, and Stephen said that our block IO sucks. It used to suck, but in 2.4, with the right patch from Jens, it doesnt suck anymore. ) Is this "right patch from Jens" on the radar for 2.4 inclusion? Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.2 vs. 2.4 benchmarks
Hi, I ran some 2.2 vs. 2.4 benchmarks, particularly in the area of file i/o, using bonnie++. The machine is a SMP 128Mb PII-350 with a udma2 drive capable of some 20Mb/sec+. Kernels involved are 2.4.0, and the default RH7.0 kernel (2.2.16 plus more patches than you can shake a stick at). Not going too much into the gory details, here are the differences exposed between 2,2 and 2.4: 1) Amazing 2.4 increase in streaming write performance; 13Mb/sec -> 20Mb/sec. I suspect this is the result of the "last minute" 2.4.0 dirty buffer/sync waiting handling changes. 2) Slight 2.4 increase in streaming read performance; 16Mb/sec -> 17Mb/sec. This leaves 2.4.0 writing faster than reading, I find that surprising. 3) Some 10% drop in rewrite performance from 2.2 -> 2.4 (possibly because page aging, like LRU, isn't too hot for the 2nd+ linear scan over data) 4) File creation 30% faster in 2.4; random deletes 30% faster; sequential deletes 10% slower. I did one other quick test, with disappointing results for 2.4.0. I did a kernel build with 32Mb. 2.4.0 was taking about 10 mins to do the build. 2.2.x was 1min30 quicker :( I was hoping/expecting the 2.4.0 page aging to do better, due to keeping the more useful pages in RAM better. I have no explanation. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.2 vs. 2.4 benchmarks
Hi, I ran some 2.2 vs. 2.4 benchmarks, particularly in the area of file i/o, using bonnie++. The machine is a SMP 128Mb PII-350 with a udma2 drive capable of some 20Mb/sec+. Kernels involved are 2.4.0, and the default RH7.0 kernel (2.2.16 plus more patches than you can shake a stick at). Not going too much into the gory details, here are the differences exposed between 2,2 and 2.4: 1) Amazing 2.4 increase in streaming write performance; 13Mb/sec - 20Mb/sec. I suspect this is the result of the "last minute" 2.4.0 dirty buffer/sync waiting handling changes. 2) Slight 2.4 increase in streaming read performance; 16Mb/sec - 17Mb/sec. This leaves 2.4.0 writing faster than reading, I find that surprising. 3) Some 10% drop in rewrite performance from 2.2 - 2.4 (possibly because page aging, like LRU, isn't too hot for the 2nd+ linear scan over data) 4) File creation 30% faster in 2.4; random deletes 30% faster; sequential deletes 10% slower. I did one other quick test, with disappointing results for 2.4.0. I did a kernel build with 32Mb. 2.4.0 was taking about 10 mins to do the build. 2.2.x was 1min30 quicker :( I was hoping/expecting the 2.4.0 page aging to do better, due to keeping the more useful pages in RAM better. I have no explanation. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: reiserfs patch for 2.4.0-final
On Fri, 5 Jan 2001, Chris Mason wrote: > > Could someone create one single patch for the 2.4.0 ? > > > I put all the code into CVS, and Yura is making the official patch now. Since 2.4.0 final should fix a few i/o performance issues (particuarly under heavy write loads), a quick few ext2 vs. reiserfs benchmarks would make very interesting reading ;-) Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] dcache 2nd chance replacement
On Thu, 4 Jan 2001, Alan Cox wrote: > > On Thu, Jan 04, 2001 at 02:59:49PM -0200, Rik van Riel wrote: > > > Unfortunately you seem to ignore my arguments, so lets > > I've not ignored them, as said they were either obviously wrong of offtopic. > > Would the two of you ajourn this debate to alt.flame Better still stop _theorizing_ and start _measuring_ Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] dcache 2nd chance replacement
On Thu, 4 Jan 2001, Alan Cox wrote: On Thu, Jan 04, 2001 at 02:59:49PM -0200, Rik van Riel wrote: Unfortunately you seem to ignore my arguments, so lets I've not ignored them, as said they were either obviously wrong of offtopic. Would the two of you ajourn this debate to alt.flame Better still stop _theorizing_ and start _measuring_ Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: reiserfs patch for 2.4.0-final
On Fri, 5 Jan 2001, Chris Mason wrote: Could someone create one single patch for the 2.4.0 ? I put all the code into CVS, and Yura is making the official patch now. Since 2.4.0 final should fix a few i/o performance issues (particuarly under heavy write loads), a quick few ext2 vs. reiserfs benchmarks would make very interesting reading ;-) Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre7...
On Sat, 30 Dec 2000, Linus Torvalds wrote: > On Sat, 30 Dec 2000, Steven Cole wrote: > > > > It looks like 2.4.0-test13-pre7 is a clear winner when running dbench 48 > > on my somewhat slow test machine (450 Mhz P-III, 192MB, IDE). > > This is almost certainly purely due to changing (some would say "fixing") > the bdflush synchronous wait point. Nice:) Did Rik's drop_behind performance fix make it in or can we look forward to another jump in the dbench benchmarks? Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test13-pre7...
On Sat, 30 Dec 2000, Linus Torvalds wrote: On Sat, 30 Dec 2000, Steven Cole wrote: It looks like 2.4.0-test13-pre7 is a clear winner when running dbench 48 on my somewhat slow test machine (450 Mhz P-III, 192MB, IDE). This is almost certainly purely due to changing (some would say "fixing") the bdflush synchronous wait point. Nice:) Did Rik's drop_behind performance fix make it in or can we look forward to another jump in the dbench benchmarks? Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Modprobe local root exploit
On Tue, 14 Nov 2000, Jakub Jelinek wrote: > > Rather than add sanity checking to modprobe, it would be a lot easier > > and safer from a security audit point of view to have the kernel call > > /sbin/kmodprobe instead of /sbin/modprobe. Then kmodprobe can sanitise > > all the data and exec the real modprobe. That way the only thing that > > needs auditing is a string munging/sanitising program. > > Well, no matter what kernel needs auditing as well, the fact that dev_load > will without any check load any module the user wants is already problematic > and no munging helps with it at all, especially loading old ISA drivers > might not be a good idea. FWIW: A quick look at the kernel source, and dev_load() seems to be the only place that does this. Other places apply prefixes to user supplied names. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Modprobe local root exploit
On Tue, 14 Nov 2000, Jakub Jelinek wrote: Rather than add sanity checking to modprobe, it would be a lot easier and safer from a security audit point of view to have the kernel call /sbin/kmodprobe instead of /sbin/modprobe. Then kmodprobe can sanitise all the data and exec the real modprobe. That way the only thing that needs auditing is a string munging/sanitising program. Well, no matter what kernel needs auditing as well, the fact that dev_load will without any check load any module the user wants is already problematic and no munging helps with it at all, especially loading old ISA drivers might not be a good idea. FWIW: A quick look at the kernel source, and dev_load() seems to be the only place that does this. Other places apply prefixes to user supplied names. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Modprobe local root exploit
On Mon, 13 Nov 2000, Torsten Duwe wrote: > > "Francis" == Francis Galiegue <[EMAIL PROTECTED]> writes: > > >> + if ((*p & 0xdf) >= 'a' && (*p & 0xdf) <= 'z') continue; > > Francis> Just in case... Some modules have uppercase letters too :) > > That's what the &0xdf is intended for... Code in a security sensitive area needs to be crystal clear. What's wrong with isalnum() ? Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Modprobe local root exploit
On Mon, 13 Nov 2000, Torsten Duwe wrote: "Francis" == Francis Galiegue [EMAIL PROTECTED] writes: + if ((*p 0xdf) = 'a' (*p 0xdf) = 'z') continue; Francis Just in case... Some modules have uppercase letters too :) That's what the 0xdf is intended for... Code in a security sensitive area needs to be crystal clear. What's wrong with isalnum() ? Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.2.18Pre Lan Performance Rocks!
On Mon, 30 Oct 2000, Andrea Arcangeli wrote: > functionality that needs high performance completly in kernel? People > may need to write high performance network code for custom protocols, > this way they will end creating kernel modules with system-crashing > bugs, memory leaks and kernel buffer overflows (chroot+nobody+logging > won't work anymore). (plus they will get into pain while debugging) I'm glad _someone_ is connected to reality with regards the security implications of throwing loads of servers into kernel space. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.2.18Pre Lan Performance Rocks!
On Mon, 30 Oct 2000, Andrea Arcangeli wrote: functionality that needs high performance completly in kernel? People may need to write high performance network code for custom protocols, this way they will end creating kernel modules with system-crashing bugs, memory leaks and kernel buffer overflows (chroot+nobody+logging won't work anymore). (plus they will get into pain while debugging) I'm glad _someone_ is connected to reality with regards the security implications of throwing loads of servers into kernel space. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: LMbench 2.4.0-test10pre-SMP vs. 2.2.18pre-SMP
On Mon, 23 Oct 2000, Jeff Garzik wrote: > First test was with 2.4.0-test10-pre3. > Next four tests were with 2.4.0-test10-pre4. > Final four tests were with 2.2.18-pre17. > > All are 'virgin' kernels, without any patches. [...] I'll take the liberty of highlighting some big changes, v2.2 vs v2.4 *Local* Communication latencies in microseconds - smaller is better --- Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP ctxsw UNIX UDP TCP conn - - - - - - - - rum.normn Linux 2.4.0-t 620 4563 10681 157 146 rum.normn Linux 2.2.18p 212 1856 123 106 159 237 - So we broke pipe/AF UNIX latencies File & VM system latencies in microseconds - smaller is better -- Host OS 0K File 10K File MmapProtPage Create Delete Create Delete Latency Fault Fault - - -- -- -- -- --- - - rum.normn Linux 2.4.0-t 15 1 28 3 1016 10.0K rum.normn Linux 2.2.18p 16 1 29 2 7658 20.6K - But gave steroids to mmap latencies *Local* Communication bandwidths in MB/s - bigger is better --- HostOS Pipe AFTCP File Mmap Bcopy Bcopy Mem Mem UNIX reread reread (libc) (hand) read write - - -- -- -- -- - rum.normn Linux 2.4.0-t 152 105 98151326138144 326 171 rum.normn Linux 2.2.18p 264 106 55152326137142 326 180 - Mixed fortunes here. A serious boost to TCP bandwidth but pipe bandwidth dies a bit Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch(?)] question wrt context switching during disk i/o
On Sat, 21 Oct 2000, Bill Wendling wrote: > } > bdflush is broken in current kernels. I posted to linux-mm about this, > } > but Rik et al haven't shown any interest. I normally see bursts of > } > up to around 40K cs/second when doing writes; I hacked a little > } > premption counter into the kernel and verified that they're practially > } > all bdflush... > } > There's some strangness in bdflush(). The comment says: > > /* > * If there are still a lot of dirty buffers around, > * skip the sleep and flush some more. Otherwise, we > * go to sleep waiting a wakeup. > */ > if (!flushed || balance_dirty_state(NODEV) < 0) { > run_task_queue(_disk); > schedule(); > } Speaking of bdflush brokenness, I was trying to tune it using /proc/sys/vm/bdflush. I was trying to eliminate the bursty write behaviour Linux always seems to have had (exhibited by e.g. find /). Unfortunately, different /proc/sys/vm/bdflush settings didn't seem to have much (if any) effect. Is this another case of /proc/sys/vm/* settings being ignored? If so they should be removed. I was hoping to get a steady trickle of writes instead of the occasional mammoth burst. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch(?)] question wrt context switching during disk i/o
On Sat, 21 Oct 2000, Bill Wendling wrote: } bdflush is broken in current kernels. I posted to linux-mm about this, } but Rik et al haven't shown any interest. I normally see bursts of } up to around 40K cs/second when doing writes; I hacked a little } premption counter into the kernel and verified that they're practially } all bdflush... } There's some strangness in bdflush(). The comment says: /* * If there are still a lot of dirty buffers around, * skip the sleep and flush some more. Otherwise, we * go to sleep waiting a wakeup. */ if (!flushed || balance_dirty_state(NODEV) 0) { run_task_queue(tq_disk); schedule(); } Speaking of bdflush brokenness, I was trying to tune it using /proc/sys/vm/bdflush. I was trying to eliminate the bursty write behaviour Linux always seems to have had (exhibited by e.g. find /). Unfortunately, different /proc/sys/vm/bdflush settings didn't seem to have much (if any) effect. Is this another case of /proc/sys/vm/* settings being ignored? If so they should be removed. I was hoping to get a steady trickle of writes instead of the occasional mammoth burst. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[FIXED] Re: 2.4.0test-9: IDE problems
On Wed, 11 Oct 2000, Alan Cox wrote: > > > have the rules for testing if the driver/host/device register and report > > > that all signals are valid and stable. > > > > Yes, I had some "interesting" modifications to a lot of my /usr when I > > tried to activate UDMA4 under RH7.0 (I don't believe my hardware is > > capable of UDMA4!) > > The 2.2 kernel we ship doesnt have the ide patches either so Im not suprised > it got upset 8) OK, so in case anyone is tracking open issues, this was "pilot error". My motherboard only does ATA33 (UDMA2). It just happens to work under ATA44 (UDMA3). Since ATA44 is out of my machine's spec, and could corrupt data, the 2.4 kernel is correct to reject attempts to set UDMA3.[1] Cheers Chris [1] But if you're mad, you can still boot with idex=ata66 and force the issue. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Raw i/o usage wrecks block device performance??
Hi, Here's a very strange (and repeatable) result. Affects 2.2.x + raw device patches (i.e. RH7.0). Also had a similar effect on 2.4.0test9! The problem is best described with a little sequence. After using raw i/o facilities, streamed block device reads from the same underlying device exhibit much poorer performance than before the raw i/o. Example [root@localhost /root]# hdparm -t /dev/hda /dev/hda: Timing buffered disk reads: 64 MB in 3.81 seconds = 16.80 MB/sec [root@localhost /root]# time dd if=/dev/raw/raw1 of=/dev/null bs=1024k count=64 64+0 records in 64+0 records out real0m2.990s user0m0.010s sys 0m0.450s [root@localhost /root]# hdparm -t /dev/hda /dev/hda: Timing buffered disk reads: 64 MB in 6.12 seconds = 10.46 MB/sec The read figures before and after the raw i/o are repeatable with only little jitter. Raw device reads are consistent and not affected by this phenomena. Anyone know what's going on? Looks like a bug or inefficiency somewhere in the kernel. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[FIXED] Re: 2.4.0test-9: IDE problems
On Wed, 11 Oct 2000, Alan Cox wrote: have the rules for testing if the driver/host/device register and report that all signals are valid and stable. Yes, I had some "interesting" modifications to a lot of my /usr when I tried to activate UDMA4 under RH7.0 (I don't believe my hardware is capable of UDMA4!) The 2.2 kernel we ship doesnt have the ide patches either so Im not suprised it got upset 8) OK, so in case anyone is tracking open issues, this was "pilot error". My motherboard only does ATA33 (UDMA2). It just happens to work under ATA44 (UDMA3). Since ATA44 is out of my machine's spec, and could corrupt data, the 2.4 kernel is correct to reject attempts to set UDMA3.[1] Cheers Chris [1] But if you're mad, you can still boot with idex=ata66 and force the issue. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.4.0test9 vm: disappointing streaming i/o under load
Hi, Finally got round to checking out 2.4.0test9. Unfortunately, 2.4.0test9 exhibits poor streaming i/o performance when under a bit of memory pressure. The test is this: boot with mem=32M, log onto GNOME and start xmms playing a big .wav ripped from a CD (this requires 100-200k read i/o per second). Then, I start then kill netscape. I then started a find / and started gnumeric firing up at the same time. Results === 2.2 RH7.0: the music skipped maybe twice briefly during the test. 2.4.0test9: music stuttered repeatedly while netscape started. Worse, when firing up gnumeric with the find / on the go, there were big pauses in sound output. On pause was over 5 seconds!!! So not so hot. Could this perhaps be related to the drop_behind magic penalizing streaming i/o pages too much? Perhaps the greater ago on the i/o pages means that when there is a little memory pressure, they are getting thrown out the page cache before the app (xmms) gets a chance to use them! Might it be useful for me to try pre10-1, I note it has more "balancing fixes". Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0test-9: IDE problems
On Wed, 11 Oct 2000, Alan Cox wrote: > The 2.2 kernel we ship doesnt have the ide patches either so Im not suprised > it got upset 8) Ah yes you're correct. I saw the patch in the kernel SRPM but didn't look far enough to see: ... # IDE patch provides UDMA66 support, but is known to corrupt filesystems # on a few systems, so is not applied by default. Patch151: linux-2.2.16-ide-2805.patch ... # Dangerous IDE patch available but off by default #%patch151 -p1 ... Still, with hdparm -d1 -X67, I can presumably get UDMA3 and good 2.2 speeds (17Mb/s, or 21Mb/s with rawio) without this patch. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0test-9: IDE problems
On Wed, 11 Oct 2000, Alan Cox wrote: > > hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } > > hda: dma_intr: error=0x84 { DriveStatusError BadCRC } > > Bad CRC is a cable error. That could be misconfiguration but could also be > crap cables It went away when I enabled PIIX4 support + PIIX4 tuning support. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0test-9: IDE problems
On Tue, 10 Oct 2000, Andre Hedrick wrote: > Also set this option "CONFIG_IDEDMA_IVB" because you are in the > transistion period of drive manufacturing. Turned that on, applied the patch. BTW, your patch seems to make the "Speed warnings" failure _more_ likely?? Still refuses to activate UDMA3. 11.5Mb/sec vs. 17Mb/sec in 2.2. Is my hardware trying to tell me that 2.2 shouldn't be allowing me to run with UDMA3? It's rock solid and yes I've given it a pounding. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0test-9: IDE problems
On Tue, 10 Oct 2000, Andre Hedrick wrote: > Basically you have drive that caught in the word93 rules change. > > However, the error you got were real and the kernel did properly respeed > the drive to one step slower. The problem above prevented you from going > from ATA66 to ATA44, thus you fell to ATA33. > > You RHS 7.0 kernel does not have all the fallback and rules testing to > keep things running the very best and in the safest way. Also you do not > have the rules for testing if the driver/host/device register and report > that all signals are valid and stable. Yes, I had some "interesting" modifications to a lot of my /usr when I tried to activate UDMA4 under RH7.0 (I don't believe my hardware is capable of UDMA4!) > If you did not set TUNING option if the chipset has it specifically > flagged then you will not be able to retune the chipset/drive and the IO > will be out of sync. Shortly after my first post, I noticed and activated the Intel PIIX4 support + tuning. This got rid of the nasty errors but didn't get my 17Mb/sec. Trying your patch now. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.4.0test-9: IDE problems
Hi, Finally got around to trying out 2.4.0test9. I'm going to do some VM performance comparisons (incidentally because VM should be a carefully measured science not random cool idea of the day which we have seen too much of recently). Unfortunately, I can't start fair tests yet because UDMA3 refuses to activate in 2.4.0test-9. I get these messages on boot ide0: Speed warnings UDMA 3/4/5 is not functional. hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } hda: dma_intr: error=0x84 { DriveStatusError BadCRC } hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } hda: dma_intr: error=0x84 { DriveStatusError BadCRC } hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } hda: dma_intr: error=0x84 { DriveStatusError BadCRC } hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } hda: dma_intr: error=0x84 { DriveStatusError BadCRC } ide0: reset: success hda: DMA disabled Subsequently, if I run hdparm -d1 -X67 /dev/hda I get told ide0: Speed warnings UDMA 3/4/5 is not functional. And this leaves me with /dev/hda: Timing buffered disk reads: 64 MB in 5.71 seconds = 11.21 MB/sec Under the stock 2.2 RedHat 7.0 kernel, the same hdparm tuning gives me about 17Mb/s. Anyone got any hints? I selected the ATA option. Might this be causing the failure? Anyone got any hints? Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.4.0test-9: IDE problems
Hi, Finally got around to trying out 2.4.0test9. I'm going to do some VM performance comparisons (incidentally because VM should be a carefully measured science not random cool idea of the day which we have seen too much of recently). Unfortunately, I can't start fair tests yet because UDMA3 refuses to activate in 2.4.0test-9. I get these messages on boot ide0: Speed warnings UDMA 3/4/5 is not functional. hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } hda: dma_intr: error=0x84 { DriveStatusError BadCRC } hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } hda: dma_intr: error=0x84 { DriveStatusError BadCRC } hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } hda: dma_intr: error=0x84 { DriveStatusError BadCRC } hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } hda: dma_intr: error=0x84 { DriveStatusError BadCRC } ide0: reset: success hda: DMA disabled Subsequently, if I run hdparm -d1 -X67 /dev/hda I get told ide0: Speed warnings UDMA 3/4/5 is not functional. And this leaves me with /dev/hda: Timing buffered disk reads: 64 MB in 5.71 seconds = 11.21 MB/sec Under the stock 2.2 RedHat 7.0 kernel, the same hdparm tuning gives me about 17Mb/s. Anyone got any hints? I selected the ATA option. Might this be causing the failure? Anyone got any hints? Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0test-9: IDE problems
On Tue, 10 Oct 2000, Andre Hedrick wrote: Basically you have drive that caught in the word93 rules change. However, the error you got were real and the kernel did properly respeed the drive to one step slower. The problem above prevented you from going from ATA66 to ATA44, thus you fell to ATA33. You RHS 7.0 kernel does not have all the fallback and rules testing to keep things running the very best and in the safest way. Also you do not have the rules for testing if the driver/host/device register and report that all signals are valid and stable. Yes, I had some "interesting" modifications to a lot of my /usr when I tried to activate UDMA4 under RH7.0 (I don't believe my hardware is capable of UDMA4!) If you did not set TUNING option if the chipset has it specifically flagged then you will not be able to retune the chipset/drive and the IO will be out of sync. Shortly after my first post, I noticed and activated the Intel PIIX4 support + tuning. This got rid of the nasty errors but didn't get my 17Mb/sec. Trying your patch now. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0test-9: IDE problems
On Tue, 10 Oct 2000, Andre Hedrick wrote: Also set this option "CONFIG_IDEDMA_IVB" because you are in the transistion period of drive manufacturing. Turned that on, applied the patch. BTW, your patch seems to make the "Speed warnings" failure _more_ likely?? Still refuses to activate UDMA3. 11.5Mb/sec vs. 17Mb/sec in 2.2. Is my hardware trying to tell me that 2.2 shouldn't be allowing me to run with UDMA3? It's rock solid and yes I've given it a pounding. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0test-9: IDE problems
On Wed, 11 Oct 2000, Alan Cox wrote: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } hda: dma_intr: error=0x84 { DriveStatusError BadCRC } Bad CRC is a cable error. That could be misconfiguration but could also be crap cables It went away when I enabled PIIX4 support + PIIX4 tuning support. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0test-9: IDE problems
On Wed, 11 Oct 2000, Alan Cox wrote: The 2.2 kernel we ship doesnt have the ide patches either so Im not suprised it got upset 8) Ah yes you're correct. I saw the patch in the kernel SRPM but didn't look far enough to see: ... # IDE patch provides UDMA66 support, but is known to corrupt filesystems # on a few systems, so is not applied by default. Patch151: linux-2.2.16-ide-2805.patch ... # Dangerous IDE patch available but off by default #%patch151 -p1 ... Still, with hdparm -d1 -X67, I can presumably get UDMA3 and good 2.2 speeds (17Mb/s, or 21Mb/s with rawio) without this patch. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.4.0test9 vm: disappointing streaming i/o under load
Hi, Finally got round to checking out 2.4.0test9. Unfortunately, 2.4.0test9 exhibits poor streaming i/o performance when under a bit of memory pressure. The test is this: boot with mem=32M, log onto GNOME and start xmms playing a big .wav ripped from a CD (this requires 100-200k read i/o per second). Then, I start then kill netscape. I then started a find / and started gnumeric firing up at the same time. Results === 2.2 RH7.0: the music skipped maybe twice briefly during the test. 2.4.0test9: music stuttered repeatedly while netscape started. Worse, when firing up gnumeric with the find / on the go, there were big pauses in sound output. On pause was over 5 seconds!!! So not so hot. Could this perhaps be related to the drop_behind magic penalizing streaming i/o pages too much? Perhaps the greater ago on the i/o pages means that when there is a little memory pressure, they are getting thrown out the page cache before the app (xmms) gets a chance to use them! Might it be useful for me to try pre10-1, I note it has more "balancing fixes". Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: VM in v2.4.0test9
On Wed, 4 Oct 2000, Rik van Riel wrote: > Handling out-of-memory in a clean and predictable way is the > next thing on the feature list. I'll add it RSN (I'm reasonably > sure now that the current VM features are stable ... time for > OOM handling). Stable is good. But before moving on, wouldn't it be nice to have some test8 vs. test9 vs. 2.2.14 (or so) benchmarks, to confirm it was worth the pain of a whole pre-patch series weeding out deadlocks? Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: VM in v2.4.0test9
On Wed, 4 Oct 2000, Rik van Riel wrote: Handling out-of-memory in a clean and predictable way is the next thing on the feature list. I'll add it RSN (I'm reasonably sure now that the current VM features are stable ... time for OOM handling). Stable is good. But before moving on, wouldn't it be nice to have some test8 vs. test9 vs. 2.2.14 (or so) benchmarks, to confirm it was worth the pain of a whole pre-patch series weeding out deadlocks? Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/