Re: [PATCH] Re: BUG: race-cond with partition-check
[EMAIL PROTECTED] writes: > --- partitions/check.c~ Thu May 31 22:26:56 2001 > +++ partitions/check.cFri Jun 8 10:44:02 2001 > @@ -418,11 +418,10 @@ > blk_size[dev->major] = NULL; > > dev->part[first_minor].nr_sects = size; > - /* No Such Agen^Wdevice or no minors to use for partitions */ > + /* No such device or no minors to use for partitions */ Any reason why you're silently removing a good old anti-NSA joke? Conspiracy theorists may have fun with that... :-) --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: BUG: race-cond with partition-check
[EMAIL PROTECTED] writes: --- partitions/check.c~ Thu May 31 22:26:56 2001 +++ partitions/check.cFri Jun 8 10:44:02 2001 @@ -418,11 +418,10 @@ blk_size[dev-major] = NULL; dev-part[first_minor].nr_sects = size; - /* No Such Agen^Wdevice or no minors to use for partitions */ + /* No such device or no minors to use for partitions */ Any reason why you're silently removing a good old anti-NSA joke? Conspiracy theorists may have fun with that... :-) --Malcolm -- Malcolm Beattie [EMAIL PROTECTED] Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)
[cc list reduced] Andreas Dilger writes: > PS - I used to think shrinking a filesystem online was useful, but there > are a huge amount of problems with this and very few real-life > benefits, as long as you can at least do offline shrinking. With > proper LVM usage, the need to shrink a filesystem never really > happens in practise, unlike the partition case where you always > have to guess in advance how big a filesystem needs to be, and then > add 10% for a safety margin. With LVM you just create the minimal > sized device you need now, and freely grow it in the future. In an attempt to nudge you back towards your previous opinion: consider a system-wide spool or tmp filesystem. It would be nice to be able to add in a few extra volumes for a busy period but then shrink it down again when usage returns to normal. In the absence of the ability to shrink a live filesystem, storage management becomes a much harder job. You can't throw in a spare volume or two where it's needed without careful thought because you'll be ratchetting up the space on that one filesystem without being able to change your mind and reduce it again later. You'll end up with stingy storage admins who refuse to give you a bunch of extra filesystem space for a while because they can't get it back again afterwards. --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why side-effects on open(2) are evil. (was Re: [RFD w/info-PATCH]device arguments from lookup)
[cc list reduced] Andreas Dilger writes: PS - I used to think shrinking a filesystem online was useful, but there are a huge amount of problems with this and very few real-life benefits, as long as you can at least do offline shrinking. With proper LVM usage, the need to shrink a filesystem never really happens in practise, unlike the partition case where you always have to guess in advance how big a filesystem needs to be, and then add 10% for a safety margin. With LVM you just create the minimal sized device you need now, and freely grow it in the future. In an attempt to nudge you back towards your previous opinion: consider a system-wide spool or tmp filesystem. It would be nice to be able to add in a few extra volumes for a busy period but then shrink it down again when usage returns to normal. In the absence of the ability to shrink a live filesystem, storage management becomes a much harder job. You can't throw in a spare volume or two where it's needed without careful thought because you'll be ratchetting up the space on that one filesystem without being able to change your mind and reduce it again later. You'll end up with stingy storage admins who refuse to give you a bunch of extra filesystem space for a while because they can't get it back again afterwards. --Malcolm -- Malcolm Beattie [EMAIL PROTECTED] Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
Alexander Viro writes: > thing, we could turn mount(2) into > open appropriate fs type > convince the sucker that you are allowed, tell which device you want, > etc. > open mountpoint > mount(fs_fd, dir_fd) > Would work like charm, especially since we could fit the network filesystems > into the same scheme and get rid of the kludges a-la ncpfs mount sequence. > > There's only one sore spot: how'd you mount _that_ fs? ;-) Start up init with fs_fd on file descriptor 3 and init can put it where it likes. --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LANANA: To Pending Device Number Registrants
Alexander Viro writes: thing, we could turn mount(2) into open appropriate fs type convince the sucker that you are allowed, tell which device you want, etc. open mountpoint mount(fs_fd, dir_fd) Would work like charm, especially since we could fit the network filesystems into the same scheme and get rid of the kludges a-la ncpfs mount sequence. There's only one sore spot: how'd you mount _that_ fs? ;-) Start up init with fs_fd on file descriptor 3 and init can put it where it likes. --Malcolm -- Malcolm Beattie [EMAIL PROTECTED] Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Not a typewriter
Jonathan Lundell writes: > FWIW, the comment in errno.h under Solaris 2.6 is "Inappropriate > ioctl for device". I believe that's the POSIX interpretation. POSIX has [ENOTTY] Inappropriate I/O control operation A control function was attempted for a file or special file for which the operation was inappropriate. which is quite a nice way of putting it. --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Not a typewriter
Jonathan Lundell writes: FWIW, the comment in errno.h under Solaris 2.6 is Inappropriate ioctl for device. I believe that's the POSIX interpretation. POSIX has [ENOTTY] Inappropriate I/O control operation A control function was attempted for a file or special file for which the operation was inappropriate. which is quite a nice way of putting it. --Malcolm -- Malcolm Beattie [EMAIL PROTECTED] Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Wow! Is memory ever cheap!
Larry McVoy writes: > On Wed, May 09, 2001 at 12:24:25AM -0400, Marty Leisner wrote: > > My understanding is suns big machines stopped using ecc and they > > The SUN problem was a cache problem and there is no way that I believe > that SUN would turn of ECC in the cache. There are good reasons for > not doing so. If you think through the end to end argument, you will > see that you have no way to do checks on the data path into/out of the > processor. If that part of the datapath is not checked then no amount > of checking elsewhere does any good, the processor can be corrupting > your data and never know it. If SUN was so stupid as to remove this, > then it is a dramatically different place. I heard that there was a > bug in the cache controller, I never heard that they had removed ECC. There are issues with error detection/correction/recovery with different designs of L1 and L2 caches. There's a good paper: IBM S/390 storage hierarchy - G5 and G6 performance considerations IBM Journal of Research and Development Vol 43 No. 5/6 available at http://www.research.ibm.com/journal/rd/435/jackson.html which covers IBM's choice of L1 and L2 design for S/390. The section on "S/390 reliability and performance implications" is relevant here. In particular, they use a solution which isn't best from the performance point of view but ensures you don't discover "too late" about an error. I heard a rumour (now I get to the unsubstantiated part :-) that Sun chose a higher-performing design for their cache subsystem but which has a nastier failure mode in the case of cache errors. --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Wow! Is memory ever cheap!
Larry McVoy writes: On Wed, May 09, 2001 at 12:24:25AM -0400, Marty Leisner wrote: My understanding is suns big machines stopped using ecc and they The SUN problem was a cache problem and there is no way that I believe that SUN would turn of ECC in the cache. There are good reasons for not doing so. If you think through the end to end argument, you will see that you have no way to do checks on the data path into/out of the processor. If that part of the datapath is not checked then no amount of checking elsewhere does any good, the processor can be corrupting your data and never know it. If SUN was so stupid as to remove this, then it is a dramatically different place. I heard that there was a bug in the cache controller, I never heard that they had removed ECC. There are issues with error detection/correction/recovery with different designs of L1 and L2 caches. There's a good paper: IBM S/390 storage hierarchy - G5 and G6 performance considerations IBM Journal of Research and Development Vol 43 No. 5/6 available at http://www.research.ibm.com/journal/rd/435/jackson.html which covers IBM's choice of L1 and L2 design for S/390. The section on S/390 reliability and performance implications is relevant here. In particular, they use a solution which isn't best from the performance point of view but ensures you don't discover too late about an error. I heard a rumour (now I get to the unsubstantiated part :-) that Sun chose a higher-performing design for their cache subsystem but which has a nastier failure mode in the case of cache errors. --Malcolm -- Malcolm Beattie [EMAIL PROTECTED] Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Block device strategy and requests
I'm designing a block device driver for a high performance disk subsystem with unusual characteristics. To what extent is the limited number of "struct request"s (128 by default) necessary for back-pressure? With this I/O subsystem it would be possible for the strategy function to rip the requests from the request list straight away, arrange for the I/Os to be done to/from the buffer_heads (with no additional state required) with no memory "leak". This would effectively mean that the only limit on the number of I/Os queued would be the number of buffer_heads allocated; not a fixed number of "struct request"s in flight. Is this reasonable or does any memory or resource balancing depend on the number of I/Os outstanding being bounded? Also, there is a lot of flexibility in how often interrupts are sent to mark the buffer_heads up-to-date. (With the requests pulled straight off the queue, the job of end_that_request_first() in doing the linked list updates and bh->b_end_io() callbacks would be done by the interrupt routine directly.) At one extreme, I could take an interrupt for each 4K block issued and mark it up-to-date very quickly making for very low-latency I/O but a very large interrupt rate when I/O throughput is high. The alternative would be to arrange for an interrupt every n buffer_heads (or based on some other criterion) and only take an interrupt and mark buffers up-to-date on each of those). Are there any rules of thumb on which is best or doesn't it matter too much? --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Block device strategy and requests
I'm designing a block device driver for a high performance disk subsystem with unusual characteristics. To what extent is the limited number of struct requests (128 by default) necessary for back-pressure? With this I/O subsystem it would be possible for the strategy function to rip the requests from the request list straight away, arrange for the I/Os to be done to/from the buffer_heads (with no additional state required) with no memory leak. This would effectively mean that the only limit on the number of I/Os queued would be the number of buffer_heads allocated; not a fixed number of struct requests in flight. Is this reasonable or does any memory or resource balancing depend on the number of I/Os outstanding being bounded? Also, there is a lot of flexibility in how often interrupts are sent to mark the buffer_heads up-to-date. (With the requests pulled straight off the queue, the job of end_that_request_first() in doing the linked list updates and bh-b_end_io() callbacks would be done by the interrupt routine directly.) At one extreme, I could take an interrupt for each 4K block issued and mark it up-to-date very quickly making for very low-latency I/O but a very large interrupt rate when I/O throughput is high. The alternative would be to arrange for an interrupt every n buffer_heads (or based on some other criterion) and only take an interrupt and mark buffers up-to-date on each of those). Are there any rules of thumb on which is best or doesn't it matter too much? --Malcolm -- Malcolm Beattie [EMAIL PROTECTED] Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: zSeries
Frank Fiene writes: > Who can tell me, how is the performance of the big irons (zSeries) > from IBM comparing a pc server with linux? Different. Very, very different. Elaborate on what problem you're trying to solve and then there'll be more chance of comparing platforms. --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: zSeries
Frank Fiene writes: Who can tell me, how is the performance of the big irons (zSeries) from IBM comparing a pc server with linux? Different. Very, very different. Elaborate on what problem you're trying to solve and then there'll be more chance of comparing platforms. --Malcolm -- Malcolm Beattie [EMAIL PROTECTED] Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ftruncate not extending files?
bert hubert writes: > I would've sworn, based on the fact that I saw people do it, that ftruncate > was a legitimate way to extend a file Well it's not SuSv2 standards compliant: http://www.opengroup.org/onlinepubs/007908799/xsh/ftruncate.html If the file previously was larger than length, the extra data is discarded. If it was previously shorter than length, it is unspecified whether the file is changed or its size increased. If ^^^ the file is extended, the extended area appears as if it were zero-filled. How "legitimate" relates to "SuSv2 standards compliant" is your call. --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ftruncate not extending files?
bert hubert writes: I would've sworn, based on the fact that I saw people do it, that ftruncate was a legitimate way to extend a file Well it's not SuSv2 standards compliant: http://www.opengroup.org/onlinepubs/007908799/xsh/ftruncate.html If the file previously was larger than length, the extra data is discarded. If it was previously shorter than length, it is unspecified whether the file is changed or its size increased. If ^^^ the file is extended, the extended area appears as if it were zero-filled. How "legitimate" relates to "SuSv2 standards compliant" is your call. --Malcolm -- Malcolm Beattie [EMAIL PROTECTED] Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
Chris Evans writes: > > On Thu, 1 Feb 2001, Malcolm Beattie wrote: > > > Mapping the addresses from whichever ScrollLock combination produced > > the task list to symbols produces the call trace > > do_exit <- do_signal <- tcp_destroy_sock <- inet_ioctl <- signal_return > > > > The inet_ioctl is odd there--vsftpd doesn't explicitly call ioctl > > anywhere at all and the next function before it in memory is > > inet_shutdown which looks more believable. I have checked I'm looking > > Probably, the empty SIGPIPE handler triggered. The response to this is a > lot of shutdown() close() and finally an exit(). > > The trace you give above looks like the child process trace. I always see > the parent process go nuts. The parent process is almost always blocking > on read() of a unix dgram socket, which it shares with the child. The > child does a shutdown() on this socket just before exit(). We've done some more detective work. I can reproduce the hang too by quitting the ftp client abruptly (^Z and kill %1 in my case). Inducing the hang while stracing the daemon shows a recv returning 0 as expected when the socket closes. The daemon then calls "die": die(const char* p_text) { /* Going down hard... */ #ifdef DIE_DEBUG bug(p_text); #endif and DIE_DEBUG is defined. bug() writes an error message and then does three things: shutdown(2) on the sockets close(2) on the sockets abort() the last of which libc implements as rt_sigprocmask(SIG_UNBLOCK, [SIGABRT]) kill(getpid(), SIGABRT) Here's the interesting thing: doing an exit(0) before the shutdowns and abort gets rid of the hang. The only unusual and potentially untested thing I could find about the program was that it uses capset() and prctl(PR_SET_KEEPCAPS). However, replacing the "retval = capset(...)" call with a dummy "retval = 0" doesn't get rid of the hang. So it looks as though some combination of shutdown(2) and SIGABRT is at fault. After the hang the kernel-side stack trace is always either the one I gave above (and I *did* write down the address for inet_ioctl correctly; it's definitely not inet_shutdown) or else do_exit <- do_signal <- schedule <- syscall_trace <- signal_return (with exactly the same addresses as above except for the differing schedule and syscall_trace ones) which appeared after the hang while vsftpd was being run under strace. --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
Malcolm Beattie writes: > Chris Evans writes: > > I've just managed to reproduce this personally on 2.4.0. I've had a report > > that 2.4.1 is also affected. Both myself and the other person who > > reproduced this have SMP i686 machines, which may or may not be relevant. > > > > To reproduce, all you need to do is get my vsftpd ftp server: > > ftp://ferret.lmh.ox.ac.uk/pub/linux/vsftpd-0.0.9.tar.gz [...] > As in Chris' case, vzftpd was a zombie (so Foo-ScrollLock told me) and > all other processes were looking OK in R or S state. Mapping the addresses from whichever ScrollLock combination produced the task list to symbols produces the call trace do_exit <- do_signal <- tcp_destroy_sock <- inet_ioctl <- signal_return The inet_ioctl is odd there--vsftpd doesn't explicitly call ioctl anywhere at all and the next function before it in memory is inet_shutdown which looks more believable. I have checked I'm looking at the right System.map but I suppose I may have mis-transcribed the address when writing it down. vsftpd doesn't make use of signal handlers except to unset some existing ones and a SIGALRM handler which I don't think would have triggered. Something like a seg fault may have caused it (I should have seen an oops if it had happened in kernel space) perhaps? --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
Chris Evans writes: > I've just managed to reproduce this personally on 2.4.0. I've had a report > that 2.4.1 is also affected. Both myself and the other person who > reproduced this have SMP i686 machines, which may or may not be relevant. > > To reproduce, all you need to do is get my vsftpd ftp server: > ftp://ferret.lmh.ox.ac.uk/pub/linux/vsftpd-0.0.9.tar.gz I got this just before lunch too. I was trying out 2.4.1 + zerocopy (with netfilter configured off, see the sendfile/zerocopy thread for more details and hardware specs) and tried running vsftpd on the slower machine instead of the faster machine as before. I connected to vsftpd with an ftp client and got a 500 OOPS: chdir Login failed. 421 Service not available, remote server has closed connection (ftpd's idea of an OOPS; not the kernel's idea of an oops, of course). That was probably because I hadn't configured the directory properly but following that the machine hung, in the following way: userland hung: no more logins, existent xterm processes didn't refresh their windows on my (remote) display. The machine was still pingable, though. I configured Magic SysRq into the kernel but hadn't played with it before so I hadn't enabled it in /proc (D'oh. Next time I'll know.) As in Chris' case, vzftpd was a zombie (so Foo-ScrollLock told me) and all other processes were looking OK in R or S state. Looking at the kernel's EIP every so often to see what was going showed remove_wait_queue, add_wait_queue, skb_recv_datagram and wait_for_packet mostly. Random thought: if vsftpd did a sendfile and then exited, becoming a zombie, could there be a problem with tearing down a sendfile mapping? I'm off to read some code. --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [UPDATE] Fresh zerocopy patch on kernel.org
David S. Miller writes: > > Malcolm Beattie writes: > > David S. Miller writes: > > > > > > At the usual place: > > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/davem/zerocopy-2.4.1-1.diff.gz > > > > Hmm, disappointing results here; maybe I've missed something. > > As discussed elsewhere there is a %10 to %15 performance hit for > normal write()'s done with the new code. > > If you do your testing using sendfile() as the data source, you'll > results ought to be wildly different and more encouraging. I did say that the ftp test used sendfile() as the data source and it dropped from 86 MB/s to 62 MB/s. Alexey has mailed me suggesting the problem may be that netfilter is turned on. It is indeed turned on in both the 2.4.1 config and the 2.4.1+zc config but maybe it has a far higher detrimental effect in the zerocopy case. I'm currently building new non-netfilter kernels and I'll go through the exercise again. I'm confident I'll end up being impressed with the numbers even if it takes some tweaking to get there :-) --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
Chris Evans writes: I've just managed to reproduce this personally on 2.4.0. I've had a report that 2.4.1 is also affected. Both myself and the other person who reproduced this have SMP i686 machines, which may or may not be relevant. To reproduce, all you need to do is get my vsftpd ftp server: ftp://ferret.lmh.ox.ac.uk/pub/linux/vsftpd-0.0.9.tar.gz I got this just before lunch too. I was trying out 2.4.1 + zerocopy (with netfilter configured off, see the sendfile/zerocopy thread for more details and hardware specs) and tried running vsftpd on the slower machine instead of the faster machine as before. I connected to vsftpd with an ftp client and got a 500 OOPS: chdir Login failed. 421 Service not available, remote server has closed connection (ftpd's idea of an OOPS; not the kernel's idea of an oops, of course). That was probably because I hadn't configured the directory properly but following that the machine hung, in the following way: userland hung: no more logins, existent xterm processes didn't refresh their windows on my (remote) display. The machine was still pingable, though. I configured Magic SysRq into the kernel but hadn't played with it before so I hadn't enabled it in /proc (D'oh. Next time I'll know.) As in Chris' case, vzftpd was a zombie (so Foo-ScrollLock told me) and all other processes were looking OK in R or S state. Looking at the kernel's EIP every so often to see what was going showed remove_wait_queue, add_wait_queue, skb_recv_datagram and wait_for_packet mostly. Random thought: if vsftpd did a sendfile and then exited, becoming a zombie, could there be a problem with tearing down a sendfile mapping? I'm off to read some code. --Malcolm -- Malcolm Beattie [EMAIL PROTECTED] Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
Malcolm Beattie writes: Chris Evans writes: I've just managed to reproduce this personally on 2.4.0. I've had a report that 2.4.1 is also affected. Both myself and the other person who reproduced this have SMP i686 machines, which may or may not be relevant. To reproduce, all you need to do is get my vsftpd ftp server: ftp://ferret.lmh.ox.ac.uk/pub/linux/vsftpd-0.0.9.tar.gz [...] As in Chris' case, vzftpd was a zombie (so Foo-ScrollLock told me) and all other processes were looking OK in R or S state. Mapping the addresses from whichever ScrollLock combination produced the task list to symbols produces the call trace do_exit - do_signal - tcp_destroy_sock - inet_ioctl - signal_return The inet_ioctl is odd there--vsftpd doesn't explicitly call ioctl anywhere at all and the next function before it in memory is inet_shutdown which looks more believable. I have checked I'm looking at the right System.map but I suppose I may have mis-transcribed the address when writing it down. vsftpd doesn't make use of signal handlers except to unset some existing ones and a SIGALRM handler which I don't think would have triggered. Something like a seg fault may have caused it (I should have seen an oops if it had happened in kernel space) perhaps? --Malcolm -- Malcolm Beattie [EMAIL PROTECTED] Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Serious reproducible 2.4.x kernel hang
Chris Evans writes: On Thu, 1 Feb 2001, Malcolm Beattie wrote: Mapping the addresses from whichever ScrollLock combination produced the task list to symbols produces the call trace do_exit - do_signal - tcp_destroy_sock - inet_ioctl - signal_return The inet_ioctl is odd there--vsftpd doesn't explicitly call ioctl anywhere at all and the next function before it in memory is inet_shutdown which looks more believable. I have checked I'm looking Probably, the empty SIGPIPE handler triggered. The response to this is a lot of shutdown() close() and finally an exit(). The trace you give above looks like the child process trace. I always see the parent process go nuts. The parent process is almost always blocking on read() of a unix dgram socket, which it shares with the child. The child does a shutdown() on this socket just before exit(). We've done some more detective work. I can reproduce the hang too by quitting the ftp client abruptly (^Z and kill %1 in my case). Inducing the hang while stracing the daemon shows a recv returning 0 as expected when the socket closes. The daemon then calls "die": die(const char* p_text) { /* Going down hard... */ #ifdef DIE_DEBUG bug(p_text); #endif and DIE_DEBUG is defined. bug() writes an error message and then does three things: shutdown(2) on the sockets close(2) on the sockets abort() the last of which libc implements as rt_sigprocmask(SIG_UNBLOCK, [SIGABRT]) kill(getpid(), SIGABRT) Here's the interesting thing: doing an exit(0) before the shutdowns and abort gets rid of the hang. The only unusual and potentially untested thing I could find about the program was that it uses capset() and prctl(PR_SET_KEEPCAPS). However, replacing the "retval = capset(...)" call with a dummy "retval = 0" doesn't get rid of the hang. So it looks as though some combination of shutdown(2) and SIGABRT is at fault. After the hang the kernel-side stack trace is always either the one I gave above (and I *did* write down the address for inet_ioctl correctly; it's definitely not inet_shutdown) or else do_exit - do_signal - schedule - syscall_trace - signal_return (with exactly the same addresses as above except for the differing schedule and syscall_trace ones) which appeared after the hang while vsftpd was being run under strace. --Malcolm -- Malcolm Beattie [EMAIL PROTECTED] Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [UPDATE] Fresh zerocopy patch on kernel.org
David S. Miller writes: > > At the usual place: > > ftp://ftp.kernel.org/pub/linux/kernel/people/davem/zerocopy-2.4.1-1.diff.gz Hmm, disappointing results here; maybe I've missed something. Setup is a Pentium II 350MHz (tusk) connected to a Pentium III 733MHz (heffalump) (both 512MB RAM) with SX fibre, each with a 3Com 3C985 NIC. Kernels compared are 2.4.1 and 2.4.1+zc (the 2.4.1-1 diff above) using acenic driver with MTU set to 9000. Sysctls set are # Raise socket buffer limits net.core.rmem_max = 262144 net.core.wmem_max = 262144 # Increase TCP write memory net.ipv4.tcp_wmem = 10 10 10 on both sides. Comparison tests done were gensink4: 10485760 (10MB) buffer size, 262144 (256K) socket buffer ftp: server does sendfile() from a 300MB file in page cache, client does read from socket/write to /dev/null in 4K chunks. 2.4.12.4.1+zc KByte/s tusk%CPU heff%CPU KByte/s tusk%CPU heff%CPU gensink4 tusk->heffalump94000 58-100 9354000 98-102 11-45 heffalump->tusk72000 86-100 46-59 7 71-9353-71 2.4.1 2.4.1+zc KByte/s KByte/s ftp heffalump->tusk 86000 62000 I was impressed with the raw 2.4.1 figures and hoped to be even more impressed with the 2.4.1+zc numbers. Is there something I'm missing or can change or do to help to improve matters or track down potential problems? --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN)
Ingo Molnar writes: > > On Tue, 30 Jan 2001, jamal wrote: > > > > - is this UDP or TCP based? (UDP i guess) > > > > > TCP > > well then i'd suggest to do: > > echo 10 10 10 > /proc/sys/net/ipv4/tcp_wmem > > does this make any difference? For the last week I've been benchmarking Linux network and I/O on a couple of machines with 3c985 gigabit cards and some other stuff (see below). One of the things I tried yesterday was a beta test version of a secure ftpd written by Chris Evans which happens to use sendfile() making it a convenient extra benchmark. I'd already put net.core.{r,w}mem_max up to 262144 for the sake of gensink and other benchmarks which raise SO_{SND,RCV}BUF. I hadn't however, tried raising tcp_wmem as per your suggestion above. Currently the systems are linked back to back with fibre with jumbo frames (MTU 9000) on and running pure kernel 2.4.1. I transferred a 300 MByte file repeatedly from the server to the client with an ftp "get" client-side. The file will have been completely in page cache on the server (both machines have 512MB RAM) and was written to /dev/null on the client side. (Yes, I checked the client was doing ordinary read/write and not throwing it away). Without the raised tcp_wmem setting I was getting 81 MByte/s. With tcp_wmem set as above I got 86 MByte/s. Nice increase. Any other setting I can tweak apart from {r,w}mem_max and tcp_{w,r}mem? The CPU on the client (350 MHz PII) is the bottleneck: gensink4 maxes out at 69 Mbyte/s pulling TCP from the server and 94 Mbyte/s pushing. (The other system, 733 MHz PIII pushes >100MByte/s UDP with ttcp but the client drops most of it). I'll be following up Dave Miller's "please benchmark zerocopy" request when I've got some more numbers written down since I've only just put the zerocopy patch in and haven't rebooted yet. If anyone wants any other specific benchmarks done (I/O or network) I may get some time to do them: the PIII system has an 8-port Escalade card with 8 x 46GB disks (117 MByte/s block writes as measured by Bonnie on a RAID1/0 mixed RAIDset) and there are also four dual-port eepro fast ethernet cards, a Cisco 8-port 3508G gigabit switch and a 24-port 3524 fast ethernet switch (gigastack linked to the 3508G). I'm benchmarking and looking into the possibility of a DIY NAS or SAN-type thing. --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [UPDATE] Fresh zerocopy patch on kernel.org
David S. Miller writes: At the usual place: ftp://ftp.kernel.org/pub/linux/kernel/people/davem/zerocopy-2.4.1-1.diff.gz Hmm, disappointing results here; maybe I've missed something. Setup is a Pentium II 350MHz (tusk) connected to a Pentium III 733MHz (heffalump) (both 512MB RAM) with SX fibre, each with a 3Com 3C985 NIC. Kernels compared are 2.4.1 and 2.4.1+zc (the 2.4.1-1 diff above) using acenic driver with MTU set to 9000. Sysctls set are # Raise socket buffer limits net.core.rmem_max = 262144 net.core.wmem_max = 262144 # Increase TCP write memory net.ipv4.tcp_wmem = 10 10 10 on both sides. Comparison tests done were gensink4: 10485760 (10MB) buffer size, 262144 (256K) socket buffer ftp: server does sendfile() from a 300MB file in page cache, client does read from socket/write to /dev/null in 4K chunks. 2.4.12.4.1+zc KByte/s tusk%CPU heff%CPU KByte/s tusk%CPU heff%CPU gensink4 tusk-heffalump94000 58-100 9354000 98-102 11-45 heffalump-tusk72000 86-100 46-59 7 71-9353-71 2.4.1 2.4.1+zc KByte/s KByte/s ftp heffalump-tusk 86000 62000 I was impressed with the raw 2.4.1 figures and hoped to be even more impressed with the 2.4.1+zc numbers. Is there something I'm missing or can change or do to help to improve matters or track down potential problems? --Malcolm -- Malcolm Beattie [EMAIL PROTECTED] Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Modprobe local root exploit
Keith Owens writes: > All these patches against request_module are attacking the problem at > the wrong point. The kernel can request any module name it likes, > using any string it likes, as long as the kernel generates the name. > The real problem is when the kernel blindly accepts some user input and > passes it straight to modprobe, then the kernel is acting like a setuid > wrapper for a program that was never designed to run setuid. Rather than add sanity checking to modprobe, it would be a lot easier and safer from a security audit point of view to have the kernel call /sbin/kmodprobe instead of /sbin/modprobe. Then kmodprobe can sanitise all the data and exec the real modprobe. That way the only thing that needs auditing is a string munging/sanitising program. --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Modprobe local root exploit
Keith Owens writes: All these patches against request_module are attacking the problem at the wrong point. The kernel can request any module name it likes, using any string it likes, as long as the kernel generates the name. The real problem is when the kernel blindly accepts some user input and passes it straight to modprobe, then the kernel is acting like a setuid wrapper for a program that was never designed to run setuid. Rather than add sanity checking to modprobe, it would be a lot easier and safer from a security audit point of view to have the kernel call /sbin/kmodprobe instead of /sbin/modprobe. Then kmodprobe can sanitise all the data and exec the real modprobe. That way the only thing that needs auditing is a string munging/sanitising program. --Malcolm -- Malcolm Beattie [EMAIL PROTECTED] Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Topic for discussion: OS Design
Marty Fouts writes: > I have had the good fortune of working with one architecture (PA-RISC) which > gets the separation of addressability and accessability 'right' enough to be > able to partition efficiently and use ordinary procedure calls (with some > magic at server boundaries) rather than IPCs. There are others, but PA-RISC > is the one I am aware of. Like S/390 secondary address space and cross-address-space services? --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Topic for discussion: OS Design
Marty Fouts writes: I have had the good fortune of working with one architecture (PA-RISC) which gets the separation of addressability and accessability 'right' enough to be able to partition efficiently and use ordinary procedure calls (with some magic at server boundaries) rather than IPCs. There are others, but PA-RISC is the one I am aware of. Like S/390 secondary address space and cross-address-space services? --Malcolm -- Malcolm Beattie [EMAIL PROTECTED] Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: New Benchmark tools, lookie looky........
Larry McVoy writes: > On Tue, Oct 17, 2000 at 09:21:00AM -0700, Andre Hedrick wrote: > > Expand 'traces' ... O-SCOPE analyizer? > > Insert a ring buffer into the disk sort entry point. Add a userland process > which reads this ring buffer and gets the actual requests in the actual order > they are sent to the drive[s]. Then take that data and write a simulator into > which you can plug in different algs. I have all this crud for SunOS if you > want it, including elevator.c, hacksaw.c, and inorder.c. I wrote a lightweight kernel->userland ring buffer device for Linux called bufflink and a block-request logger that uses it called reqlog. reqlog writes a structure struct reqlog { unsigned intmajor; unsigned intminor; unsigned long sector; longnr_sectors; }; to the ring buffer when an ioctl is done to enable logging. The current patch isn't quite what you were suggesting since it does roughly add_request() { ... elevator_queue(req, tmp, latency, dev, current_request); + if (bl_reqlog && enable_reqlog) { + ... + bufflink_append(bl_reqlog, (unsigned char *), sizeof rl); + } if (queue_new_request) (dev->request_fn)(); } but it would be easy to write the record (instead or as well) before the elevator_queue(). The patches are available from http://users.ox.ac.uk/~mbeattie/linux-kernel.html --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: New Benchmark tools, lookie looky........
Larry McVoy writes: On Tue, Oct 17, 2000 at 09:21:00AM -0700, Andre Hedrick wrote: Expand 'traces' ... O-SCOPE analyizer? Insert a ring buffer into the disk sort entry point. Add a userland process which reads this ring buffer and gets the actual requests in the actual order they are sent to the drive[s]. Then take that data and write a simulator into which you can plug in different algs. I have all this crud for SunOS if you want it, including elevator.c, hacksaw.c, and inorder.c. I wrote a lightweight kernel-userland ring buffer device for Linux called bufflink and a block-request logger that uses it called reqlog. reqlog writes a structure struct reqlog { unsigned intmajor; unsigned intminor; unsigned long sector; longnr_sectors; }; to the ring buffer when an ioctl is done to enable logging. The current patch isn't quite what you were suggesting since it does roughly add_request() { ... elevator_queue(req, tmp, latency, dev, current_request); + if (bl_reqlog enable_reqlog) { + ... + bufflink_append(bl_reqlog, (unsigned char *)rl, sizeof rl); + } if (queue_new_request) (dev-request_fn)(); } but it would be easy to write the record (instead or as well) before the elevator_queue(). The patches are available from http://users.ox.ac.uk/~mbeattie/linux-kernel.html --Malcolm -- Malcolm Beattie [EMAIL PROTECTED] Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: mapping user space buffer to kernel address space
[EMAIL PROTECTED] writes: > I have a user buffer and i want to map it to kernel address space > can anyone tell how to do this like in AIX we have xmattach In 2.2, you're better off providing a fake character device driver which allocates the memory in kernel space and lets the user mmap it. In 2.4, you could try out kiobufs which, if my reading of the section marked "#ifdef HACKING", "case BTTV_JUST_HACKING" and also /* playing with kiobufs and dma-to-userspace */ is correct, goes (modulo error handling) roughly like: struct kiobuf *iobuf; alloc_kiovec(1, ); /* allocate a(n array of one) kiobuf */ map_user_kiobuf(READ, iobuf, va, len); /* userland vaddr and length */ /* s/READ/WRITE/ for write */ /* now you have an iobuf containing pinned down user pages */ ... lock_kiovec(1, , 1); /* Lock pages down for I/O */ /* first 1 is vector count */ /* second means wait for lock */ .. /* do I/O on it */ free_kiovec(1, );/* does an implicit unlock_kiovec */ It doesn't do an unmap_kiobuf(iobuf) so I don't understand where the per-page map->count that map_user_kiobuf incremented gets decremented again. Anyone? Lowlevel I/O on a kiovec can be done with something like an ll_rw_kiovec which sct said was going to get put in but since I haven't read anything more recent than 2.4.0-test5 at the moment, I can't say if it's there or what it looks like. --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: mapping user space buffer to kernel address space
[EMAIL PROTECTED] writes: I have a user buffer and i want to map it to kernel address space can anyone tell how to do this like in AIX we have xmattach In 2.2, you're better off providing a fake character device driver which allocates the memory in kernel space and lets the user mmap it. In 2.4, you could try out kiobufs which, if my reading of the section marked "#ifdef HACKING", "case BTTV_JUST_HACKING" and also /* playing with kiobufs and dma-to-userspace */ is correct, goes (modulo error handling) roughly like: struct kiobuf *iobuf; alloc_kiovec(1, iobuf); /* allocate a(n array of one) kiobuf */ map_user_kiobuf(READ, iobuf, va, len); /* userland vaddr and length */ /* s/READ/WRITE/ for write */ /* now you have an iobuf containing pinned down user pages */ ... lock_kiovec(1, iobuf, 1); /* Lock pages down for I/O */ /* first 1 is vector count */ /* second means wait for lock */ .. /* do I/O on it */ free_kiovec(1, iobuf);/* does an implicit unlock_kiovec */ It doesn't do an unmap_kiobuf(iobuf) so I don't understand where the per-page map-count that map_user_kiobuf incremented gets decremented again. Anyone? Lowlevel I/O on a kiovec can be done with something like an ll_rw_kiovec which sct said was going to get put in but since I haven't read anything more recent than 2.4.0-test5 at the moment, I can't say if it's there or what it looks like. --Malcolm -- Malcolm Beattie [EMAIL PROTECTED] Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: large memory support for x86
Timur Tabi writes: > ** Reply to message from Jeff Epler <[EMAIL PROTECTED]> on Thu, 12 Oct 2000 > 13:08:19 -0500 > > What the support for >4G of memory on x86 is about, is the "PAE", Page Address > > Extension, supported on P6 generation of machines, as well as on Athlons > > (I think). With these, the kernel can use >4G of memory, but it still can't > > present a >32bit address space to user processes. But you could have 8G > > physical RAM and run 4 ~2G or 2 ~4G processes simultaneously in core. > > How about the kernel itself? How do I access the memory above 4GB inside a > device driver? It depends on what you have already. If you're given a (kernel) virtual address, just dereference it. The unit of currency for physical pages is the "struct page". If you want to allocate a physical page for your own use (from anywhere in physical memory) them you do struct page *page = alloc_page(GFP_FOO); If you want to read/write to that page directly from kernel space then you need to map it into kernel space: char *va = kmap(page); /* read/write from the page starting at virtual address va */ kunmap(va); The implementations of kmap and kunmap are such that mappings are cached (within reason) so it's "reasonable" fast doing kmap/kunmap. If you want to do something else with the page (like get some I/O done to/from it) then the new (and forthcoming) kiobuf functions take struct page units and handle all the internal mapping gubbins without you having to worry about it. --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: want tool to open RPM package on Window 95
Michal Jaegermann writes: > > Somewhere floating around there is a perl version of rpm2cpio. > > This is what I wrote one day a long time ago: > > #!/usr/bin/perl -w > use strict; > > my ($buffer, $pos, $gzmagic); > $gzmagic = "\037\213"; > open OUT, "| gunzip" or die "cannot find gunzip; $!\n"; > while(1) { > exit 1 unless defined($pos = read STDIN, $buffer, 8192) and $pos > 0; > next unless ($pos = index $buffer, $gzmagic) >= 0; > print OUT substr $buffer, $pos; > last; > } > print OUT ; > exit 0; > > Yes, I know that I should not mix 'read' with stdio but it worked > every time I tried the above. :-) The good news is that "read" does use stdio (along with seek and print). The syscall ones are sys{read,write,seek}. The less good news is that your "print OUT " sucks up all the RPM file into memory before dumping it out again which is inelegant and leads those who copy the idiom without understanding it to run into problems when they use similar code on large files. One way of doing it a bit differently is #!/usr/bin/perl die "Usage: rpm2cpio foo.rpm | cpio ...\n" unless @ARGV == 1; open(RPM, $ARGV[0]) or die "$ARGV[0]: $!\n"; open(STDOUT, "| gunzip") or die "cannot find gunzip: $!\n"; while (read(RPM, $_, 8192)) { if (!$found_gzmagic) { s/^.*?(?=\037\213)//s or next; $found_gzmagic = 1; } print; } > Can we go back now to kernel issues? Oops, yes, we now return you to your regularly scheduled kgcc wars. --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: want tool to open RPM package on Window 95
Michal Jaegermann writes: Somewhere floating around there is a perl version of rpm2cpio. This is what I wrote one day a long time ago: #!/usr/bin/perl -w use strict; my ($buffer, $pos, $gzmagic); $gzmagic = "\037\213"; open OUT, "| gunzip" or die "cannot find gunzip; $!\n"; while(1) { exit 1 unless defined($pos = read STDIN, $buffer, 8192) and $pos 0; next unless ($pos = index $buffer, $gzmagic) = 0; print OUT substr $buffer, $pos; last; } print OUT STDIN; exit 0; Yes, I know that I should not mix 'read' with stdio but it worked every time I tried the above. :-) The good news is that "read" does use stdio (along with seek and print). The syscall ones are sys{read,write,seek}. The less good news is that your "print OUT STDIN" sucks up all the RPM file into memory before dumping it out again which is inelegant and leads those who copy the idiom without understanding it to run into problems when they use similar code on large files. One way of doing it a bit differently is #!/usr/bin/perl die "Usage: rpm2cpio foo.rpm | cpio ...\n" unless @ARGV == 1; open(RPM, $ARGV[0]) or die "$ARGV[0]: $!\n"; open(STDOUT, "| gunzip") or die "cannot find gunzip: $!\n"; while (read(RPM, $_, 8192)) { if (!$found_gzmagic) { s/^.*?(?=\037\213)//s or next; $found_gzmagic = 1; } print; } Can we go back now to kernel issues? Oops, yes, we now return you to your regularly scheduled kgcc wars. --Malcolm -- Malcolm Beattie [EMAIL PROTECTED] Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: large memory support for x86
Timur Tabi writes: ** Reply to message from Jeff Epler [EMAIL PROTECTED] on Thu, 12 Oct 2000 13:08:19 -0500 What the support for 4G of memory on x86 is about, is the "PAE", Page Address Extension, supported on P6 generation of machines, as well as on Athlons (I think). With these, the kernel can use 4G of memory, but it still can't present a 32bit address space to user processes. But you could have 8G physical RAM and run 4 ~2G or 2 ~4G processes simultaneously in core. How about the kernel itself? How do I access the memory above 4GB inside a device driver? It depends on what you have already. If you're given a (kernel) virtual address, just dereference it. The unit of currency for physical pages is the "struct page". If you want to allocate a physical page for your own use (from anywhere in physical memory) them you do struct page *page = alloc_page(GFP_FOO); If you want to read/write to that page directly from kernel space then you need to map it into kernel space: char *va = kmap(page); /* read/write from the page starting at virtual address va */ kunmap(va); The implementations of kmap and kunmap are such that mappings are cached (within reason) so it's "reasonable" fast doing kmap/kunmap. If you want to do something else with the page (like get some I/O done to/from it) then the new (and forthcoming) kiobuf functions take struct page units and handle all the internal mapping gubbins without you having to worry about it. --Malcolm -- Malcolm Beattie [EMAIL PROTECTED] Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Availability of kdb
Marty Fouts writes: > Here's another piece of free advice, worth less than you paid for it: in 25 > years, only the computer history trivia geeks are going to remember you, > just as only a very small handful of us now remember who wrote OS/360. You mean like Fred Brooks who managed the development of OS/360, had some innovative ideas about how large software projects should be run, whose ideas clashed with contemporary ones, who became a celebrity? You don't spot any parallels there? He whose book "Mythical Man Month" with "No Silver Bullet" and "The Second System Effect" are quoted around the industry decades later? And you think that's only a small handful of people? --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Availability of kdb
Marty Fouts writes: Here's another piece of free advice, worth less than you paid for it: in 25 years, only the computer history trivia geeks are going to remember you, just as only a very small handful of us now remember who wrote OS/360. You mean like Fred Brooks who managed the development of OS/360, had some innovative ideas about how large software projects should be run, whose ideas clashed with contemporary ones, who became a celebrity? You don't spot any parallels there? He whose book "Mythical Man Month" with "No Silver Bullet" and "The Second System Effect" are quoted around the industry decades later? And you think that's only a small handful of people? --Malcolm -- Malcolm Beattie [EMAIL PROTECTED] Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Adding set_system_gate fails in arch/i386/kernel/traps.
Petr Vandrovec writes: > On 12 Sep 00 at 21:25, Keith Owens wrote: > > >0x85) vanish after the system has booted further. printk shows that > > >idt_table is correctly updated immediately after the set_system_gate > > >but once the system has booted the entries for my new traps have > > >reverted. (printk telemetry available on request). However, once the > > >system has booted, a little module which simply updates > > >idt_table[MY_NEW_VECTOR] directly works fine and "sticks". Help? > > >(Or, more accurately "Aaarrrgh?"). > > > > I can confirm that this sometimes occurs in 2.4.0-testx, AFAIK I have > > only seen the problem in SMP kernels. > > What about arch/i386/kernel/io_apic.c:assign_irq_vector() ? I'm not using SMP but you both put me on the right track. init/main.c:start_kernel does: setup_arch(_line); trap_init(); init_IRQ(); trap_init does the set_system_gate(FOO_VECTOR, ) lines I extended but then init_IRQ() does for (i = 0; i < NR_IRQS; i++) { int vector = FIRST_EXTERNAL_VECTOR + i; if (vector != SYSCALL_VECTOR) set_intr_gate(vector, interrupt[i]); } and promptly zaps everything except SYSCALL_VECTOR. (FIRST_EXTERNAL_VECTOR is 0x20). Many thanks. --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Adding set_system_gate fails in arch/i386/kernel/traps.
Petr Vandrovec writes: On 12 Sep 00 at 21:25, Keith Owens wrote: 0x85) vanish after the system has booted further. printk shows that idt_table is correctly updated immediately after the set_system_gate but once the system has booted the entries for my new traps have reverted. (printk telemetry available on request). However, once the system has booted, a little module which simply updates idt_table[MY_NEW_VECTOR] directly works fine and "sticks". Help? (Or, more accurately "Aaarrrgh?"). I can confirm that this sometimes occurs in 2.4.0-testx, AFAIK I have only seen the problem in SMP kernels. What about arch/i386/kernel/io_apic.c:assign_irq_vector() ? I'm not using SMP but you both put me on the right track. init/main.c:start_kernel does: setup_arch(command_line); trap_init(); init_IRQ(); trap_init does the set_system_gate(FOO_VECTOR, handler) lines I extended but then init_IRQ() does for (i = 0; i NR_IRQS; i++) { int vector = FIRST_EXTERNAL_VECTOR + i; if (vector != SYSCALL_VECTOR) set_intr_gate(vector, interrupt[i]); } and promptly zaps everything except SYSCALL_VECTOR. (FIRST_EXTERNAL_VECTOR is 0x20). Many thanks. --Malcolm -- Malcolm Beattie [EMAIL PROTECTED] Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Flavours of deceased bovine
Keith Owens writes: > Just had an ext2 filesystem on SCSI that was corrupt. The first two > words of the group descriptor had been overwritten with 0xdeadbeef, > 0x. The filesystem is fixed now but trying to track down the > problem is difficult, there are 50+ places in the kernel that use > 0xdeadbeef. > > I strongly suggest that people use different variants of dead beef to > make it easier to work out where any corruption is coming from. > Perhaps change the last 2-3 digits so magic values would be 0xdeadb000 > to 0xdeadbfff, assuming it does not affect any other code. Nah, choose new words which stand out. There are plenty of them and it avoids the problem of a 0xdeadb001 being decremented before being noticed and thus confused with a 0xdeadb000. Be inventive: egrep -x '[abcdefilos]{3,8}' /usr/dict/words and make one up whenever needed. For example, 0baff1ed acce55ed decea5ed d15ab1ed d15ea5ed along with multiword ones like fee1dead dead1055 badca5e5 --Malcolm -- Malcolm Beattie <[EMAIL PROTECTED]> Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Flavours of deceased bovine
Keith Owens writes: Just had an ext2 filesystem on SCSI that was corrupt. The first two words of the group descriptor had been overwritten with 0xdeadbeef, 0x. The filesystem is fixed now but trying to track down the problem is difficult, there are 50+ places in the kernel that use 0xdeadbeef. I strongly suggest that people use different variants of dead beef to make it easier to work out where any corruption is coming from. Perhaps change the last 2-3 digits so magic values would be 0xdeadb000 to 0xdeadbfff, assuming it does not affect any other code. Nah, choose new words which stand out. There are plenty of them and it avoids the problem of a 0xdeadb001 being decremented before being noticed and thus confused with a 0xdeadb000. Be inventive: egrep -x '[abcdefilos]{3,8}' /usr/dict/words and make one up whenever needed. For example, 0baff1ed acce55ed decea5ed d15ab1ed d15ea5ed along with multiword ones like fee1dead dead1055 badca5e5 --Malcolm -- Malcolm Beattie [EMAIL PROTECTED] Unix Systems Programmer Oxford University Computing Services - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/