Re: [OT]: DRI doesn't work on 2.4.0 but does on prerelease-ac5
On Mon, 8 Jan 2001, J Sloan wrote: > This is a little OT for linux-kernel, but I'll take a swing at it > since I'm running 2.4 and Xfree 4 with a voodoo 3. > > After upgrading to Red Hat 7.0, I noticed 3D screensavers > and Quake 3 Arena were dog slow - in the end, I basically > had to make sure the mesa libs didn't get found before the > real opengl libs. > > In my case, that meant nuking mesa from my system and > letting Linux use what was left, which got me back the good > accelerated performance - you may choose a less drastic > option. I don't see any breakage from the absence of mesa. Sounds like the version you blew away was not the one built in 4.0.2. (Mesa is built along with XFree86 now, not as an add-on.) I will test with my current configuration and see if I can duplicate the slow down. I am currently using a Matrox G400 max card with 4.0.2cvs. I get about 1285 frames per second on the gears demo currently. We will see if that changes with the 2.4.0 final release version. [EMAIL PROTECTED] | Note to AOL users: for a quick shortcut to reply Alan Olsen| to my mail, just hit the ctrl, alt and del keys. "In the future, everything will have its 15 minutes of blame." - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.2 vs. 2.4 benchmarks
Chris Evans wrote: > > Hi, > > I ran some 2.2 vs. 2.4 benchmarks, particularly in the area of file i/o, > using bonnie++. > > The machine is a SMP 128Mb PII-350 with a udma2 drive capable of some > 20Mb/sec+. Kernels involved are 2.4.0, and the default RH7.0 kernel > (2.2.16 plus more patches than you can shake a stick at). > > Not going too much into the gory details, here are the differences exposed > between 2,2 and 2.4: > > 1) Amazing 2.4 increase in streaming write performance; 13Mb/sec -> > 20Mb/sec. I suspect this is the result of the "last minute" 2.4.0 dirty > buffer/sync waiting handling changes. > > 2) Slight 2.4 increase in streaming read performance; 16Mb/sec -> > 17Mb/sec. This leaves 2.4.0 writing faster than reading, I find that > surprising. > I am not surprised. Reading _have_ to read the stuff before presenting a result. So you are completely bound by IO waiting, unless the stuff is cached. But test-programs tend to empty the cache first. Writes can be buffered partially even if the testfile is much larger than memory. The extra 3Mb/s might be going into RAM. Filling 128M with 3M/s takes about 43s. 20M/s in 43s is about 850M. Did you use a testfile in the 500MB-1000MB range? > 3) Some 10% drop in rewrite performance from 2.2 -> 2.4 (possibly because > page aging, like LRU, isn't too hot for the 2nd+ linear scan over data) > > 4) File creation 30% faster in 2.4; random deletes 30% faster; sequential > deletes 10% slower. > > I did one other quick test, with disappointing results for 2.4.0. I did a > kernel build with 32Mb. > > 2.4.0 was taking about 10 mins to do the build. 2.2.x was 1min30 quicker > :( I was hoping/expecting the 2.4.0 page aging to do better, due to > keeping the more useful pages in RAM better. I have no explanation. You built exactly the same kernel in each case? (Version and options) With the same amount of other software (X, daemons,...) running? Using the same source tree? (Different disk locations may have large speed differences) The circumstances where the same? (Doing a make [dist]clean in order to get rid of files from the previous build will cache the directory contents and be unfair if it happened in only one of the cases.) Helge Hafting - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Failure building 2.4 while running 2.4. Success in building 2.4 while running 2.2.
I have RedHat7, glibc-2.2-9, gcc-2.96-69. I can build 2.4.0 while running kernel 2.2.16. If I try to rebuild 2.4.0 while running the new kernel, I get random compiler errors. It happens on two machines. One of them runs 2.4.0-test12, the other 2.4.0. Both of them with the updates above mentioned. I know this is a RedHat issue, but it may be useful to know for some. -- Systems and Network Administrator - Delta Romania Phone +4093-267961 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.4.0-vmbigpatch compile problem
Hi! PF_RSSTRIM is not declared anywhere either in the linux-2.4.0 sources or in the 2.4.0-vmbigpatch. Regards, Zoltan Boszormenyi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] hisax/sportster dependency error
hi. Alan Cox <[EMAIL PROTECTED]> writes: > > > according to sportster.c:get_io_range, this appears to be perfectly > > > intentional, request_regioning 64x8 byte from 0x268 in 1024byte-steps. > > > > AFAIK, this is because the hardware is stupid and does decode the higher > > address lines. Therefore, the IO ports are mirrored every 1024 bytes and > > should be reserved to avoid potential conflicts with other devices. > > Almost every 10bit decode ISA card is like that. You don't need to do the > work. The PCI alloc rules already cover it. so, if i understand this correctly, since all offsets actually in use are 1024B multiples the following would be sufficient, or more elegant..? or should #define SPORTSTER_ISAC 0xC000 #define SPORTSTER_HSCXA0x #define SPORTSTER_HSCXB0x4000 #define SPORTSTER_RES_IRQ 0x8000 still get requested explicitly in such cases? --- linux-2.4/drivers/isdn/hisax/sportster.c.orig Tue Jan 9 09:31:36 2001 +++ linux-2.4/drivers/isdn/hisax/sportster.cTue Jan 9 09:54:18 2001 @@ -133,13 +133,10 @@ void release_io_sportster(struct IsdnCardState *cs) { - int i, adr; byteout(cs->hw.spt.cfg_reg + SPORTSTER_RES_IRQ, 0); - for (i=0; i<64; i++) { - adr = cs->hw.spt.cfg_reg + i *1024; - release_region(adr, 8); - } + + release_region(cs->hw.spt.cfg_reg, 8); } void @@ -185,27 +182,18 @@ static int __init get_io_range(struct IsdnCardState *cs) { - int i, j, adr; + int adr = cs->hw.spt.cfg_reg; + + if ( check_region(adr, 8) ) { + printk(KERN_WARNING + "HiSax: %s config port %x-%x already in use\n", + CardType[cs->typ], adr, adr + 8); + return 0; + } - for (i=0;i<64;i++) { - adr = cs->hw.spt.cfg_reg + i *1024; - if (check_region(adr, 8)) { - printk(KERN_WARNING - "HiSax: %s config port %x-%x already in use\n", - CardType[cs->typ], adr, adr + 8); - break; - } else - request_region(adr, 8, "sportster"); - } - if (i==64) - return(1); - else { - for (j=0; jhw.spt.cfg_reg + j *1024; - release_region(adr, 8); - } - return(0); - } + request_region(adr, 8, "sportster"); + + return 1; } int __init best regards, dns -- ___ mailto:[EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Journaling: Surviving or allowing unclean shutdown?
On Mon, Jan 08, 2001 at 12:02:49PM +, Stephen C. Tweedie wrote: > Right. There are two distinct meanings: > > 1) Do not write to this medium, ever (physical readonly); and > > 2) Do not allow modifications to the filesystem (logical readonly). > > The fact is that the kernel confuses the two, but that just isn't >[snip] > We just don't have a way of specifying these two things independently. Is this call for a new mount option?, or should we just clutter /dev even further with devices with ro permissions as the marker. TTFN -- Roger Think of the mess on the carpet. Sensible people do all their demon-summoning in the garage, which you can just hose down afterwards. -- [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: kernel network problem ?
Nicolas Noble wrote: [...] As others have told already, this is the ECN problem. > I noticed the same bug. This is very weired, I can send a list of sites > which I can't connect anymore. You have a list? Send all of them a message stating that they ought to upgrade their firewalls which cause this problem. Or they will loose customers/visitors. Cisco already have an upgrade for them, so fixing is dead easy, and they can then boast compatibility with the latest internet standards. If they don't care about linux users, tell them that windows eventually will use ECN too. They definitely don't want to have a ECN problem when that happens. Helge Hafting - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] advansys.c: include missing restore_flags, etc
> > save_flags(flags); > > cli(); > > @@ -9965,7 +9972,7 @@ > > } > > Err, according tho wise ppl on this list, this does not work on > MIPSes. The flags thing must stay in the same stackframe. Certainly doesnt work on sparc32, but then it didnt before. Inline it might - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] advansys.c: include missing restore_flags, etc
Arnaldo Carvalho de Melo writes: > Please consider applying, comments in the patch. Can't the following be fixed properly? > -STATIC int > +STATIC unsigned long > DvcEnterCritical(void) > { > -intflags; > +unsigned long flags; > > save_flags(flags); > cli(); Guess what happens here? return flags; > @@ -9965,7 +9972,7 @@ > } > > STATIC void > -DvcLeaveCritical(int flags) > +DvcLeaveCritical(unsigned long flags) > { > restore_flags(flags); > } The above doesn't work on some architectures. Its better to use a macro if you want to separate this out. ie, something like (davem will have to okay it tho): #define DvcEnterCritical() \ ({ unsigned long __flags; save_flags(__flags); cli(); __flags; }) #define DvcLeaveCritical(flags) \ do { restore_flags(flags); } while (0) This should then ensure that you don't end up with problems associated with register windows on the sparc or whatever. Even better would be to use a spinlock instead of Dvc?Critical. _ |_| - ---+---+- | | Russell King[EMAIL PROTECTED] --- --- | | | | http://www.arm.linux.org.uk/personal/aboutme.html / / | | +-+-+ --- -+- / | THE developer of ARM Linux |+| /|\ / | | | --- | +-+-+ - /\\\ | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] hisax/sportster dependency error
> > Almost every 10bit decode ISA card is like that. You don't need to do the > > work. The PCI alloc rules already cover it. > > so, if i understand this correctly, since all offsets actually in use > are 1024B multiples the following would be sufficient, or more elegant..? PCI allocation rules handle all of this. PCI I/O is not allocated in the ranges 0x[1-F][0-3]xx Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[testcase] madvise->semaphore deadlock 2.4.0
Greetings, While trying to configure ftpsearch, the process hangs while running it's madvise confidence test below. It appears to be taking a fault in madvise_fixup_middle():atomic_add(2, &vma->vm_file->f_count) and immediately deadlocking forever on mm->mmap_sem per IKD. (Virgin 2.4.0 agrees) Accesses to /proc afterward (ie ps) leaves hangers. kdb> bp sys_madvise Instruction(i) BP #0 at 0xc0129aa4 (sys_madvise) is enabled globally adjust 1 kdb> go Instruction(i) breakpoint #0 at 0xc0129aa4 (adjusted) 0xc0129aa4 sys_madvise: 0xc0129aa4 sys_madviseint3 Entering kdb (current=0xc4232000, pid 260) due to Breakpoint @ 0xc0129aa4 kdb> bp __down_failed Instruction(i) BP #1 at 0xc0107c84 (__down_failed) is enabled globally adjust 1 kdb> go Instruction(i) breakpoint #1 at 0xc0107c84 (adjusted) 0xc0107c84 __down_failed: 0xc0107c84 __down_failedint3 Entering kdb (current=0xc4232000, pid 260) due to Breakpoint @ 0xc0107c84 kdb> sr z SysRq: Suspending trace kdb> rd eax = 0x ebx = 0xc4232000 ecx = 0xc7f6963c edx = 0x0010 esi = 0xc7f69620 edi = 0xc4232000 esp = 0xc4233e58 eip = 0xc0107c84 ebp = 0xc4233efc xss = 0x0018 xcs = 0x0010 eflags = 0x0296 xds = 0x0018 xes = 0x0018 origeax = 0x ®s = 0xc4233e24 kdb> bt EBP EIP Function(args) 0xc4233efc 0xc0107c84 __down_failed (0xc4232000, 0x2, 0xc0114c00, 0x0, 0x3) kernel .text 0xc010 0xc0107c84 0xc0107c9c 0xc0226571 stext_lock+0x12d kernel .text.lock 0xc0226444 0xc0226444 0xc0227840 0xc0114c77 do_page_fault+0x77 (0xc4233f0c, 0x2, 0xc370bd20, 0x4017, 0x2) kernel .text 0xc010 0xc0114c00 0xc0115060 0xc0109284 error_code+0x34 kernel .text 0xc010 0xc0109250 0xc010928c Interrupt registers: eax = 0x ebx = 0xc370bd20 ecx = 0x4017 edx = 0x0002 esi = 0xc370bce0 edi = 0xc370bca0 esp = 0xc4233f40 eip = 0xc012964a ebp = 0xc4233f50 xss = 0x0018 xcs = 0x0010 eflags = 0x00010202 xds = 0x0018 xes = 0x0018 origeax = 0x ®s = 0xc4233f0c 0xc012964a madvise_fixup_middle+0xb6 (0xc370bca0, 0x4016, 0x4017, 0x0) kernel .text 0xc010 0xc0129594 0xc01296fc 0xc4233f74 0xc0129789 madvise_behavior+0x8d (0xc370bca0, 0x4016, 0x4017, 0x0) kernel .text 0xc010 0xc01296fc 0xc0129798 0xc4233f90 0xc0129a7d madvise_vma+0x35 (0xc370bca0, 0x4016, 0x4017, 0x0) kernel .text 0xc010 0xc0129a48 0xc0129aa4 0xc4233fbc 0xc0129b48 sys_madvise+0xa4 (0x4016, 0x1, 0x0, 0x4000e6d0, 0xb86c) kernel .text 0xc010 0xc0129aa4 0xc0129b94 more> 0xc0109154 system_call+0x3c kernel .text 0xc010 0xc0109118 0xc0109158 kdb> go pid 260 starving for fork.c205 pid 260 starving for fork.c205 pid 260 starving for fork.c205 ksymoops 2.3.5 on i686 2.4.0. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.0/ (default) -m /lib/modules/2.4.0/System.map (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] Warning (Oops_read): Code line not seen, dumping what data is available Trace; c010791d <__down+55/9c> Trace; c0107a68 <__down_failed+8/c> Trace; c020989d Trace; c0112ef4 Trace; c012b805 <__alloc_pages+dd/2d0> Trace; c012206c Trace; c01220fa Trace; c0122260 Trace; c0113037 Trace; c0108e80 Trace; c0125965 Trace; c0125a9f Trace; c0125d60 Trace; c0125e1a Trace; c0108d63 2 warnings issued. Results may not be reliable. #include #include int main(int argc,char **argv) { char *dummy; char *base; dummy = malloc(2 * 64 * 1024 ); base = (char *) (( ((unsigned long) dummy) + 64 * 1024UL - 1 ) & - (64 * 1024UL)); if (madvise(base,64*1024,MADV_NORMAL)) exit(1); if (madvise(base,64*1024,MADV_RANDOM)) exit(1); if (madvise(base,64*1024,MADV_SEQUENTIAL)) exit(1); if (madvise(base,64*1024,MADV_WILLNEED)) exit(1); if (madvise(base,64*1024,MADV_DONTNEED)) exit(1); exit(0); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: FS callback routines
"Sean R. Bright" <[EMAIL PROTECTED]> writes: > Ok, before I begin, don't shoot me down, but I had an idea for a kernel > modification and was wondering how feasible the group thought it was. > > I was writing a user space application to monitor a folder's contents. The > folder itself contained 100 folders, and each of those contained 24 folders. > While writing the code to traverse the directory structure I realized that > instead of my software figuring out when things change, why not just have > the fs tell my application when something was updated. For example, say we > had a function called watch_fs(), that took an inode reference and a > function pointer and maybe a bitmask of events to watch for. When that > inode (or its children) were changed, why couldn't the fs code call the > callback function I specified? > > I have no idea how expensive this would be or if its even worth it at this > point. It also wouldn't be portable at all considering that I know of no > other OS that does this (could be wrong). > > Like I said, I am not asking that this be (necessarily) implemented, I am > just curious as to what the percieved performance ramifications would be if > it were to implemented, say, by a virgin kernel developer ;) you want to have a look at http://oss.sgi.com/projects/fam/ resp. imon, the corresponding kernel modules. this has been around for quite some time now. enlightenment has been/still is? using it since it's earliest incarnations of its file manager extension efm. (same with kde? not sure..) i'm wondering whether this could get into the mainstream kernels soon? i'm not really deep in the filesystem layers, but this sounds to me like an extremely useful feature. could anyone comment on section 2 of http://oss.sgi.com/projects/fam/imon.txt ? would this actually be the way to do it or is there any better method? regards, dns -- ___ mailto:[EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Unified power management userspace policy
John Fremlin wrote: > > Hi all! > > At the moment there are two power management drivers in the linux > kernel (AFAIK). They each have different userspace interfaces -- > /proc/apm and /dev/apmctl and /proc/sys/acpi/events or something. This > is not altogether bad, but as they do the same thing, it might be nice > to unify (part) of the interface. In fact this is already done for the > in kernel interface with pm_send_all. > John, Could you please use call_usermodehelper() in this patch rather than exec_usermodehelper()? I want to kill exec_usermodehelper() sometime. Plus your code will be simpler - no need to create your own kernel thread. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: `rmdir .` doesn't work in 2.4
On Tue, 9 Jan 2001, Stefan Traby wrote: > "rmdir `pwd`" is required to fail (at least under csh, bash, ksh) if the > path component contains a white space and thereof it can't be a valid > replacement for Andreas "rmdir ." which was what Al initially suggested. > > Yes, I'm very pickey about that; but hey, I don't want to force anyone > to write GNU/Linux like rms; just valid shell code. :) Of course you should do rmdir "`pwd`" But this is a userspace issue. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1
Date: Tue, 9 Jan 2001 11:31:45 +0100 From: Christoph Hellwig <[EMAIL PROTECTED]> Yuck. A new file_opo just to get a few benchmarks right ... I hope the writepages stuff will not be merged in Linus tree (but I wish the code behind it!) It's a "I know how to send a page somewhere via this filedescriptor all by myself" operation. I don't see why people need to take painkillers over this for 2.4.x. I think f_op->write is stupid, such a special case file operation just to get a few benchmarks right. This is the kind of argument I am hearing. Orthogonal to f_op->write being for specifying a low-level implementation of sys_write, f_op->writepage is for specifying a low-level implementation of sys_sendfile. Can you grok that? Linus has already seen this. Originally he had a gripe because in an older revision of the code used to allow multiple pages to be passed in an array to the writepage(s) operation. He didn't like that, so I made it take only one page as he requested. He had no other major objections to the infrastructure. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1
On Mon, 8 Jan 2001, Rik van Riel wrote: > I really think the zerocopy network stuff should be ported to kiobuf > proper. yep, we talked to Stephen Tweedie about this already, but it involves some changes in kiovec support and we didnt want to touch too much code for 2.4. In any case, the zerocopy code is 'kiovec in spirit' (uses vectors of struct page *, offset, size entities), so transition to a finalized kiovec framework (or whatever other mechanizm) is trivial. Right now kiovecs are *way* too bloated for the purposes of skb fragments. > The usefulness of the patch you posted is rather .. umm .. limited. > [...] i violently disagree :-) The upcoming TUX release is based on David's and Alexey's cleaned-up zerocopy framework. [thus TUX and zerocopy are separated.] David's patch adds a *very* scalable implementation of zerocopy sendfile() and zerocopy sendmsg(), the panacea of fileserver (webserver) scalability - it can be used by Apache, Samba and other fileservers. The new zerocopy networking code DMA-s straight out of the pagecache, natively supports hardware-checksumming and highmem (64-bit DMA on 32-bit systems) zerocopy as well and multi-fragment DMA - no limitations. We can saturate a gigabit link with TCP traffic, at about 20% CPU usage on a 500 MHz x86 UP system. David and Alexey's patch is cool - check it out! > Having proper kiobuf support would make it possible to, for example, > do zerocopy network->disk data transfers and lots of other things. i used to think that this is useful, but these days it isnt. It's a waste of PCI bandwidth resources, and it's much cheaper to keep a cache in RAM instead of doing direct disk=>network DMA *all the time* some resource is requested. > Furthermore, by using kiobuf for the network zerocopy stuff there's a > good chance the networking code will be integrated. David and Alexey are TCP/IP networking code maintainers. So if you see a 'test this' networking framework patch from them on l-k, it has quite high chances of being integrated into the networking code :-) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1
On Tue, Jan 09, 2001 at 11:23:41AM +0100, Ingo Molnar wrote: > > On Mon, 8 Jan 2001, Rik van Riel wrote: > > > I really think the zerocopy network stuff should be ported to kiobuf > > proper. > > yep, we talked to Stephen Tweedie about this already, but it involves some > changes in kiovec support and we didnt want to touch too much code for > 2.4. In any case, the zerocopy code is 'kiovec in spirit' (uses vectors of > struct page *, offset, size entities), Yep. That is why I was so worried aboit the writepages file op. It's rather hackish (only write, looks usefull only for networking) instead of the proposed rw_kiovec fop. > > > The usefulness of the patch you posted is rather .. umm .. limited. > > [...] > > i violently disagree :-) The upcoming TUX release is based on David's and > Alexey's cleaned-up zerocopy framework. [thus TUX and zerocopy are > separated.] David's patch adds a *very* scalable implementation of > zerocopy sendfile() and zerocopy sendmsg(), the panacea of fileserver > (webserver) scalability - it can be used by Apache, Samba and other > fileservers. The new zerocopy networking code DMA-s straight out of the > pagecache, natively supports hardware-checksumming and highmem (64-bit DMA > on 32-bit systems) zerocopy as well and multi-fragment DMA - no > limitations. We can saturate a gigabit link with TCP traffic, at about 20% > CPU usage on a 500 MHz x86 UP system. David and Alexey's patch is cool - > check it out! Yuck. A new file_opo just to get a few benchmarks right ... I hope the writepages stuff will not be merged in Linus tree (but I wish the code behind it!) Christoph -- Whip me. Beat me. Make me maintain AIX. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1
On Tue, 9 Jan 2001, Christoph Hellwig wrote: > > 2.4. In any case, the zerocopy code is 'kiovec in spirit' (uses > > vectors of struct page *, offset, size entities), > Yep. That is why I was so worried aboit the writepages file op. i believe you misunderstand. kiovecs (in their current form) are simply too bloated for networking purposes. Due to its nature and nonpersistency, networking is very lightweight and memory-footprint-sensitive code (as opposed to eg. block IO code), right now an 'struct skb_shared_info' [which is roughly equivalent to a kiovec] is 12+4*6 == 36 bytes, which includes support for 6 distinct fragments (each fragment can be on any page, any offset, any size). A *single* kiobuf (which is roughly equivalent to an skb fragment) is 52+16*4 == 116 bytes. 6 of these would be 696 bytes, for a single TCP packet (!!!). This is simply not something to be used for lightweight zero-copy networking. so it's easy to say 'use kiovecs', but right now it's simply not practical. kiobufs are a loaded concept, and i'm not sure whether it's desirable at all to mix networking zero-copy concepts with block-IO/filesystem zero-copy concepts. Just to make it even more clear: although i do believe it to be desirable from an architectural point of view, i'm not sure at all whether it's possible, based on the experience we gathered while implementing TCP-zerocopy. we talked (and are talking) to Stephen about this problem, but it's a clealy 2.5 kernel issue. Merging to a finalized zero-copy framework will be easy. (The overwhelming percentage of zero-copy code is in the networking code itself and is insensitive to any kiovec issues.) > It's rather hackish (only write, looks usefull only for networking) > instead of the proposed rw_kiovec fop. i'm not sure what you are trying to say. You mean we should remove sendfile() as well? It's only write, looks useful mostly for networking. A substantial percentage of kernel code is useful only for networking :-) > > zerocopy sendfile() and zerocopy sendmsg(), the panacea of fileserver > > (webserver) scalability - it can be used by Apache, Samba and other > > fileservers. The new zerocopy networking code DMA-s straight out of the > > The new zerocopy networking code DMA-s straight out of the > > pagecache, natively supports hardware-checksumming and highmem (64-bit > > DMA on 32-bit systems) zerocopy as well and multi-fragment DMA - no > > limitations. We can saturate a gigabit link with TCP traffic, at about > > 20% CPU usage on a 500 MHz x86 UP system. David and Alexey's patch is > > cool - check it out! > Yuck. A new file_opo just to get a few benchmarks right ... no. As David said, it's direct sendfile() support. It's completely isolated, it's 20 lines of code, it does not impact filesystems, it only shows up in sendfile(). So i truly dont understand your point. This interface has gone through several iterations and was actually further simplified. Ingo ps1. "first they say it's impossible, then they ridicule you, then they oppose you, finally they say it's self-evident". Looks like, after many many years, zero-copy networking for Linux is now finally in phase III. :-) ps2. i'm joking :-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: FS callback routines
On Mon, 8 Jan 2001, Sean R. Bright wrote: > I was writing a user space application to monitor a folder's contents. The > folder itself contained 100 folders, and each of those contained 24 folders. > While writing the code to traverse the directory structure I realized that > instead of my software figuring out when things change, why not just have > the fs tell my application when something was updated. For example, say we > had a function called watch_fs(), that took an inode reference and a > function pointer and maybe a bitmask of events to watch for. When that > inode (or its children) were changed, why couldn't the fs code call the > callback function I specified? RFTM: linux-2.4.0/Documentation/dnotify.txt BYtE Philipp -- / / (_)__ __ __ Philipp Hahn / /__/ / _ \/ // /\ \/ / //_/_//_/\_,_/ /_/\_\ [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
PROBLEM: loop device doesnt reset it's flags
[1.] The loop device doesnt seem to reset it's read-only status after it gets used on a file that is on a read-only filesystem. [6.] (Pretty chopped up, but it demonstrates the problem) # mount /dev/hda1 on / type reiserfs (rw) /dev/scd0 on /mnt/cdrom type iso9660 (ro,noexec,nosuid,nodev,sync,unhide) # list /mnt/cdrom/floppy.img -r--r--r-- 1 root root 1474560 Dec 16 15:40 /mnt/cdrom/floppy.img # cp /mnt/cdrom/floppy.img / # mount -o loop=/dev/loop0 /floppy.img /mnt/disk # mount /floppy.img on /mnt/disk type ext2 (rw,loop=/dev/loop0) # umount /mnt/disk # mount -o ro,loop=/dev/loop0 /floppy.img /mnt/disk # mount /floppy.img on /mnt/disk type ext2 (ro,loop=/dev/loop0) # umount /mnt/disk # mount -o rw,loop=/dev/loop0 /floppy.img /mnt/disk # mount /floppy.img on /mnt/disk type ext2 (rw,loop=/dev/loop0) # umount /mnt/disk (All that above is normal) # mount -o loop=/dev/loop0 /mnt/cdrom/floppy.img /mnt/disk # mount /mnt/cdrom/floppy.img on /mnt/disk type ext2 (ro,loop=/dev/loop0) # umount /mnt/disk (Now loop0 is screwed) # mount -o loop=/dev/loop0 /floppy.img /mnt/disk mount: floppy.img is write-protected, mounting read-only # mount /floppy.img on /mnt/disk type ext2 (ro,loop=/dev/loop0) # umount /mnt/disk # mount -o rw,loop=/dev/loop0 /floppy.img /mnt/disk mount: floppy.img is write-protected, mounting read-only # mount /floppy.img on /mnt/disk type ext2 (ro,loop=/dev/loop0) # umount /mnt/disk The same behavior as shown above is exhibited by: losetup /dev/loop1 /mnt/cdrom/floppy.img losetup -d /dev/loop1 now loop1 thinks it is always read-only. [7.1.] Linux dave 2.4.0 #1 i586 unknown Kernel modules 2.4.0 Gnu C 2.95.2 Gnu Make 3.79.1 Binutils 2.10.1 Linux C Library2.2 Dynamic linker ldd: version 1.9.9 Procps 2.0.7 Mount 2.10r Net-tools 1.57 Console-tools 0.2.3 Sh-utils 2.0 Modules Loaded [X.] This patch seems to fix the problem on my machine. --- linux/drivers/block/loop.c.orig Tue Jan 9 12:16:02 2001 +++ linux/drivers/block/loop.c Tue Jan 9 12:16:57 2001 @@ -412,13 +412,14 @@ error = -EINVAL; inode = file->f_dentry->d_inode; + lo->lo_flags = 0; + if (S_ISBLK(inode->i_mode)) { /* dentry will be wired, so... */ error = blkdev_get(inode->i_bdev, file->f_mode, file->f_flags, BDEV_FILE); lo->lo_device = inode->i_rdev; - lo->lo_flags = 0; /* Backed by a block device - don't need to hold onto a file structure */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Subtle MM bug (really 830MB barrier question)
> 08048000-08b5c000 r-xp 03:05 1130923 /tmp/newmagma/magma.exe.dyn > 08b5c000-08cc9000 rw-p 00b13000 03:05 1130923 /tmp/newmagma/magma.exe.dyn > 08cc9000-0bd0 rwxp 00:00 0 > Now, subsequent to each memory allocation, only the second number in the > third line changes. It becomes 23a78000, then 3b7f, and finally > 3b808000 (after the failed allocation). OK it's fairly obvious what's happening here. Your program is using its own allocator, which relies solely on brk() to obtain more memory. On x86 Linux, brk()-allocated memory (the heap) begins right above the executable and grows upward - the increasing number you noted above is the top of the heap, which grows with every brk(). Problem is, the heap can't keep growing forever - as you discovered, on x86 Linux the upper bound is just below 0x4000. That boundary is where shared libraries and other memory-mapped files start to appear. Note that there is still plenty (~2GB) of address space left, in the region between the shared libraries and the top of user address space (just under 0xBFFF). How do you use that space? You need an allocation scheme based on mmap'ing /dev/zero. As others pointed out, glibc's allocator does just that. Here's your short answer: ask the authors of your program to either 1) replace their custom allocator with regular malloc() or 2) enhance their custom allocator to use mmap. (or, buy some 64-bit hardware =)...) Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1
On Tue, Jan 09, 2001 at 02:31:13AM -0800, David S. Miller wrote: >Date: Tue, 9 Jan 2001 11:31:45 +0100 >From: Christoph Hellwig <[EMAIL PROTECTED]> > >Yuck. A new file_opo just to get a few benchmarks right ... I >hope the writepages stuff will not be merged in Linus tree (but I >wish the code behind it!) > > It's a "I know how to send a page somewhere via this filedescriptor > all by myself" operation. I don't see why people need to take > painkillers over this for 2.4.x. I think f_op->write is stupid, such > a special case file operation just to get a few benchmarks right. > This is the kind of argument I am hearing. > > Orthogonal to f_op->write being for specifying a low-level > implementation of sys_write, f_op->writepage is for specifying a > low-level implementation of sys_sendfile. Can you grok that? Sure. But sendfile is not one of the fundamental UNIX operations... If there was no alternative to this I would probably have not said anything, but with the rw_kiovec file op just before the door I don't see any reason to add this _very_ specific file operation. An alloc_kiovec before and an free_kiovec after the actual call and the memory overhaed of a kiobuf won't hurt so much that it stands against a clean interface, IMHO. > > Linus has already seen this. Originally he had a gripe because in an > older revision of the code used to allow multiple pages to be passed > in an array to the writepage(s) operation. He didn't like that, so I > made it take only one page as he requested. He had no other major > objections to the infrastructure. You get that multiple page call with kiobufs for free... Christoph -- Whip me. Beat me. Make me maintain AIX. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] hashed device lookup (New Benchmarks)
On Mon, Jan 08, 2001 at 04:23:41PM +0100, Ben Greear wrote: > I don't argue that ifconfig shouldn't be fixed, but the hash speeds up It's already fixed since months. There was one stupid algorithm, which I was to blame for when I changed ifconfig to use a device list two years ago. At that time I didn't think that anybody would be ever crazy enough to set up 4000 interfaces and just chosed the simplest list management. I fixed it when you first complained a few months ago and now the list insertion works that the list does not need to be walked fully in the usual case. It could be optimized more in user space, but it's probably not worth it. > ip by about 2X too. Is that not useful enough? ip seems to be implemented > pretty efficient, so if the hash helps it significantly then maybe it > can help other efficient programs too. Notice that it is the system > (ie kernel) time that stays remarkably flat with the hash + ip graph. Just does your benchmark represent anything that real users do frequently ? If you really want to optimize I'm sure there are lots of areas in the kernel where your efforts are better spent ;) [just run with a the kernel profiler on for a few days on your box and look at all the real hot spots] BTW, if you just want to optimize ip link ls speed it would be probably enough to keep a one behind cache that just caches the next member after the last search. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1
Date: Tue, 9 Jan 2001 12:28:10 +0100 From: Christoph Hellwig <[EMAIL PROTECTED]> Sure. But sendfile is not one of the fundamental UNIX operations... It's a fundamental Linux interface and VFS-->networking interface. An alloc_kiovec before and an free_kiovec after the actual call and the memory overhaed of a kiobuf won't hurt so much that it stands against a clean interface, IMHO. This whole exercise is pointless unless it performs well. The overhead _DOES_ matter, we've tested and profiled all of this with full specweb99 runs, zerocopy ftp server loads, etc. Removing one word of information from anything involved in these code paths makes enormous differences. Have you run such tests with your suggested kiobuf scheme? Know what I really hate? People who are talking, "almost done", and "designing" the "real solution" to a problem and have no code to show for it. Ie. a total working implementation. Often they have not one line of code to show. Then the folks who actually get off their lazy asses and make something real, which works, and in fact exceeded most of our personal performance expectations, are the ones who are getting told that what they did was crap. What was the first thing out of people's mouths? Not "nice work", but "I think writepage is ugly and an eyesore, I hope nobody seriously considers this code for inclusion." Keep designing... like Linus says, "show me the code". Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Problem compiling linux-2.4.0 for Athlon/K7
Hello, if Athlon/K7 is selected as processor type i get the following error messages when compiling make -C kernel CFLAGS="-D__KERNEL__ -I/usr/src/linux-2.4.x/linux-2.4.0/include -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -fno-strict-aliasing -pipe -mpreferred-stack-boundary=2 -march=i686 -malign-functions=4 -DMODULE -DMODVERSIONS -include /usr/src/linux-2.4.x/linux-2.4.0/include/linux/modversions.h" MAKING_MODULES=1 modules make[1]: Entering directory `/usr/src/linux-2.4.x/linux-2.4.0/kernel' make[1]: Nothing to be done for `modules'. make[1]: Leaving directory `/usr/src/linux-2.4.x/linux-2.4.0/kernel' make -C drivers CFLAGS="-D__KERNEL__ -I/usr/src/linux-2.4.x/linux-2.4.0/include -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -fno-strict-aliasing -pipe -mpreferred-stack-boundary=2 -march=i686 -malign-functions=4 -DMODULE -DMODVERSIONS -include /usr/src/linux-2.4.x/linux-2.4.0/include/linux/modversions.h" MAKING_MODULES=1 modules make[1]: Entering directory `/usr/src/linux-2.4.x/linux-2.4.0/drivers' make -C block modules make[2]: Entering directory `/usr/src/linux-2.4.x/linux-2.4.0/drivers/block' gcc -D__KERNEL__ -I/usr/src/linux-2.4.x/linux-2.4.0/include -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -fno-strict-aliasing -pipe -mpreferred-stack-boundary=2 -march=i686 -malign-functions=4 -DMODULE -DMODVERSIONS -include /usr/src/linux-2.4.x/linux-2.4.0/include/linux/modversions.h -DEXPORT_SYMTAB -c loop.c In file included from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/irq.h:57, from /usr/src/linux-2.4.x/linux-2.4.0/include/asm/hardirq.h:6, from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/interrupt.h:45, from /usr/src/linux-2.4.x/linux-2.4.0/include/asm/string.h:296, from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/string.h:21, from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/fs.h:23, from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/capability.h:17, from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/binfmts.h:5, from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/sched.h:9, from loop.c:53: /usr/src/linux-2.4.x/linux-2.4.0/include/asm/hw_irq.h: In function `x86_do_profile': /usr/src/linux-2.4.x/linux-2.4.0/include/asm/hw_irq.h:198: `current' undeclared (first use in this function) /usr/src/linux-2.4.x/linux-2.4.0/include/asm/hw_irq.h:198: (Each undeclared identifier is reported only once /usr/src/linux-2.4.x/linux-2.4.0/include/asm/hw_irq.h:198: for each function it appears in.) In file included from /usr/src/linux-2.4.x/linux-2.4.0/include/asm/string.h:296, from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/string.h:21, from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/fs.h:23, from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/capability.h:17, from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/binfmts.h:5, from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/sched.h:9, from loop.c:53: /usr/src/linux-2.4.x/linux-2.4.0/include/linux/interrupt.h: In function `raise_softirq': /usr/src/linux-2.4.x/linux-2.4.0/include/linux/interrupt.h:89: `current' undeclared (first use in this function) /usr/src/linux-2.4.x/linux-2.4.0/include/linux/interrupt.h: In function `tasklet_schedule': /usr/src/linux-2.4.x/linux-2.4.0/include/linux/interrupt.h:160: `current' undeclared (first use in this function) /usr/src/linux-2.4.x/linux-2.4.0/include/linux/interrupt.h: In function `tasklet_hi_schedule': /usr/src/linux-2.4.x/linux-2.4.0/include/linux/interrupt.h:174: `current' undeclared (first use in this function) In file included from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/string.h:21, from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/fs.h:23, from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/capability.h:17, from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/binfmts.h:5, from /usr/src/linux-2.4.x/linux-2.4.0/include/linux/sched.h:9, from loop.c:53: /usr/src/linux-2.4.x/linux-2.4.0/include/asm/string.h: In function `__constant_memcpy3d': /usr/src/linux-2.4.x/linux-2.4.0/include/asm/string.h:305: `current' undeclared (first use in this function) /usr/src/linux-2.4.x/linux-2.4.0/include/asm/string.h: In function `__memcpy3d': /usr/src/linux-2.4.x/linux-2.4.0/include/asm/string.h:312: `current' undeclared (first use in this function) {standard input}: Assembler messages: {standard input}:8: Warning: Ignoring changed section attributes for .modinfo make[2]: *** [loop.o] Error 1 make[2]: Leaving directory `/usr/src/linux-2.4.x/linux-2.4.0/drivers/block' make[1]: *** [_modsubdir_block] Error 2 make[1]: Leaving directory `/usr/src/linux-2.4.x/linux-2.4.0/drivers' make: *** [_mod_drivers] Error 2 - To unsubscribe fr
Re: Benchmarking 2.2 and 2.4 using hdparm and dbench 1.1
> Where is the size defined, and is it easy to modify? Look in fs/buffer.c:buffer_init() > I noticed that /proc/sys/vm/freepages is not writable any more. Is there > any reason for this? I am not sure why. > Hmm... I'm still using samba 2.0.7. I'll try 2.2 to see if it > helps. What are tdb spinlocks? samba 2.2 uses tdb which is an SMP safe gdbm like database. By default it uses byte range fcntl locks to provide locking, but has the option of using spinlocks (./configure --with-spinlocks). I doubt it would make a difference on your setup. > Have you actually compared the same setup with 2.2 and 2.4 kernels and a > single client transferring a large file, preferably from a slow server > with little memory? Most samba servers that people benchmark are fast > computers with lots of memory. So far, every major kernel upgrade has > given me a performance boost, even for slow computers, and I would hate to > see that trend break for 2.4... I havent done any testing on slow hardware and the high end stuff is definitely performing better in 2.4, but I agree we shouldn't forget about the slower stuff. Narrowing down where the problem is would help. My guess is it is a TCP problem, can you check if it is performing worse in your case? (eg ftp something against 2.2 and 2.4) Anton - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [testcase] madvise->semaphore deadlock 2.4.0
Mike Galbraith wrote: > > Greetings, > > While trying to configure ftpsearch, the process hangs while running > it's madvise confidence test below. It appears to be taking a fault > in madvise_fixup_middle():atomic_add(2, &vma->vm_file->f_count) and > immediately deadlocking forever on mm->mmap_sem per IKD. (Virgin 2.4.0 > agrees) > This should fix it. We're still in disagreement with the HPUX 11 manpage though. HP say that MADV_DONTNEED requires an underlying file, and thus implies that MADV_WILLNEED doesn't need an underlying file. We have it the other way round, which seems more sensible. --- linux-2.4.0/mm/filemap.cFri Jan 5 21:37:20 2001 +++ linux-akpm/mm/filemap.c Tue Jan 9 23:05:00 2001 @@ -1835,7 +1835,8 @@ n->vm_end = end; setup_read_behavior(n, behavior); n->vm_raend = 0; - get_file(n->vm_file); + if (n->vm_file) + get_file(n->vm_file); if (n->vm_ops && n->vm_ops->open) n->vm_ops->open(n); lock_vma_mappings(vma); @@ -1861,7 +1862,8 @@ n->vm_pgoff += (n->vm_start - vma->vm_start) >> PAGE_SHIFT; setup_read_behavior(n, behavior); n->vm_raend = 0; - get_file(n->vm_file); + if (n->vm_file) + get_file(n->vm_file); if (n->vm_ops && n->vm_ops->open) n->vm_ops->open(n); lock_vma_mappings(vma); @@ -1893,7 +1895,8 @@ right->vm_pgoff += (right->vm_start - left->vm_start) >> PAGE_SHIFT; left->vm_raend = 0; right->vm_raend = 0; - atomic_add(2, &vma->vm_file->f_count); + if (vma->vm_file) + atomic_add(2, &vma->vm_file->f_count); if (vma->vm_ops && vma->vm_ops->open) { vma->vm_ops->open(left); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] advansys.c: include missing restore_flags, etc
Em Tue, Jan 09, 2001 at 08:30:07AM +0100, Pauline Middelink escreveu: > > +STATIC unsigned long > > DvcEnterCritical(void) > > { > > -intflags; > > +unsigned long flags; > > > > save_flags(flags); > > cli(); > > @@ -9965,7 +9972,7 @@ > > } > > Err, according tho wise ppl on this list, this does not work on > MIPSes. The flags thing must stay in the same stackframe. > > (I know, not your fault, but since you are patching the driver...) yap, know that, just thought that this beast was only for i386, will submit another patch, and I think that some other drivers does this as well... - Arnaldo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: `rmdir .` doesn't work in 2.4
> I trust your specs said so, however I'm not sure which are the specs > we should follow for Linux. > At least for LFS 2.2.x fixage I always followed the SuSv2 specs We are Linux, and free to do whatever we want. However, following POSIX makes a large body of software available. It would be very unwise to deviate from POSIX if it can be avoided. Now POSIX describes only part of Unix, and for other parts we get our inspiration from SVID, or X/Open, or SUSv2, or by looking at what other Unix-like systems do, like *BSD*, Solaris, AIX, etc. But these sources are often contradictory. The next version of the POSIX standard (which will simultaneously be SUSv3) is expected a few months from now. As soon as it exists, we'll want to follow it, as much as possible. Today it doesnt exist, but in case of doubt it is reasonable to follow the draft. (And in case the draft is really ridiculous, there is still time to file a change request.) Andries See http://www.opengroup.org/austin/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Confirmation request about new 2.4.x. kernel limits
On Mon, Jan 08, 2001 at 11:11:05PM -0500, Venkatesh Ramamurthy wrote: > > > Max. RAM size: 64 GB (any slowness accessing RAM over 4 GB > > with 32 bit machines ?) > more than 4GB in RAM is bounce buffered, so there is performance > penalty as the data have to be copied into the 4GB RAM area Actual memory limit is lower, your run-of-the-mill Pentium-PAE36 capable system has PCI bus(es) for IO, and address space for that/those need to stay in area below 4G for bootup to access any devices, thus very likely your system doesn't have more than, say, 3 GB of RAM below 4G. Pick your processors. You need XEONs to have L1/L2 cacheing on memory above the 4 GB address (PAE36 mapped physical addresses.) For IO on usual systems you have 32 bit address space PCI busmasters, so those can access only the lowest 4GB of address space, and to have a block of data in upper area, it needs to be "bounced", that is, CPU must copy it. Linux 2.4.0 system doesn't support 64-bit PCI addresses at 32-bit systems (not at 64-bit Alpha either, I recall.) On the other hand, Alpha systems and SPARC systems have IOMMU hardware, and we do support that (to some extent), but 32-bit intel world doesn't have similar things. For userspace, if parts of userspace are physically mapped above 4G, it might not be very harmfull at all -- presuming you have XEONs which cache the memory accesses there also. The libc and similar multiply shared objects might as well reside in high memory. Userspace process doesn't see, after all, where each page resides physically. /Matti Aarnio - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Which Bind version..
On Mon, Jan 08, 2001 at 12:56:37PM +0500, Mike wrote: > Hi!! > I wanna install Bind on my DNS. Which Bind version is most stabel and > secure. 9.0.1 is the latest release in the 9.x series and if you are interested in "SecureDNS", that's the way to go. I'm currently running 9.1.0beta2, and it seems rock solid to me. There is also 8.2.2P7 if you want to stick the the older 8.x series, but I certainly wouldn't if I were setting up a new DNS server. The 4.x series is totally deprecated at this point. Personally, I wouldn't use anything less than 9.0.1 and I currently support over 100 domains on my servers (my partner runs a hosting service). > Regards, > Nauman Ansari Mike -- Michael H. Warfield| (770) 985-6132 | [EMAIL PROTECTED] (The Mad Wizard) | (678) 463-0932 | http://www.wittsend.com/mhw/ NIC whois: MHW9 | An optimist believes we live in the best of all PGP Key: 0xDF1DD471| possible worlds. A pessimist is sure of it! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1
On Tue, 9 Jan 2001, Christoph Hellwig wrote: > Sure. But sendfile is not one of the fundamental UNIX operations... Neither were eg. kernel-based semaphores. So what? Unix wasnt perfect and isnt perfect - but it was a (very) good starting point. If you are arguing against the existence or importance of sendfile() you should re-think, sendfile() is a unique (and important) interface because it enables moving information between files (streams) without involving any interim user-space memory buffer. No original Unix API did this AFAIK, so we obviously had to add it. It's an important Linux API category. > If there was no alternative to this I would probably have not said > anything, but with the rw_kiovec file op just before the door I don't > see any reason to add this _very_ specific file operation. I do think that the kiovec code has to be rewritten substantially before it can be used for networking zero-copy, so right now we do the least damange if we do not increase the coverage of kiovec code. > An alloc_kiovec before and an free_kiovec after the actual call and > the memory overhaed of a kiobuf won't hurt so much that it stands > against a clean interface, IMHO. please study the networking portions of the zerocopy patch and you'll see why this is not desirable. An alloc_kiovec()/free_kiovec() is exactly the thing we cannot afford in a sendfile() operation. sendfile() is lightweight, the setup times of kiovecs are not. basically the current kiovec design does not deal with the realities of high-speed, featherweight networking. DO NOT talk in hypotheticals. The code is there, do it, measure it. You might not care about performance, we do. another, more theoretical issue is that i think the kernel should not be littered with multi-page interfaces, we should keep the one "struct page * at a time" interfaces. Eg. check out how the new zerocopy code generates perfect MTU sized frames via the ->writepage() interface. No interim container objects are necessary. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1
On Mon, 8 Jan 2001, David S. Miller wrote: >All I am asking is that someone lets me know if they make major >changes to my code so I can keep track of whats happening. > > We have not made any major changes to your code, in lieu of this > not being code which is actually being submitted yet. > > If it bothers you that publicly someone has published changes to your > driver which you disagree with, oh well... :-) i did tell Jes about our zerocopy work, months ago (and IIRC we even exchanged emails about technical issues briefly). The changes were first published in the TUX 1.0 source code last August, and subsequent cleanups (more than 10 iterations) were published on Alexey's public FTP site: ftp://ftp.inr.ac.ru/ip-routing/ i think this whole issue got miscommunicated because Jes moved to Canada exactly when we wrote the fragmented-API changes. I do believe Jes will like most of our changes though, and i can surely tell that the elegant and clean code of the Acenic driver made these changes so much easier. Jen's Acenic driver was the first Linux networking driver in history to support zero-copy TCP. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Subtle MM bug
Linus Torvalds <[EMAIL PROTECTED]> writes: > On 8 Jan 2001, Eric W. Biederman wrote: > > > Zlatko Calusic <[EMAIL PROTECTED]> writes:> > > > > > > Yes, but a lot more data on the swap also means degraded performance, > > > because the disk head has to seek around in the much bigger area. Are > > > you sure this is all OK? > > > > I don't think we have more data on the swap, just more data has an > > allocated home on the swap. > > I think Zlatko's point is that because of the extra allocations, we will > have worse locality (more seeks etc). Yes that was my concern. But in the end I'm not sure. I made two simple tests and haven't found any problems with 2.4.0 mm logic (opposed to 2.2.17). In fact, the new kernel was faster in the more interesting (make -j32) test. Also I have found that new kernel allocates 4 times more swap space under some circumstances. That may or may not be alarming, it remains to be seen. -- Zlatko - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: kernel network problem ?
On Tue, 9 Jan 2001, Helge Hafting wrote: > Nicolas Noble wrote: > [...] > As others have told already, this is the ECN problem. > > > I noticed the same bug. This is very weired, I can send a list of sites > > which I can't connect anymore. > > You have a list? Send all of them a message stating that they ought > to upgrade their firewalls which cause this problem. Or they > will loose customers/visitors. Cisco already have an upgrade for them, > so fixing is dead easy, and they can then boast compatibility with > the latest internet standards. > > If they don't care about linux users, tell them that windows eventually > will use ECN too. They definitely don't want to have a ECN problem when > that happens. After upgrading to kernel 2.4.0, I found myself unable to retrieve mail from Adelphia's (2-way cable ISP) POP server. It took several days to figure out that _one_ of their routers was configured to block ECN. After bringing this to the attention of their network engineers, I was informed that their policy prohibits making any router changes on the basis of one trouble report. The person I spoke with did NOT try to defend their setup, but it was made clear that they'll do nothing until Windows breaks. If I were packaging a Linux distribution, I'd be sure to have ECN disabled by default, FWIW. Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ad1848.c: include missing restore_flags
Alan, Please apply. - Arnaldo --- linux-2.4.0-ac4/drivers/sound/ad1848.c Thu Aug 24 07:40:05 2000 +++ linux-2.4.0-ac4.acme/drivers/sound/ad1848.c Tue Jan 9 08:55:58 2001 @@ -28,6 +28,7 @@ * of irqs. Use dev_id. * Christoph Hellwig : adapted to module_init/module_exit * Aki Laukkanen : added power management support + * Arnaldo C. de Melo : added missing restore_flags in ad1848_resume * * Status: * Tested. Believed fully functional. @@ -2751,6 +2752,7 @@ bits = interrupt_bits[devc->irq]; if (bits == -1) { printk(KERN_ERR "MSS: Bad IRQ %d\n", devc->irq); + restore_flags(flags); return -1; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1
Ingo Molnar wrote: > On Tue, 9 Jan 2001, Christoph Hellwig wrote: > >> Sure. But sendfile is not one of the fundamental UNIX operations... > > Neither were eg. kernel-based semaphores. So what? Unix wasnt > perfect and isnt perfect - but it was a (very) good starting > point. If you are arguing against the existence or importance of > sendfile() you should re-think, sendfile() is a unique (and > important) interface because it enables moving information between > files (streams) without involving any interim user-space memory > buffer. No original Unix API did this AFAIK, so we obviously had to > add it. It's an important Linux API category. Ehh, that's not correct. HP-UX was the first to implement sendfile(). Linux (and other commercial unices) then copied the idea... For the record, sendfile() exists because we (Zeus) asked HP for it. (So of course we agree that sendfile is important!) Regards, Stephen -- Stephen Landamore, <[EMAIL PROTECTED]> Zeus Technology Tel: +44 1223 525000 Universally Serving the Net Fax: +44 1223 525100 http://www.zeus.com Zeus Technology, Zeus House, Cowley Road, Cambridge, CB4 0ZT, ENGLAND - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: FS callback routines
"Michael D. Crawford" wrote: > > Regarding notification when there's a change to the filesystem: > > This is one of the most significant things about the BeOS BFS filesystem, and > something I'd dearly love to see Linux adopt. It makes an app very efficient, > you just get notified when a directory changes and you never waste time polling. > > I think it would require changes to the VFS layer, not just to the filesystems, > because this is a concept POSIX filesystems do not presently possess. > > The other is indexed filesystem attributes, for example a file can have its > mimetype in the filesystem, and any application can add an attribute and have it > indexed. > > There's a method to do boolean queries on indexed attributes, and you can find > files in an entire filesystem that match a query in a blazingly short time, much > faster than walking the directory tree. > > If you want to try out the BeOS, there's a free-as-in-beer version at > http://free.be.com for Pentium PC's. You can also purchase a version that comes > for both PC's and certain PowerPC macs. > > There are read-only versions of this for Linux which I believe are under the > GPL. The original author is here: > > http://hp.vector.co.jp/authors/VA008030/bfs/ > > He refers you to here to get a version that works under 2.2.16: > > http://milosch.net/beos/ > > The author's intention was to take it read-write, but it's complex because it is > a journaling filesystem. > > Daniel Berlin, a BeOS developer modified the Linux BFS driver so it works with > 2.4.0-test1. I don't know if it works with 2.4.0. The web site where it used > to be posted isn't there anymore, and the laptop where I had it is in for > repair. I may have it on a backup, and I'll see if I can track Daniel down. > > While Be, Inc.'s implementation is closed-source, the design of the BFS (_not_ > "befs" as it is sometimes called) is explained in Practical File System Design > with the Be File System by Dominic Giampolo, ISBN 1-55860-497-9. Dominic has > since left Be and I understand works at Google now. fs/dnotify.c: /* * Directory notifications for Linux. * * Copyright (C) 2000 Stephen Rothwell ... The currently defined events are: DN_ACCESS A file in the directory was accessed (read) DN_MODIFY A file in the directory was modified (write,truncate) DN_CREATE A file was created in the directory DN_DELETE A file was unlinked from directory DN_RENAME A file in the directory was renamed DN_ATTRIB A file in the directory had its attributes changed (chmod,chown) It was done last year, quietly and without fanfare, by Stephen Rothwell: http://www.linuxcare.com/about-us/os-dev/rothwell.epl This may be the most significant new feature in 2.4.0, as it allows us to take a fundamentally different approach to many different problems. Three that come to mind: mail (get your mail instantly without polling); make (don't rely on timestamps to know when rebuilding is needed, don't scan huge directory trees on each build); locate (reindex only those directories that have changed, keep index database current). As you noticed, there are many others. Stephen, it would be very interesting to know more about the development process you went through and what motivated you to provide this fundamental facility. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Subtle MM bug
Linus Torvalds <[EMAIL PROTECTED]> writes: > On 8 Jan 2001, Eric W. Biederman wrote: > > > Zlatko Calusic <[EMAIL PROTECTED]> writes:> > > > > > > Yes, but a lot more data on the swap also means degraded performance, > > > because the disk head has to seek around in the much bigger area. Are > > > you sure this is all OK? > > > > I don't think we have more data on the swap, just more data has an > > allocated home on the swap. > > I think Zlatko's point is that because of the extra allocations, we will > have worse locality (more seeks etc). > > Clearly we should not actually do any more actual IO. But the sticky > allocation _might_ make the IO we do be more spread out. The tradeoff when implemented correctly is that writes will tend to be more spread out and reads should be better clustered together. > To offset that, I think the sticky allocation makes us much better able to > handle things like clustering etc more intelligently, which is why I think > it's very much worth it. But let's not close our eyes to potential > downsides. Certainly, keeping ours eyes open is a good a good thing. But it has been apparent for a long time that by doing allocation as we were doing it, that when it came to heavy swapping we were taking a performance hit. So I'm relieved that we are now being more aggressive. >From the sounds of it what we are currently doing actually sucks worse for some heavy loads. But it still feels like the right direction. It's been my impression that work loads where we are actively swapping are a lot different from work loads where we really don't swap. To the extent that it might make sense to make the actively swapping case a config option to get our attention in the code. It would be nice to have a linux kernel for once that handles heavy swapping (below the level of thrashing) gracefully. :) Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1
On Tue, 9 Jan 2001, Stephen Landamore wrote: > >> Sure. But sendfile is not one of the fundamental UNIX operations... > > Neither were eg. kernel-based semaphores. So what? Unix wasnt > Ehh, that's not correct. HP-UX was the first to implement sendfile(). i dont think we disagree. What i was referring to was the 'original' Unix idea, the 30 years old one, which did not include sendfile() :-) We never claimed that sendfile() first came up in Linux [that would be a blatant lie] - and the Linux API itself was indeed influenced by existing sendfile()/copyfile() interfaces. (at the time Linus implemented sendfile() there already existed several similar interfaces.) > For the record, sendfile() exists because we (Zeus) asked HP for it. good move :-) [honestly.] > (So of course we agree that sendfile is important!) :-) I think sendfile() should also have its logical extensions: receivefile(). I dont know how the HPUX implementation works, but in Linux, right now it's only possible to sendfile() from a file to a socket. The logical extension of this is to allow socket->file IO and file->file, socket->socket IO as well. (the later one could be interesting for things like web proxies.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [testcase] madvise->semaphore deadlock 2.4.0
On Tue, 9 Jan 2001, Andrew Morton wrote: > Mike Galbraith wrote: > > > > Greetings, > > > > While trying to configure ftpsearch, the process hangs while running > > it's madvise confidence test below. It appears to be taking a fault > > in madvise_fixup_middle():atomic_add(2, &vma->vm_file->f_count) and > > immediately deadlocking forever on mm->mmap_sem per IKD. (Virgin 2.4.0 > > agrees) > > > > This should fix it. Indeed it does. (benchmark _that_ OS rags;) -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: kernel network problem ?
> trouble report. The person I spoke with did NOT try to defend their > setup, but it was made clear that they'll do nothing until Windows breaks. > > If I were packaging a Linux distribution, I'd be sure to have ECN disabled > by default, FWIW. Probably the case. However the more people who pester the faulty sites the better. Did you ask the person how many reports he needed I certainly intend to run ECN on my mailhost once I trust 2.4 a bit more. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Broken tty handling
Drat it, don't you hate it when you get around to reporting a long standing bug and it's already fixed. Thank you, -d - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: `rmdir .` doesn't work in 2.4
On Mon, 8 Jan 2001, Stefan Traby wrote: > On Mon, Jan 08, 2001 at 12:58:20PM -0500, Alexander Viro wrote: > > > Shell equivalent is rmdir `pwd`. Also portable. > > Very portable - not. > > rmdir "`pwd`" !!! OK, got me on that. Yes, you'll need quoting here. Sorry. Notice that there are two effects in the game: * some Unices refuse to rmdir() busy directories. For them removing the pwd is impossible. Period. You can chdir() away, but there is no promise that after chdir foo chdir .. foo will refer to the directory that used to be your pwd in the middle. That's pretty obvious - consider the effect of chdir foo chdir ..mv foo bar mv baz foo So unless you have external information about behaviour of other processes the only way to pinpoint a directory is to keep it opened/pwd/root. Each of these will keep it busy and Unices that refuse to rmdir() busy ones will return -EBUSY on that. On such systems there is _no_ reliable way to remove your current pwd unless you can guarantee that it won't be renamed away by another process. No matter what you are doing. * All Unices are required to refuse rmdir() on pathnames that end on "." or "..". 2.2 is an exception in that respect - usually it allows such operation. However, that is _still_ unreliable. rename() called by another process in the right time will make rmdir(".") return -ENOENT, even though at any moment "." would resolve to the same directory. Including the window when rmdir() would fail. Notice that error value is indistinguishable from the other cases, so blind repeating rmdir(".") while you get -ENOENT is not a solution (as the matter of fact, it can trivially turn into infinite loop). All examples mentioned in that thread (HP/UX, Solaris, *BSD) _will_ fail with "rmdir .". Some of them will fail with "rmdir " too - see discussion of -EBUSY above. The bottom line: without external information about behaviour of other processes you can't reliably remove the directory that is your pwd now. "chdir away and rmdir by the name it used to have" works around the problem with -EBUSY (on the systems that refuse to remove busy ones) _BUT_ it is still vulnerable to "rename by another process" kind of races. If you _have_ such external warranties - trivial wrapper will do the trick on the systems that allow rmdir() of busy directories and the same wrapper combined with chdir away will solve the problem for all systems. There is no reason to put that in the kernel - it will not give you any additional warranties. We _can_ pinpoint the link and do rmdir() on it reliably. We can't do the same to inode. In principle kernel could do that, but NONE of the existing Unices (2.2 included) do such things and it would require more trickery than it's worth. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: `rmdir .` doesn't work in 2.4
- Received message begins Here - > > Hello Al, > > why `rmdir .` is been deprecated in 2.4.x? I wrote software that depends on > `rmdir .` to work (it's local software only for myself so I don't care that it > may not work on unix) and I'm getting flooded by failing cronjobs since I put > 2.4.0 on such machine. `rmdir .` makes perfect sense, the cwd dentry remains > pinned by me until I `cd ..`, when it gets finally deleted from disk. I'd like > if we could resurrect such fine feature (adapting userspace is just a few liner > but that isn't the point). Comments? Not exactly valid, since a file could be created in that "pinned" directory after the rmdir... - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1
Ingo Molnar wrote: > > On Tue, 9 Jan 2001, Stephen Landamore wrote: > > > >> Sure. But sendfile is not one of the fundamental UNIX operations... > > > > Neither were eg. kernel-based semaphores. So what? Unix wasnt > > > Ehh, that's not correct. HP-UX was the first to implement sendfile(). > > i dont think we disagree. What i was referring to was the 'original' Unix > idea, the 30 years old one, which did not include sendfile() :-) We never > claimed that sendfile() first came up in Linux [that would be a blatant > lie] - and the Linux API itself was indeed influenced by existing > sendfile()/copyfile() interfaces. (at the time Linus implemented > sendfile() there already existed several similar interfaces.) > y'know our pals have patented it? http://www.delphion.com/details?pn=US05845280__ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: kernel network problem ?
On Tue, Jan 09, 2001 at 01:32:49PM +, Alan Cox wrote: > > If I were packaging a Linux distribution, I'd be sure to have ECN disabled > > by default, FWIW. > > Probably the case. However the more people who pester the faulty sites the > better. Did you ask the person how many reports he needed > > I certainly intend to run ECN on my mailhost once I trust 2.4 a bit more. > > Alan Is anyone maintaing an automated sweep of sites that I can complain to all at once (for each 2.4 ecn system I install of course) rather then finding them one at a time as my connections fail? :) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sscape.c: include missing restore_flags
Alan, Please apply. - Arnaldo --- linux-2.4.0-ac4/drivers/sound/sscape.c Mon Jan 8 20:39:30 2001 +++ linux-2.4.0-ac4.acme/drivers/sound/sscape.c Tue Jan 9 09:16:39 2001 @@ -16,6 +16,7 @@ * Christoph Hellwig : adapted to module_init/module_exit * Bartlomiej Zolnierkiewicz : added __init to attach_sscape() * Chris Rankin: Specify that this module owns the coprocessor + * Arnaldo C. de Melo : added missing restore_flags in sscape_pnp_upload_file */ #include @@ -969,7 +970,10 @@ memcpy(devc->raw_buf, dt, l); dt += l; sscape_start_dma(devc->dma, devc->raw_buf_phys, l, 0x48); sscape_pnp_start_dma ( devc, 0 ); - if (sscape_pnp_wait_dma ( devc, 0 ) == 0) return 0; + if (sscape_pnp_wait_dma ( devc, 0 ) == 0) { + restore_flags(flags); + return 0; + } } restore_flags(flags); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] hashed device lookup (Does NOT meet Linus' sumissionpolicy!)
> Actually if you count arp which is also part of ip; ip becomes smaller > by about 15K. ...i always forget some small detail. thx -d - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.4.0 bug in SHM an via-rhine or is it my fault?
Hi folks! I searched the kernel archives for information on this at least half a yearback but I found only one article on the subject and that was never replied to: I'm using a via-rhine chip (DFE-530TX) on a 10 Mbit network, I use 2.4.0 final, Athlon (classic) 1Gig, Abit-KA7 mobo (via KX133), Debian woody. whenever I try to get a file on my local network, meaning I get close to the 10Mbit barrier the network card hangs up. Traffic just stops. One ifdown/ifup and everything works fine again. (for about 10 seconds) this problem has persisted for some time now, I thought it would be fixed in the final, but, alas, it hasn't. It only happens during high traffic, too, at about 400k, no problem! Something new that cropped up in prerelease: My SHM stopped working! everything was fine in test12, and after that all I got was "no space left on device". Has anything changed that one should know about? I mounted shm like it's written in the help, and on a friends celeron SMP machine it works fine, I just don't know what I did wrong. any ideas on any of the 2 problems? TIA Felix Maibaum - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cramfs is ro only, so honour this in inode->mode
Shane Nay writes: > but the bits are useless in the "normal interpretation" of it, ... > But then you pull out the write bits, If you need to steal a bit, grab one that won't hurt. Take the owner's read bit. (owner may read own files) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] More compile warning fixes for 2.4.0
[about labels w/o statements after them] >> Is this really a kernel bug? This is common idiom in C, so gcc >> shouldn't warn about it. If it does, it is a bug in gcc IMHO. > > No, it is not a common idiom in C. It has _never_ been valid C. > > GCC originally allowed it due to a mistake in the grammar; we > now warn for it. Fix your source. Since neither -ansi nor -std=foo was specified, gcc should just shut up and be happy. Consider this as another GNU extension. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Unified power management userspace policy
Hi! Andrew Morton <[EMAIL PROTECTED]> writes: > Could you please use call_usermodehelper() in this patch > rather than exec_usermodehelper()? I want to kill > exec_usermodehelper() sometime. The reason I used exec_usermodehelper is that I wanted to waitpid on the process to see how it exited. Am I still allowed to do that if it runs as a child of keventd? [...] -- http://www.penguinpowered.com/~vii - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: `rmdir .` doesn't work in 2.4
Hi, On Tue, Jan 09, 2001 at 01:01:25AM +0100, Andrea Arcangeli wrote: > On Mon, Jan 08, 2001 at 03:27:21PM -0800, Linus Torvalds wrote: > > However, it is against all UNIX standards, and Linux-2.4 will explicitly > > I may be missing something but apparently SuSv2 allows it, you can check here: > > http://www.opengroup.org/onlinepubs/007908799/xsh/rmdir.html > > Infact SuSv2 doesn't even allow rmdir to return -EINVAL. SuS always allows implementations to return other errors than the ones listed: Implementations will not generate a different error number from the ones described here for error conditions described in this specification, but may generate additional errors unless explicitly disallowed for a particular function. See http://www.opengroup.org/onlinepubs/007908799/xsh/errors.html --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: `rmdir .` doesn't work in 2.4
Alexander Viro writes: > [...] If you really need to destroy the directory > that happens to be your pwd - sorry, no reliable way to do that without > interesting locking. On _any_ UNIX out there. 2.2 included. It will > happily give you -ENOENT and refuse to perform the action above in > case if some other process renames your pwd. Yes, for rmdir("."); Well, this bites. Locking guess: use a global read-write lock, with the "write" case being deletion of "." and the "read" case being everything else. You could have one lock per CPU, with the writer needing to grab all of them in order. So removal of "." pays the cost. If the standards gripe, well, rmdot() is a nice name. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: `rmdir .` doesn't work in 2.4
Hi, On Mon, Jan 08, 2001 at 09:28:33PM +0100, Andrea Arcangeli wrote: > On Mon, Jan 08, 2001 at 12:58:20PM -0500, Alexander Viro wrote: > > It's a hell of a pain wrt locking. You need to lock the parent, but it can > > This is a no-brainer and bad implementation, but shows it's obviously right > wrt locking. (pseudocode, I ignored the uaccess details and all the other not > relevant things) > > err = sys_getcwd(buf, PAGE_SIZE) > if (!memcmp(path, ".", 2)) > path = buf > err = 2_4_0_sys_rmdir(path) > Could you enlight me on where's the locking pain? Do the above while another process is renaming one of your parents and watch an innocent directory get shot down in flames, or prepare for an incorrect ENOENT. --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Network Performance?
On Mon, Jan 08, 2001 at 07:07:18PM +0100, Erik Mouw wrote: > I had similar problems two weeks ago. Turned out the connection between > two switches: one of them was hard wired to 100Mbit/s full duplex, the > other one to 100Mbit/s half duplex. Just to rule out the obvious... We check that as the first thing. Both are set the same. No collisions out of the ordinary. Tim -- Tim Sailer <[EMAIL PROTECTED]> Cyber Security Operations Brookhaven National Laboratory (631) 344-3001 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0 bug in SHM an via-rhine or is it my fault?
On Tue, 9 Jan 2001, Felix Maibaum wrote: > My SHM stopped working! > everything was fine in test12, and after that all I got was "no space > left on device". > Has anything changed that one should know about? I mounted shm like it's > written in the help, and on a friends celeron SMP machine it works fine, > I just don't know what I did wrong. You used a buggy version of powertweak which set kernel.shmall to 0 in /etc/sysctl.conf. Remove the offending line in /etc/sysctl.conf and either reboot the machine or "echo 2097152 > /proc/sys/kernel/shmall". Ciao, Nils -- Nils Philippsen / Berliner Straße 39 / D-71229 Leonberg // +49.7152.209647 [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offence. -- Edsger W. Dijkstra - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: FS callback routines
Daniel Phillips <[EMAIL PROTECTED]>: > "Michael D. Crawford" wrote: > > > > Regarding notification when there's a change to the filesystem: > > > > This is one of the most significant things about the BeOS BFS filesystem, and > > something I'd dearly love to see Linux adopt. It makes an app very efficient, > > you just get notified when a directory changes and you never waste time polling. > > > > I think it would require changes to the VFS layer, not just to the filesystems, > > because this is a concept POSIX filesystems do not presently possess. > > > > The other is indexed filesystem attributes, for example a file can have its > > mimetype in the filesystem, and any application can add an attribute and have it > > indexed. > > > > There's a method to do boolean queries on indexed attributes, and you can find > > files in an entire filesystem that match a query in a blazingly short time, much > > faster than walking the directory tree. > > > > If you want to try out the BeOS, there's a free-as-in-beer version at > > http://free.be.com for Pentium PC's. You can also purchase a version that comes > > for both PC's and certain PowerPC macs. > > > > There are read-only versions of this for Linux which I believe are under the > > GPL. The original author is here: > > > > http://hp.vector.co.jp/authors/VA008030/bfs/ > > > > He refers you to here to get a version that works under 2.2.16: > > > > http://milosch.net/beos/ > > > > The author's intention was to take it read-write, but it's complex because it is > > a journaling filesystem. > > > > Daniel Berlin, a BeOS developer modified the Linux BFS driver so it works with > > 2.4.0-test1. I don't know if it works with 2.4.0. The web site where it used > > to be posted isn't there anymore, and the laptop where I had it is in for > > repair. I may have it on a backup, and I'll see if I can track Daniel down. > > > > While Be, Inc.'s implementation is closed-source, the design of the BFS (_not_ > > "befs" as it is sometimes called) is explained in Practical File System Design > > with the Be File System by Dominic Giampolo, ISBN 1-55860-497-9. Dominic has > > since left Be and I understand works at Google now. > > fs/dnotify.c: > >/* > * Directory notifications for Linux. > * > * Copyright (C) 2000 Stephen Rothwell > ... > > The currently defined events are: > > DN_ACCESS A file in the directory was accessed (read) > DN_MODIFY A file in the directory was modified (write,truncate) > DN_CREATE A file was created in the directory > DN_DELETE A file was unlinked from directory > DN_RENAME A file in the directory was renamed > DN_ATTRIB A file in the directory had its attributes > changed (chmod,chown) > > It was done last year, quietly and without fanfare, by Stephen Rothwell: > > http://www.linuxcare.com/about-us/os-dev/rothwell.epl > > This may be the most significant new feature in 2.4.0, as it allows us > to take a fundamentally different approach to many different problems. > Three that come to mind: mail (get your mail instantly without polling); > make (don't rely on timestamps to know when rebuilding is needed, don't > scan huge directory trees on each build); locate (reindex only those > directories that have changed, keep index database current). As you > noticed, there are many others. > > Stephen, it would be very interesting to know more about the development > process you went through and what motivated you to provide this > fundamental facility. It would also be very nice if the security of the feature could be confirmed. The problem with SGI's implementation is that it becomes possible to monitor files that you don't own, don't have access to, or are not permitted to know even exist. For these reasons, we have disabled the feature. - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: `rmdir .` doesn't work in 2.4
On Tue, Jan 09, 2001 at 07:41:21AM -0600, Jesse Pollard wrote: > Not exactly valid, since a file could be created in that "pinned" directory > after the rmdir... In 2.2.x no file can be created in the pinned directory after the rmdir. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Confirmation request about new 2.4.x. kernel limits
Hi, On Fri, Jan 05, 2001 at 11:46:04PM +0100, Pavel Machek wrote: > > > Max. file size: 1 TB(?) > > Max. file system size: 2 TB(?) > > Again, maybe on i386 with ext2. Actually, the 2TB limit affects all architectures, as we assume that block indexes fit into 32 bits. Blocks are passed around as unsigned longs in some cases, but even on 64-bit machines that doesn't help us as the limit still persists in the filesystem (32-bit block numbers) and device drivers (ints and 4-byte sector numbers used when generating SCSI commands). Auditing the whole driver path to allow 64-bit block numbers, and adding the logic to generate the 5th sector address byte in the scsi command when we're doing 10-byte commands, are all possible extensions for 2.5. For now, though, the 2TB device limit is with us for all architectures and all filesystems on 2.4. --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Unified power management userspace policy
John Fremlin wrote: > > Hi! > > Andrew Morton <[EMAIL PROTECTED]> writes: > > > Could you please use call_usermodehelper() in this patch > > rather than exec_usermodehelper()? I want to kill > > exec_usermodehelper() sometime. > > The reason I used exec_usermodehelper is that I wanted to waitpid on > the process to see how it exited. Am I still allowed to do that if it > runs as a child of keventd? Oh foo. I missed that. In the patch-which-didn't-make-it, yes, it can be called synchronously. Or you can be called back with the exit code when the subprocess exits. It does all the waitpid stuff, the signal management, handles chrootedness, etc. But that's vapourware now. In the current implementation of call_usermodehelper(), it looks like the commentary is incorrect - it returns a negative error code or the subprocess's pid, but you can't wait on that because it's parented by keventd. Sorry for the noise - stick with what you have now. - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Confirmation request about new 2.4.x. kernel limits
Hi, On Mon, Jan 08, 2001 at 11:11:05PM -0500, Venkatesh Ramamurthy wrote: > > > Max. RAM size:64 GB (any slowness > accessing RAM over 4 GB > * with 32 bit machines ?) > Imore than 4GB in RAM is bounce buffered, so there is performance > penalty as the data have to be copied into the 4GB RAM area Any memory over 1GB is bounce-buffered, but we don't use that memory for anything other than process data pages or file cache, so only swapping and disk IO to regular files gets the extra copy. In particular, things like network buffers are still all kept in the low 1GB so never need to be buffered. --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: VM subsystem bug in 2.4.0 ?
Hi, On Mon, Jan 08, 2001 at 04:30:10PM -0200, Rik van Riel wrote: > On Mon, 8 Jan 2001, Linus Torvalds wrote: > > > > The only solution I see is something like a "active_immobile" > > list, and add entries to that list whenever "writepage()" > > returns 1 - instead of just moving them to the active list. > > Just marking them with a special "do not deactivate me" > bit seems to work fine enough. When this special bit is > set, we simply move the page to the back of the active > list instead of deactivating. But again, how do you clear the bit? Locking is a per-vma property, not per-page. I can mmap a file twice and mlock just one of the mappings. If you get a munlock(), how are you to know how many other locked mappings still exist? --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0 bug in SHM an via-rhine or is it my fault?
Nils Philippsen wrote: > reboot the machine or "echo 2097152 > /proc/sys/kernel/shmall". now thats what I call a quick response, thanks, it did the trick. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cramfs is ro only, so honour this in inode->mode
"Albert D. Cahalan" <[EMAIL PROTECTED]> writes: > Shane Nay writes: > > > but the bits are useless in the "normal interpretation" of it, > ... > > But then you pull out the write bits, > > If you need to steal a bit, grab one that won't hurt. > Take the owner's read bit. (owner may read own files) Er, bash-2.03$ cd /tmp bash-2.03$ cat >foo This is a test. bash-2.03$ chmod u-r foo bash-2.03$ cat foo cat: foo: Permission denied bash-2.03$ ls -l foo --w-r--r--1 doug doug 16 Jan 9 09:16 foo bash-2.03$ This is Linux 2.4.0. -Doug - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.4.0-ac4 lockups
Hi, I'm currently running ROCK Linux 1.3.11 on a MiTAC 6033 laptop, XFree86 4.0.1 and the rest of the linux install is quite bleeding edge (I can find out version numbers for most things is needed). In this box there is a PCMCIA Token Ring card (IBM Turbo 16/4 PC Card 2) and to drive this, pcmcia-cs-3.1.23. The problem that is showing its ugly face is that after some prolonged network activity the system will lock solidly. The magic SysRq keys still work, well sort of anyway. Alt-SysRq-s does inspire the system to disk activity. Alt-SysRq-u doesn't do enough for the disk-led to light up but Alt-SysRq-b does reboot the system. Upon reboot I go and fetch a coffee while the system is fsck'ing the filesystems. I have had several lockups in the last couple of days. It started in 2.4.0-prer and with the Token Ring card. Some crash messages has been relating to virtual memory at invalid addresses and when I get a good crash message I will write it down and post to the list. Any ideas anyone? /Anders - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: `rmdir .` doesn't work in 2.4
On Tue, 9 Jan 2001, Jesse Pollard wrote: > Not exactly valid, since a file could be created in that "pinned" directory > after the rmdir... No, it couldn't (if you can show a testcase when it would - please do, you've found a bug). Moreover, busy directories can be removed in 2.4 quite fine - it's about pathname, not about the thing being your (or somebody else) pwd. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[PATCH] via-macii.c: restore_flags on failure
Hi, Please consider applying. - Arnaldo --- linux-2.4.0-ac4/drivers/macintosh/via-macii.c Tue Dec 19 11:25:39 2000 +++ linux-2.4.0-ac4.acme/drivers/macintosh/via-macii.c Tue Jan 9 10:18:17 2001 @@ -9,6 +9,9 @@ * * Rewrite for Unified ADB by Joshua M. Thompson ([EMAIL PROTECTED]) * + * Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> + * - restore_flags on failure in macii_init - 09/01/2001 + * * 1999-08-02 (jmt) - Initial rewrite for Unified ADB. */ @@ -147,15 +150,16 @@ cli(); err = macii_init_via(); - if (err) return err; + if (err) goto out; err = request_irq(IRQ_MAC_ADB, macii_interrupt, IRQ_FLG_LOCK, "ADB", macii_interrupt); - if (err) return err; + if (err) goto out; macii_state = idle; - restore_flags(flags); - return 0; + err = 0; +out: restore_flags(flags); + return err; } /* initialize the hardware */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: `rmdir .` doesn't work in 2.4
> On Tue, 9 Jan 2001, Jesse Pollard wrote: > > > Not exactly valid, since a file could be created in that "pinned" directory > > after the rmdir... > > No, it couldn't (if you can show a testcase when it would - please do, you've > found a bug). Moreover, busy directories can be removed in 2.4 quite fine - > it's about pathname, not about the thing being your (or somebody else) pwd. Apologies to all, foot-in-mouth disease - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1
From: Trond Myklebust <[EMAIL PROTECTED]> Date: 09 Jan 2001 14:52:40 +0100 I don't really want to be chiming in with another 'make it a kiobuf', but given that you already have written 'do_tcp_sendpages()' why did you make sock->ops->sendpage() take the single page as an argument rather than just have it take the 'struct page **'? It was like that to begin with. But to do it cleanly you have to pass in not a vector of "pages" but a vector of "page+offset+len" triplets. Linus hated it, and I understood why, so I reverted the API to be single page based. I would have thought one of the main interests of doing something like this would be to allow us to speed up large writes to the socket for ncpfs/knfsd/nfs/smbfs/... This is what TCP_CORK/MSG_MORE et al. are all for, things get coalesced perfectly. Sending in a vector of pages seems nice, but none of the page cache infrastructure works like this, all of the core routines work on a page at a time. It actually simplifies a lot. The writepage interface optimizes large file writes to a socket just fine. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: adding a system call
> > What is the procedure for adding a new system call to the Linux kernel? > > hack away, the code's free. don't expect Linus to accept your > changes into the "real" kernel without a VERY good argument. I know. However the Kernel Hacker's Guide writes about sys.h. After a bit of exploring, I found that sys.h has been replaced by something else in later kernels, which leaves me wondering where in the kernel I should insert my code, and where the dispatcher is located for the other system calls, in case my system call would need them. My system call idea is to allow a superuser process to request a mmap on behalf of an user process. To see how this would be useful, let us consider svgalib. Until now, there were two ways to allow an application access to the video array. The first was by making it setuid root, but this compromises system security by allowing it too many permissions. The second was by having a helper module which allows user applications access to the video card. However this allows any remote user to set the screen in flames. With my new system call, a superuser process can set the graphics mode in a safe manner and then ask for an mmap of the video array into the application data segment. Mihai - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Change of policy for future 2.2 driver submissions
Hi, On Fri, Jan 05, Linus Torvalds wrote: [...] > But that's very different from having somebody like RedHat, SuSE or > Debian make such a kernel part of their standard package. No, I don't > expect that they'll switch over completely immediately: that would show > a lack of good judgement. The prudent approach has always been to have > both a 2.2.19 and a 2.4.0 kernel on there, and ask the user if he wants > to test the new kernel first. Right, but now there is a problem: Software RAID. The RAID code of 2.4.0 is not backwards compatible to the one in 2.2.18; if somebody has used 2.4.0 on softraid and discovers some problem, he can not switch back to some official 2.2 kernel. In order to make it possible to switch between kernel releases, every vendor now really is forced to integrate the new RAID0.90 code to their 2.2 kernel. IMHO this code should be integrated to the next official 2.2 kernel so people can use whatever they want. > Linus -o) Hubert Mantel Goodbye, dots... /\\ _\_v - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Change of policy for future 2.2 driver submissions
> some official 2.2 kernel. In order to make it possible to switch between > kernel releases, every vendor now really is forced to integrate the new > RAID0.90 code to their 2.2 kernel. IMHO this code should be integrated to > the next official 2.2 kernel so people can use whatever they want. Then people using a newer 2.2 cannot go back to an older 2.2 thats really far far worse. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: VM subsystem bug in 2.4.0 ?
Hi Stephen, On Tue, 9 Jan 2001, Stephen C. Tweedie wrote: > But again, how do you clear the bit? Locking is a per-vma property, > not per-page. I can mmap a file twice and mlock just one of the > mappings. If you get a munlock(), how are you to know how many > other locked mappings still exist? It's worse: The issue we are talking about is SYSV IPC_LOCK. This is a per segment thing. A user can (un)lock a segment at any time. But we do not have the references to the vmas attached to the segemnts or to the pages allocated. Greetings Christoph - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: adding a system call
[EMAIL PROTECTED] said: > What is the procedure for adding a new system call to the Linux > kernel? First: Convince people that it's necessary. -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: `rmdir .` doesn't work in 2.4
On Tue, 9 Jan 2001, Albert D. Cahalan wrote: > Alexander Viro writes: > > > [...] If you really need to destroy the directory > > that happens to be your pwd - sorry, no reliable way to do that without > > interesting locking. On _any_ UNIX out there. 2.2 included. It will > > happily give you -ENOENT and refuse to perform the action above in > > case if some other process renames your pwd. Yes, for rmdir("."); > > Well, this bites. > > Locking guess: use a global read-write lock, with the "write" case > being deletion of "." and the "read" case being everything else. > You could have one lock per CPU, with the writer needing to grab all > of them in order. So removal of "." pays the cost. It's _so_ far from the SMP cache issues that it's not even funny. So reference to brw-locks is completely bogus. What you are proposing is to serialize rmdir() and rename() (including lookups) wrt rmdir and rename. Globally. Fun, fun... > If the standards gripe, well, rmdot() is a nice name. If anything, frmdir() might be a better name. However, it's really inconsistent with the whole namespace-modifying stuff. You don't have flink(fd, newname). frename() and funlink() are not even funny - _which_ link would you want to be renamed/removed? Filesystem consists of two types of objects - files (and that includes directories, etc.) and links. Pathname can be evaluated to link and to file. Namespace syscalls (creat()/mkdir()/mknod()/symlink()/link()/ unlink()/rmdir()/rename()) operate on links. open(), truncate(), stat(), lstat(), etc. operate on files - completely different can of worms. 2.2 tried (without success) to make rmdir() and some cases of rename() act on files. Notice that if you have /foo as pwd, "." and "/foo" will evaluate to the same file, but to different links. That's what it's really about. We could add new syscalls. However, I'm yet to see the real-world situation where they would be needed enough to warrant their inclusion. And I mean real-world, not an exercise asking for that functionality. Occam's Razor... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Delay in authentication.
On Mon, 8 Jan 2001, Scott Laird wrote: > > Is syslog running correctly? When syslog screws up, it very frequently > results in this sort of problem. > I would guess that syslog is okay. I'm getting plenty of entries in my various logs, along with a few boxes remote logging into this server. Another interesting thing I have noticed about this delay. If I remove the data in the password field from the shadow file ("username::...") there is no pause during login. -Chris -- Two penguins were walking on an iceberg. The first penguin said to the second, "you look like you are wearing a tuxedo." The second penguin said, "I might be..." --David Lynch, Twin Peaks - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] More compile warning fixes for 2.4.0
On Tue, 9 Jan 2001, Albert D. Cahalan wrote: > [about labels w/o statements after them] > > >> Is this really a kernel bug? This is common idiom in C, so gcc > >> shouldn't warn about it. If it does, it is a bug in gcc IMHO. > > > > No, it is not a common idiom in C. It has _never_ been valid C. > > > > GCC originally allowed it due to a mistake in the grammar; we > > now warn for it. Fix your source. > > Since neither -ansi nor -std=foo was specified, gcc should just > shut up and be happy. Consider this as another GNU extension. > It has to do with ; "a label at the end of a compound statement..." This has never been correctly allowed. Many don't realilize that case 'X': case 'Y': default: ... are all labels. Modern compilers are now enforcing the rules. When a 'switch' is a compound statement, tt follows the rules for other compound statements. For instance, you can code (correctly) switch(a) case 1: a--; ... this, with no braces at all. If a == 1, it gets changed to 0, otherwise it is untouched. If we need another test, it becomes a compound statement requiring braces as: switch(a) { case 1: a--; default: } Observe that we have tricked the compiler into generating code without using a ';' denoting the end of a statement. The standards makers don't like this and and now requiring that the above be coded as: switch(a) { case 1: a--; default: ; } ^___ no tricks allowed. A 'program unit', denoted by {} braces has never required a terminating semicolon, so putting a ';' at the end of the physical statement just won't do it in this case. Cheers, Dick Johnson Penguin : Linux version 2.4.0 on an i686 machine (799.53 BogoMips). "Memory is like gasoline. You use it up when you are running. Of course you get it all back when you reboot..."; Actual explanation obtained from the Micro$oft help desk. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
RE: Confirmation request about new 2.4.x. kernel limits
> Any memory over 1GB is bounce-buffered, but we don't use that memory > for anything other than process data pages or file cache, so only > swapping and disk IO to regular files gets the extra copy. In > particular, things like network buffers are still all kept in the low > 1GB so never need to be buffered. [Venkatesh Ramamurthy] If anything over 1GB is bounce buffered than what is the purpose of setting the pci_dev->dma_mask field. On a IA32 system we set it to 32 1's and IA64 to 64 1's - Venkat > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] hashed device lookup (New Benchmarks)
Andi Kleen wrote: > > On Mon, Jan 08, 2001 at 04:23:41PM +0100, Ben Greear wrote: > > I don't argue that ifconfig shouldn't be fixed, but the hash speeds up > > It's already fixed since months. There was one stupid algorithm, which > I was to blame for when I changed ifconfig to use a device list two years ago. The benchmark was run against this one: [root@candle lanforge]# ifconfig --version net-tools 1.57 ifconfig 1.40 (2000-05-21) The latest I could find anywhere Please tell me the version of a newer one if it exists. > > ip by about 2X too. Is that not useful enough? ip seems to be implemented > > pretty efficient, so if the hash helps it significantly then maybe it > > can help other efficient programs too. Notice that it is the system > > (ie kernel) time that stays remarkably flat with the hash + ip graph. > > Just does your benchmark represent anything that real users do frequently ? I'm going to write something that binds to a raw device, which is something users (DHCP, for sure) does. If it does not show any significant improvement, then I'll drop the issue untill many-many interfaces are more common. > > If you really want to optimize I'm sure there are lots of areas in the kernel > where your efforts are better spent ;) [just run with a the kernel profiler on > for a few days on your box and look at all the real hot spots] I was just trying to smooth VLAN's adoption into the kernel by removing the one linear-lookup that I know of relating to lots of VLANs. It obviously isn't horribly important, but it was fun :) > > BTW, if you just want to optimize ip link ls speed it would be probably enough > to keep a one behind cache that just caches the next member after the last > search. That is still linear in the kernel...or do you mean cache in the kernel? At any rate, I'm more concerned about random access. > > -Andi -- Ben Greear ([EMAIL PROTECTED]) http://www.candelatech.com Author of ScryMUD: scry.wanfear.com (Released under GPL) http://scry.wanfear.com http://scry.wanfear.com/~greear - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[PATCH] dn_keyb.c: restore_flags on failure
Alan, Please consider applying. I don't who is the maintainer, no references in the driver, CREDITS or MAINTAINERS - Arnaldo --- linux-2.4.0-ac4/drivers/char/dn_keyb.c Fri Jul 28 06:34:40 2000 +++ linux-2.4.0-ac4.acme/drivers/char/dn_keyb.c Tue Jan 9 10:32:17 2001 @@ -435,15 +435,14 @@ for(;length;length--) { keyb_cmds[keyb_cmd_write++]=*(cmd++); if(keyb_cmd_write==keyb_cmd_read) - return; + goto out; if(keyb_cmd_write==APOLLO_KEYB_CMD_ENTRIES) keyb_cmd_write=0; } if(!keyb_cmd_transmit) { sio01.BRGtest_cra=5; } - restore_flags(flags); - +out: restore_flags(flags); } static struct busmouse apollo_mouse = { - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Network Performance?
On Mon, Jan 08, 2001 at 01:40:57PM -0500, Craig I. Hagan wrote: > > 101 packets transmitted, 101 packets received, 0% packet loss > > round-trip min/avg/max = 109.6/110.3/112.2 ms > > > > > Does the problem occur in both directions? > > > > Good question. I'll find out. > > > > > Are you _sure_ the window size is being set correctly? How > > > is it being set? > > > > I'm fairly sure. We echo the value to the file. catting it back > > shows the correct value. If we go lower than default, it slows > > down even more. > > what are you setting it to on the solaris machine? what window > sizes have you tried? > > Your pipe looks like it will have quite a few bits in flight due to its Yup. That's why the tuning. WAN performance here is very important. > latency. From my quick guess math, which sucks, it appears that you can fit 1.2 > to 1.5 megabytes on the wire (100mbit machine<-> machine) times 100-120ms wire Hmm. 100/8 is about 12, no? > time. This is a rather large number, so you may want to see what hosts really > support, perhaps starting with 64k or 128k and work up. Make sure that you have > window scaling turned on if you go with very large windows. Yes, we have that enabled too. > Also, have you upped your socket buffers to match your window sizes? We are using straight ftp for the testing. > Last, solaris tends to have poorly tuned tcp values out of the box, look at > this link and tune the solaris stack to better reflect reality. > >http://www.google.com/search?q=cache:www.rvs.uni-hannover.de/people/voeckler/tune/EN/tune.html+%2Bwan+%2Bwindow+%2Bscale+%2Bsize+%2Bnetwork&hl=en > > linux tuning has a decent amount of data in the docs section of the kernel > sources. I'll take a look. THanks. Tim -- Tim Sailer <[EMAIL PROTECTED]> Cyber Security Operations Brookhaven National Laboratory (631) 344-3001 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1
> designing for them. Eg. if an IO operation (eg. streaming video webcast) > does a DMA from a camera card to an outgoing networking card, would it be Most mpeg2 hardware isnt set up for that kind of use. And webcast protocols like h.263 tend to be software implemented. Capturing raw video for pre-processing is similar. Right now thats best done with mmap() on the ring buffer and O_DIRECT I/O it seems Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1
On Tue, 9 Jan 2001, Stephen C. Tweedie wrote: > > please study the networking portions of the zerocopy patch and you'll see > > why this is not desirable. An alloc_kiovec()/free_kiovec() is exactly the > > thing we cannot afford in a sendfile() operation. sendfile() is > > lightweight, the setup times of kiovecs are not. > > > Right. However, kiobufs can be kept around for as long as you want > and can be reused easily, and even if allocating and freeing them is > more work than you want, populating an existing kiobuf is _very_ > cheap. we do have SLAB [which essentially caches structures, on a per-CPU basis] which i did take into account, but still, initializing a 600+ byte kiovec is probably more work than the rest of sending a packet! I mean i'd love to eliminate the 200+ bytes skb initialization as well, it shows up. > > another, more theoretical issue is that i think the kernel should not be > > littered with multi-page interfaces, we should keep the one "struct page * > > at a time" interfaces. > > Bad bad bad. We already have SCSI devices optimised for bandwidth > which don't approach decent performance until you are passing them 1MB > IOs, [...] The fact that we're using single-page interfaces doesnt preclude us from having nicely clustered requests, this is what IO-plugging is about! > and even in networking the 1.5K packet limit kills us in some cases > and we need an interface capable of generating jumbograms. which cases? > Perhaps tcp can merge internal 4K requests, [...] yes, because depending on the application to send properly sized requests is a futile act IMO. So we do have intelligent buffering and clustering in basically every kernel subsystem - and we'll continue to have it because we have no choice - most of Linux's user-visible IO APIs have byte granularity (which is good btw.). Adding a multi-page interface will IMO mostly just complicate the design and the implementation. Do you have empirical (or theoretical) proof which shows that single-page interfaces cannot perform well? > but if you're doing udp jumbograms (or STP or VIA), you do need an > interface which can give the networking stack more than one page at > once. nothing prevents the introduction of specialized interfaces - if they feel like they can get enough traction. I was talking about the normal Linux IO APIs, read()/write()/sendfile(), which are byte granularity and invoke an almost mandatory buffering/clustering mechanizm in every kernel subsystem they deal with. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Failure building 2.4 while running 2.4. Success in building 2.4 while running 2.2.
Silviu Marin-Caea wrote: > > I have RedHat7, glibc-2.2-9, gcc-2.96-69. > > I can build 2.4.0 while running kernel 2.2.16. > > If I try to rebuild 2.4.0 while running the new kernel, I get random > compiler errors. > > It happens on two machines. One of them runs 2.4.0-test12, the other > 2.4.0. Both of them with the updates above mentioned. > > I know this is a RedHat issue, but it may be useful to know for some. I know this isn't since I already built 2.4.0-ac2 and -ac3 on this laptop and never got any compiler error :) [asuardi@princess asuardi]$ rpm -q glibc gcc glibc-2.2-9 gcc-2.96-69 random compiler errors => bad hardware. On two machines ? Yes. --alessandro <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> Linux: kernel 2.2.19p6/2.4.0 glibc-2.2 gcc-2.96-69 binutils-2.10.1.0.4 Oracle: Oracle8i 8.1.7.0.0 Enterprise Edition for Linux motto: Tell the truth, there's less to remember. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Failure building 2.4 while running 2.4. Success in building 2.4 while running 2.2.
> I have RedHat7, glibc-2.2-9, gcc-2.96-69. Ditto > If I try to rebuild 2.4.0 while running the new kernel, I get random > compiler errors. Now I don't. What hardware are you using ? > It happens on two machines. One of them runs 2.4.0-test12, the other > 2.4.0. Both of them with the updates above mentioned. What hardware what errors ? > I know this is a RedHat issue, but it may be useful to know for some. It may well be compiler optimisation where the new gcc is optimising out something someone forgot in a driver or miscompiling a specific driver. One good way to test if its compiler or kernel triggered would be to rebuild 2.4.0 with egcs (aka kgcc). I'd like to know what drivers you are running so I can try and duplicate it - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0-ac3 write() to tcp socket returning errno of -3 (ESRCH:"No such process")
On Tue, 9 Jan 2001, Andrew Morton wrote: > is this still reproducible? If so can I send you a debugging > patch to diagnose a bit further? Yes to both. If I get a patch in the next hour or so, I can have it running before I go to work. Otherwise I won't be able to try it until this evening. With the appended patch, I got these logged, and the application produces the expected error, all with the same timestamp: tcp.c:1165:tcp_sendmsg: err is unexpectedly -375. tcp.c:963:tcp_sendmsg: err is unexpectedly -375. tcp_sendmsg:991: copy = -375, mss_now = 512, skb->len = 887, skb_tailroom(skb) = 521, seglen = 37. The second message is misleading; err is not -375 at this point, copied is. I'm looking at how these were produced, and they seem to be in the opposite order that the code produces them? If you're trying to find these in an unpatched file, The first (line 1165 above) printk() is in the err = copied case of do_fault2. The second is in the if(err) goto do_fault2 check. The last is right after this in tcp_sendmsg. if(copy > seglen) copy = seglen; This is kind of frightening; the printk on line 991 is effectively inside if(mss_now - skb->len > 0) and mss_now seems to be less than skb->len when the printk happens. My copy of K&R is at work; could that comparison be being done unsigned because of skb->len? I wouldn't think so, but the alternative seems somewhat worse... Most of this patch is to tcp_sendmsg. diff -ru linux-2.4.0-ac3/net/ipv4/tcp.c linux-2.4.0-ac3-debugging/net/ipv4/tcp.c --- linux-2.4.0-ac3/net/ipv4/tcp.c Mon Jan 8 22:41:14 2001 +++ linux-2.4.0-ac3-debugging/net/ipv4/tcp.cMon Jan 8 23:02:03 2001 @@ -451,6 +451,23 @@ #define TCP_PAGES(amt) (((amt)+TCP_MEM_QUANTUM-1)/TCP_MEM_QUANTUM) +#define CHECK_TCP_RET() check_tcp_ret(err, __FILE__, __LINE__, __FUNCTION__) + +void check_tcp_ret(int ret, char *file, int line, char *func) { + if(ret < 0) { + switch(-ret) { + case EAGAIN: case EBADF: case EPIPE: case ENOSPC: case EIO: case ECONNRESET: + case EINTR: case ETIMEDOUT: case EFAULT: case EINVAL: case EMSGSIZE: case +ENOMEM: + case ENOBUFS: case ENOTCONN: case ECONNREFUSED: case ERESTARTSYS: case +EHOSTUNREACH: + break; + + default: + printk(KERN_ERR "%s:%d:%s: err is unexpectedly %d.\n", file, line, +func, ret); + } + } +} + + int tcp_mem_schedule(struct sock *sk, int size, int kind) { int amt = TCP_PAGES(size); @@ -883,6 +900,8 @@ } current->state = TASK_RUNNING; remove_wait_queue(sk->sleep, &wait); + if(timeo < 0) + printk(KERN_ERR "wait_for_tcp_memory: timeo == %ld\n", timeo); return timeo; } @@ -916,8 +935,10 @@ /* Wait for a connection to finish. */ if ((1 << sk->state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) - if((err = wait_for_tcp_connect(sk, flags, &timeo)) != 0) - goto out_unlock; + if((err = wait_for_tcp_connect(sk, flags, &timeo)) != 0) { + CHECK_TCP_RET(); + goto out_unlock; + } /* This should be in poll */ clear_bit(SOCK_ASYNC_NOSPACE, &sk->socket->flags); @@ -938,8 +959,11 @@ while (seglen > 0) { int copy, tmp, queue_it; - if (err) - goto do_fault2; + if (err) { + if(copied) check_tcp_ret(copied, __FILE__, __LINE__, +__FUNCTION__); + else CHECK_TCP_RET(); + goto do_fault2; + } /* Stop on errors. */ if (sk->err) @@ -948,7 +972,7 @@ /* Make sure that we are established. */ if (sk->shutdown & SEND_SHUTDOWN) goto do_shutdown; - + /* Now we need to check if we have a half * built packet we can tack some data onto. */ @@ -964,6 +988,7 @@ copy = skb_tailroom(skb); if(copy > seglen) copy = seglen; + if(copy < 0) printk(KERN_ERR "tcp_sendmsg:%d: +copy = %d, mss_now = %d, skb->len = %d, skb_tailroom(skb) = %d, seglen = %d.\n", +__LINE__ copy, mss_now, skb->len, skb_tailroom(skb), seglen); if(last_byte_was_odd) { if(copy_from_user(skb_put(skb, copy), from, copy)) @@ -975,6 +1000,7 @@ csum_and_copy_from_user( from, skb_put(skb, copy),
Re: Failure building 2.4 while running 2.4. Success in building 2.4
> I know this isn't since I already built 2.4.0-ac2 and -ac3 on this > laptop and never got any compiler error :) > > [asuardi@princess asuardi]$ rpm -q glibc gcc > glibc-2.2-9 > gcc-2.96-69 > > random compiler errors => bad hardware. On two machines ? Yes. My guess is a bad driver. Two machines with random errors from hardware only in 2.4 is pushing it - possible but pushing it. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [rlug] Failure building 2.4 while running 2.4. Success inbuilding 2.4 while running 2.2.
io am compilat 2.4.0 in timp ce rulam 2.4.0-test12 si a mers > > I can build 2.4.0 while running kernel 2.2.16. > > If I try to rebuild 2.4.0 while running the new kernel, I get random > compiler errors. > > It happens on two machines. One of them runs 2.4.0-test12, the other > 2.4.0. Both of them with the updates above mentioned. > > I know this is a RedHat issue, but it may be useful to know for some. > > -- > Systems and Network Administrator - Delta Romania > Phone +4093-267961 > > --- > Send e-mail to '[EMAIL PROTECTED]' with 'unsubscribe rlug' to > unsubscribe from this list. > Eugen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1
Hi, On Tue, Jan 09, 2001 at 03:40:56PM +0100, Ingo Molnar wrote: > > i'd love to first see these kinds of applications (under Linux) before > designing for them. Things like Beowulf have been around for a while now, and SGI have been doing that sort of multimedia stuff for ages. I don't think that there's any doubt that there's a demand for this. > Eg. if an IO operation (eg. streaming video webcast) > does a DMA from a camera card to an outgoing networking card, would it be > possible to access the packet data in case of a TCP retransmit? I'm not thinking about pci-to-pci as much as pci-to-memory-to-pci with no memory-to-memory copies. That's no different to writepage: doing a zero-copy writepage on a page cache page still gives you the problem of maintaining retransmit semantics if a user mmaps the file or writes to it after your initial transmit. And if you want other examples, we have applications such as Oracle who want to do raw disk IO in chunks of at least 128K. Going through a page-by-page interface for large IOs is almost as bad as the existing buffer_head-by-buffer_head interface, and we have already demonstrated that to be a bottleneck in the block device layer. Jes has also got hard numbers for the performance advantages of jumbograms on some of the networks he's been using, and you ain't going to get udp jumbograms through a page-by-page API, ever. Cheers, Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1
* Ingo Molnar ([EMAIL PROTECTED]) wrote: > > On Tue, 9 Jan 2001, Stephen C. Tweedie wrote: > > > but it just doesn't apply when you look at some other applications, > > such as streaming out video data or performing fileserving in a > > high-performance compute cluster where you are serving bulk data. > > The multimedia and HPC worlds typically operate on datasets which are > > far too large to cache, so you want to keep them in memory as little > > as possible when you ship them over the wire. > > i'd love to first see these kinds of applications (under Linux) before > designing for them. Eg. if an IO operation (eg. streaming video webcast) > does a DMA from a camera card to an outgoing networking card, would it be > possible to access the packet data in case of a TCP retransmit? Basically > these applications are limited enough in scope to justify even temporary > 'hacks' that enable them - and once we *see* things in action, we could > design for them. Not the other way around. Well, I know I for one use a system that you might have heard of called 'MOSIX'. It's a (kinda large) kernel patch with some user-space tools but allows for migration of processes between machines without modifying any code. There are some limitations (threaded applications and shared memory and whatnot) but it works very well for the rendering work we use it for. We use radiance which in general has pretty little inter- process communication and what it has is done through the filesystem. Now, the interesting bit here is that the processes can grow to be pretty large (200M+, up as high as 500M, higher if we let it ;) ) and what happens with MOSIX is that entire processes get sent over the wire to other machines for work. MOSIX will also attempt to rebalance the load on all of the machines in the cluster and whatnot so it can often be moving processes back and forth. So, anyhow, this is just an fyi if you weren't aware of it that I believe more than a few people are using MOSIX these days for similar appliactions and that it's availible at http://www.mosix.org if you're curious. Stephen PGP signature
Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1
> David S Miller <[EMAIL PROTECTED]> writes: >I would have thought one of the main interests of doing >something like this would be to allow us to speed up large >writes to the socket for ncpfs/knfsd/nfs/smbfs/... > This is what TCP_CORK/MSG_MORE et al. are all for, things get > coalesced perfectly. Sending in a vector of pages seems nice, > but none of the page cache infrastructure works like this, all > of the core routines work on a page at a time. It actually > simplifies a lot. > The writepage interface optimizes large file writes to a socket > just fine. OK, but can you eventually generalize it to non-stream protocols (i.e. UDP)? After all, it doesn't make sense to differentiate between zero-copy on stream and non-stream sockets, and Linux NFS, at least, remains heavily UDP-oriented... Cheers, Trond - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1
Hi, On Tue, Jan 09, 2001 at 11:23:41AM +0100, Ingo Molnar wrote: > > > Having proper kiobuf support would make it possible to, for example, > > do zerocopy network->disk data transfers and lots of other things. > > i used to think that this is useful, but these days it isnt. It's a waste > of PCI bandwidth resources, and it's much cheaper to keep a cache in RAM > instead of doing direct disk=>network DMA *all the time* some resource is > requested. No. I'm certain you're right when talking about things like web serving, but it just doesn't apply when you look at some other applications, such as streaming out video data or performing fileserving in a high-performance compute cluster where you are serving bulk data. The multimedia and HPC worlds typically operate on datasets which are far too large to cache, so you want to keep them in memory as little as possible when you ship them over the wire. Cheers, Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1
Hi, On Tue, Jan 09, 2001 at 01:04:49PM +0100, Ingo Molnar wrote: > > On Tue, 9 Jan 2001, Christoph Hellwig wrote: > > please study the networking portions of the zerocopy patch and you'll see > why this is not desirable. An alloc_kiovec()/free_kiovec() is exactly the > thing we cannot afford in a sendfile() operation. sendfile() is > lightweight, the setup times of kiovecs are not. > Right. However, kiobufs can be kept around for as long as you want and can be reused easily, and even if allocating and freeing them is more work than you want, populating an existing kiobuf is _very_ cheap. > another, more theoretical issue is that i think the kernel should not be > littered with multi-page interfaces, we should keep the one "struct page * > at a time" interfaces. Bad bad bad. We already have SCSI devices optimised for bandwidth which don't approach decent performance until you are passing them 1MB IOs, and even in networking the 1.5K packet limit kills us in some cases and we need an interface capable of generating jumbograms. Perhaps tcp can merge internal 4K requests, but if you're doing udp jumbograms (or STP or VIA), you do need an interface which can give the networking stack more than one page at once. --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1
> Bad bad bad. We already have SCSI devices optimised for bandwidth > which don't approach decent performance until you are passing them 1MB > IOs, and even in networking the 1.5K packet limit kills us in some Even low end cheap raid cards like the AMI megaraid dearly want 128K writes. Its quite a difference on them - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1
> " " == David S Miller <[EMAIL PROTECTED]> writes: > I've put a patch up for testing on the kernel.org mirrors: > /pub/linux/kernel/people/davem/zerocopy-2.4.0-1.diff.gz . > Finally, regardless of networking card, there should be a > measurable performance boost for NFS clients with this patch > due to the delayed fragment coalescing. KNFSD does not take > full advantage of this facility yet. Hi David, I don't really want to be chiming in with another 'make it a kiobuf', but given that you already have written 'do_tcp_sendpages()' why did you make sock->ops->sendpage() take the single page as an argument rather than just have it take the 'struct page **'? I would have thought one of the main interests of doing something like this would be to allow us to speed up large writes to the socket for ncpfs/knfsd/nfs/smbfs/... After all, in both the case of the client WRITE requests and the server READ responses, we end up with a set of several pages that just need to be pushed down the network without further ado. Unless I misunderstood the code, it seems that do_tcp_sendpages() fits the bill nicely... Cheers, Trond - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PLEASE-TESTME] Zerocopy networking patch, 2.4.0-1
On Tue, 9 Jan 2001, Stephen C. Tweedie wrote: > > i used to think that this is useful, but these days it isnt. It's a waste > > of PCI bandwidth resources, and it's much cheaper to keep a cache in RAM > > instead of doing direct disk=>network DMA *all the time* some resource is > > requested. > > No. I'm certain you're right when talking about things like web > serving, [...] yep, i was concentrating on fileserving load. > but it just doesn't apply when you look at some other applications, > such as streaming out video data or performing fileserving in a > high-performance compute cluster where you are serving bulk data. > The multimedia and HPC worlds typically operate on datasets which are > far too large to cache, so you want to keep them in memory as little > as possible when you ship them over the wire. i'd love to first see these kinds of applications (under Linux) before designing for them. Eg. if an IO operation (eg. streaming video webcast) does a DMA from a camera card to an outgoing networking card, would it be possible to access the packet data in case of a TCP retransmit? Basically these applications are limited enough in scope to justify even temporary 'hacks' that enable them - and once we *see* things in action, we could design for them. Not the other way around. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: VM subsystem bug in 2.4.0 ?
Hi, On Tue, Jan 09, 2001 at 03:53:55PM +0100, Christoph Rohland wrote: > > On Tue, 9 Jan 2001, Stephen C. Tweedie wrote: > > But again, how do you clear the bit? Locking is a per-vma property, > > not per-page. I can mmap a file twice and mlock just one of the > > mappings. If you get a munlock(), how are you to know how many > > other locked mappings still exist? > > It's worse: The issue we are talking about is SYSV IPC_LOCK. The issue is locked VA pages. SysV is just one of the ways in which it can happen: the solution has got to address both that and mlock()/mlockall(). > This is a > per segment thing. A user can (un)lock a segment at any time. But we > do not have the references to the vmas attached to the segemnts Why not? Won't the address space mmap* lists give you this? --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: FS callback routines
Jesse Pollard wrote: > Daniel Phillips <[EMAIL PROTECTED]>: > > This may be the most significant new feature in 2.4.0, as it allows us > > to take a fundamentally different approach to many different problems. > > Three that come to mind: mail (get your mail instantly without polling); > > make (don't rely on timestamps to know when rebuilding is needed, don't > > scan huge directory trees on each build); locate (reindex only those > > directories that have changed, keep index database current). As you > > noticed, there are many others. > > ... > > It would also be very nice if the security of the feature could be > confirmed. The problem with SGI's implementation is that it becomes > possible to monitor files that you don't own, don't have access to, > or are not permitted to know even exist. To receive notification about events in a given directory you have to be able to open it. Is this adequate for your needs? > For these reasons, we have disabled the feature. It's nice to have that option, isn't it? ;-) -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/