Re: test10-pre7
Ok, how about this approach? It only works for the case where we do not have the kind of multiple stuff that drivers/net has, but hey, we don't actually need to handle all the cases right now. We can leave that for the future, as the configuration process is likely to change anyway during 2.5.x, and the multiple object case may go away entirely (ie the case of slhc and 8390 will become just a normal configuration dependency: you'd have a "CONFIG_SLHC" entry that is computed by the dependency graph at configuration time, rather than by the Makefile at build time). This is the simplest rule base that I could come up with that should work for both SCSI and USB: # Translate to Rules.make lists. multi-used := $(filter $(list-multi), $(obj-y) $(obj-m)) multi-objs := $(foreach m, $(multi-used), $($(basename $(m))-objs)) active-objs := $(sort $(multi-objs) $(obj-y) $(obj-m)) O_OBJS := $(obj-y) M_OBJS := $(obj-m) MIX_OBJS:= $(filter $(export-objs), $(active-objs)) Does anybody see any problems with it? Basically, we're sidestepping the sorting, because neither SCSI nor USB need it. Making the problem simpler is always good. Now, the above won't work for drivers/net, but I think it will work for just about anything else. So let's just leave drivers/net alone for now. Simplicity is good. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Linux-2.4.0-test10
Ok, test10-final is out there now. This has no _known_ bugs that I consider show-stoppers, for what it's worth. And when I don't know of a bug, it doesn't exist. Let us rejoice. In traditional kernel naming tradition, this kernel hereby gets anointed as one of the "greased weasel" kernel series, one of the final steps in a stable release. We're still waiting for the Vatican to officially canonize this kernel, but trust me, that's only a matter of time. It's a little known fact, but the Pope likes penguins too. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux-2.4.0-test10
On Tue, 31 Oct 2000, Rik van Riel wrote: On Tue, 31 Oct 2000, Linus Torvalds wrote: Ok, test10-final is out there now. This has no _known_ bugs that I consider show-stoppers, for what it's worth. And when I don't know of a bug, it doesn't exist. Let us rejoice. In traditional kernel naming tradition, this kernel hereby gets anointed as one of the "greased weasel" kernel series, one of the final steps in a stable release. Well, there's the thing with RAW IO being done into a process' address space and the data arriving only after the page gets unmapped from the process. Yes. But that doesn't count like a "show-stopper" for me, simply because it's one of those small details that are known, and never materialize under normal load. Yes, it will have to be fixed before anybody starts doing RAW IO in a major way. And I bet it will be fixed. But it's not on my list of "I cannot release a 2.4.0 before this is done" - even if I think it will actually be fixed for the common case before that anyway. (Note: I suspect that we may just have to accept the fact that due to NFS etc issues, RAW IO into a shared mapping might not really supported at all. I don't think any raw IO user uses it that way anyway, so I think the big and worrisome case is actually only the swap-out case). We're still waiting for the Vatican to officially canonize this kernel, but trust me, that's only a matter of time. It's a little known fact, but the Pope likes penguins too. Lets just hope he doesn't need RAW IO ;) Naah, he mainly just does some browsing with netscape, and (don't tell a soul) plays QuakeIII with the door locked. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre7
On Tue, 31 Oct 2000, Russell King wrote: Linus Torvalds writes: On Wed, 1 Nov 2000, Keith Owens wrote: LINK_FIRST is processed in the order it is specified, so a.o will be linked before z.o when both are present. See the patch. So why don't you do the same thing for obj-y, then? Why can't you do LINK_FIRST=$(obj-y) and be done with it? Hmm, so why don't we just call it obj-y and be done with it? ;) That was going to be my next question if somebody actually said "sure". The question was rhetorical, since the way LINK_FIRST is implemented means that it has all the same problems that $(obj-y) has, and is hard to get right in the generic case (but you can get it trivially right for the subset case, like for USB). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux-2.4.0-test10
On Tue, 31 Oct 2000, Miles Lane wrote: Were there no changes between test10-pre7 and test10? I notice you didn't send out a Changelist. The Changelists help me focus my testing. Sorry. Here it is.. Linus - - final: - Jeff Garzik: ISA network driver cleanup, wrapper.h fixes, 8139too update, etc - Mike Coleman: fix TracerPid in /proc/n/status - Thomas Molina: mark NAT packet drop message KERN_DEBUG - Marcelo Tosatti: nbd should use GFP_BUFFER, not GFP_ATOMIC - Steve Pratt: TLB flush order fix - David Miller: network and sparc updates - Alan Cox: various details (NULL ptr checks in SCSI etc) - Daniel Roesen: pretty up microcode revision printouts - Mike Coleman: fix ptrace ambiguity issues - Paul Mackerras: make yenta work even in the absense of ISA irqs - me: make USB Makefile do the right thing for export-objs. - Randy Dunlap, USB: fix race conditions, usb enumeration etc. - pre7: - Niels Jensen: remove no-longer-needed workarounds for old gcc versions - Ingo Molnar Rik v Riel: VM inactive list maintenance correction - Randy Dunlap, USB: printer.c, usb-storage, usb identification and memory leak fixes - David Miller: networking updates - David Mosberger: add AT_CLKTCK to elf information. And make AT_PAGESZ work for static binaries too. - oops. pcmcia broke by mistake - Me: truncate vs page access race fix. - pre6: - Jeremy Fitzhardinge: autofs4 expiry fix - David Miller: sparc driver updates, networking updates - Mathieu Chouquet-Stringer: buffer overflow in sg_proc_dressz_write - Ingo Molnar: wakeup race fix (admittedly the window was basically non-existent, but still..) - Rasmus Andersen: notice that "this_slice" is no longer used for scheduling - delete the code that calculates it. - ALI pirq routing update. It's even uglier than we initially thought.. - Dimitrios Michailidis: fix ipip locking bugs - Various: face it - gcc-2.7.2.3 miscompiles structure initializers. - Paul Cassella: locking comments on dev_base - Trond Myklebust: NFS locking atomicity. refresh inode properly. - Andre Hedrick: Serverworks Chipset driver, IDE-tape fix - Paul Gortmaker: kill unused code from 8390 support. - Andrea Arcangeli: fix nfsv3d wrong truncates over 4G - Maciej W. Rozycki: PIIX4 needs the same USB quirk handling as PIIX3. - me: if we cannot figure out the PCI bridge windows, just "inherit" the window from the parent. Better than not booting. - Ching-Ling Lee: ALI 5451 Audio core support update - pre5: - Mikael Pettersson: more Pentium IV cleanup. - David Miller: non-x86 platforms missed "pte_same()". - Russell King: NFS invalidate_inode_pages() can do bad things! - Randy Dunlap: usb-core.c is gone - module fix - Ben LaHaise: swapcache fixups for the new atomic pte update code - Oleg Drokin: fix nm256_audio memory region confusion - Randy Dunlap: USB printer fixes - David Miller: sparc updates - David Miller: off-by-one error in /proc socket dumper - David Miller: restore non-local bind() behaviour. - David Miller: wakeups on socket shutdown() - Jeff Garzik: DEPCA net drvr fixes and CodingStyle - Jeff Garzik: netsemi net drvr fix - Jeff Garzik Andrea Arkangeli: keyboard cleanup - Jeff Garzik: VIA audio update - Andrea Arkangeli: mxcsr initialization cleanup and fix - Gabriel Paubert: better twd_i387_to_fxsr() emulation - Andries Brouwer: proper error return in ext2 mkdir() - pre4: - disable writing to /proc/xxx/mem. Sure, it works now, but it's still a security risk. - IDE driver update (Victroy66 SouthBridge support) - i810 rng driver cleanup - fix sbus Makefile - named initializers in module.. - ppoe: remove explicit initializer - it's done with initcalls. - x86 WP bit detection: do it cleanly with exception handling - Arnaldo Carvalho de Melo: memory leaks in drivers/media/video - Bartlomiej Zolnierkiewicz: video init functions get __init - David Miller: get rid of net/protocols.c - they get to initialize themselves - David Miller: get rid of dev_mc_lock - we hold dev-xmit_lock anyway. - Geert Uytterhoeven: Zorro (Amiga) bus support update - David Miller: work around gcc-2.7.2 bug - Geert Uytterhoeven: mark struct consw's "const". - Jeff Garzik: network driver cleanups, ns558 joystick driver oops fix - Tigran Aivazian: clean up __alloc_pages(), kill_super() and notify_change() - Tigran Aivazian: move stuff from .data to .bss - Jeff Garzik: divert.h typename cleanups - James Simmons: mdacon using spinlocks - Tigran Aivazian: fix BFS free block calculation - David Miller: sparc32 works again - Bernd Schmidt: fix undefined C code (set/use without a sequence point) - Mikael Pettersson: nicer Pentium IV setup handling. - Georg
Re: Poll and OSS API
On Thu, 2 Nov 2000, Thomas Sailer wrote: The OSS API (http://www.opensound.com/pguide/oss.pdf, page 102ff) specifies that a select _with the sounddriver's filedescriptor set in the read mask_ should start the recording. So fix the stupid API. The above is just idiocy. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: Negative scalability by removal of lock_kernel()?(Was:Strange performance behavior of 2.4.0-test9)
In article [EMAIL PROTECTED], Andrew Morton [EMAIL PROTECTED] wrote: neither flock() nor fcntl() serialisation are effective on linux 2.2 or linux 2.4. This is because the file locking code still wakes up _all_ waiters. In my testing with fcntl serialisation I have seen a single Apache instance get woken and put back to sleep 1,500 times before the poor thing actually got to service a request. Indeed. flock() is the absolute worst case, and always has been. I guess nobody every actually bothered to benchmark it. For kernel 2.2 I recommend that Apache consider using sysv semaphores for serialisation. They use wake-one. For kernel 2.4 I recommend that Apache use unserialised accept. No. Please use unserialized accept() _always_, because we can fix that. Even 2.2.x can be fixed to do the wake-one for accept(), if required. It's not going to be any worse than the current apache config, and basically the less games apache plays, the better the kernel can try to accomodate what apache _really_ wants done. When playing games, you hide what you really want done, and suddenly kernel profiles etc end up being completely useless, because they no longer give the data we needed to fix the problem. Basically, the whole serialization crap is all about the Apache people saying the equivalent of "the OS does a bad job on something we consider to be incredibly important, so we do something else instead to hide it". And regardless of _what_ workaround Apache does, whether it is the sucky fcntl() thing or using SysV semaphores, it's going to hide the real issue and mean that it never gets fixed properly. And in the end it will result in really really bad performance. Instead, if apache had just done the thing it wanted to do in the first place, the wake-one accept() semantics would have happened a hell of a lot earlier. Now it's there in 2.4.x. Please use it. PLEASE PLEASE PLEASE don't play games trying to outsmart the OS, it will just hurt Apache in the long run. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Poll and OSS API
On Sat, 4 Nov 2000, Jeff Garzik wrote: So fix the stupid API. The above is just idiocy. We're pretty much stuck with the API, until we look at merging ALSA in 2.5.x. Broken API or not, OSS is a mature API, and there are spec-correct apps that depend on this behavior. Considering that about 100% of the sound drivers do not follow that particular API damage anyway (they can't, as has been pointed out: the driver doesn't even receive enough information to be _able_ to follow the documented API), I doubt that there are all that many programs that depend on it. Yes, some drivers apparently _try_ to follow the spec to some degree, but we should just change the documentation asap. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: Negative scalability by removal of
On Sat, 4 Nov 2000, Alan Cox wrote: Even 2.2.x can be fixed to do the wake-one for accept(), if required. Do we really want to retrofit wake_one to 2.2. I know Im not terribly keen to try and backport all the mechanism. I think for 2.2 using the semaphore is a good approach. Its a hack to fix an old OS kernel. For 2.4 its not needed We don't need to backport of the full exclusive wait queues: we could do the equivalent of the semaphore inside the kernel around just accept(). It wouldn't be a generic thing, but it would fix the specific case of accept(). Otherwise we're going to have old binaries of apache lying around forever that do the wrong thing.. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: Negative scalability by removal of
On Tue, 7 Nov 2000, Andrew Morton wrote: Alan Cox wrote: Even 2.2.x can be fixed to do the wake-one for accept(), if required. Do we really want to retrofit wake_one to 2.2. I know Im not terribly keen to try and backport all the mechanism. I think for 2.2 using the semaphore is a good approach. Its a hack to fix an old OS kernel. For 2.4 its not needed It's a 16-liner! I'll cheerfully admit that this patch may be completely broken, but hey, it's free. I suggest that _something_ has to be done for 2.2 now, because Apache has switched to unserialised accept(). This is why I'd love to _not_ see silly work-arounds in apache: we obviously _can_ fix the places where our performance sucks, but only if we don't have other band-aids hiding the true issues. For example, with a file-locking apache, we'd have to fix the (noticeably harder) file locking thing to be wake-one instead, and even then we'd never be able to do as well as something that gets the same wake-one thing without the two extra system calls. The patch looks superficially fine to me, although it does seem to add another cache-line to the wakeup setup - it migth be worth-while to have the exclusive state closer. But maybe I just didn't count right. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
test11-pre1
Mostly driver updates. With a few notable exceptions: two rather subtle MM race conditions that happened with SMP and highmem respectively. And the FXCSR and file locking that was already discussed on the list. Linus - - pre1: - me: make PCMCIA work even in the absense of PCI irq's - me: add irq mapping capabilities for Cyrix southbridges - me: make IBMMCA compile right as a module - me: uhhuh. Major atomic-PTE SMP race boo-boo. Fixed. - Andrea Arkangeli: don't allow people to set security-conscious bits in mxcsr through ptrace SETFPXREGS. - Jürgen Fischer: aha152x update - Andrew Morton, Trond Myklebust: file locking fixes - me: TLB invalidate race with highmem - Paul Fulghum: synclink/n_hdlc driver updates - David Miller: export sysctl_jiffies, and have the proper no-sysctl version handy - Neil Brown: RAID driver deadlock and nsfd read access to execute-only files fix - Keith Owens: clean up module information passing, remove "get_module_symbol()". - Jeff Garzik: network (and other) driver fixes and cleanups - Andrea Arkangeli: scheduler cleanup. - Ching-Ling Li: fix ALi sound driver memory leak - Anton Altaparmakov: upcase fix for NTFS - Thomas Woller: CS4281 audio update - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] deadlock fix
On Tue, 7 Nov 2000, Gary E. Miller wrote: I see this patch did not make it into test11-pre1. Without it raid1 and SMP do not work together. Please consider for test11-pre2. You must have a different test11-pre1 than the one I have. It's already there in -pre1, as far as I can see. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RANT] Linux-IrDA status
On Wed, 8 Nov 2000, Michael Rothwell wrote: Linus Torvalds wrote: Also, I've never seen much in the form of explanation, and at least the last patch I saw just the first screenful was so off-putting that I just went "Ok, I have real bugs to fix, I don't need this crap". Like what? I'm not sure what you're saying here. It seems that the pople writing the IrDA code have gotten no feedback from you as to why their patch is never accepted -- could you clarify? There's one _major_ reason why things never get accepted: CVS trees I'm not fed patches. I'm force-fed big changes every once in a while. I don't like it. I like it even less when the very first screen of a patch is basically a stupid change that implies that somebody calls ioctl's from interrupts. When I get a big patch like that, where the very first screen is bletcherous, what the hell am I supposed to do? I'm not going to waste my time on people who cannot send multiple small and well-defined patches, and who send be big, ugly, "non-maintained" (as far as I'm concerned) patches. I'm surprised Alan rants about this. He knows VERY well how I work, and is (along with Jeff Garzik and Randy Dunlap) one of the people who are very good at sending me 25 separate patches with explanations of what they do. Basically, if you send me a big patch with tons of changes, how the hell DO you expect me to answer them? Does anybodt really expect me to go through ten thousand lines of code that I do not know, and comment on it? Obviously not, as anybody with an ounce of sense would see. So what choice do I have? Apply them blindly? Quite frankly, I'd rather have a few people hate me deeply than apply stuff I don't like. If I just start blindly applying big patches, I can avoid nasty discussions. But I'd rather have people flame me. Maybe some day people will instead start sending me smaller commented patches. I'm NOT going to do other peoples work for them. If people can't be bothered to send me well-specified patches ESPECIALLY now that we're close to 2.4.x, then I can't be bothered to apply them, Live with it. Hat eme all you like. I do not care. Th ething I care about is not letting too much crap through unchecked. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [RANT] Linux-IrDA status
On Wed, 8 Nov 2000, Michael Rothwell wrote: Like what? I'm not sure what you're saying here. It seems that the pople writing the IrDA code have gotten no feedback from you as to why their patch is never accepted -- could you clarify? Just to clarify. The ONLY message from the IrDA people I've gotten during the last few weeks has been a SINGLE email from Dag Brattli, with a 330kB patch. The whole, full, unabridged explanation for those 330kB of patches: Hello Linus, Here is the latest IrDA patch for Linux-2.4.0-test10. Short summary: o Fixes IrDA in 2.4 o Touches _no_ other files. Please apply! Best regards Dag Brattli That's it. ONE message during the last month. ONE huge patch. From people who should have known about 2.4.x being pending for some time. 10,000+ lines of diff, with _no_ effort to split it up, or explain it with anything but "o Fixes IrDA in 2.4" and these people expect me to reply, sending long explanations of why I don't like them? After they did nothing of the sort for the code they claim should have been applied? Nada. Get a grip. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Pentium 4 and 2.4/2.5
In article [EMAIL PROTECTED], Alan Cox [EMAIL PROTECTED] wrote: Be careful with the intel patches. The ones I've seen so far tried to call the cpu 'if86' breaking several tools that do cpu model checking off uname. They didnt fix the 2GHz CPU limit, they use 'rep nop' in the locks which is explicitly 'undefined behaviour' for non intel processors and they use the TSC without checking it had one. "rep nop" is definitely not undefined behaviour except in some older Intel manuals. Do you actually know of a CPU where it doesn't work? Every single intel-compatible CPU I know of has the rep prefixes as no-ops if they aren't used (lock - ILL being a later, documented, addition), and the way the prefixes work it almost has to be that way. As prefixes they can't be part of the instruction, because you can legally have other prefixes in between the rep and the real instruction, which means that any sane implementation will just set a flag when it sees the prefix, and an instruction that doesn't care will just ignore the flag. So you'd almost have to do _extra_ work to make "rep nop" fail, even if it used to be specified as "undefined". Standard 2.4.x will definitely be using "rep nop" unless somebody can show me a CPU where it doesn't work (and even then I probably won't care unless that CPU is also SMP-capable). It's documented by intel these days, and it works on all CPU's I've ever heard of, and it even makes sense to me (*). (*) Well.. More sense than _some_ instruction set extensions I've seen. After all, "repeat no-op" for a longer delay sounds almost logical. Certainly better than that IV == 15 thing, ugh ;) Also, at least part of the reason Intel removed the TSC check was that Linux actually seems to get the extended CPU capability flags wrong, overwriting the _real_ capability flags which in turn caused the TSC check on Linux to simply not work. Peter Anvin is working on fixing this. I suspect that Linux-2.2 has the same problem. There's a few other minor details that need to be fixed for Pentium 4 features (aka " not very well documented errata"), and I think I have them all except for waiting for Peter to get the capabilities flag handling right. So I suspect that we'll have good support for Pentium IV soon enough.. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Pentium 4 and 2.4/2.5
In article [EMAIL PROTECTED], Alan Cox [EMAIL PROTECTED] wrote: rep;nop is a magic instruction on the PIV and possibly some PIII series CPUs [not sure]. As far as I can make out it naps momentarily or until bus activity thus saving power on spinlocks. From what I've heard, the reason Intel _really_ wants "rep nop" is that without it the CPU will heat up quite efficiently (that's what you do when you want to run at an eventual 2GHz with all cylinders firing all the time), causing thermal meltdown on non-thermally protected CPU's and CPU speed throttling on the ones that _are_ thermally protected (which will obviously have to be all the shipping ones). And the thermal throttling will severly cripple performance. The problem is 'rep nop' is not defined on other cpus so we can only really use it on the PIII/PIV kernel builds Intel retroactively defined it for all their CPU's. And I very strongly suspect that every single other x86 CPU vendor does the same. Why not? They get a new instruction for free, but just documenting it. Maybe they can sell the same old chip with a new name ("The X Wonderchip. Now with documetned 'rep nop' support! Get one today!"). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Pentium 4 and 2.4/2.5
On Wed, 8 Nov 2000, Alan Cox wrote: unless that CPU is also SMP-capable). It's documented by intel these days, and it works on all CPU's I've ever heard of, and it even makes sense to me (*). Do the intel docs guarantee it works on i486 and higher, if so SMP athlon will be the only check needed for the SMP users. You work for an x86 chip cloning company so if you say it works I trust you 8) Well, we don't make low-power SMP laptops, so as such Transmeta doesn't much care. It will work, though. And yes, as far as I know Intel made it an "architecture feature", meaning that they claim it work son all their ia32 chips. Now, I could imagine that Intel would select an instruction that didn't work on Athlon on purpose, but I really don't think they did. I don't have an athlon to test. It's easy enough to generate a test-program. If the following works, you're pretty much guaranteed that it's ok int main() { printf("Testing 'rep nop' ... "); asm volatile("rep ; nop"); printf("okey-dokey\n"); return 0; } (there's not much a "rep nop" _can_ do, after all - the most likely CPU extension would be to raise an "Illegal Opcode" fault). Also, at least part of the reason Intel removed the TSC check was that Linux actually seems to get the extended CPU capability flags wrong, overwriting the _real_ capability flags which in turn caused the TSC check on Linux to simply not work. Peter Anvin is working on fixing this. I suspect that Linux-2.2 has the same problem. I've not seen incorrect TSC detection in 2.2, do you know the precise circumstances this occurs and I'll check over them. I've also got no bug reports of this failing. It won't fail on other CPU's. The bug is, as far as I can tell, in get_model_name(), cpuid(0x8001, dummy, dummy, dummy, (c-x86_capability)); Notice how we overwrite the x86_capability state with whatever we read from the extended register 0x8001. So we overwrite the _real_ capabilities that we got the right way in head.S. This is wrong. It just happens to work on other, non-Pentium IV, processors. The extended capabilities are an _extention_, not replacement, for the regular capabilities. check_config would also panic with the 'Kernel compiled for ..' message if it occurred. Which is what it apparently does, if you compile for TSC. Even though very obviously a Pentium IV _does_ have a TSC. NOTE! I don't actually have access to a Pentium IV myself yet, although I'm promised one soon enough. So I've only got second-hand reports on the cpuid thing so far. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: PATCH: rd - deadlock removal
On Thu, 9 Nov 2000, Jens Axboe wrote: The second is more elegant in that it side steps the problem by giving rd.c a make_request function instead of using the default _make_request. This means that io_request_lock is simply never claimed my rd. And this solution is much better, even given the freeze I think that is the way to go. I agree, I already applied it. The second approach just makes the problem go away, and also avoids needlessly merging the request etc. I suspect that the lack of request-merging could also eventually be used to simplify the driver a bit, as it now wouldn't need to worry about that issue any more at all. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] /proc/pid/stat access stalls badly for swapping process,2.4.0-test10
As to the real reason for stalls on /proc/pid/stat, I bet it has nothing to do with IO except indirectly (the IO is necessary to trigger the problem, but the _reason_ for the problem lies elsewhere). And it has everything to do with the fact that the way Linux semaphores are implemented, a non-blocking process has a HUGE advantage over a blocking one. Linux kernel semaphores are extreme unfair in that way. What happens is that some process is getting a lot of VM faults and gets its VM semaphore. No contention yet. it holds the semaphore over the IO, and now another process does a "ps". The "ps" process goes to sleep on the semaphore. So far so good. The original process releases the semaphore, which increments the count, and wakes up the process waiting for it. Note that it _wakes_ it, it does not give the semaphore to it. Big difference. The process that got woken up will run eventually. Probably not all that immediately, because the process that woke it (and held the semaphore) just slept on a page fault too, so it's not likely to immediately relinquish the CPU. The original running process comes back faulting again, finds the semaphore still unlocked (the "ps" process is awake but has not gotten to run yet), gets the semaphore, and falls asleep on the IO for the next page. The "ps" process actually gets to run now, but it's a bit late. The semaphore is locked again. Repeat until luck breaks the bad circle. (This schenario, btw, is much harder to trigger on SMP than on UP. And it's completely separate from the issue of simple disk bandwidth issues which can obviously cause no end of stalls on anything that needs the disk, and which can also happen on SMP). NOTE! If somebody wants to fix this, the fix should be reasonably simple but needs to be quite exhaustively checked and double-checked. It's just too easy to break the semaphores by mistake. The way to make semaphores more fair is to NOT allow a new process to just come in immediately and steal the semaphore in __down() if there are other sleepers. This is most easily accomplished by something along the lines of the following in __down() in arch/i386/kernel/semaphore.c spin_lock_irq(semaphore_lock); sem-sleepers++; + + /* +* Are there other people waiting for this? +* They get to go first. +*/ + if (sleepers 1) + goto inside; for (;;) { int sleepers = sem-sleepers; /* * Add "everybody else" into it. They aren't * playing, because we own the spinlock. */ if (!atomic_add_negative(sleepers - 1, sem-count)) { sem-sleepers = 0; break; } sem-sleepers = 1; /* us - see -1 above */ +inside: spin_unlock_irq(semaphore_lock); schedule(); tsk-state = TASK_UNINTERRUPTIBLE|TASK_EXCLUSIVE; spin_lock_irq(semaphore_lock); } spin_unlock_irq(semaphore_lock); But note that teh above is UNTESTED and also note that from a throughput (as opposed to latency) standpoint being unfair tends to be nice. Anybody want to try out something like the above? (And no, I'm not applying it to my tree yet. It needs about a hundred pairs of eyes to verify that there isn't some subtle "lost wakeup" race somewhere). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [bug] usb-uhci locks up on boot half the time
In article [EMAIL PROTECTED], David Ford [EMAIL PROTECTED] wrote: The oddity is that kdb shows the machine to lock up on the popf in pci_conf_write_word()+0x2c. I never did get around to digging up this routine and looking at the code, but I suspect this is a final return from the routine. I'm rather confused however, I have no idea why a flags pop would hang the hardware. Educated guess: it enables interrupts, after it has done something to the hardware that causes an infinite stream of them. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
test11-pre2
Nothing stands out as affecting most people here. Security fix for /proc, and various cleanups. Alpha and sparc fixes. If you use RAID or ramdisk, upgrade. Linus - - pre2: - Stephen Rothwell: directory notify could return with the lock held - Richard Henderson: CLOCKS_PER_SEC on alpha. - Jeff Garzik: ramfs and highmem: kmap() the page to clear it - Asit Mallick: enable the APIC in the official order - Neil Brown: avoid rd deadlock on io_request_lock by using a private rd-request function. This also avoids unnecessary request merging at this level. - Ben LaHaise: vmalloc threadign and overflow fix - Randy Dunlap: USB updates (plusb driver). PCI cacheline size. - Neil Brown: fix a raid1 on top of lvm bug that crept in in pre1 - Alan Cox: various (Athlon mmx copy, NULL ptr checks for scsi_register etc). - Al Viro: fix /proc permission check security hole. - Can-Ru Yeou: SiS301 fbcon driver - Andrew Morton: NMI oopser and kernel page fault punch through both console_lock and timerlist_lock to make sure it prints out.. - Jeff Garzik: clean up "kmap()" return type (it returns a kernel virtual address, ie a "void *"). - Jeff Garzik: network driver docs, various one-liners. - David Miller: add generic "special" flag to page flags, to be used by architectures as they see fit. Like keeping track of cache coherency issues. - David Miller: sparc64 updates, make sparc32 boot again - Davdi Millner: spel "synchronous" correctly - David Miller: networking - fix some bridge issues, and correct IPv6 sysctl entries. - Dan Aloni: make fork.c use proper macro rather than doing get_exec_domain() by hand. - pre1: - me: make PCMCIA work even in the absense of PCI irq's - me: add irq mapping capabilities for Cyrix southbridges - me: make IBMMCA compile right as a module - me: uhhuh. Major atomic-PTE SMP race boo-boo. Fixed. - Andrea Arkangeli: don't allow people to set security-conscious bits in mxcsr through ptrace SETFPXREGS. - Jürgen Fischer: aha152x update - Andrew Morton, Trond Myklebust: file locking fixes - me: TLB invalidate race with highmem - Paul Fulghum: synclink/n_hdlc driver updates - David Miller: export sysctl_jiffies, and have the proper no-sysctl version handy - Neil Brown: RAID driver deadlock and nsfd read access to execute-only files fix - Keith Owens: clean up module information passing, remove "get_module_symbol()". - Jeff Garzik: network (and other) driver fixes and cleanups - Andrea Arkangeli: scheduler cleanup. - Ching-Ling Li: fix ALi sound driver memory leak - Anton Altaparmakov: upcase fix for NTFS - Thomas Woller: CS4281 audio update - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] /proc/pid/stat access stalls badly for swapping process,2.4.0-test10
In article [EMAIL PROTECTED], Mike Galbraith [EMAIL PROTECTED] wrote: (This schenario, btw, is much harder to trigger on SMP than on UP. And it's completely separate from the issue of simple disk bandwidth issues which can obviously cause no end of stalls on anything that needs the disk, and which can also happen on SMP). Unfortunately, it didn't help in the scenario I'm running. time make -j30 bzImage: real14m19.987s (within stock variance) user6m24.480s sys 1m12.970s Note that the above kin of "throughput performance" should not have been affected, and was not what I was worried about. procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 31 2 1 12 1432 4440 12660 0 1227 151 202 848 89 11 0 34 4 1 1908 2584536 5376 248 1904 602 763 785 4094 63 32 5 13 19 1 64140 67728604 33784 106500 84612 43625 21683 19080 52168 28 22 50 Looks like there was a big delay in vmstat there - that could easily be due to simple disk throughput issues.. Does it feel any different under the original load that got the original complaint? The patch may have just been buggy and ineffective, for all I know. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] /proc/pid/stat access stalls badly for swapping process,2.4.0-test10
In article [EMAIL PROTECTED], David Mansfield [EMAIL PROTECTED] wrote: Linus Torvalds wrote: ... And it has everything to do with the fact that the way Linux semaphores are implemented, a non-blocking process has a HUGE advantage over a blocking one. Linux kernel semaphores are extreme unfair in that way. ... The original running process comes back faulting again, finds the semaphore still unlocked (the "ps" process is awake but has not gotten to run yet), gets the semaphore, and falls asleep on the IO for the next page. The "ps" process actually gets to run now, but it's a bit late. The semaphore is locked again. Repeat until luck breaks the bad circle. But doesn't __down have a fast path coded in assembly? In other words, it only hits your patched code if there is already contention, which there isn't in this case, and therefore the bug...? The __down() case should be hit if there's a waiter, even if that waiter has not yet been able to pick up the lock (the waiter _will_ have decremented the count to negative in order to trigger the proper logic at release time). But as I mentioned, the pseudo-patch was certainly untested, so somebody should probably walk through the cases to check that I didn't miss something. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: sendfile(2) fails for devices?
In article [EMAIL PROTECTED], Jeff Garzik [EMAIL PROTECTED] wrote: sendfile(2) fails with -EINVAL every time I try to read from a device file. This sounds like a bug... is it? (the man page doesn't mention such a restriction) sendfile() on purpose only works on things that use the page cache. EINVAL is basically sendfiles way of saying "I would fall back on doing a read+write, so you might as well do it yourself in user space because it might actually be more efficient that way". I am using kernel 2.4.0-test11-pre2. All other tests with sendfile(2) succeed: file-file, file-STDOUT, STDIN-file... Yes, as long as STDIN is a file ;) sendfile() wants the source to be in the page cache, because the whole point of sendfile() was to avoid a copy. The current device model does _not_ use the page cache. Now, arguably that's a bug - it also means that you cannot mmap() a block device - but as it could be easily documented (maybe it is, somewhere), I'll call it a bad feature for now. Now, if you want to add the code to do address spaces for block devices, I wouldn't be all that unhappy. I've wanted to see it for a while. I'm not likely to apply it for 2.4.x any more, but I'd love to have it early for 2.5.x. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] patch-2.4.0-test10-irda24 (resend)
On Sun, 12 Nov 2000, Dag Brattli wrote: (resending in case it got lost, didn't show up on linux-kernel) Didn't get lost, but I think the linux-kernel size filter killed it from the kernel list. Everything applied. Thanks, Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] wakeup_bdflush related fixes and nfsd optimizations fortest10
On Sat, 11 Nov 2000, Ying Chen/Almaden/IBM wrote: This patch includes two sets of things against test10: First, there are several places where schedule() is called after wakeup_bdflush(1) is called. This is completely unnecessary Fair enough. Second, (I have posted this to the kernel mailing list, but I forgot to cc to Linus.) I made some optimizations on racache in nfsd in test10. ..but this would need a lot more testing/feedback, especially from the nfs client maintainers (I see that Neil Brown did some querying already, I think more is in order). Also, I'd _really_ like those lists to be real linux/list.h lists instead of duplicating code. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: The IrDA patches !!! (was Re: [RANT] Linux-IrDA status)
Ok, thanks to the work of Jean, everything seems to be applied now. I'll make a test3 one of these days (probably tomorrow), please verify that everything looks happy. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] show_task() and thread_saved_pc() fix for x86
On Fri, 10 Nov 2000, Alexander Viro wrote: diff -urN rc11-2/include/asm-i386/processor.h rc11-2-show_task/include/asm-i386/processor.h --- rc11-2/include/asm-i386/processor.h Fri Nov 10 09:14:04 2000 +++ rc11-2-show_task/include/asm-i386/processor.h Fri Nov 10 16:08:15 2000 @@ -412,7 +412,7 @@ */ extern inline unsigned long thread_saved_pc(struct thread_struct *t) { - return ((unsigned long *)t-esp)[3]; + return ((unsigned long **)t-esp)[0][1]; } The above needs to get verified: it should be something like unsigned long *ebp = *((unsigned long **)t-esp); if ((void *) ebp (void *) t) return 0; if ((void *) ebp = (void *) t + 2*PAGE_SIZE) return 0; if (3 (unsigned long)ebp) return 0; return *ebp; because otherwise I guarantee that we'll eventually have a bug with a invalid pointer reference in the debugging code and that would be bad. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.4.0-test11-pre3
Drivers, drivers, drivers. IrDA and ISDN. PPC. The most interesting part is probably the exclusive wait-queue patch. David Miller noticed that exclusivity doesn't nest correctly the way we used to do it: being on multiple wait-queues would potentially cause lost wake-up events if a non-exclusive waiter got mistaken for an exclusive one because the exclusive bit was a per-process thing. Moving the exclusivity bit from the process into the wait-queue cleaned up the interfaces and also made it nest properly. No known uses were actually buggy, but at least one case was apparently ok only by pure luck. Linus - - pre3: - James Simmons: vgacon "printk()" deadlock with global irq lock. - don't poke blanked console on console output - Ching-Ling: get channels right on ALI audio driver - Dag Brattli and Jean Tourrilhes: big IrDA update - Paul Mackerras: PPC updates - Randy Dunlap: USB ID table support, LEDs with usbkbd, belkin serial converter. - Jeff Garzik: pcnet32 and lance net driver fix/cleanup - Mikael Pettersson: clean up x86 ELF_PLATFORM - Bartlomiej Zolnierkiewicz: sound and drm driver init fixes and cleanups - Al Viro: Jeff missed some kmap()'s. sysctl cleanup - Kai Germaschewski: ISDN updates - Alan Cox: SCSI driver NULL ptr checks - David Miller: networking updates, exclusive waitqueues nest properly, SMP i_shared_lock/page_table_lock lock order fix. - pre2: - Stephen Rothwell: directory notify could return with the lock held - Richard Henderson: CLOCKS_PER_SEC on alpha. - Jeff Garzik: ramfs and highmem: kmap() the page to clear it - Asit Mallick: enable the APIC in the official order - Neil Brown: avoid rd deadlock on io_request_lock by using a private rd-request function. This also avoids unnecessary request merging at this level. - Ben LaHaise: vmalloc threadign and overflow fix - Randy Dunlap: USB updates (plusb driver). PCI cacheline size. - Neil Brown: fix a raid1 on top of lvm bug that crept in in pre1 - Alan Cox: various (Athlon mmx copy, NULL ptr checks for scsi_register etc). - Al Viro: fix /proc permission check security hole. - Can-Ru Yeou: SiS301 fbcon driver - Andrew Morton: NMI oopser and kernel page fault punch through both console_lock and timerlist_lock to make sure it prints out.. - Jeff Garzik: clean up "kmap()" return type (it returns a kernel virtual address, ie a "void *"). - Jeff Garzik: network driver docs, various one-liners. - David Miller: add generic "special" flag to page flags, to be used by architectures as they see fit. Like keeping track of cache coherency issues. - David Miller: sparc64 updates, make sparc32 boot again - Davdi Millner: spel "synchronous" correctly - David Miller: networking - fix some bridge issues, and correct IPv6 sysctl entries. - Dan Aloni: make fork.c use proper macro rather than doing get_exec_domain() by hand. - pre1: - me: make PCMCIA work even in the absense of PCI irq's - me: add irq mapping capabilities for Cyrix southbridges - me: make IBMMCA compile right as a module - me: uhhuh. Major atomic-PTE SMP race boo-boo. Fixed. - Andrea Arkangeli: don't allow people to set security-conscious bits in mxcsr through ptrace SETFPXREGS. - Jürgen Fischer: aha152x update - Andrew Morton, Trond Myklebust: file locking fixes - me: TLB invalidate race with highmem - Paul Fulghum: synclink/n_hdlc driver updates - David Miller: export sysctl_jiffies, and have the proper no-sysctl version handy - Neil Brown: RAID driver deadlock and nsfd read access to execute-only files fix - Keith Owens: clean up module information passing, remove "get_module_symbol()". - Jeff Garzik: network (and other) driver fixes and cleanups - Andrea Arkangeli: scheduler cleanup. - Ching-Ling Li: fix ALi sound driver memory leak - Anton Altaparmakov: upcase fix for NTFS - Thomas Woller: CS4281 audio update - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
test11-pre5
More drivers. The x86 capabilities cleanup is here. Linus - pre5: - Rasmus Andersen: add proper "linux/init.h" for sound drivers - David Miller: sparc64 and networking updates - David Trcka: MOXA numbering starts from 0, not 1. - Jeff Garzik: sysctl.h standalone - Dag Brattli: IrDA finishing touches - Randy Dunlap: USB fixes - Gerd Knorr: big bttv update - Peter Anvin: x86 capabilities cleanup - Stephen Rothwell: apm initcall fix - smp poweroff should work - Andrew Morton: setscheduler() spinlock ordering fix - Stephen Rothwell: directory notification documentation - Petr Vandrovec: ncpfs capabilities check cleanup - David Woodhouse: fix jffs to use generic is() library - Chris Swiedler: oom_kill selection fix - Jens Axboe: re-merge after sleeping in ll_rw_block. - Randy Dunlap: USB updates (pegasus and ftdi_sio) - Kai Germaschewski: ISDN ppp header compression fixed - pre4: - Andrea Arcangeli: SMP scheduler memory barrier fixup - Richard Henderson: fix alpha semaphores and spinlock bugs. - Richard Henderson: clean up the file from hell: "xor.c" - pre3: - James Simmons: vgacon "printk()" deadlock with global irq lock. - don't poke blanked console on console output - Ching-Ling: get channels right on ALI audio driver - Dag Brattli and Jean Tourrilhes: big IrDA update - Paul Mackerras: PPC updates - Randy Dunlap: USB ID table support, LEDs with usbkbd, belkin serial converter. - Jeff Garzik: pcnet32 and lance net driver fix/cleanup - Mikael Pettersson: clean up x86 ELF_PLATFORM - Bartlomiej Zolnierkiewicz: sound and drm driver init fixes and cleanups - Al Viro: Jeff missed some kmap()'s. sysctl cleanup - Kai Germaschewski: ISDN updates - Alan Cox: SCSI driver NULL ptr checks - David Miller: networking updates, exclusive waitqueues nest properly, SMP i_shared_lock/page_table_lock lock order fix. - pre2: - Stephen Rothwell: directory notify could return with the lock held - Richard Henderson: CLOCKS_PER_SEC on alpha. - Jeff Garzik: ramfs and highmem: kmap() the page to clear it - Asit Mallick: enable the APIC in the official order - Neil Brown: avoid rd deadlock on io_request_lock by using a private rd-request function. This also avoids unnecessary request merging at this level. - Ben LaHaise: vmalloc threadign and overflow fix - Randy Dunlap: USB updates (plusb driver). PCI cacheline size. - Neil Brown: fix a raid1 on top of lvm bug that crept in in pre1 - Alan Cox: various (Athlon mmx copy, NULL ptr checks for scsi_register etc). - Al Viro: fix /proc permission check security hole. - Can-Ru Yeou: SiS301 fbcon driver - Andrew Morton: NMI oopser and kernel page fault punch through both console_lock and timerlist_lock to make sure it prints out.. - Jeff Garzik: clean up "kmap()" return type (it returns a kernel virtual address, ie a "void *"). - Jeff Garzik: network driver docs, various one-liners. - David Miller: add generic "special" flag to page flags, to be used by architectures as they see fit. Like keeping track of cache coherency issues. - David Miller: sparc64 updates, make sparc32 boot again - Davdi Millner: spel "synchronous" correctly - David Miller: networking - fix some bridge issues, and correct IPv6 sysctl entries. - Dan Aloni: make fork.c use proper macro rather than doing get_exec_domain() by hand. - pre1: - me: make PCMCIA work even in the absense of PCI irq's - me: add irq mapping capabilities for Cyrix southbridges - me: make IBMMCA compile right as a module - me: uhhuh. Major atomic-PTE SMP race boo-boo. Fixed. - Andrea Arkangeli: don't allow people to set security-conscious bits in mxcsr through ptrace SETFPXREGS. - Jürgen Fischer: aha152x update - Andrew Morton, Trond Myklebust: file locking fixes - me: TLB invalidate race with highmem - Paul Fulghum: synclink/n_hdlc driver updates - David Miller: export sysctl_jiffies, and have the proper no-sysctl version handy - Neil Brown: RAID driver deadlock and nsfd read access to execute-only files fix - Keith Owens: clean up module information passing, remove "get_module_symbol()". - Jeff Garzik: network (and other) driver fixes and cleanups - Andrea Arkangeli: scheduler cleanup. - Ching-Ling Li: fix ALi sound driver memory leak - Anton Altaparmakov: upcase fix for NTFS - Thomas Woller: CS4281 audio update - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: test11-pre5
On Wed, 15 Nov 2000, Dan Aloni wrote: summery: dev_3c501.name shouldn't be NULL, or we get oops Note that these days "name" is not a pointer at all, but an array, and as such cannot be NULL any more. Not initializing it will just cause it to be empty (ie is the same as initializing it to ""). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: test11-pre5
In article [EMAIL PROTECTED], Dan Aloni [EMAIL PROTECTED] wrote: On Tue, 14 Nov 2000, Jeff Garzik wrote: Dan Aloni wrote: reason: Correct me if I'm wrong, but 3c501.c:init_module() calls net_init.c:register_netdev(dev_3c501), which calls strchr(), {and might also,which might} dereference dev_3c501.name. There is no dereferencing involved, and therefore no problem. Well, at least I was alertive. Almost a bug fix ;-) Is there a special reason why dev-name is not a pointer? It used to be. And we used to have an incredible number of bugs with initialization and with creating these things dynamically. A lot of Space.c was due to horrible hackery with getting the static allocation right for these things. Turning it into a plain array got rid of all the hackery, and saved memory anyway. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Memory management bug
In article [EMAIL PROTECTED], After some trickery with some special hardware feature (storage keys) I found out that empty_bad_pmd_table and empty_bad_pte_table have been put to the page table quicklists multiple(!) times. This is definitely bad, and means that something else really bad is going on. In fact, I have this fairly strong suspicion that we should just get rid of the "bad" page tables altogether, and make the stuff that now uses them BUG() instead. The whole concept of "bad" page tables comes from very early on in Linux, when the way the page fault handler worked was that if it ran out of memory or something else really bad happened, it would insert a dummy page table entry that was guaranteed to let the CPU continue. That way the page fault handler was always "successful" from a hardware standpoint, even if it ended up trying to kill the process. This used to be required simply because a page fault in kernel space originally needed to let the process unwind sanely and cleanly. These days, the requirement that page faults always "succeed" is long long gone. The exception handling mechanism handles the cases where we validly can take a page fault, and in other cases we will just kill the process outright. As such, the bad page tables should no longer be needed, and are apparently just hiding some nasty bugs. What happens if you just replace all places that would use a bad page table with a BUG()? (Ie do _not_ add the bug to the place where you added the test: by that time it's too late. I'm talking about the places where the bad page tables are used, like in the error cases of "get_pte_kernel_slow()" etc. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: isofs broken (2.2 and 2.4)
On Thu, 16 Nov 2000, Andries Brouwer wrote: Has there been a kernel version that could read these? It looks like it proclaims blocksize 512 and uses blocksize 2048 or so. The (de_len == 0) check in do_isofs_readdir() seems to imply that the blocksize is always 2048. So at the very least something is inconsistent. We use ISOFS_BUFFER_SIZE(inode) (512 in this case) for some sector sizes, and then ISOFS_BLOCK_SIZE (2048) for others. But the way isofs_bmap() works, we need to work with ISOFS_BUFFER_SIZE(inode). And I don't know if directories are always _aligned_ at 2048 bytes even if they should be blocked at 2k. Looking at the isofs lookup() logic, it will actually handle split entries, instead of complaining about them. And I suspect readdir() did too at some point, and the code was just removed (probably due to excessive confusion) when one of the many readdir() reorganizations was done. readdir() probably worked a long time ago. Is the thing documented somewhere? It looks like we should just allow entries that are split and not complain about them. We have the temporary buffer for it already.. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: isofs broken (2.2 and 2.4)
Does this patch fix it for you? Warning: TOTALLY UNTESTED!!! Please test carefully. Also, I'd be interested to know whether somebody really knows if the zero length handling is correct. Should we really round up to 2048, or should we perhaps round up only to the next bufsize? Linus - --- v2.4.0-test10/linux/fs/isofs/dir.c Fri Aug 11 14:29:01 2000 +++ linux/fs/isofs/dir.cWed Nov 15 17:14:26 2000 @@ -94,6 +94,14 @@ return retnamlen; } +static struct buffer_head *isofs_bread(struct inode *inode, unsigned int bufsize, +unsigned int block) +{ + unsigned int blknr = isofs_bmap(inode, block); + if (!blknr) + return NULL; + return bread(inode-i_dev, blknr, bufsize); +} + /* * This should _really_ be cleaned up some day.. */ @@ -105,7 +113,7 @@ unsigned char bufbits = ISOFS_BUFFER_BITS(inode); unsigned int block, offset; int inode_number = 0; /* Quiet GCC */ - struct buffer_head *bh; + struct buffer_head *bh = NULL; int len; int map; int high_sierra; @@ -117,46 +125,25 @@ return 0; offset = filp-f_pos (bufsize - 1); - block = isofs_bmap(inode, filp-f_pos bufbits); + block = filp-f_pos bufbits; high_sierra = inode-i_sb-u.isofs_sb.s_high_sierra; - if (!block) - return 0; - - if (!(bh = breada(inode-i_dev, block, bufsize, filp-f_pos, inode-i_size))) - return 0; - while (filp-f_pos inode-i_size) { int de_len; -#ifdef DEBUG - printk("Block, offset, f_pos: %x %x %x\n", - block, offset, filp-f_pos); - printk("inode-i_size = %x\n",inode-i_size); -#endif - /* Next directory_record on next CDROM sector */ - if (offset = bufsize) { -#ifdef DEBUG - printk("offset = bufsize\n"); -#endif - brelse(bh); - offset = 0; - block = isofs_bmap(inode, (filp-f_pos) bufbits); - if (!block) - return 0; - bh = breada(inode-i_dev, block, bufsize, filp-f_pos, inode-i_size); + + if (!bh) { + bh = isofs_bread(inode, bufsize, block); if (!bh) return 0; - continue; } de = (struct iso_directory_record *) (bh-b_data + offset); - if(first_de) inode_number = (block bufbits) + (offset (bufsize - 1)); + if (first_de) inode_number = (block bufbits) + (offset (bufsize - +1)); de_len = *(unsigned char *) de; #ifdef DEBUG printk("de_len = %d\n", de_len); -#endif - +#endif /* If the length byte is zero, we should move on to the next CDROM sector. If we are at the end of the directory, we @@ -164,36 +151,36 @@ if (de_len == 0) { brelse(bh); - filp-f_pos = ((filp-f_pos ~(ISOFS_BLOCK_SIZE - 1)) - + ISOFS_BLOCK_SIZE); + bh = NULL; + filp-f_pos = ((filp-f_pos ~(ISOFS_BLOCK_SIZE - 1)) + +ISOFS_BLOCK_SIZE); + block = filp-f_pos bufbits; offset = 0; - - if (filp-f_pos = inode-i_size) - return 0; - - block = isofs_bmap(inode, (filp-f_pos) bufbits); - if (!block) - return 0; - bh = breada(inode-i_dev, block, bufsize, filp-f_pos, inode-i_size); - if (!bh) - return 0; continue; } - offset += de_len; + offset += de_len; + if (offset == bufsize) { + offset = 0; + block++; + brelse(bh); + bh = NULL; + } + + /* Make sure we have a full directory entry */ if (offset bufsize) { - /* -* This would only normally happen if we had -* a buggy cdrom image. All directory -* entries should terminate with a null size -* or end exactly at the end of the sector. -*/ - printk("next_offset (%x) bufsize (%lx)\n", - offset,bufsize); - break; + int slop = bufsize - offset + de_len; + memcpy(tmpde, de, slop); + offset = bufsize - 1; +
Re: BUG: isofs broken (2.2 and 2.4)
On Wed, 15 Nov 2000, Linus Torvalds wrote: Does this patch fix it for you? Warning: TOTALLY UNTESTED!!! Please test carefully. Ok, I tested it with the broken image. It looks like "readdir()" is ok now (but not really knowing what the right output should be I cannot guarantee that). HOWEVER, doing an "ls -l" on some of the files gets ENOENT, implying that "lookup()" still has some problems with the image. I suspect the code to handle split entries in isofs_find_entry() has some simple bug, but I'm too lazy to check it out right now. Anybody else willing to finish this one off? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: isofs broken (2.2 and 2.4)
On Thu, 16 Nov 2000 [EMAIL PROTECTED] wrote: If noone else does, I suppose I can. Thanks. ( .. gets ENOENT .. and that is not because it only is a partial image?) I don't think so, but I obviously have no way of actually confirming my suspicion. If the stat information was wrong due to the partial image, the lookup should still have succeeded (the directory entries certainly were there - otherwise they'd not have shown up in readdir), and we would just have gotten garbage inode information etc. I think. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] Inconsistent behaviour of rmdir
On Thu, 16 Nov 2000, Jean-Marc Saffroy wrote: As you see, it looks like the rmdir fails simply because the dir name ends with a dot !! This is confirmed by sys_rmdir in fs/namei.c, around line 1384 : switch(nd.last_type) { case LAST_DOTDOT: error = -ENOTEMPTY; goto exit1; case LAST_ROOT: case LAST_DOT: error = -EBUSY; goto exit1; } Should we rip off the offending "case LAST_DOT" ? Or do we need a smarter patch ? Is it really a problem that a process has its current directory deleted ? How about the root ? The cwd is not the problem. The '.' is. The reason for that check is that allowing "rmdir(".")" confuses a lot of UNIX programs, because it wasn't traditionally allowed. The man page for rmdir(2) should be updated as well, the current one states : EBUSY pathname is the current working directory or root directory of some process. That's definitely wrong. You can do rmdir `pwd` and that's fine (not all filesystems will let you do that, but that's a low-level filesystem issue). It's really only the special names "." and ".." that cannot be removed. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Memory management bug
On Thu, 16 Nov 2000 [EMAIL PROTECTED] wrote: Ok, the BUG() hit in get_pmd_slow: pmd_t * get_pmd_slow(pgd_t *pgd, unsigned long offset) { pmd_t *pmd; int i; pmd = (pmd_t *) __get_free_pages(GFP_KERNEL,2); You really need 4 pages? There's no way to reliably get 4 consecutive pages when you're even close to being low on memory. I would suggest just failing with a NULL return here. What is the architecture setup for this machine? I have no clue about S/390 memory management. Maybe you can modify the pmd layout? One potential fix for this is to just make the page size bigger. Make "Linux pages" be _two_ hardware pages, and make a Linux pte contain two "hardware pte's". That way the pmd would be an order-1 allocation instead of an order-2 one. Which is statistically _much_ more likely to be around (exponential distribution). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Memory management bug
On Thu, 16 Nov 2000, Andrea Arcangeli wrote: If they absolutely needs 4 pages for pmd pagetables due hardware constraints I'd recommend to use _four_ hardware pages for each softpage, not two. Yes. However, it definitely is an issue of making trade-offs. Most 64-bit MMU models tend to have some flexibility in how you set up the page tables, and it may be possible to just move bits around too (ie making both the pmd and the pgd twice as large, and getting the expansion of 4 by doing two expand-by-two's, for example, if the hardware has support for doing things like that). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: PATCH: 8139too kernel thread
On Thu, 16 Nov 2000, Alexander Viro wrote: On Thu, 16 Nov 2000, Alan Cox wrote: The only disadvantage to this scheme is the added cost of a kernel thread over a kernel timer. I think this is an ok cost, because this is a low-impact thread that sleeps a lot.. 8K of memory, two tlb flushes, cache misses on the scheduler. The price is ^^^ actually extremely high. confused Does it really need non-lazy TLB? If Alan wants to back-port it into 2.2.x, the lazy tlb won't work. But yes, on 2.4.x the cost of threads is fairly low. The biggest cost by far is probably the locking needed for the scheduler etc, and there the best rule of thumb is probably to see whether the driver really ends up being noticeably simpler. The event stuff that we are discussing for pcmcia may make all of this moot, maybe media selection is the perfect example of how to do the very same thing. I'll forward Jeff the emails on that. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH (2.4)] atomic use count for proc_dir_entry
On Thu, 16 Nov 2000, Dan Aloni wrote: Makes procfs use an atomic use count for dir entries, to avoid using the Big kernel lock. Axboe says it looks ok. There's a race there. Look at what happens if de_put() races with remove_proc_entry(): we'd do free_proc_entry() twice. Not good. Leave the kernel lock for now. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
test11-pre6
The log-file says it all.. Linus - - pre6: - Intel: start to add Pentium IV specific stuff (128-byte cacheline etc) - David Miller: search-and-destroy places that forget to mark us running after removing us from a wait-queue. - me: NFS client write-back ref-counting SMP instability. - me: fix up non-exclusive waiters - Trond Myklebust: Be more careful about SMP in NFS and RPC code - Trond Myklebust: inode attribute update race fix - Charles White: don't do unaligned accesses in cpqarray driver. - Jeff Garzik: continued driver cleanup and fixes - Peter Anvin: integrate more of the Intel patches. - Robert Love: add i815 signature to the intel AGP support - Rik Faith: DRM update to make it easier to sync up 2.2.x - David Woodhouse: make old 16-bit pcmcia controllers work again (ie i82365 and TCIC) - pre5: - Rasmus Andersen: add proper "linux/init.h" for sound drivers - David Miller: sparc64 and networking updates - David Trcka: MOXA numbering starts from 0, not 1. - Jeff Garzik: sysctl.h standalone - Dag Brattli: IrDA finishing touches - Randy Dunlap: USB fixes - Gerd Knorr: big bttv update - Peter Anvin: x86 capabilities cleanup - Stephen Rothwell: apm initcall fix - smp poweroff should work - Andrew Morton: setscheduler() spinlock ordering fix - Stephen Rothwell: directory notification documentation - Petr Vandrovec: ncpfs capabilities check cleanup - David Woodhouse: fix jffs to use generic is() library - Chris Swiedler: oom_kill selection fix - Jens Axboe: re-merge after sleeping in ll_rw_block. - Randy Dunlap: USB updates (pegasus and ftdi_sio) - Kai Germaschewski: ISDN ppp header compression fixed - pre4: - Andrea Arcangeli: SMP scheduler memory barrier fixup - Richard Henderson: fix alpha semaphores and spinlock bugs. - Richard Henderson: clean up the file from hell: "xor.c" - pre3: - James Simmons: vgacon "printk()" deadlock with global irq lock. - don't poke blanked console on console output - Ching-Ling: get channels right on ALI audio driver - Dag Brattli and Jean Tourrilhes: big IrDA update - Paul Mackerras: PPC updates - Randy Dunlap: USB ID table support, LEDs with usbkbd, belkin serial converter. - Jeff Garzik: pcnet32 and lance net driver fix/cleanup - Mikael Pettersson: clean up x86 ELF_PLATFORM - Bartlomiej Zolnierkiewicz: sound and drm driver init fixes and cleanups - Al Viro: Jeff missed some kmap()'s. sysctl cleanup - Kai Germaschewski: ISDN updates - Alan Cox: SCSI driver NULL ptr checks - David Miller: networking updates, exclusive waitqueues nest properly, SMP i_shared_lock/page_table_lock lock order fix. - pre2: - Stephen Rothwell: directory notify could return with the lock held - Richard Henderson: CLOCKS_PER_SEC on alpha. - Jeff Garzik: ramfs and highmem: kmap() the page to clear it - Asit Mallick: enable the APIC in the official order - Neil Brown: avoid rd deadlock on io_request_lock by using a private rd-request function. This also avoids unnecessary request merging at this level. - Ben LaHaise: vmalloc threadign and overflow fix - Randy Dunlap: USB updates (plusb driver). PCI cacheline size. - Neil Brown: fix a raid1 on top of lvm bug that crept in in pre1 - Alan Cox: various (Athlon mmx copy, NULL ptr checks for scsi_register etc). - Al Viro: fix /proc permission check security hole. - Can-Ru Yeou: SiS301 fbcon driver - Andrew Morton: NMI oopser and kernel page fault punch through both console_lock and timerlist_lock to make sure it prints out.. - Jeff Garzik: clean up "kmap()" return type (it returns a kernel virtual address, ie a "void *"). - Jeff Garzik: network driver docs, various one-liners. - David Miller: add generic "special" flag to page flags, to be used by architectures as they see fit. Like keeping track of cache coherency issues. - David Miller: sparc64 updates, make sparc32 boot again - Davdi Millner: spel "synchronous" correctly - David Miller: networking - fix some bridge issues, and correct IPv6 sysctl entries. - Dan Aloni: make fork.c use proper macro rather than doing get_exec_domain() by hand. - pre1: - me: make PCMCIA work even in the absense of PCI irq's - me: add irq mapping capabilities for Cyrix southbridges - me: make IBMMCA compile right as a module - me: uhhuh. Major atomic-PTE SMP race boo-boo. Fixed. - Andrea Arkangeli: don't allow people to set security-conscious bits in mxcsr through ptrace SETFPXREGS. - Jürgen Fischer: aha152x update - Andrew Morton, Trond Myklebust: file locking fixes - me: TLB invalidate race with highmem - Paul Fulghum: synclink/n_hdlc driver updates - David Miller: export sysctl_jiffies, and
Re: [PATCH] pcmcia event thread. (fwd)
On Fri, 17 Nov 2000, Russell King wrote: Alan Cox writes: From a practical point of view that currently means 'delete Linus tree pcmcia regardless of what you are doing' since the modules from David Hinds and Linus pcmcia are not 100% binary compatible for all cases. However, deleting that code would render a significant number of ARM platforms without PCMCIA support, which would be real bad. Right now, I suspect that the in-kernel pcmcia code is actually at the point where it _is_ possible to use it. David Hinds has been keeping the cs layer in synch with the external versions, and tons of people have helped make the low-level drivers stable again. If somebody still has a problem with the in-kernel stuff, speak up. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pcmcia event thread. (fwd)
On Fri, 17 Nov 2000, Alan Cox wrote: regardless of what you are doing' since the modules from David Hinds and Linus pcmcia are not 100% binary compatible for all cases. However, deleting that code would render a significant number of ARM platforms without PCMCIA support, which would be real bad. It would actually have made no difference as said code didnt actually work anyway. Dwmw2 seems to have solved that Alan, Russell is talking about CardBus controllers (it's also PCMCIA, in fact, these days it's the _only_ pcmcia in any machine made less than five years ago). The patches to get i82365 and TCIC up and running again are interesting mainly for laptops with i486 CPUs and for desktops with pcmcia add-in cards (which are basically always ISA i82365-clones). They aren't interesting to ARM, I suspect. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pcmcia event thread. (fwd)
On Fri, 17 Nov 2000, Alan Cox wrote: Alan, Russell is talking about CardBus controllers (it's also PCMCIA, in fact, these days it's the _only_ pcmcia in any machine made less than five years ago). I have at least two machines here that are 2 years old but disagree with you. Once is only months old. Who makes those pieces of crap? And who _buys_ them? I can understand it in embedded stuff simply because the chips are simpler and smaller, but in a laptop you should definitely try to avoid it. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Memory management bug
On Fri, 17 Nov 2000 [EMAIL PROTECTED] wrote: Whats the reasoning behind these ifs ? To catch memory corruption or things running out of control in the kernel. I was refering to the "if (!order) goto try_again" ifs in alloc_pages, not the "if (something) BUG()" ifs. Basically, if you try to wait for orders 0, you may have to wait for a LOONG time. It actually works reasonably well on machines with big memories, because a buddy allocator _will_ try to coalesce memory allocations as much as possible. But it has nasty cases where you can be really unlucky. Feel free to run simulations to see, but basically if you have reasonably random allocation and free patterns and you want to get an order-X contiguous allocation, you may have to free up a noticeable portion of your memory before it succeeds. Sure, you could do "directed freeing", where you actually try to look at which pages would be worth freeing to find a large free area, but the complexity is not insignificant, and quite frankly the proper approach has always been "don't do that then". Don't rely on big contiguous chunks of memory. Having an mm that can guarantee contiguous chunks of physical memory would be cool, but I suspect strongly that it would have some serious downsides. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pcmcia event thread. (fwd)
On Fri, 17 Nov 2000, Jeff Garzik wrote: 2. Even when I specify cs_irq=27, it resorts to polling: Intel PCIC probe: Intel i82365sl DF ISA-to-PCMCIA at port 0x8400 ofs 0x00, 2 sockets host opts [0]: none host opts [1]: none ISA irqs (default) = none! polling interval = 1000 ms Intel i82365sl DF ISA-to-PCMCIA at port 0x8400 ofs 0x80, 2 sockets host opts [2]: none host opts [3]: none ISA irqs (default) = none! polling interval = 1000 ms For these two, it sounds to me like you need to be doing a PCI probe, and getting the irq and I/O port info from pci_dev. And calling pci_enable_device, which may or may not be a showstopper here... The i82365 stuff actually used to do much of this, but it was so intimately intertwined with the cardbus handling that I pruned it out for my sanity. It should be possible to do the same thing with a nice simple concentrated PCI probe, instead of having stuff quite as spread out as it used to be. As to why it doesn't show any ISA interrupts, who knows... Some of the PCI PCMCIA bridges need to be initialized. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: isofs broken (2.2 and 2.4)
On Fri, 17 Nov 2000, Harald Koenig wrote: this seems to make things much worse: starting with ~90M free memory "du" again started leaking (or maybe just using memory?) down to ~80M free memory when the system suddently locked up completely, no console switch was possible anymore (but Sysrq-B did reboot). How about this version (full patch against test10 - it includes a slightly corrected version of my earlier dir.c patch)? It's entirely untested, but it looks good and compiles. Ship it! Linus - diff -u --recursive --new-file v2.4.0-test10/linux/fs/isofs/dir.c linux/fs/isofs/dir.c --- v2.4.0-test10/linux/fs/isofs/dir.c Fri Aug 11 14:29:01 2000 +++ linux/fs/isofs/dir.cFri Nov 17 13:38:01 2000 @@ -94,6 +94,14 @@ return retnamlen; } +static struct buffer_head *isofs_bread(struct inode *inode, unsigned int bufsize, +unsigned int block) +{ + unsigned int blknr = isofs_bmap(inode, block); + if (!blknr) + return NULL; + return bread(inode-i_dev, blknr, bufsize); +} + /* * This should _really_ be cleaned up some day.. */ @@ -105,7 +113,7 @@ unsigned char bufbits = ISOFS_BUFFER_BITS(inode); unsigned int block, offset; int inode_number = 0; /* Quiet GCC */ - struct buffer_head *bh; + struct buffer_head *bh = NULL; int len; int map; int high_sierra; @@ -117,46 +125,25 @@ return 0; offset = filp-f_pos (bufsize - 1); - block = isofs_bmap(inode, filp-f_pos bufbits); + block = filp-f_pos bufbits; high_sierra = inode-i_sb-u.isofs_sb.s_high_sierra; - if (!block) - return 0; - - if (!(bh = breada(inode-i_dev, block, bufsize, filp-f_pos, inode-i_size))) - return 0; - while (filp-f_pos inode-i_size) { int de_len; -#ifdef DEBUG - printk("Block, offset, f_pos: %x %x %x\n", - block, offset, filp-f_pos); - printk("inode-i_size = %x\n",inode-i_size); -#endif - /* Next directory_record on next CDROM sector */ - if (offset = bufsize) { -#ifdef DEBUG - printk("offset = bufsize\n"); -#endif - brelse(bh); - offset = 0; - block = isofs_bmap(inode, (filp-f_pos) bufbits); - if (!block) - return 0; - bh = breada(inode-i_dev, block, bufsize, filp-f_pos, inode-i_size); + + if (!bh) { + bh = isofs_bread(inode, bufsize, block); if (!bh) return 0; - continue; } de = (struct iso_directory_record *) (bh-b_data + offset); - if(first_de) inode_number = (block bufbits) + (offset (bufsize - 1)); + if (first_de) inode_number = (bh-b_blocknr bufbits) + offset; de_len = *(unsigned char *) de; #ifdef DEBUG printk("de_len = %d\n", de_len); -#endif - +#endif /* If the length byte is zero, we should move on to the next CDROM sector. If we are at the end of the directory, we @@ -164,36 +151,33 @@ if (de_len == 0) { brelse(bh); - filp-f_pos = ((filp-f_pos ~(ISOFS_BLOCK_SIZE - 1)) - + ISOFS_BLOCK_SIZE); + bh = NULL; + filp-f_pos = ((filp-f_pos ~(ISOFS_BLOCK_SIZE - 1)) + +ISOFS_BLOCK_SIZE); + block = filp-f_pos bufbits; offset = 0; - - if (filp-f_pos = inode-i_size) - return 0; - - block = isofs_bmap(inode, (filp-f_pos) bufbits); - if (!block) - return 0; - bh = breada(inode-i_dev, block, bufsize, filp-f_pos, inode-i_size); - if (!bh) - return 0; continue; } - offset += de_len; - if (offset bufsize) { - /* -* This would only normally happen if we had -* a buggy cdrom image. All directory -* entries should terminate with a null size -* or end exactly at the end of the sector. -*/ - printk("next_offset (%x) bufsize (%lx)\n", - offset,bufsize); - break; + offset += de_len; + + /* Make sure we have a full directory entry */ + if (offset = bufsize) { + int slop =
Re: BUG: isofs broken (2.2 and 2.4)
On Fri, 17 Nov 2000, Harald Koenig wrote: Linus:0.380u 76.850s 1:19.12 97.6%0+0k 0+0io 113pf+0w Andries: 0.470u 97.220s 1:40.29 97.4%0+0k 0+0io 112pf+0w The biggest difference is just the system times and the fact that it's more efficient coding. BUT: there are some obvious bugs in the output of "du" and "find". some samples (all file names (should) match the format "xe%03d/xe%03d.%c%c" with both %03d being the _same_ number and both %c are in [a-z0-9]). Yes. There's a silly bug there, now that I've tested it a bit. Basically the test for stuff that traversed a boundary was wrong. The whole name conversion code is pretty horrible. It's been written over the years, and it was doing the same thing with small modifications in both readdir() and lookup(). I've got a cleaned up version that also should have the above bug fixed. Still ready to test? This time I went over the files rather carefully, and while I've not tested the fixed version I'm getting pretty happy with it. I'll merge some more of the name translation logic, but before I do that here's the newest patch.. Linus - diff -u --recursive --new-file v2.4.0-test10/linux/fs/isofs/dir.c linux/fs/isofs/dir.c --- v2.4.0-test10/linux/fs/isofs/dir.c Fri Aug 11 14:29:01 2000 +++ linux/fs/isofs/dir.cFri Nov 17 15:43:36 2000 @@ -40,14 +40,17 @@ lookup: isofs_lookup, }; -static int isofs_name_translate(char * old, int len, char * new) +int isofs_name_translate(struct iso_directory_record *de, char *new, struct inode +*inode) { - int i, c; + char * old = de-name; + int len = de-name_len[0]; + int i; for (i = 0; i len; i++) { - c = old[i]; + unsigned char c = old[i]; if (!c) break; + if (c = 'A' c = 'Z') c |= 0x20; /* lower case */ @@ -74,8 +77,7 @@ { int std; unsigned char * chr; - int retnamlen = isofs_name_translate(de-name, - de-name_len[0], retname); + int retnamlen = isofs_name_translate(de, retname, inode); if (retnamlen == 0) return 0; std = sizeof(struct iso_directory_record) + de-name_len[0]; if (std 1) std++; @@ -105,7 +107,7 @@ unsigned char bufbits = ISOFS_BUFFER_BITS(inode); unsigned int block, offset; int inode_number = 0; /* Quiet GCC */ - struct buffer_head *bh; + struct buffer_head *bh = NULL; int len; int map; int high_sierra; @@ -117,46 +119,22 @@ return 0; offset = filp-f_pos (bufsize - 1); - block = isofs_bmap(inode, filp-f_pos bufbits); + block = filp-f_pos bufbits; high_sierra = inode-i_sb-u.isofs_sb.s_high_sierra; - if (!block) - return 0; - - if (!(bh = breada(inode-i_dev, block, bufsize, filp-f_pos, inode-i_size))) - return 0; - while (filp-f_pos inode-i_size) { int de_len; -#ifdef DEBUG - printk("Block, offset, f_pos: %x %x %x\n", - block, offset, filp-f_pos); - printk("inode-i_size = %x\n",inode-i_size); -#endif - /* Next directory_record on next CDROM sector */ - if (offset = bufsize) { -#ifdef DEBUG - printk("offset = bufsize\n"); -#endif - brelse(bh); - offset = 0; - block = isofs_bmap(inode, (filp-f_pos) bufbits); - if (!block) - return 0; - bh = breada(inode-i_dev, block, bufsize, filp-f_pos, inode-i_size); + + if (!bh) { + bh = isofs_bread(inode, bufsize, block); if (!bh) return 0; - continue; } de = (struct iso_directory_record *) (bh-b_data + offset); - if(first_de) inode_number = (block bufbits) + (offset (bufsize - 1)); + if (first_de) inode_number = (bh-b_blocknr bufbits) + offset; de_len = *(unsigned char *) de; -#ifdef DEBUG - printk("de_len = %d\n", de_len); -#endif - /* If the length byte is zero, we should move on to the next CDROM sector. If we are at the end of the directory, we @@ -164,36 +142,33 @@ if (de_len == 0) { brelse(bh); - filp-f_pos = ((filp-f_pos ~(ISOFS_BLOCK_SIZE - 1)) - + ISOFS_BLOCK_SIZE); + bh = NULL; + filp-f_pos = ((filp-f_pos ~(ISOFS_BLOCK_SIZE - 1)) + +ISOFS_BLOCK_SIZE); + block = filp-f_pos bufbits;
Re: BUG: isofs broken (2.2 and 2.4)
On Sat, 18 Nov 2000 [EMAIL PROTECTED] wrote: But now that you did two-thirds of the job I take it you'll also do the third part? It is again precisely the same stuff. Are you talking about isofs_lookup_grandparent()? The code is now dead, and has been for a long time actually (as the VFS layer keeps track of ".." for us these days). Removed. I'll look at the isofs_read_level3_size() thing. At least that one doesn't have the name translation crap in it. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: isofs broken (2.2 and 2.4)
Oh, and sorry - the last patch doesn't contain the (obvious) fixes to the header files to take some of the calling convention changes into account. Linus --- --- v2.4.0-test10/linux/include/linux/iso_fs.h Fri Sep 8 12:52:56 2000 +++ linux/include/linux/iso_fs.hFri Nov 17 15:52:03 2000 @@ -177,16 +177,17 @@ extern int parse_rock_ridge_inode(struct iso_directory_record *, struct inode *); extern int get_rock_ridge_filename(struct iso_directory_record *, char *, struct inode *); +extern int isofs_name_translate(struct iso_directory_record *, char *, struct inode +*); extern int find_rock_ridge_relocation(struct iso_directory_record *, struct inode *); -int get_joliet_filename(struct iso_directory_record *, struct inode *, unsigned char *); +int get_joliet_filename(struct iso_directory_record *, unsigned char *, struct inode +*); int get_acorn_filename(struct iso_directory_record *, char *, struct inode *); extern struct dentry *isofs_lookup(struct inode *, struct dentry *); extern int isofs_get_block(struct inode *, long, struct buffer_head *, int); extern int isofs_bmap(struct inode *, int); -extern int isofs_lookup_grandparent(struct inode *, int); +extern struct buffer_head *isofs_bread(struct inode *, unsigned int, unsigned int); extern struct inode_operations isofs_dir_inode_operations; extern struct file_operations isofs_dir_operations; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: isofs broken (2.2 and 2.4)
There's a test11-pre7 there now, and I'd really ask people to check out the isofs changes because slight worry about those is what held me up from just calling it test11 outright. It's almost guaranteed to be better than what we had before, but anyway.. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test11-pre7 compile failure
In article [EMAIL PROTECTED], J Sloan [EMAIL PROTECTED] wrote: looks like the md fixes broke something - In file included from /usr/src/linux/include/linux/pagemap.h:17, from /usr/src/linux/include/linux/locks.h:9, from /usr/src/linux/include/linux/raid/md.h:37, from init/main.c:25: /usr/src/linux/include/linux/highmem.h: In function `bh_kmap': /usr/src/linux/include/linux/highmem.h:23: structure has no member named `p_page' The "p_page" should be a "b_page". Duh. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Freeze on FPU exception with Athlon
In article [EMAIL PROTECTED], =?iso-8859-1?q?Markus=20Schoder?= [EMAIL PROTECTED] wrote: The following small program (linked against glibc 2.1.3) reliably freezes my system (Athlon Thunderbird CPU) with at least kernels 2.4.0-test10 and 2.4.0-test11-pre5. Even the SysRq keys do not work after the freeze. Are you sure sysrq doesn't work? Many distributions will disable the kernel printing to the console, or move it to console 7 or similar. It would be really good to get the EIP trace of RightAlt+ScrollLock pressed a few times if you can try to see if you can use klogd to enable proper printk's. Older kernels (e.g. 2.3.40) seem to work. Any Ideas? The FP exception handling has certainly changed, but the changes should all have affected mainly just PIII kernels with XMM support enabled. An Athlon system should have been pretty unaffected. But I'll take a look if I see something obvious. One thing to try: if interrupts really don't work for you (and if SysRq doesn't work, that may be the case), please test out a kernel that simply ignores irq13 by just commenting out the line setup_irq(13, irq13); in arch/i386/kernel/i8259.c. Does that make any difference? (irq13 shouldn't be used any more, it's horrible legacy crap, but we do want to support even horrible legacy systems). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Freeze on FPU exception with Athlon
In article [EMAIL PROTECTED], =?iso-8859-1?q?Markus=20Schoder?= [EMAIL PROTECTED] wrote: The following small program (linked against glibc 2.1.3) reliably freezes my system (Athlon Thunderbird CPU) with at least kernels 2.4.0-test10 and 2.4.0-test11-pre5. Even the SysRq keys do not work after the freeze. Older kernels (e.g. 2.3.40) seem to work. Any Ideas? It certainly doesn't happen for me on any of the machines I work with, but it wouldn't compile as-is for me, so I exchanged the FPU setting with a simpler asm("fldcw %0": :"m" (0)); which should do the equivalent (ie unmask divide by zero errors). Does that make a difference for you? Can you try to figure out where it started happening? Ie try test9 and back too, to figure out what might be bringing it on... I sure as hell hope this isn't an Athlon issue. Can other people try the test-program and see if we have a pattern (ie "it happens only on Athlons", or "Linus is on drugs and it happens for everybody else"). Thanks, Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: isofs broken (2.2 and 2.4)
On Sat, 18 Nov 2000, Keith Owens wrote: On Fri, 17 Nov 2000 17:21:53 -0800 (PST), Linus Torvalds [EMAIL PROTECTED] wrote: There's a test11-pre7 there now, and I'd really ask people to check out the isofs changes because slight worry about those is what held me up from just calling it test11 outright. It's almost guaranteed to be better than what we had before, but anyway.. Linus namei.c: In function `isofs_find_entry': namei.c:130: warning: passing arg 2 of `get_joliet_filename' from incompatible pointer type namei.c:130: warning: passing arg 3 of `get_joliet_filename' from incompatible pointer type Thanks. The second and third arguments were switched around to match all the other filename conversion stuff, and because I don't have joliet enabled I didn't notice this. Just switch them around where the warning occurs, and you should be golden. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] potential death in disassociate_ctty()
In article [EMAIL PROTECTED], Andrew Morton [EMAIL PROTECTED] wrote: Also, somewhere on the path from kernel 2.2 to 2.4 the call to do_notify_parent() was moved inside the tasklist lock. Why was this? Ehh.. Because that is also what protects our "parent" pointer. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Please send Changelog info and patch notices for the test and-pre releases.
On Fri, 17 Nov 2000, Miles Lane wrote: I haven't seen any announcements of recent test and test-pre releases. Can you begin sending those again, please? You can actually get them off kernel.org these days: Peter Anvin set up a system whereby when I upload a changelog it automatically gets added to the web-site (main page, bottom). Linus - - pre7: - Kai Germaschewski: more ISDN cleanups and small fixes. - Al Viro: fix ntfs_new_inode() that he broke. Cleanups. - various: handle !CONFIG_HOTPLUG properly - David Miller: sparc and networking - me: more iso9660 fixes. - Neil Brown: fix rd and RAID on highmem machines - Vojtech Pavlik: input driver fixes - David Woodhouse: module unload races - up_and_exit() - pre6: - Intel: start to add Pentium IV specific stuff (128-byte cacheline etc) - David Miller: search-and-destroy places that forget to mark us running after removing us from a wait-queue. - me: NFS client write-back ref-counting SMP instability. - me: fix up non-exclusive waiters - Trond Myklebust: Be more careful about SMP in NFS and RPC code - Trond Myklebust: inode attribute update race fix - Charles White: don't do unaligned accesses in cpqarray driver. - Jeff Garzik: continued driver cleanup and fixes - Peter Anvin: integrate more of the Intel patches. - Robert Love: add i815 signature to the intel AGP support - Rik Faith: DRM update to make it easier to sync up 2.2.x - David Woodhouse: make old 16-bit pcmcia controllers work again (ie i82365 and TCIC) - pre5: - Rasmus Andersen: add proper "linux/init.h" for sound drivers - David Miller: sparc64 and networking updates - David Trcka: MOXA numbering starts from 0, not 1. - Jeff Garzik: sysctl.h standalone - Dag Brattli: IrDA finishing touches - Randy Dunlap: USB fixes - Gerd Knorr: big bttv update - Peter Anvin: x86 capabilities cleanup - Stephen Rothwell: apm initcall fix - smp poweroff should work - Andrew Morton: setscheduler() spinlock ordering fix - Stephen Rothwell: directory notification documentation - Petr Vandrovec: ncpfs capabilities check cleanup - David Woodhouse: fix jffs to use generic is() library - Chris Swiedler: oom_kill selection fix - Jens Axboe: re-merge after sleeping in ll_rw_block. - Randy Dunlap: USB updates (pegasus and ftdi_sio) - Kai Germaschewski: ISDN ppp header compression fixed - pre4: - Andrea Arcangeli: SMP scheduler memory barrier fixup - Richard Henderson: fix alpha semaphores and spinlock bugs. - Richard Henderson: clean up the file from hell: "xor.c" - pre3: - James Simmons: vgacon "printk()" deadlock with global irq lock. - don't poke blanked console on console output - Ching-Ling: get channels right on ALI audio driver - Dag Brattli and Jean Tourrilhes: big IrDA update - Paul Mackerras: PPC updates - Randy Dunlap: USB ID table support, LEDs with usbkbd, belkin serial converter. - Jeff Garzik: pcnet32 and lance net driver fix/cleanup - Mikael Pettersson: clean up x86 ELF_PLATFORM - Bartlomiej Zolnierkiewicz: sound and drm driver init fixes and cleanups - Al Viro: Jeff missed some kmap()'s. sysctl cleanup - Kai Germaschewski: ISDN updates - Alan Cox: SCSI driver NULL ptr checks - David Miller: networking updates, exclusive waitqueues nest properly, SMP i_shared_lock/page_table_lock lock order fix. - pre2: - Stephen Rothwell: directory notify could return with the lock held - Richard Henderson: CLOCKS_PER_SEC on alpha. - Jeff Garzik: ramfs and highmem: kmap() the page to clear it - Asit Mallick: enable the APIC in the official order - Neil Brown: avoid rd deadlock on io_request_lock by using a private rd-request function. This also avoids unnecessary request merging at this level. - Ben LaHaise: vmalloc threadign and overflow fix - Randy Dunlap: USB updates (plusb driver). PCI cacheline size. - Neil Brown: fix a raid1 on top of lvm bug that crept in in pre1 - Alan Cox: various (Athlon mmx copy, NULL ptr checks for scsi_register etc). - Al Viro: fix /proc permission check security hole. - Can-Ru Yeou: SiS301 fbcon driver - Andrew Morton: NMI oopser and kernel page fault punch through both console_lock and timerlist_lock to make sure it prints out.. - Jeff Garzik: clean up "kmap()" return type (it returns a kernel virtual address, ie a "void *"). - Jeff Garzik: network driver docs, various one-liners. - David Miller: add generic "special" flag to page flags, to be used by architectures as they see fit. Like keeping track of cache coherency issues. - David Miller: sparc64 updates, make sparc32 boot again - Davdi Millner: spel "synchronous" correctly - David Miller: networking - fix some bridge issues, and correct
Re: [PATCH] pcmcia event thread. (fwd)
On Sat, 18 Nov 2000, David Ford wrote: Linus Torvalds wrote: [...] If somebody still has a problem with the in-kernel stuff, speak up. The kernel's irq detection for the card sockets doesn't work for me. It's the NEC Versa LX story. The DH code also reports no IRQ found but still figures out a working IRQ (normally 3) and assigns it for the tulip card. I use the i82365 module w/ the DH code. The below is the output of the kernel pcmcia code. PCI: No IRQ known for interrupt pin B of device 00:03.1. Please try using pci=biosirq. PCI: No IRQ known for interrupt pin A of device 00:03.0. Please try using pci=biosirq. Strange. Your interrupt router is a bog-standard PIIX4, we know how to route the thing, AND your device shows up: # dump_pirq Interrupt routing table found at address 0xf5a80: Version 1.0, size 0x0080 Interrupt router is device 00:07.0 PCI exclusive interrupt mask: 0x Compatible router: vendor 0x8086 device 0x1234 Device 00:03.0 (slot 0): INTA: link 0x60, irq mask 0x0420 INTB: link 0x61, irq mask 0x0420 Interrupt router: Intel 82371AB PIIX4/PIIX4E PCI-to-ISA bridge PIRQ1 (link 0x60): irq 10 PIRQ2 (link 0x61): irq 5 PIRQ3 (link 0x62): unrouted PIRQ4 (link 0x63): irq 9 Serial IRQ: [enabled] [continuous] [frame=21] [pulse=4] Can you (you've probably done this before, but anyway) enable DEBUG in arch/i386/kernel/pci-i386.h? I wonder if the kernel for some strange reason doesn't find your router, even though "dump_pirq" obviously does.. If there's something wrong with the checksumming for example.. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Freeze on FPU exception with Athlon
On Sat, 18 Nov 2000, Brian Gerst wrote: I get Floating Point Exception (core dumped), but I needed to use the modified program below to keep GCC from optimizing the division away as a constant. This is on test11-pre5. I'm starting to suspect that it's really a combination of three things: - 3dnow optimization (ie you have to compile the kernel with Athlon support) - pending, but not yet noticed, FPU exceptions. - a bug/feature in the kernel, where a process exit does not bother to clear the FPU, only marks it as "unused". If I'm right, the proper test-program should be something like int main(int argc, char **argv) { asm("fldcw %0": :"m" (0)); asm("fldz ; fld1 ; fdiv"); sleep(1); return 0; } where it's important that we do not wait for the result of the fdiv, we just exit after having caused a pending exception (and you cannot do this reliably from C code - depending on compiler version and optimizations gcc may try to write the bad value back to memory etc). Now, with the pending exception, do a 3dnow MMX memcpy() - which will clear the TS bit (because it decides that the FP state can be thrown away and doesn't need to do a full save/restore) and start using the FPU. Boom. Instant FP exception. With the exception handler deciding that nobody owns the FP state, and thus doing nothing sane. If I'm right (and I'm _always_ right), the following patch would make a difference. Markus? Linus --- v2.4.0-test10/linux/arch/i386/kernel/traps.cTue Oct 31 12:42:26 2000 +++ linux/arch/i386/kernel/traps.c Fri Nov 17 21:52:55 2000 @@ -643,6 +640,12 @@ asmlinkage void do_coprocessor_error(struct pt_regs * regs, long error_code) { ignore_irq13 = 1; + + /* Due to lazy error handling, we might have false pending errors! */ + if (!current-used_math) { + init_fpu(); + return; + } math_error((void *)regs-eip); } @@ -700,6 +703,12 @@ if (cpu_has_xmm) { /* Handle SIMD FPU exceptions on PIII+ processors. */ ignore_irq13 = 1; + + /* Due to lazy error handling, we might have false pending errors! */ + if (!current-used_math) { + init_fpu(); + return; + } simd_math_error((void *)regs-eip); } else { /* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Freeze on FPU exception with Athlon
On Sat, 18 Nov 2000, Alan Cox wrote: Linus Torvalds wrote: I sure as hell hope this isn't an Athlon issue. Can other people try the test-program and see if we have a pattern (ie "it happens only on Athlons", or "Linus is on drugs and it happens for everybody else"). I've tried both variants (fesetenv and inline-asm) with glibc-2.1.3, 2.4.0-test11pre7 and an AMD Thunderbird. Neither does freeze, but both yield: Floating point exception (core dumped) Compiler specific ? There's almost certainly more than that. I'd love to have a report on my asm-only version, but even so I suspect it also requires the 3dnow stuff, because I'm not able to trigger anything like this on any machines I have access to (none of them are AMD, though) Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test11-pre6 still very broken
In article [EMAIL PROTECTED], Greg KH [EMAIL PROTECTED] wrote: On Fri, Nov 17, 2000 at 11:25:50PM -0800, Ben Ford wrote: Here is lspci output from the laptop in question. Is this not UHCI? Yes it is. Just a bit funny if you think about it, but with Intel and Via putting the UHCI core into their chipsets I guess it makes sense. One note for the archives, if you are presented a choice between a OHCI or a UHCI controller, go for the OHCI. It has a "cleaner" interface, handles more of the logic in the silicon, and due to this provides faster transfers. I'd disagree. UHCI has tons of advantages, not the least of which is [Cthat it was there first and is widely available. If OHCI hadn't been done we'd have _one_ nice good USB controller implementation instead of fighting stupid and unnecessary battles that shouldn't have existed in the first place. For example, the UHCI root hub can be controlled without DMA, which makes it a lot cheaper on the system. When a UHCI system is unconnected and idle, it doesn't waste cycles on extra memory traffic the way OHCI does. UHCI also requires fewer transistors, and is the more common by far simply because Intel is good at getting their chipsets out. Basically, the advantages of OHCI are not worth the differentiation, and are not always advantages at all. Many people think that it is "good" that the root hub looks more like a regular hub, but that's just wrong. Especially with faster speeds, the memory pressure of the USB controller is going to be noticeable, and it would be much preferable if the root directory of the USB tree would be separated out (and cached in the controller) by the root hub. The UHCI approach of making the root a bit special should be taken _further_, and not seen as a mistake. I hope EHCI makes it all moot. Some way or another. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Freeze on FPU exception with Athlon
On Sat, 18 Nov 2000, Markus Schoder wrote: Your test program is indeed sufficient to trigger the freeze. Unfortunately the patch does not make a difference :( Ok. This may in fact be an Athlon CPU bug. But before we contact anybody from AMD, I'd really need to know what the result from the irq13 disabling and the non-3dnow thing is. Considering that Udo reports no lockup at all with the same test-program even with an Athlon and 3dnow, it looks like it's either irq13 (and a motherboard routing issue: sane modern motherboards shouldn't even route the external FERR at _all_ any more), or something stepping-specific on your Athlon. It doesn't sound kernel-related per se. Let's hope it's irq13. If so, it will be easy to fix (tentative fix: any CPU that reports a built-in FPU just doesn't get irq13 enabled at all). Current workign theory: - Athlons do FERR wrong. They drive FERR externally when the unmasked exception happens, rather than when the next FP instruction actually detects the exception. This means that the external FERR irq13 actually happens _before_ the internal exception 16, which is wrong. - Linux has seen exception 16 working, so it ignores irq13 and assumes that it's some real external device (which does happen - sometimes SCI is wired to irq13). - irq13 is not only wired on the motherboard (which was right in 1989, but is not right in 2000), but is marked level-triggered (which probably wasn't right even in 1989). So when the irq13 happens, it _keeps_ on happening, and we never get an exception 16 at all. The reason 2.2.x works on your machine might be that the early bootup test for FP exceptions will have done something to mask the fpu exception just by luck. I forget the exact details of the test - it got removed in later kernels because it made it really nasty to handle XMM faults correctly. Does anybody have any better ideas? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Freeze on FPU exception with Athlon
On Sat, 18 Nov 2000, adrian wrote: On Sat, 18 Nov 2000, Linus Torvalds wrote: There's almost certainly more than that. I'd love to have a report on my asm-only version, but even so I suspect it also requires the 3dnow stuff, I tried all three versions, and no freezes. I forgot to mention the tests were run on a model 2 Athlon (original slot K7, .18 micron). The kernel is compiled with 3dnow support. Apparently it isn't the stepping, as we have Athlon model 4's both showing it and not showing it. The motherboard seems to be the only real difference here, which is why I like the irq13 explanation more and more. I've been wanting to get rid of irq13 anyway (some boards wire up USB and/or ACPI to irq13 and the fact that the FPU has claimed it makes those machines unhappy), so if the solution is to only check for irq13 on old i386 and i486sx machines and just leave it alone for newer CPU's, I won't complain. Markus, can you make the irq13 test the first thing - don't worry about 3dnow as that seems to not be a deciding factor.. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] semaphore fairness patch against test11-pre6
On Sun, 19 Nov 2000, Andrew Morton wrote: Has anyone tried it on SMP? I get fairly repeatable instances of immortal `D'-state processes with this patch. Too bad. I really thought it should be safe to do. The patch isn't right - it allows `sleepers' to increase without bound. But it's a boolean! It's not a boolean. It's really a "bias count". It happens to get only the values 0 and 1 simply becase the logic is that we always account for all the other people when any process goes to sleep, so "sleepers" only ever counts the one process that went to sleep last. But the algorithm itself should allow for other values. In fact, I think that you'll find that it works fine if you switch to non-exclusive wait-queues, and the only reason you see the repeatable D states is exactly the case where we didn't "take" the semaphore even though we were awake, and that basically makes us an exclusive process that didn't react to an exclusive wakeup. (Think of it this way: with the "inside" patch, the process does tsk-state = TASK_INTERRUPTIBLE; twice, even though there was only one semaphore that woke it up: we basically "lost" a wakeup event, not because "sleepers" cannot be 2, but because we didn't pick up the wakeup that we might have gotten. Instead of the "goto inside", how about just doing it without the "double sleep", and doing something like tsk-state = TASK_INTERRUPTIBLE; add_wait_queue_exclusive(sem-wait, wait); spin_lock_irq(semaphore_lock); sem-sleepers ++; + if (sem-sleepers 1) { + spin_unlock_irq(semaphore_lock); + schedule(); + spin_lock_irq(semaphore_lock); + } for (;;) { The only difference between the above and the "goto inside" variant is really that the above sets "tsk-state = TASK_INTERRUPTIBLE;" just once per loop, not twice as the "inside" case did. So if we happened to get an exclusive wakeup at just the right point, we won't go to sleep again and miss it. But these things are very subtle. The current semaphore algorithm was basically perfected over a week of some serious thinking. The fairness change should similarly get a _lot_ of attention. It's way too easy to miss things. Does the above work for you even in SMP? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pcmcia event thread. (fwd)
On Sat, 18 Nov 2000, David Ford wrote: Linus Torvalds wrote: Can you (you've probably done this before, but anyway) enable DEBUG in arch/i386/kernel/pci-i386.h? I wonder if the kernel for some strange reason doesn't find your router, even though "dump_pirq" obviously does.. If there's something wrong with the checksumming for example.. ..building now. Actually, try this patch first. It adds the PCI_DEVICE_ID_INTEL_82371MX router type, and also makes the PCI router search fall back more gracefully on the device it actually found if there is not an exact match on the "compatible router" entry... It should make Linux find and accept the chip you have. Knock wood. Linus --- v2.4.0-test10/linux/arch/i386/kernel/pci-irq.c Tue Oct 31 12:42:26 2000 +++ linux/arch/i386/kernel/pci-irq.cSat Nov 18 21:11:19 2000 @@ -283,12 +297,19 @@ { "PIIX", PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371FB_0, pirq_piix_get, pirq_piix_set }, { "PIIX", PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371SB_0, pirq_piix_get, pirq_piix_set }, { "PIIX", PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371AB_0, pirq_piix_get, pirq_piix_set }, + { "PIIX", PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371MX, pirq_piix_get, +pirq_piix_set }, { "PIIX", PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82443MX_0, pirq_piix_get, pirq_piix_set }, + { "ALI", PCI_VENDOR_ID_AL, PCI_DEVICE_ID_AL_M1533, pirq_ali_get, pirq_ali_set }, + { "VIA", PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C586_0, pirq_via_get, pirq_via_set }, { "VIA", PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C596, pirq_via_get, pirq_via_set }, { "VIA", PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C686, pirq_via_get, pirq_via_set }, + { "OPTI", PCI_VENDOR_ID_OPTI, PCI_DEVICE_ID_OPTI_82C700, pirq_opti_get, pirq_opti_set }, + + { "NatSemi", PCI_VENDOR_ID_CYRIX, PCI_DEVICE_ID_CYRIX_5520, pirq_cyrix_get, +pirq_cyrix_set }, + { "default", 0, 0, NULL, NULL } }; @@ -298,7 +319,6 @@ static void __init pirq_find_router(void) { struct irq_routing_table *rt = pirq_table; - u16 rvendor, rdevice; struct irq_router *r; #ifdef CONFIG_PCI_BIOS @@ -308,32 +328,31 @@ return; } #endif - if (!(pirq_router_dev = pci_find_slot(rt-rtr_bus, rt-rtr_devfn))) { + /* fall back to default router if nothing else found */ + pirq_router = pirq_routers + sizeof(pirq_routers) / sizeof(pirq_routers[0]) - +1; + + pirq_router_dev = pci_find_slot(rt-rtr_bus, rt-rtr_devfn); + if (!pirq_router_dev) { DBG("PCI: Interrupt router not found at %02x:%02x\n", rt-rtr_bus, rt-rtr_devfn); - /* fall back to default router */ - pirq_router = pirq_routers + sizeof(pirq_routers) / sizeof(pirq_routers[0]) - 1; return; } - if (rt-rtr_vendor) { - rvendor = rt-rtr_vendor; - rdevice = rt-rtr_device; - } else { - /* -* Several BIOSes forget to set the router type. In such cases, we -* use chip vendor/device. This doesn't guarantee us semantics of -* PIRQ values, but was found to work in practice and it's still -* better than not trying. -*/ - DBG("PCI: Guessed interrupt router ID from %s\n", pirq_router_dev-slot_name); - rvendor = pirq_router_dev-vendor; - rdevice = pirq_router_dev-device; - } - for(r=pirq_routers; r-vendor; r++) - if (r-vendor == rvendor r-device == rdevice) + + for(r=pirq_routers; r-vendor; r++) { + /* Exact match against router table entry? Use it! */ + if (r-vendor == rt-rtr_vendor r-device == rt-rtr_device) { + pirq_router = r; break; - pirq_router = r; - printk("PCI: Using IRQ router %s [%04x/%04x] at %s\n", r-name, - rvendor, rdevice, pirq_router_dev-slot_name); + } + /* Match against router device entry? Use it as a fallback */ + if (r-vendor == pirq_router_dev-vendor r-device == +pirq_router_dev-device) { + pirq_router = r; + } + } + printk("PCI: Using IRQ router %s [%04x/%04x] at %s\n", + pirq_router-name, + pirq_router_dev-vendor, + pirq_router_dev-device, + pirq_router_dev-slot_name); } static struct irq_info *pirq_get_info(struct pci_dev *dev, int pin) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] semaphore fairness patch against test11-pre6
On Sun, 19 Nov 2000, Andrew Morton wrote: I don't see a path where David's patch can cause a lost wakeup in the way you describe. Basically, if there are two up() calls, they might end up waking up only one process, because the same process goes to sleep twice. That's wrong. It should wake up two processes. However, thinking about it more, that's obviously possible only for semaphores that are used for more than just mutual exclusion, and basically nobody does that anyway. Next step is to move the waitqueue and wakeup operations so they're inside the spinlock. Nope. That doesn't work either. Next step is to throw away the semaphore_lock and use the sem-wait lock instead. That _does_ work. This is probably just a fluke - it synchronises the waker with the sleepers and we get lucky. Yes, especially on a two-cpu machine that kind of synchronization can basically end up hiding real bugs. I'll think about this some more. One thing I noticed is that the "wake_up(sem-wait);" at the end of __down() is kind of bogus: we don't actually want to wake anybody up at that point at all, it's just that if we don't wake anybody up we'll end up having "sem = 0, sleeper = 0", and when we unlock the semaphore the "__up()" logic won't trigger, and we won't ever wake anybody up. That's just incredibly bogus. Instead of the "wake_up()" at the end of __down(), we should have something like this at the end of __down() instead: ... for-loop ... } tsk-state = TASK_RUNNING; remove_wait_queue(sem-wait, wait); /* If there are others, mark the semaphore active */ if (wait_queue_active(sem_wait)) { atomic_dec(sem-count); sem-sleepers = 1; } spin_unlock_irq(semaphore_lock); } which would avoid an unnecessary reschedule, and cause the wakeup to happen at the proper point, namely "__up()" when we release the semaphore. I suspect this may be part of the trouble with the "sleepers" count playing: we had these magic rules that I know I thought about when the code was written, but that aren't really part of the "real" rules. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.4.0-test11
On Sun, 19 Nov 2000, Rich Baum wrote: The patch is in the v2.3 directory. You may want to move it to the v2.4 directory so people can find it easier. Oops. Thanks. Done. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: i386 cleanups
On Tue, 17 Apr 2001, Pavel Machek wrote: These are tiny cleanups you might like. sizes are "logically" long. No. Sizes are not "logical". They are whatever you decide they are, ie it's purely a complier convention. At least earlier, size_t was defined as "unsigned int" in user mode, and doing anything else would make gcc complain about clashes with its compiled-in __builtin_size_t that it uses for the builtin prototypes (ie if you had a declaration for "void *memcpy(void *dest, const void *src, size_t n);" and your size_t didn't match the gcc builtin_size_t, you'd get a "redefined with different arguments" warning or something). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: light weight user level semaphores
[ Cc'd to linux-kernel, to get feedback etc. I've already talked this over with some people a long time ago, but more people might get interested ] On Tue, 17 Apr 2001, Mike Kravetz wrote: In the near future, I should have some time to begin working on a prototype implementation. One thing that I don't remember too clearly is a reference you made to the System V semaphore implementation. I'm pretty sure you indicated any new light weight implementation should not be based on the System V APIs. Is this correct, or did I remember incorrectly? It's correct. I don't see any way the kernel can do the SysV semantics for "cleanup" for a semaphore when a process dies in an uncontrolled manner (or do it fast enough even when it can use at_exit() etc). The whole point of fast semaphores would be to avoid the kernel entry entirely for the non-contention case, which basically means that the kernel doesn't even _know_ who holds the semaphore at any given moment. So the kernel cannot do the cleanups on process exit that are part of the SysV semantics. My personal absolute favourite "fast semaphore" implementation is as follows. First the user interface, just to make it clear that the implementation is very far from the interface: /* * a fast semaphore is a 128-byte opaque thing, * aligned on a 128-byte boundary. This is partly * to minimize false sharing in the L1 (we assume * that 128-byte cache-lines are going to be fairly * common), but also to allow the kernel to hide * data there */ struct fast_semaphore { unsigned int opaque[32]; } __attribute__((aligned, 64)); struct fast_semaphore *FS_create(char *ID); int FS_down(struct fast_semaphore *, unsigned long timeout); void FS_up(struct fast_semaphore *); would basically be the interface. People would not need to know what the implementation is like. Add to taste (ie make rw-semaphores, etc), but the above is a kind of "fairly minimal thing". So "trydown()" would just be a FS_down() with a zero timeout, for example. Anyway, the implementation would be roughly: - FS_create is responsible for allocating a shared memory region at "FS_create()" time. This is what the ID is there for: a "anonymous" semaphore would have an ID of NULL, and could only be used by threads or across a fork(): it would basically be done with a MAP_ANON | MAP_SHARED, and the pointer returned would just be a pointer to that memory. So FS_create() starts out by allocating the backing store for the semaphore. This can basically be done in user space, although the kernel does need to get involved for the second part of it, which is to (a) allocate a kernel "backing store" thing that contains the waiters and the wait-queues for other processes and (b) fill in the opaque 128-bit area with the initial count AND the magic to make it fly. More on the magic later. So the second part of FS_create needs a new system call. - FS_down() and FS_up() would be two parts: the fast case (no contention), very similar to what the Linux kernel itself uses. And the slow case (contention), which ends up being a system call. You'd have something like this on x86 in user space: extern void FS_down(struct fast_semahore *fs, unsigned long timeout) __attribute__((regparm(3))); /* Four-instruction fast-path: the call plus these ones */ FS_down: lock ; decl (%edx) js FS_down_contention ret FS_down_contention: movl $FS_down_contention_syscall,%eax int 80 ret (Note: the regparm(3) thing makes the arguments be passed in %edx and %ecx - check me on details in which order, and realize that they will show up as arguments to the system call too because the x86 system call interface is already register-based) FS_up() does the same - see how the kernel already knows to avoid doing the wakup if there has been no contention, and has a fast-path that never goes out-of-line (ie the kernel semaphore out-of-line case is the user-level system call case). So now we get to the "subtle" part. Getting contention right. The above causes us to get to the kernel when we have contention, and the kernel gets only a pointer to user space. In particular, it gets a pointer to memory that it cannot trust, and from that _untrusted_ pointer it needs to quickly get to the _trusted_ part, ie the part that only the kernel itself controls (the stuff with the wait-queues etc). This is where subtlety is needed. The speed concerns are paramount: I am convinced that the non-contention case is the important one, but at the same time we can't allow contention to be _too_ costly either. The system call is fairly cheap (and already acts as a first-level back-off, so that's ok), but we can't afford to
Re: light weight user level semaphores
On Thu, 19 Apr 2001, Alon Ziv wrote: * the userspace struct was just a signed count and a file handle. The main reason I wanted to avoid a filehandle is just because it's another name space that people already use, and that people know what the semantics are for (ie "open()" is _defined_ to return the "lowest available file descriptor", and people depend on that). So if you use a file handle, you'd need to do magic - open it, and then use dup2() to move it up high, or something. Which has its own set of problems: just _how_ high woul dyou move it? Would it potentially disturb an application that opens thousands of files, and knows that they get consecutive file descriptors? Which is _legal_ and well-defined in UNIX. However, I'm not married to the secure hash version - you could certainly use another name-space, and something more akin to file descriptors. You should be aware of issues like the above, though. Maybe it would be ok to say "if you use fast semaphores, they use file descriptors and you should no longer depend on consecutive fd's". But note how that might make it really nasty for things like libraries: can libraries use fast semaphores behind the back of the user? They might well want to use the semaphores exactly for things like memory allocator locking etc. But libc certainly cant use fd's behind peoples backs. So personally, I actually think that you must _not_ use file descriptors. But that doesn't mean that you couldn't have a more "file-desciptor-like" approach. Side note: the design _should_ allow for "lazy initialization". In particular, it should be ok for FS_create() to not actually do a system call at all, but just initialize the count and set a "uninitialized" flag. And then the actual initialization would be done at "FS_down()" time, and only if contention happens. Why? Note that there are many cases where contention simply _cannot_ happen. The classic one is a thread-safe library that is used both by threaded applications and by single-threaded ones, where the single-threaded one would never actually trigger contention. For these kinds of reasons it would actually be best to make try to abstract the interfaces (notably the system call interface) as much as possible, so that you can change the implementation inside the kernel without having to recompile applications that use it. So the sanest implementation might be one where - FS_create is a system call that just gets a 128-byte area and an ID. - the contention cases are plain system calls with no user-mode part to them at all. This allows people to modify the behaviour of the semaphores later, _without_ having any real coupling between user-mode expectations and kernel implementation. For example, if the user-mode library actually does a physical "open()" or plays games with file descriptors itself, we will -always- be stuck with the fd approach, and we can never fix it. But if you have opaque system calls, you mist start out with a system call that internally just does the equivalent of the "open a file descriptor and hide it in the semaphore", and later on the thing can be changed to do whatever else without the user program ever even realizing.. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: light weight user level semaphores
On Thu, 19 Apr 2001, Abramo Bagnara wrote: [ Using file descriptors ] This would also permit: - to have poll() - to use mmap() to obtain the userspace area It would become something very near to sacred Unix dogmas ;-) No, this is NOT what the UNIX dogmas are all about. When UNIX says "everything is a file", it really means that "everything is a stream of bytes". Things like magic operations on file desciptors are _anathema_ to UNIX. ioctl() is the worst wart of UNIX. Having magic semantics of file descriptors is NOT Unix dogma at all, it is a horrible corruption of the original UNIX cleanlyness. Please don't excuse "semaphore file descriptors" with the "everything is a file" mantra. It is not at ALL applicable. The "everything is a file" mantra is to make pipe etc meaningful - processes don't have to worry about whether the fd they have is from a file open, a pipe() system call, opening a special block device, or a socket()+connect() thing. They can just read and write. THAT is what UNIX is all about. And this is obviously NOT true of a "magic file descriptors for semaphores". You can't pass it off as stdin to another process and expect anything useful from it unless the other process _knows_ it is a special semaphore thing and does mmap magic or something. The greatness of UNIX comes from "everything is a stream of bytes". That's something that almost nobody got right before UNIX. Remember VMS structured files? Did anybody ever realize what an absolutely _idiotic_ crock the NT "CopyFile()" thing is for the same reason? Don't confuse that with "everything should be a file descriptor". The two have nothing to do with each other. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: light weight user level semaphores
On Thu, 19 Apr 2001, Alan Cox wrote: can libraries use fast semaphores behind the back of the user? They might well want to use the semaphores exactly for things like memory allocator locking etc. But libc certainly cant use fd's behind peoples backs. libc is entitled to, and most definitely does exactly that. Take a look at things like gethostent, getpwent etc etc. Ehh.. I will bet you $10 USD that if libc allocates the next file descriptor on the first "malloc()" in user space (in order to use the semaphores for mm protection), programs _will_ break. You want to take the bet? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: light weight user level semaphores
On Thu, 19 Apr 2001, Alexander Viro wrote: Ehh... Non-lazy variant is just read() and write() as down_failed() and up_wakeup() Lazy... How about Looks good to me. Anybody want to try this out and test some benchmarks? There may be problems with large numbers of semaphores, but hopefully that won't be an issue. And the ability to select/poll on these things might come in handy for various implementation issues (ie locks with timeouts etc). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Children first in fork
In article 9bn3sr$fer$[EMAIL PROTECTED], Wichert Akkerman [EMAIL PROTECTED] wrote: What you can do is what strace does: insert a loop instruction after the fork or clone call and remove that when the call returns. You're probably even better off just intercepting the fork, turning it into a clone, and setting the CLONE_PTRACE option. Which (together with tracing the parent, which you will obviously be doing already in order to do all this in the first place) will nicely cause the child to get an automatic SIGSTOP _and_ be already traced. Not that I've tested it myself. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: light weight user level semaphores
On 19 Apr 2001, Ulrich Drepper wrote: Linus Torvalds [EMAIL PROTECTED] writes: Looks good to me. Anybody want to try this out and test some benchmarks? I fail to see how this works across processes. It's up to FS_create() to create whatever shared mapping is needed. For threads, you don't need anything special. For fork()'d helper stuff, you'd use MAP_ANON | MAP_SHARED. For execve(), you need shm shared memory or MAP_SHARED on a file. It all depends on your needs. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: light weight user level semaphores
On Thu, 19 Apr 2001, Ingo Oeser wrote: Are you sure, you can implement SMP-safe, atomic operations (which you need for all up()/down() in user space) WITHOUT using privileged instructions on ALL archs Linux supports? Why do you care? Sure, there are broken architectures out there. They'd need system calls. They'd be slow. That's THEIR problem. No sane architecture has this limitation. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: light weight user level semaphores
On Thu, 19 Apr 2001, Ingo Oeser wrote: On Thu, Apr 19, 2001 at 09:11:56AM -0700, Linus Torvalds wrote: No, this is NOT what the UNIX dogmas are all about. When UNIX says "everything is a file", it really means that "everything is a stream of bytes". Things like magic operations on file desciptors are _anathema_ to UNIX. ioctl() is the worst wart of UNIX. Having magic semantics of file descriptors is NOT Unix dogma at all, it is a horrible corruption of the original UNIX cleanlyness. Right. And on semaphores, this stream is exactly 0 bytes long. This is perfectly normal and can be handled by all applications I'm aware of. It's perfectly normal, but it does NOT conform to the idea "everything is a file". The fact that there are other ugly examples (ioctls and special files) does not mean that adding a new one is a good idea. When people say "everything is a file", they mean that it can be _used_ as a file, not that it can passably return a valid error code. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: light weight user level semaphores
On 19 Apr 2001, Ulrich Drepper wrote: Linus Torvalds [EMAIL PROTECTED] writes: I fail to see how this works across processes. It's up to FS_create() to create whatever shared mapping is needed. No, the point is that FS_create is *not* the one creating the shared mapping. The user is explicitly doing this her/himself. No. Who creates the shared mapping is _irrelevant_, because it ends up being entirely a function of what the chosen interface is. For example, quote often you want semaphores for threading purposes only, and then you don't need a shared mapping at all. So you'd use the proper interfaces for that, and for that, your "thread_semaphore()" function would just do a malloc() and initialize the memory to zero. Doing a mmap or something like that would just be stupid, because you're protecting only one VM space anyway. In other cases, you may need to have process-wide semaphores, and you'd use "process_semaphore(char *ID)" or something, which actually does a mmap() on a shared file. Or you'd have "fork_semaphore()" that creates a semaphore that is valid across forks, not not valid across execve's and cannot be passed around. So normally the user does NOT create the shared mapping himself. Normally you'd just use the "proper interface" for your needs, nothing more. Sure, you can have the option of saying "I've created this shared memory region, please make it use the generic semaphore engine code", but quite frankly I think that is a BAD IDEA. Why? Because it won't work portably across architectures anyway. You don't know what the requirements of the architecture are, so it should be done by a nice "semaphore library". NOT by the user. Remember: these semaphores are NOT a new SysV bogosity. These semaphores are a new interface, with sane performance and sane design. And you can have multiple external interfaces to the same "semaphore engine". I'm not interested in re-creating the idiocies of Sys IPC. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Children first in fork
On Fri, 20 Apr 2001, Mark Kettenis wrote: I believe the 2.2.x behaviour was pretty much useless, No. 2.2.x is not useless, it is apparently _buggy_ in this regard. Some of the fixes in the 2.3.x timeframe seem to not have made it into 2.2.x. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: light weight user level semaphores
In article [EMAIL PROTECTED], Olaf Titz [EMAIL PROTECTED] wrote: Ehh.. I will bet you $10 USD that if libc allocates the next file descriptor on the first "malloc()" in user space (in order to use the semaphores for mm protection), programs _will_ break. Of course, but this is a result from sloppy coding. ABSOLUTELY NOT! This is guaranteed behaviour of UNIX. You get file handles in order, or you don't get them at all. Sure, some library functions are allowed to use up file handles. But most sure as hell are NOT. In general, open() can just return anything and about the only case where you can even think of ignoring its result is this: close(0); close(1); close(2); open("/dev/null", O_RDWR); dup(0); dup(0); Which is quite common to do. Imagine a server that starts up another process, which does exactly something like the above: the _usual_ execve() case looks something like pid = fork(); if (!pid) { close(0); close(1); dup(pipe[0]); /* input pipe */ dup(pipe[1]); /* output pipe */ execve("child"); exit(1); } The above is absolutely _standard_ behaviour. It's required to work. And btw, it's _still_ required to work even if there happens to be a "malloc()" in between the close() and the dup() calls. Trust me. You're arguing for clearly broken behaviour. malloc() and friends MUST NOT open file descriptors. It _will_ break programs that rely on traditional and documented features. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsembenchmarks [Re: generic rwsem [Re: Alpha process table hang]]]
On Fri, 20 Apr 2001, Andrea Arcangeli wrote: While dropping the list_empty check to speed up the fast path I faced the same complexity of the 2.4.4pre4 lib/rwsem.c and so before reinventing the wheel I read how the problem was solved in 2.4.4pre4. I would suggest the following: - the generic semaphores should use the lock that already exists in the wait-queue as the semaphore spinlock. - the generic semaphores should _not_ drop the lock. Right now it drops the semaphore lock when it goes into the slow path, only to re-aquire it. This is due to bad interfacing with the generic slow-path routines. I suspect that this lock-drop is why Andrea sees problems with the generic semaphores. The changes to "count" and "sleeper" aren't actually atomic, because we don't hold the lock over them all. And re-using the lock means that we don't need the two levels of spinlocking for adding ourselves to the wait queue. Easily done by just moving the locking _out_ of the wait-queue helper functions, no? - the generic semaphores are entirely out-of-line, and are just declared universally as regular FASTCALL() functions. The fast-path x86 code looks ok to me. The debugging stuff makes it less readable than it should be, I suspect, and is probably not worth it at this stage. The users of rw-semaphores are so well-defined (and so well debugged) that the debugging code only makes the code harder to follow right now. Comments? Andrea? Your patches have looked ok, but I absoutely refuse to see the non-inlined fast-path for reasonable x86 hardware.. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fix for SMP deadlock in autofs4
On Fri, 20 Apr 2001, Jeremy Fitzhardinge wrote: This is a fix for a potential deadlock in autofs4's expire routine. It's wrong. I don't think we should be able to do a mntput() _either_ inside the spinlock. The filesystem should not "know" that mntput is safe. For this reason I don't think "dput_locked()" is the right answer either. Why are we doing the mntget/dget at all? We hold the spinlock, so we know they are not going away. Not doing the mntget/dget means that we (a) run faster and (b) don't have the bug, because we don't need to put the damn things. Comments? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fix for SMP deadlock in autofs4
On Fri, 20 Apr 2001, Jeremy Fitzhardinge wrote: I kept the dget/put out caution and ignorance, but they're clearly problematic. I'm happy to drop them if holding dcache_lock is enough to keep the tree stable while I traverse it. How does this patch look to you people? It's untested, but looks fairly obvious. It removes the increment, and changes autofs4_expire() to properly bump the count of the returned dentry (and callers will dput() it when done). This may be unnecessarily careful, but it's the RightThing(tm) to do. Jeremy, would you mind verifying that this WorksForYou(tm)? Linus - diff -u --recursive --new-file pre5/linux/fs/autofs4/expire.c linux/fs/autofs4/expire.c --- pre5/linux/fs/autofs4/expire.c Mon Oct 23 21:57:38 2000 +++ linux/fs/autofs4/expire.c Fri Apr 20 22:57:51 2001 @@ -98,8 +98,6 @@ top, count)); this_parent = top; - count--;/* top is passed in after being dgot */ - if (is_autofs4_dentry(top)) { count--; DPRINTK(("is_tree_busy: autofs; count=%d\n", count)); @@ -168,8 +166,6 @@ unsigned long timeout; struct dentry *root = sb-s_root; struct list_head *tmp; - struct dentry *d; - struct vfsmount *p; if (!sbi-exp_timeout || !root) return NULL; @@ -208,23 +204,17 @@ attempts if expire fails the first time */ ino-last_used = now; } - p = mntget(mnt); - d = dget_locked(dentry); - - if (!is_tree_busy(p, d)) { + if (!is_tree_busy(mnt, dentry)) { DPRINTK(("autofs_expire: returning %p %.*s\n", dentry, (int)dentry-d_name.len, dentry-d_name.name)); /* Start from here next time */ list_del(root-d_subdirs); list_add(root-d_subdirs, dentry-d_child); + dget(dentry); spin_unlock(dcache_lock); - dput(d); - mntput(p); return dentry; } - dput(d); - mntput(p); } spin_unlock(dcache_lock); @@ -251,6 +241,7 @@ pkt.len = dentry-d_name.len; memcpy(pkt.name, dentry-d_name.name, pkt.len); pkt.name[pkt.len] = '\0'; + dput(dentry); if ( copy_to_user(pkt_p, pkt, sizeof(struct autofs_packet_expire)) ) return -EFAULT; @@ -278,6 +269,7 @@ de_info-flags |= AUTOFS_INF_EXPIRING; ret = autofs4_wait(sbi, dentry-d_name, NFY_EXPIRE); de_info-flags = ~AUTOFS_INF_EXPIRING; + dput(dentry); } return ret; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [andrea@suse.de: Re: generic rwsem [Re: Alpha process tablehang]]
On Fri, 20 Apr 2001, David Howells wrote: The file should only be used for the 80386 and maybe early 80486's where CMPXCHG doesn't work properly, everything above that can use the XADD implementation. Why are those not using the generic files? The generic code is obviously more maintainable. But if you want it totally non-inline, then that can be done. However, whilst developing it, I did notice that that slowed things down, hence why I wanted it kept in line. I want to keep the _fast_ case in-line. I do not care at ALL about the stupid spinlock version. That should be the _fallback_, and it should be out-of-line. It is always going to be the slowest implementation, modulo bugs in architecture-specific code. For i386 and i486, there is no reason to try to maintain a complex fast case. The machines are unquestionably going away - we should strive to not burden them unnecessarily, but we should _not_ try to save two cycles. In short: - the only case that _really_ matters for performance is the uncontended read-lock for "reasonable" machines. A i386 no longer counts as reasonable, and designing for it would be silly. And the write-lock case is much less compelling. - We should avoid any inlines where the inline code is 2* the out-of-line code. Icache issues can overcome any cycle gains, and do not show up well in benchmarks (benchmarks tend to have very hot icaches). Note that this is less important for the out-of-line code in another segment that doesn't get brought into the icache at all for the non-contention case, but that should still be taken _somewhat_ into account if only because of kernel size issues. Both of the above rules implies that the generic spin-lock implementation should be out-of-line. (1) asm-i386/rwsem-spin.h is wrong, and can probably be replaced with the generic spinlock implementation without inconveniencing people much. (though someone has commented that they'd want this to be inline as cycles are precious on the slow 80386). Icache is also precious on the 386, which has no L2 in 99% of all cases. Make it out-of-line. (2) "fix up linux/rwsem-spinlock.h": do you want the whole generic spinlock implementation made non-inline then? Yes. People who care about performance _will_ have architecture-specific inlines on architectures where they make sense (ie 99% of them). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: x86 rwsem in 2.4.4pre[234] are still buggy [was Re: rwsembenchmarks [Re: generic rwsem [Re: Alpha process table hang]]]
On Sat, 21 Apr 2001, Russell King wrote: Erm, spin_lock()? What if up_read or up_write gets called from interrupt context (is this allowed)? Currently that is not allowed. We allow it for regular semaphores, but not for rw-semaphores. We may some day have to revisit that issue, but I suspect we won't have much reason to. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: try_to_swap_out() deactivating pages w. count 2
In article [EMAIL PROTECTED], Rik van Riel [EMAIL PROTECTED] wrote: What I _am_ worried about is the fact that we do this to pages with a really high page age. These things are in active use and cannot be swapped out any time soon, yet we do claim swap space for it ... Ehh... And if we didn't do that, then how could they every become less active? We should _absolutely_ do the swap space reclaiming without looking at the page count. If we don't, you will never free those pages, and I have a trivial exploit for you that will basically mlock all pages in memory. try_to_swap_out() _absolutely_ does the right thing. Also note how it will need to allocate the swap space backing store only once. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: try_to_swap_out() deactivating pages w. count 2
On Sat, 21 Apr 2001, Rik van Riel wrote: We should _absolutely_ do the swap space reclaiming without looking at the page count. page-age != page-count It's all the same thing. The page age and count are used to decice when the page actually gets thrown _out_ of memory. That's a decision that is based on the _physical_ page attributes. But try_to_swap_out() is based on the attribute on this particular virtual mapping of the page. If this particular virtual mapping does not have the "accessed" bit set, then try_to_swap_out() should get rid of that virtual mapping. It should absolutely not use the global page characteristics (either global usage count or global age) in making that decision. Because those do not matter - they have absoilutely no meaning for this virtual mapping of the page. Put another way: if process A is a heavy user of a page, and process B just touched it once and will never touch it again, what do you think should happen? Answer: the page should be dropped from process B. It's a cheap thing to do (we can get it back if necessary without any IO), and it means that if we end up having toi actually swap out the page eventually, we will not be confused by "noise" in the page count from a mappign that hasn't been active for a long time. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] swap-speedup-2.4.3-A1, massive swapping speedup
On Mon, 23 Apr 2001, Jonathan Morton wrote: There seems to be one more reason, take a look at the function read_swap_cache_async() in swap_state.c, around line 240: /* * Add it to the swap cache and read its contents. */ lock_page(new_page); add_to_swap_cache(new_page, entry); rw_swap_page(READ, new_page, wait); return new_page; Here we add an empty page to the swap cache and use the page lock to protect people from reading this non-up-to-date page. How about reversing the order of the calls - ie. add the page to the cache only when it's been filled? That would fix the race. No. The page cache is used as the IO synchronization point, both for swapping and for regular IO. You have to add the page in _first_, because otherwise you may end up doing multiple IO's from different pages. The proper fix is to do the equivalent of this on all the lookup paths that want a valid page: if (!PageUptodate(page)) { lock_page(page); if (PageUptodate(page)) { unlock_page(page); return 0; } rw_swap_page(page, 0); wait_on_page(page); if (!PageUptodate(page)) return -EIO; } return 0; This is the whole point of the uptodate flag, and for all I know we may already do all of this (it's certainly the normal setup). Note how we do NOT block on write-backs in the above: the page will be up-to-date from the bery beginning (it had _better_ be, it's hard to write back a swap page that isn't up-to-date ;). The above is how all the file paths work. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] swap-speedup-2.4.3-A2
On Mon, 23 Apr 2001, Ingo Molnar wrote: you are right - i thought about that issue too and assumed it works like the pagecache (which first reads the page without hashing it, then tries to add the result to the pagecache and throws away the page if anyone else finished it already), but that was incorrect. The above is NOT how the page cache works. Or if some part of the page cache works that way, then it is a BUG. You must NEVER allow multiple outstanding reads from the same location - that implies that you're doing something wrong, and the system is doing too much IO. The way _all_ parts of the page cache should work is: Create new page: - look up page. If found, return it - allocate new page. - look up page again, in case somebody else added it while we allocated it. - add the page atomically with the lookup if the lookup failed, otherwise just free the page without doing anything. - return the looked-up / allocated page. return up-to-date page: - call the above to get a page cache page. - if uptodate, return - lock_page() - if now uptodate (ie somebody else filled it and held the lock), unlock and return. - start the IO - wait on IO by waiting on the page (modulo other work that you could do in the background). - if the page is still not up-to-date after we tried to read it, we got an IO error. Return error. The above is how it is always meant to work. The above works for both new allocations and for old. It works even if an earlier read had failed (due to wrong permissions for example - think about NFS page caches where some people may be unable to actually fill a page, so that you need to re-try on failure). The above is how the regular read/write paths work (modulo bugs). And it's also how the swap cache should work. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Longstanding elf fix (2.4.3 fix)
On 23 Apr 2001, Eric W. Biederman wrote: ptrace is protected by the big kernel lock, but exec isn't so that doesn't help. Hmm. ptrace does require that the process be stopped in all cases Right. Ptrace definitely cannot access a process at arbitrary times. In fact, it is very serialized indeed, in that it can only access a process at signal points, ie effectively when it is returning to user space. With threads, of course, that doesn't help us. But with threads, the other threads could have caused the same page faults, so ptrace() isn't actually adding any new cases in that sense. I'd be a lot more worried about /proc accesses. execve() doesn't really need the mm semaphore, but on the other hand it would be cleaner to get it, and it won't really hurt (there can not be any real contention on it anyway - the only contention might come through /proc, and I haven't looked at what that might imply). load-library should definitely get it. I thought it did already, but.. Did you have a patch? Maybe I missed it. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rw_semaphores, optimisations try #3
On Mon, 23 Apr 2001, D.W.Howells wrote: Linus, you suggested that the generic list handling stuff would be faster (2 unconditional stores) than mine (1 unconditional store and 1 conditional store and branch to jump round it). You are both right and wrong. The generic code does two stores per _process_ woken up (list_del) mine does the 1 or 2 stores per _batch_ of processes woken up. So the generic way is better when the queue is an even mixture of readers or writers and my way is better when there are far greater numbers of waiting readers. However, that said, there is not much in it either way, so I've reverted it to the generic list stuff. Note that the generic list structure already has support for batching. It only does it for multiple adds right now (see the list_splice merging code), but there is nothing to stop people from doing it for multiple deletions too. The code is something like static inline void list_remove_between(x,y) { n-next = y; y-prev = x; } and notice how it's still just two unconditional stores for _any_ number of deleted entries. Anyway, I've already applied your #2, how about a patch relative to that? Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] rw_semaphores, optimisations try #3
On Tue, 24 Apr 2001, David Howells wrote: Yes but the struct rwsem_waiter batch would have to be entirely deleted from the list before any of them are woken, otherwise the waking processes may destroy their rwsem_waiter blocks before they are dequeued (this destruction is not guarded by a spinlock). Look again. Yes, they may destroy the list, but nobody cares. Why? - nobody will look up the list because we do have the spinlock at this point, so a destroyed list doesn't actually _matter_ to anybody You were actually depending on this earlier, although maybe not on purpose. - list_remove_between() doesn't care about the integrity of the entries it destroys. It only uses, and only changes, the entries that are still on the list. Subtlety is fine. It might warrant a comment, though. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: rwsem benchmark [was Re: [PATCH] rw_semaphores, optimisationstry #3]
On Tue, 24 Apr 2001, Andrea Arcangeli wrote: Again it's not a performance issue, the +a (sem) is a correctness issue because the slow path will clobber it. There must be a performance issue too, otherwise our read up/down fastpaths are the same. Which clearly they're not. I guess I'm faster because I avoid the pipeline stall using +m (sem-count) that is written as a constant, that was definitely intentional idea. Guys. You're arguing over stalls that are (a) compiler-dependent and (b) in code that doesn't hapeen _anywhere_ except in the specific benchmark you're using. Get over it. - The benchmark may use constant addresses. None of the kernel does. The benchmark is fairly meaningless in this regard. - the stalls will almost certainly depend on the code around the thing, and will also depend on the compiler version. If you're down to haggling about issues like that, then there is no real difference between the code. So calm down guys. And improving the benchmark might not be a bad idea. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: Global FPU corruption in 2.2
In article [EMAIL PROTECTED], Christian Ehrhardt [EMAIL PROTECTED] wrote: 1.) If I'm not mistaken switch_to changes current-flags without atomic operations and without any locks and sys_ptrace changes child-flags only protected by the big kernel lock. ptrace only operates on processes that are stopped. So there are no locking issues - we've synchronized on a much higher level than a spinlock or semaphore. That said, it does look like 2.2.x has a real bug, and maybe the ptrace task stopping sycnhronization is broken.. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: Global FPU corruption in 2.2
[ Alan, I'm lazy and only have 2.2.14 sources on-line. Maybe this has been fixed already and there's something else going on. Worth a look ] In article [EMAIL PROTECTED], Victor Zandy [EMAIL PROTECTED] wrote: Someone else here traced the process flags of a FP-intensive program on a machine before and after it is put in the faulty FPU state. He periodically sampled /proc/pid/stat while the program was running. He found that PF_USEDFPU was always set before the machine was broken. After he found that it was set about 70% of the time. [ Looks closer at the ptrace synchronization ] Ahh.. This actually _does_ look like a race on current-flags: PTRACE_ATTACH will do a child-flags |= PF_PTRACED; without waiting for the child to have stopped. (Aside: thinking more about the stopping logic - I'm not actually sure the ptrace synchronization is complete wrt scheduling, as there will be a window when the process has set the task state to TASK_STOPPED but hasn't actually yet scheduled away. Oh, well). All other ptrace operations (not counting killing the child) will check that the child is quiescent. But PTRACE_ATTACH will not, as we're just setting up the stopping. In 2.4.x, this bug doesn't happen because flags was split up into current-ptrace and current-flags. Exactly because of locking concerns. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] swap-speedup-2.4.3-B3
On Tue, 24 Apr 2001, Ingo Molnar wrote: the latest swap-speedup patch can be found at: Please don't add more of those horrible wait arguments. Make two different versions of a function instead. It's going to clean up and simplify the code, and there really isn't any reason to do what you're doing. You should split up the logic differently: if you want to wait for the page, then DO so: page = lookup_swap_cache(..); if (page) { wait_for_swap_cache:valid(page); .. use page .. } Note how much more readable and UNDERSTANDABLE the above is, compared to page = lookup_swap_cache(..., 1); if (page) { ... and note also how splitting up the waiting will - simplify the swap cache lookup function, making it faster for people who do _NOT_ want to wait. - make it easier to statically check the correctness of programs by just eye-balling them (Hey, he's calling 'wait' with the spinlock held). - more easily moving the wait around, allowing for more concurrency. Basically, I don't want to mix synchronous and asynchronous interfaces. Everything should be asynchronous by default, and waiting should be explicit. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: orinoco_cs IrDA
On Tue, 24 Apr 2001, Jean Tourrilhes wrote: I've got a question... I would like where to send my driver patches... Probably both me and Alan. [ General rules follow. Too few people seem to have seen them before ] Most importantly, when sending patches to me: - specify clearly that you really want to see them in the standard kernel, and why. I occasionally get patches that just say this is a good idea. I don't apply them. Especially if they are cc'd to somebody else too, in which case I pretty much assume that it's a RFC, not a real patch. - do NOT send patches in attachements. Send one patch per mail, in clear-text under your message, so that I can easily see the patch and decide then-and-there whether it looks ok. And if it doesn't look ok, and I do a reply, the patch gets included in the reply so that I can point out which part of the patch I dislike. Don't worry about sending me five emails. That's FINE. I much prefer seeing five consecutive emails from the same person with five distinct subject lines and five distinct patches, than seeing one email with five attachements to it. - if your email system is broken, and you want to send patches as attachements to avoid whitspace damage, then please FIX YOUR EMAIL SYSTEM INSTEAD. - Don't point to web-sites. If I have to move the mouse outside my email xterm to work on the email, your email just got ignored. - Make your patches one sub-directory under the source tree you're working on. In short, your patches should look like something like --- clean/fs/inode.c ... +++ linux/fs/inode.c .. @@ -179,7 +179,7 @@ ... so that I can (regardless of where my source tree is) apply them with patch -p1 from my linux top directory. Then I can just do a cd v2.4/linux patch -p1 ~/multiple-emails-with-multiple-accepted-patches and not have to worry about three patches being based on /usr/src/linux, while two others not having a path at all and being individual filenames in linux/drivers/net. - and finally: re-send. If I had laser-eye surgery the fay you sent the patches, I won't have applied them. If I took a day off and spent it with the kids at the pool instead, I won't have applied them. If I decided that this weekend I'm not going to read email for a change, I won't have applied them. And when I come back to work a day or two later, I will have several hundred other emails to work through. I never go backwards in my emails. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] swap-speedup-2.4.3-B3 (fwd)
On Thu, 26 Apr 2001, Mike Galbraith wrote: 2.4.4.pre7.virgin real11m33.589s user7m57.790s sys 0m38.730s 2.4.4.pre7.sillyness real9m30.336s user7m55.270s sys 0m38.510s Well, I actually like parts of this. The always swap out current mm one looks rather dangerous, and the lastscan jiffy thing is too ugly for words, but refill_inactive() looks much nicer. There is beauty in simplicity. The page aging in drop_pte feels pretty harsh, though. Have you looked at free_pte()? I don't like that function, and it might make a difference. There are several small nits with it: - it should probably try to deactivate the page. If drop_pte does that when it deacctivates a page involuntarily, why not do it for a real we just free'd the page voluntarily? - swap-cache pages should probably not just be de-activated, but actively aged down. Right now, they are neither, so we have to work all the way through refill_inactive() and then page_launder() to clear them out. Even though the page may be entirely useless by now (we had a complex special case that caught and short-circuited some of the pages, and maybe it was worth it. But maybe the right thing is to just age them down and naturally deactivate them?) After all, we aged them up for references to this virtual mapping, and free_pte() just made it go away. Unlike normal page cache pages, we don't get any advantage from trying to cache the things across multiple VM's. - we're dropping the accessed bit on the floor. In the vmscan case the accessed bit would have aged the page up. On the other hand, to offset some of these, we actually count the page accessed _twice_ sometimes: we count it on lookup, and we count it when we see the accessed bit in vmscan.c. Which results in some pages getting aged up twice for just one access if we go through the vmscan logic, while if we just map and unmap them they get counted just once. Obviously the page aging logic seems to be making a noticeable difference to you. So looking at page aging logic issues in the bigger picture migth be worthwhile - not just staring at the actual swap-out code. The fact is, the swap-out-code cannot get the aging right if the rest of the system ignores it or does it only for some cases. I _think_ the logic should be something along the lines of: freeing the page amounts to a implied down-aging of the page, but the 'accessed' bit would have aged it up, so the two take each other out. But if so, the free_pte() logic should have something like if (page-mapping) { if (!pte_young(pte) || PageSwapCache(page)) age_page_down_ageonly(page); if (!page-age) deactivate_page(page); } instead of just ignoring these issues completely. Comments? Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/