Re: Patch 4/6 randomize the stack pointer
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Alright, I'll bite. Someone told me to bring this up after reading all the complaints about breakage, so again we get back to PaX. I'm more interested in "this patch is bad" than "PaX is better" for this argument, but whatever. Compatibility has been repetedly mentioned here. Breaking things has been mentioned. Things inside the distro won't break becaues the distro maintainers mark them; third party vendors should mark them too. If they don't, they STILL won't break, if the distribution is crafted to do really ugly things I hate to enter ultimate-global-super-compatibility-mode. Last year I started on my master's thesis for computer security. Granted, next semester i'm *hoping* to get my AS in computer science, but I wanted to start writing early. So out of the 18 pages, I'll pull one little bit from section 4 (Deployment) subsection 2 (Executable Space Protections). This'll be a big read. - ---CUT--- 2. Executable Space Protections Executable Space Protections can be deployed on many architectures using PaX. A number of methods of deployment could be used, each ranging its own ratio of security vs. compatibility. The recommended course of action is to allow the administrator to control how protections are applied, either by setting an automatic default method or by being asked where protections should be applied on a case by case basis. Any binary which may function under full restrictions should be set to function under full restrictions automatically, without asking. There may be an option to ask the administrator in every case including those where the greatest security is used by default; but in most cases, the administrator will not want to be bothered unless a security concern is raised. There are three states for restrictions. In the Default state, the restriction is not explicitly enabled or disabled; PaX decides whether to use the restriction based on the Softmode setting. If the system is in Softmode, PaX does not enable restrictions in the Default state; if the system is not in Softmode, PaX enables restrictions in the Default state. Contrastingly, restrictions in the Enabled state are enabled under PaX regardless of Softmode, while restrictions in the Disabled state are disabled under PaX regardless of Softmode. Here, the term "compatibility" is used to indicate how much software doesn't work. A system with low compatibility will have software that does not run due to security restrictions; while a system with high compatibility will run most if not all software, including third party software. There are four basic methods of PaX flag control, each detailed briefly below. As stated above, the administrator should choose which method to employ. A. Manual Control Manual Control is not recommended as a default. Under Manual Control, all restrictions remain in the Default state on all binaries at installation time. This imposes the most added administrative duty and the least compatibility. B. Selective Disable Selective Disable is the most basic form of control, allowing the implementation to ship with everything working. Under Selective Disable, binaries known to break due to PaX restrictions have those restrictions set to the Disabled state when installed, leaving the rest in the Default state. This relieves most administrative duty and increases compatibility, although third party binaries may not come marked. C. Inheritive Selective Disable Inheritive Selective Disable is similar to Selective Disable, except that libraries are also marked and tabs are kept on these. When software is installed which uses a library, the Disabled features of the executable and each library are masked together to come up with the final mask to apply to the executable. These masks can later be generated for third party programs with an administrative tool in order to enhance compatibility further; although third party programs and libraries requiring other markings in themselves not also needed by other libraries will still break. D. Selective Enable Selective Enable is the only method leveraging Softmode to enhance compatibility. It is also the only method which will leave third party binaries completely exposed with no reason aside from that they are not explicitly packaged with a set of listed restrictions. Under Selective Enable, executable binaries have all restrictions except those known to break them set to Enabled, leaving the rest in the Default state. Third party binaries which come with no markings will have no restrictions in Softmode, and so full compatibility is reached with the maximum justifiable trade-off in the range of executables protected by PaX. The above methods become progressively more compatible, but at the same time less secure. Both the standard and Inheritive variations of the Selective Disable method are about on par in principle;
Re: [PATCH] OpenBSD Networking-related randomization port
> It adds support for advanced networking-related randomization, in > concrete it adds support for TCP ISNs randomization Er... did you read the existing Linux TCP ISN generation code? Which is quite thoroughly randomized already? I'm not sure how the OpenBSD code is better in any way. (Notice that it uses the same "half_md4_transform" as Linux; you just added another copy.) Is there a design note on how the design was chosen? I don't wish to be *too* discouraging to someone who's *trying* to help, but could you *please* check a little more carefully in future to make sire it's actually an improvement? I fear there's some ignorance of what the TCP ISN does, why it's chosen the way it is, and what the current Linux algorithm is designed to do. So here's a summary of what's going on. But even as a summary, it's pretty long... First, a little background on the selection of the TCP ISN... TCP is designed to work in an environment where packets are delayed. If a packet is delayed enough, TCP will retransmit it. If one of the copies floats around the Internet for long enough and then arrives long after it is expected, this is a "delayed duplicate". TCP connections are between (host, port, host port) quadruples, and packets that don't match some "current connection" in all four fields will have no effect on the current connection. This is why systems try to avoid re-using source port numbers when making connections to well-known destination ports. However, sometimes the source port number is explicitly specified and must be reused. The problem then arises, how do we avoid having any possible delayed packets from the previous use of this address pair show up during the current connection and confuse the heck out of things by acknowledging data that was never received, or shutting down a connection that's supposed to stay open, or something like that? First of all, protocols assume a maximum packet lifetime in the Internet. The "Maximum Segment Lifetime" was originally specified as 120 seconds, but many implementations optimize this to 60 or 30 seconds. The longest time that a response can be delayed is 2*MSL - one delay for the packet eliciting the response, and another for the response. In truth, there are few really-hard guarantees on how long a packet can be delayed. IP does have a TTL field, and a requirement that a packet's TTL field be decremented for each hop between routers *or each second of delay within a router*, but that latter portion isn't widely implemented. Still, it is an identified design goal, and is pretty reliable in practice. The solution is twofold: First, refuse to accept packets whose acks aren't in the current transmission window. That is, if the last ack I got was for byte 1000, and I have sent 1100 bytes (numbers 0 through 1099), then if the incoming packet's ack isn't somewhere between 1000 and 1100, it's not relevant. If it's 950, it might be an old ack from the current connection (which doesn't include anything interesting), but in any case it can be safely ignored, and should be. The only remaining issue is, how to choose the first sequence number to use in a connection, the Initial Sequence Number (ISN)? If you start every connection at zero, then you have the risk that packets from an old connection between the same endpoints will show up at a bad time, with in-range sequence numbers, and confuse the current connection. So what you do is, start at a sequence number higher than the last one used in the old connection. Then there can't be any confusion. But this requires remembering the last sequence number used on every connection ever. And there are at least 2^48 addresses allowed to connect to each port on the local machine. At 4 bytes per sequence number, that's a Petabyte of storage... Well, first of all, after 2*MSL, you can forget about it and use whatever sequence number you like, because you know that there won't be any old packets floating around to crash the party. But still, it can be quite a burden on a busy web server. And you might crash and lose all your notes. Do you want to have to wait 2*MSL before rebooting? So the TCP designers (I'm not on page 27 of RFC 793, if you want to follow along) specified a time of day based ISN. If you use a clock to generate an ISN which counts up faster than your network connection can send data (and thus crank up its sequence numbers), you can be sure that your ISN is always higher than the last one used by an old connection without having to remember it explicitly. RFC 793 specifies a 250,000 bytes/second counting rate. Most implementations since Ethernet used a 1,000,000 byte/second counting rate, which matches the capabilities of 10base5 and 10base2 quite well, and is easy to get from the gettimeofday() call. Note that there are two risks with this. First, if the connection actually manages to go faster than the ISN clock, the next connection's ISN will be in the middle of the space the
help me to know when ethernet header added to packet by eth_header function
Hello, Can anybody explain me how ethernet header is added to every packet outgoing? I check eth.c file and found eth_header that is used for adding ethernet header on skbuff packet. But does each packet calls this function? I think not as theres a cache header function used that cache ethernet header entry. So my main question is that when my machine first contacted to any other pc in LAN does it calls eth_header and when require to send any type of packet to same machine i thnik eth_cache_header is used is that right??? Then can it be possible that if my machine not contacted to any cached entry machine then eth_header will be called again to build eth header for that machine? In an all when functions in eth.c will be called/not called eth_header,eth_header_cache,eth_header_parse,eth_header_cache_update??? Please kindly help me to identify it. Thanks in advance. regards, linux_lover. __ Do you Yahoo!? Yahoo! Mail - You care about security. So do we. http://promotions.yahoo.com/new_mail - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: userspace vs. kernelspace address
On Fri, Jan 28, 2005 at 01:40:51PM -0800, Rock Gordon wrote: > Hi everbody, > > Thanks for your replies. > > However I think my copy_to_user and copy_from_user are > failing since the kernel-mode thread is copying data > into another process's address space, and I am not > sure how to do this. Do the get_fs() and set_fs() > combinations let you do that? If not, then how do I do My idea is on kernel thread is limited. But I think it is not possible to any userspace address from any kernel thread because they do not have access to it. Their proc_struct->mm field is empty. I am not sure whether set_fs and get_fs help in this case. HTH, Om - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch, 2.6.11-rc2] sched: RLIMIT_RT_CPU_RATIO feature
Peter Williams <[EMAIL PROTECTED]> writes: > > If the average usage rate is estimated over longer periods it will be > lower allowing lower limits to be used. Also if the task's own usage > rate estimates are used to test the limits then the limit can be lower. > > If the default limits can be made sufficiently small then the > temptation to use this feature by "ordinary" applications will > disappear. > > I'm not an expert but I imagine that the CPU usage rates of most RT > tasks taken over reasonably long time intervals is quite low and > therefore the default limits could also be quite low without adversely > effecting the programs that this mechanism is meant to help. True for some, but definitely not for all. When a system was purchased specifically to do some realtime job, it often makes sense to dedicate large chunks of the main processor to realtime number crunching. Mass-produced general-purpose processors have excellent price/performance ratios. There's no good reason not to take advantage of that. People commonly run heavy Fast Fourier Transform or reverb calculations in realtime threads. They may use up as much of the CPU as the user/owner is willing to allocate. With soft realtime, its hard to push this reliably beyond about 70-80%. But, those numbers are definitely practical. -- joq - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OpenBSD Networking-related randomization port
Stephen Hemminger <[EMAIL PROTECTED]> writes: > On Fri, 28 Jan 2005 12:45:17 -0800 > "David S. Miller" <[EMAIL PROTECTED]> wrote: > >> On Fri, 28 Jan 2005 21:34:52 +0100 >> Lorenzo Hernández García-Hierro <[EMAIL PROTECTED]> wrote: >> >> > Attached the new patch following Arjan's recommendations. >> >> No SMP protection on the SBOX, better look into that. >> The locking you'll likely need to add will make this >> routine serialize many networking operations which is >> one thing we've been trying to avoid. >> > > per-cpu would be the way to go here. I don't think so no - just doing per cpu counters you risk nearby duplicates, which can cause even easier data corruption (e.g. during ip fragment reassembly - it is already very weak and making it weaker is probably not a good idea) If you want SMP performance for ipids you can resurrect the old "cookie jar" approach I used in 2.4 time frame to get rid of inetpeers. The idea was that you have global state, and each CPU would regenerate some numbers from the state, then store them in a private "jar" and use them use, then look at the global state with locking again etc. This can be tuned on how big the jar is - the bigger the smaller the sequence space (risky for 16bit ipids), but the better the scalability. But before doing anything like this I would recommend that someone skilled in cryptography evaluates the security of these functions carefully and see if it actually has any advantages. I remember that Andrey S. broke some of the "cool" "secure" openbsd IDs easily some years ago. At least for ipids I'm utterly sceptical. 16bits are just hopeless. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat ioctl for submiting URB
Christopher Li <[EMAIL PROTECTED]> writes: > VMware is a big user of the usbdevfs, we translate guest USB > IO to usbdevfs, by submitting URB. On the x86_64 system, we > need those compatible ioctl for submitting URBs. For now we > make a hack to submit it through the vmmon driver. But that > is very ugly. > > I do want this problem get fixed in the linux kernel eventually. > I have been toying with two different ways to solve it. It seems > that it is unavoidable to get hands dirty in the usbdevfs internals. > The first one is just educate the usbdevfs to know about the 32 bit > URB ioctls. So it don't need to keep around a bounce buffer. Looks reasonable from a first look. Issues: - Should use CONFIG_COMPAT, not x86-64 specific symbols - Why can't you set URB_COMPAT transparently in the emulation layer? Then existing applications would hopefully work without changes, right? You may also want to preserve the __user casts, otherwise Al Viro and other sparse users will be unhappy. Thanks for attacking this long standing problem. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Patch 4/6 randomize the stack pointer
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Rik van Riel wrote: > On Thu, 27 Jan 2005, John Richard Moser wrote: > >> Arjan van de Ven wrote: > > Is this one any worse? >>> >>> yes. >>> >>> oracle, db2 and similar like to mmap 2Gb or more *in one chunk*. >> >> >> Special case? > > > Absolutely, but ... > >> Can I get this put into perspective? How much more important is "Good" >> randomization versus "not breaking Oracle," which becomes "No >> randomization" > > > 1) quite a lot of Linux users do use Oracle, DB2 or do >scientific calculations - distributions cannot afford >to break those applications, the default has to work >for everybody > So package oracle marked to not use the randomization. > 2) "weaker" randomization (2MB) is still effective if the >stack is non-executable, so the "load a bunch of NOPs" >approach won't work - this is what Fedora and RHEL use > "In some cases, this does nothing, so we'll leverage those cases as an argument for why this should go in, even though we're effectively saying 'please add useless junk to the kernel'" No dear, please, real ASLR has a point, try not to castrate it. > 3) it is not as theoretically strong as what you propose, >but having the "weaker" scheme enabled is definitely >more secure than having the "stronger" scheme disabled >because it breaks applications > *takes the glass pipe away* Well, I'm going to give random constructive criticism on red hat as a whole now, so try learning something from it instead of taking it as flamebait. I just ate and feel particularly like talking for no reason about half-relavent topics. I actually just tried to paxtest a fresh Fedora Core 3, unadultered, that I installed, and it FAILED every test. After a while, spender reminded me about PT_GNU_STACK. It failed everything but the Executable Stack test after execstack -c *. The randomization tests gave 13(heap-etexec), 16(heap-etdyn), 17(stack), and none for main exec (etexec,et_dyn) or shared library randomization. Also, before you say it, I read, comprehended, and anylized the source. This was PaXtest 0.9.6, and I did specific traces (after changing body.c to prevent it from forking) to look for mprotect() and mmap() calls and find out what they do (I saw probably glibc getting mmap()ed in, there wasn't anything in the source doing the mmap() calls I saw). There were no dirty tricks to mprotect() a high area of memory, which is something Ingo called foul on in 0.9.5. My point isn't that ES failed (the above discourse was to preempt Ingo calling a technical foul on paxtest again); but that I forgot about PT_GNU_STACK. How many vendors are going to forget about PT_GNU_STACK and its automatic markings and think they're protected? Do they even know/care? "it works so we'll just keep doing what we're doing, if we break the protection it'll adjust to let us" is pretty good strategy to a lot of people who don't want to be assed with your security crap. Another concern of mine, execstack gives X for PT_GNU_STACK and - for cleared PT_GNU_STACK. With many binaries i get shipped (flash and java plug-ins), there's a ? when I check them, so I clear the flag and they work. Note that I'm referring to the Java PLUG-IN, not the JRE itself; you can have full PaX restrictions on Firefox and have working Java in Firefox, because java_vm is a separate process :) (you have to chpax java itself). Firefox happens to be a high-risk application too IMHO (it's pointed at the net and exposes Gecko bugs for HTML and Javascript parsing, libjpeg and libpng bugs, and God knows what else), and I don't want it accidentally getting an executable stack. Finally, although an NX stack is nice, you should probably take into account IBM's stack smash protector, ProPolice. Any attack that can evade SSP reliably can evade an NX stack; but ProPolice protects from other overflows. Now I'm sure RH is over there inventing something that detects buffer overflows at compile time and misses or warns about the ones it can't identify: if (strlen(a) > 4) a[5] = '\0'; foo(a); void foo(char *a) { char b[5]; strcpy(b,a); } This code is safe, but you can't tell from looking at foo(). You don't get a look at every other object being compiled against this one that may call foo() either. So compile time buffer overflow detection is a best-effort at best. ProPolice protects local variables with 0 overhead; passed arguments with a few instructions; and the return pointer and stack frame pointer with a couple instructions. At runtime. Want to impress me? Actually deploy ProPolice instead of showing up 3 years from now waving around your own patch that you wrote that half-impliments half of it. If you want "something better," it's GPL, so grab it and start hacking. Anyway, that's my far-far-far offtopic rant for the day. - -- All content of all messages exchanged herein are left in the Public Domain, unless otherwise explicitly
Re: compat ioctl for submiting URB
Christopher Li <[EMAIL PROTECTED]> writes: > This patch is for the case that running 32 bit application on > a 64 bit kernel. So far only x86_64 allow you to do that. > > I am not aware of other 64bit architecture need the 32bit > emulation. A lot of them do. Just use CONFIG_COMPAT instead. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Discuss][i386] Platform SMIs and their interferance with tsc based delay calibration
Venkatesh Pallipadi <[EMAIL PROTECTED]> writes: > + > + /* > + * If the upper limit and lower limit of the tsc_rate is more than > + * 12.5% apart. > + */ > + if (pre_start == 0 || pre_end == 0 || > + (tsc_rate_max - tsc_rate_min) > (tsc_rate_max >> 3)) { > + printk(KERN_WARNING "TSC calibration may not be precise. " > +"Too many SMIs? " > +"Consider running with \"lpj=\" boot option\n"); > + return 0; > + } I think it would be better to rerun it a few times automatically before giving up. This way it would hopefully work transparently but slower for most users. The message is too obscure too to be usable and needs more explanation. And also in case the platforms in questions support EM64T x86-64 would need to be changed too :) -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Compactflash (Sandisk 512) hangs on access
> > I have been trying unsuccessfully over the last 2 weeks to get > > compactflash working on my Linux system based on mini-ITX (Via CL > > motherboard, pentium compatible). > > > > I use a CF->IDE adapter to access it just like a IDE hard disk. My > > compactflash is Sandisk SDCFH-512. Linux can detect it. I can even > > mount it and do a fdisk on it. However, the moment I try to do > > anything substantial like copy multiple files or copy 1000 blocks > > using dd, I lose access to it. Linux loses access to it totally. I > > can't even do a fdisk on it. I get an error like "Unable to open > > /dev/hdc". On Thu, 27 Jan 2005 22:07:35 +0100, Willy Tarreau <[EMAIL PROTECTED]> wrote: > Have you checked that the power connector really provides 5V to the > IDE-CF adapter ? I had the exact same behaviour 5 years ago with a power > wire cut. Signal lines were powerful enough to bring power to the cheap > flash (16 MB), I could even read it, most times. The kernel almost always > booted from it, and when it turned to mount the ext2 fs R/W, it hanged. I > finally partially destroyed it this way, and it got several defects which > could not be cleaned with a simple write or format. > > Other than that, I have lots of CF cards on IDE adapters (some on motherboard, > some hand-made, some bought to serious makers), and never ran into such > problems since. > > Willy The power connector is fine. I also disabled DMA (some suggestion on this newsgroup to a similar error) and now I can't turn it back on. everest root # hdparm -d1 /dev/hdc /dev/hdc: setting using_dma to 1 (on) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Patch 4/6 randomize the stack pointer
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ingo Molnar wrote: > * Paulo Marques <[EMAIL PROTECTED]> wrote: > > >>I really shouldn't feed the trolls, but this must be the most silly >>piece of code I saw on this mailing list in a very long time (and >>there have been some good examples over time). > > > yeah. > > >>The stack randomization doesn't prevent some sort of attacks (like >>return into libc, etc.) and given a small randomization it might be >>possible to write an exploit with a long sequence of NOP's and a >>return address somewhere in there (the attacker wouldn't know exactly >>where, but it wouldn't matter anyway). If we are able to write 'N' >>NOP's then we get a 'N'/64k chance that the exploit works. > > > yeah. NOP techniques can always be used to 'chop off bits' from any > randomization, in case the address of the payload is not known. Both > instruction NOPs for shellcode and 'parameter NOPs' ("././././" strings > or "/bin/sh\0/bin/sh\0" strings) can be used. > > but there is no fundamental theoretical difference between a 256 MB > randomization (as PaX uses) and a 2 MB randomization (Fedora) in terms > of characteristics: what is brute-force in one is brute-force in the > other as well, with a factor of overhead difference of 128. (which makes > the attack 128 times longer, but has no real difference to security.) > You said: yeah. NOP techniques can always be used to 'chop off bits' from any randomization, in case the address of the payload is not known. Both instruction NOPs for shellcode and 'parameter NOPs' ("././././" strings or "/bin/sh\0/bin/sh\0" strings) can be used. Bear with me here, I'm out of things I've studied and researched, so now we're going to go into "junk coming out of my head." It's either going to be very painful, or very funny, or both at the same time. No, I don't care that I'm about to look like an ass. You're starting with 64K of randomization, and moving to 2M later. The stack is how big? 4-8M? I don't know, I'm guessing; I saw earlier some code that said that the stack was defined as having at least 8M in some header, which "should be enough for most people" so I assume it's almost if not over 2M. Cut off however much data you know is going to be pushed already (which is what we've been calling 'the size of the stack'), compare that with the randomization, if it's bigger than the randomization period, you have chopped off all randomization. If not, you've probably got better than a 50-50. Because the size of a 'bit' grows as your entropy grows, chopping 2 megs off the randomization at 256M is significantly less than 1 bit (128M is 1 bit), while it's about 9 bits when considering 2 megs of randomization. Short version: I've got a better chance of finding an exploit that lets me just knock-off a couple megs of randomization than I do of brute forcing it. I've got a WAY better chance of brute-forcing in one or two tries if I can knock most of the randomization off. > so the attempt of our beloved troll to paint 2 MB of randomization as > 'weak' and 256 MB randomization as 'strong' is i believe misguided: both > are 'weak' in most of the threat models! (and even for threat types > where they might be considered 'strong' (e.g. whether a hole is suitable > to feed a destructive worm), they'll both be considered 'strong'.) > Let's look at GrSecurity's brute force deterrance real quick. I know you don't want to hear it, but maybe you should. The basic idea, and it's an ugly one but you have to forgive people for trying to do stupid shit like LET BROKEN CODE RUN SAFELY, is to detect a segfault (jump into unmapped ram, probably miss due to ASLR) or PaX kill (should also detect a SIGILL) and then flag the highest parent (who is found via magic I won't get into here). When flagged with this particular flag, all fork() calls are queued so that one fork() occurs every 30 seconds. This is annoying and ugly as shit, but we're trying to do the unspeakable: Make broken, security-hole ridden code safe to run in a hostile environment. Suddenly the 216 second cycle to brute force PaX' ASLR becomes something like 3 weeks! :) This randomization, after accounting for knocking off all the bits we can, may take two or three, maybe ten or twenty tries. This is what, 300-600 oh hell TEN MINUTES. Yes, you did better than 216 seconds. When brad first tried to bash the concept of his brute force deterrance through my head, I kept poking at the 30 second interval and the idea of making about 200 connections BEFORE slamming the server. The server will wait about a minute or two before timing you out, so this is fine, as it takes 3-4 seconds. He eventually got it through my skull that you can do the first 200 hits; but then every fork() afterwards is QUEUED, not executed in batch every 30 seconds. This makes a difference. It means you get a little boost with huge randomization, but not that much. In your model, however, that "little
Re: compat ioctl for submiting URB
On Fri, Jan 28, 2005 at 08:33:05PM -0500, Christopher Li wrote: > This patch is for the case that running 32 bit application on > a 64 bit kernel. So far only x86_64 allow you to do that. > > I am not aware of other 64bit architecture need the 32bit > emulation. Huh??? a) ppc64 runs ppc32 userland b) sparc64 runs sparc32 userland (as the matter of fact, very few userland programs are normally built 64bit there - no benefits in doing that for most applications, it only bloats the memory footprint) c) mips64 runs mips32 userland d) itanic, IIRC, runs i386 userland e) s390x runs s390 userland f) parisc64 runs parisc32 userland It's normal situation, not an exception. The only pair I'm not sure about is sh64/sh. AFAICS, the only other supported 64bit platform without 32bit emulation is alpha - and in that case there's no corresponding 32bit processor to emulate. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat ioctl for submiting URB
Christopher> This patch is for the case that running 32 bit Christopher> application on a 64 bit kernel. So far only x86_64 Christopher> allow you to do that. Actually, at least ia64, mips, parisc, ppc64, s390 and sparc64 also support 32-bit applications on a 64-bit kernel. All of those architectures except s390 can use USB. I guess vmware doesn't run on most of those architectures but any solution in the mainline kernel should be generic enough to handle them all. - R. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat ioctl for submiting URB
This patch is for the case that running 32 bit application on a 64 bit kernel. So far only x86_64 allow you to do that. I am not aware of other 64bit architecture need the 32bit emulation. Chris On Sat, Jan 29, 2005 at 04:29:51AM +, Gianni Tedesco wrote: > On Fri, 2005-01-28 at 16:23 -0500, Christopher Li wrote: > > +#ifdef CONFIG_IA32_EMULATION > > + > > + case USBDEVFS_SUBMITURB32: > > + snoop(>dev, "%s: SUBMITURB32\n", __FUNCTION__); > > + ret = proc_submiturb_compat(ps, p); > > + if (ret >= 0) > > + inode->i_mtime = CURRENT_TIME; > > + break; > > +#endif > > Why don't other 64bit architectures need this chunk? > > -- > // Gianni Tedesco (gianni at scaramanga dot co dot uk) > lynx --source www.scaramanga.co.uk/scaramanga.asc | gpg --import > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Discuss][i386] Platform SMIs and their interferance with tsc based delay calibration
Please don't send emails which contain 500-column lines? Venkatesh Pallipadi <[EMAIL PROTECTED]> wrote: > > Current tsc based delay_calibration can result in significant errors in > loops_per_jiffy count when the platform events like SMIs (System Management > Interrupts that are non-maskable) are present. This seems like an unsolveable problem. > Solution: > The patch below makes the calibration routine aware of asynchronous events > like SMIs. We increase the delay calibration time and also identify any > significant errors (greater than 12.5%) in the calibration and notify it > to user. Like to know your comments on this. I find calibrate_delay_tsc() quite confusing. Are you sure that the variable names are correct? + tsc_rate_max = (post_end - pre_start) / DELAY_CALIBRATION_TICKS; + tsc_rate_min = (pre_end - post_start) / DELAY_CALIBRATION_TICKS; that looks strange. I'm sure it all makes sense if one understands the algorithm, but it shouldn't be this hard. Please reissue the patch with adequate comments which describe what the code is doing. Shouldn't calibrate_delay_tsc() be __devinit? (That may generate warnings from reference_discarded.pl, but they're false positives) >From a maintainability POV it's not good that x86 is no longer using the generic calibrate_delay() code. Can you rework the code so that all architectures must implement arch_calibrate_delay(), then provide stubs for all except x86? After all, other architectures/platforms may have the same problem. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] shared subtrees
On Fri, 28 Jan 2005, Mike Waychison wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Al Viro wrote: OK, here comes the first draft of proposed semantics for subtree sharing. What we want is being able to propagate events between the parts of mount trees. Below is a description of what I think might be a workable semantics; it does *NOT* describe the data structures I would consider final and there are considerable areas where we still need to figure out the right behaviour. Okay, I'm not convinced that shared subtrees as proposed will work well with autofs. OK. I've read the thread but haven't digested it so you'll have to put up with some stupid questions. The idea discussed off-line was this: When you install an autofs mountpoint, on say /home, a daemon is started to service the requests. As far as the admin is concerned, an fs is mounted in the current namespace, call it namespaceA. The daemon actually runs in it's one private namespace: call it namespaceB. namespaceB receives a new autofs filesystem: call it autofsB. autofsB is in it's own p-node. namespaceA gets an autofsA on /home as well, and autofsA is 'owned' by autofsB's p-node. So: autofsB -> autofsB and autofsB -> autofsA Effectively, namespaceA has a private instance of autofsB in its tree. The problem is this: Assume /home/mikew is accessed in namespaceA. The daemon running in namespaceB gets the event, and mounts an nfs vfsmount on autofsB. This event is propagated back to autofsA. Which condition (or action) in the definition implies autofsB -> autofsA (Problem 1: how do you block access to /home/mikew in namespaceA?) Next, a CLONE_NS is done in namespaceA, creating namespaceA'. the homedir on /home/mikew is also copied. Now, in namespaceA', what happens when a user umount's /home/mikew? We haven't yet determined how to handle umount event propagation, but it appears likely that it will be *a hard thing to do*. No I haven't spent enough time on the RFC buy into this one. So I'll just say it looks like something is missing in this argument. Perhaps the later is namespaceC? Assuming the nfs umount succeeds, /home/mikew is accessed again in namespaceA'. namespaceC? (Problem 2: The daemon in namespaceB will see the event, but it already has something mounted on it's version of /home/mikew. How does it 'send' a mountpoint to namespaceB.) - --- Shared subtrees may help in some adminstrative situations, but don't look like the right solution for autofs. Autofs will work with namespaces if the following functionality is added to the kernel: The ability to perform mount(2) operations on a directory fd. This has been discussed before and quickly vetoed, citing that it is a security risk. I still fail to understand how allowing a mount to happen cross-namespace given a dirfd target is any worse than what is already possible given a dirfd. If you don't want someone to play with your namespace, don't give them a dirfd. Thoughts? - -- Mike Waychison Sun Microsystems, Inc. 1 (650) 352-5299 voice 1 (416) 202-8336 voice ~~ NOTICE: The opinions expressed in this email are held by me, and may not represent the views of Sun Microsystems, Inc. ~~ -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFB+r1OdQs4kOxk3/MRAmSpAJ96ix25fjze6o7viCq2DCET9J/AlQCfYlC1 CoLKusJXjL+fYxgwggOCW+w= =8bTv -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Kernel oops on integrating a module with obj-y option
Hello everyone, I am using Fedora core 1. I am doing my project in the linux kernel 2.4.28. In my project, I am intercepting system calls. I am doing all these things from a module. Now, I installed this module with the main kernel and I found it working nice when I used 'modprobe' to load it. Then I changed obj-m of my module to obj-y and then I compiled my module object file with the core kernel files like fs.o net.o kernel.o. So, my target kernel binary code contains my module. Then I booted my system. Now, the kernel oops sometimes and sometimes it prompts for checking the disk and opens the file system as a read only device. To integrate my module, I created a new subdirectory under the kernel source directory named 'rsched' and I icreated my own make file for that. The makefile contains the following lines obj-y := rsched.o ( previously obj-m := rsched.o) include $(TOPDIR)/Rules.make then I changed the following lines in the top level make file. SUBDIRS := fs net kernel rsched CORE_FILES := kernel/kernel.o fs/fs.o rsched/rsched.o How can I rectify this error so that I can integrate my module with the main kernel image? Thanks in advance and regards, selva __ Do you Yahoo!? Yahoo! Mail - Helps protect you from nasty viruses. http://promotions.yahoo.com/new_mail - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Possible bug in keyboard.c (2.6.10)
On Fri, Jan 28, 2005 at 11:59:37AM +0100, Vojtech Pavlik wrote: > I'm very sorry about the locking, but the thing grew up in times of > kernel 2.0, which didn't require any locking. There are a few possible Incorrect. You have blocking allocations in critical areas and they required locking all way back. > races with device registration/unregistration, and it's on my list to > fix that, however under normal operation there shouldn't be any need for > locks, as there are no complex structures built that'd become > inconsistent. Um-hm... Vojtech, meet USB mouse; USB mouse, meet Vojtech. Now watch a disconnect and reconnect happening when luser suddenly gets overexcited and jerks the wrong hand a bit too hard while browsing the most profitable sort of website... > If you find scenarios which will lead to trouble in the event delivery > system, please tell me, and I'll try to fix that as soon as possible. See above. Devices appearing and disappearing *are* normal. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Restrict procfs permissions
On Sat, Jan 29, 2005 at 03:45:42AM +0100, Rene Scharfe wrote: > The patch is inspired by the /proc restriction parts of the GrSecurity > patch. The main difference is the ability to configure the restrictions > dynamically. You can change the umask setting by running > ># mount -o remount,umask=007 /proc > > Testing has been *very* light so far -- it compiles and boots. Patch is > against 2.6.11-rc2-bk6. > > Comments are very welcome. It leaves already existing inodes with whatever mode they used to have. _IF_ you want to do that sort of things, do it right - add ->permission() that would apply that umask before checks and if you want it to be seen in results of stat(2) - add ->gettattr() and do the same there. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] pci: Arch hook to determine config space size
On Fri, Jan 28, 2005 at 06:52:34PM +, Christoph Hellwig wrote: > > +int __attribute__ ((weak)) pcibios_exp_cfg_space(struct pci_dev *dev) { > > return 1; } > > - prototypes belong to headers > - weak linkage is the perfect way for total obsfucation > > please make this a regular arch hook I agree. Also, when sending PCI related patches, please cc the linux-pci mailing list. thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat ioctl for submiting URB
On Fri, 2005-01-28 at 16:23 -0500, Christopher Li wrote: > +#ifdef CONFIG_IA32_EMULATION > + > + case USBDEVFS_SUBMITURB32: > + snoop(>dev, "%s: SUBMITURB32\n", __FUNCTION__); > + ret = proc_submiturb_compat(ps, p); > + if (ret >= 0) > + inode->i_mtime = CURRENT_TIME; > + break; > +#endif Why don't other 64bit architectures need this chunk? -- // Gianni Tedesco (gianni at scaramanga dot co dot uk) lynx --source www.scaramanga.co.uk/scaramanga.asc | gpg --import - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Disabling IRQ #xx, because nobody cared!
I'm runing 2.6.10 SMP. I usually use APM, but I decided to try ACPI. On my machine, USB (integrated) and Audio (PCI card) shares IRQ: CPU0 CPU1 0: 19281733 19952671IO-APIC-edge timer 1: 51751 53105IO-APIC-edge i8042 4:16135591503569IO-APIC-edge serial 7: 0 0IO-APIC-edge parport0 8: 2 0IO-APIC-edge rtc 9: 0 0 IO-APIC-level acpi 11: 149496 150504 IO-APIC-level uhci_hcd, uhci_hcd 12: 54518 50376IO-APIC-edge i8042 14: 63398 63535IO-APIC-edge ide0 15: 1 1IO-APIC-edge ide1 169: 11440 11565 IO-APIC-level ide2 177: 456415 456480 IO-APIC-level eth0 185: 50307 49693 IO-APIC-level Ensoniq AudioPCI NMI: 0 0 LOC: 39235997 39236069 ERR: 1 MIS: 0 After a while, I get irq 185: nobody cared! [] __report_bad_irq+0x22/0x90 [] note_interrupt+0x58/0x90 [] __do_IRQ+0x128/0x130 [] do_IRQ+0x1a/0x30 [] common_interrupt+0x1a/0x20 [] default_idle+0x0/0x40 [] default_idle+0x2a/0x40 [] cpu_idle+0x40/0x70 handlers: [] (snd_audiopci_interrupt+0x0/0xc0 [snd_ens1371]) Disabling IRQ #185 Then, after some more time, I get irq 11: nobody cared! [] __report_bad_irq+0x22/0x90 [] note_interrupt+0x58/0x90 [] __do_IRQ+0x128/0x130 [] do_IRQ+0x1a/0x30 [] common_interrupt+0x1a/0x20 [] default_idle+0x0/0x40 [] default_idle+0x2a/0x40 [] cpu_idle+0x40/0x70 [] start_kernel+0x147/0x170 handlers: [] (usb_hcd_irq+0x0/0x60) [] (usb_hcd_irq+0x0/0x60) At which point, USB is dead. Do you know if 'acpi' is responsible for this? -- William Park <[EMAIL PROTECTED]>, Toronto, Canada Slackware Linux -- because I can type. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
USB HID events and Microsoft wheel mouse
Something changed in the Linus BK kernel in the last few days so that I get "drivers/usb/input/hid-input.c: event field not found" in dmesg everytime I move my MS Wheel mouse. Any ideas on how to get rid of this? The events are EV_MISC: type 4 code 4 value 65585 type 4 code 4 value 65584 type 4 code 4 value 589825 -- Jon Smirl [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
HID warning messages fills the logs
Hi, when running 2.6.11-rc2-bk6 with my USB HID v1.00 Mouse [Microsoft Microsoft Wheel Mouse Optical®] the logs get filled with this message: kernel: drivers/usb/input/hid-input.c: event field not found last message repeated 459 times last message repeated 1157 times Regards Marcel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Impossible to renice threaded NPTL programs on 2.6.10
For those times when a threaded program runs amok, and I still have some hope that it will eventually stop being a pig, but would like to actually use my computer in the meanwhile, the idea of renicing this runaway program to nice 19 comes to mind. Except, it doesn't actually work. Only the main thread seems to get reniced, and the threads created with pthread_create seem to merrily go on with their plundering of CPU cycles. Test code at the end of the mail. To reproduce this, I start the test program, and observe in top that it is indeed consuming all CPU like it was intended to. Then I renice it in top, to nice 19. Effect is, '% ni value in top still stays the same, and these hog threads are still consuming nearly all CPU and not sharing with other nice 19 processes, indicating that they were not reniced to 19. Tested on Fedora Core 2's kernel 2.6.10-1.9 + procps 3.2.5 from sf.net Tested on kernel 2.6.10-ck4 + procps 3.2.1 Here for the test case: #include #include #include #include #define THREADS 10 void *hog(void*p); int main(int argc, char** argv) { pthread_t *threads; int i; threads = malloc(sizeof(pthread_t) * THREADS); for(i=0;ihttp://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
slab BUG in FC devel kernel, x86-64
The kernel in question is based on 2.6.11-rc2-bk4, FWIW. Transcribed by hand. Happened when rsyncing data onto a LVM-on-RAID1, sata_via controller. (root FS is on generic VIA IDE). slab: double free detected in cache 'size-128', objp 81000340bba8. Kernel BUG at slab:2188 invalid operand: [1] CPU 0 Modules linked in: md5 ipv6 parport_pc lp parport sunrpc ipt_REJECT ipt_state ip-contrack iptable_filter ip_tables dm_mod video button battery ac raid1 ohci1394 ieee1394 uhci_hcd ehci_hcd i2c_viapro i2c_core snd_via82xx snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc gameport snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore via_rhine mii floppy ext3 jbd sata_via libata sd_mod scsi_mod Pid: 161, comm: kswapd0 Not tainted 2.6.10-1.1115_FC4 RIP: 0010:[] {free_block+208} RSP: 0018:81003bde9cd8 EFLAGS: 00010092 RAX: 004a RBX: 81000340b000 RCX: 8042e010 RDX: 8042e010 RSI: 0001 RDI: 81003a82a7d0 RBP: 81003bfef640 R08: 8042e010 R09: 81001dafdd78 R10: 0001 R11: 8044bd20 R12: 81000340bba8 R13: 0013 R14: 81000340b028 R15: 0013 FS: 2aaba3a0() GS:8050d880() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 2aaac000 CR3: 0b36d000 CR4: 06e0 Process kswapd0 (pid: 161, threadinfo 81003bde8000, task 81003bd5d070) Stack: 0001 81003bfe9698 00101a285a78 81003bfef640 0010 81003bfe9688 81003bfe9698 0080 801690c1 Call Trace:{cache_flusharray+242} {kfree+156} {destroy_inode+41} {dispose_list+95} {shrink_icache_memory+993} {shrink_slab+188} {balance_pgdat+547} {kswapd+260} {autoremove_wake_function+0} {autoremove_wake_function+0} {autoremove_wake_function+0} {schedule_tail+11} {child_rip+8} {kswapd+0} {child_rip+0} Code: 0f 0b 21 f1 36 80 ff ff ff ff 8c 08 0f b7 43 24 48 89 de 48 RIP {free_block+208} RSP - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 2.4] ata_piix on ich6r in RAID mode
--- Jeff Garzik <[EMAIL PROTECTED]> wrote: > Martins Krikis wrote: > > Without this patch, if the BIOS of an ICH6R box has IDE set to > "RAID" > > mode then ata_piix will not find any SATA disks because it > incorrectly > > tries the legacy mode. With the patch all 4 SATA drives become > visible. > > I don't think it would break any other vendor's SATA, but you can > be > > the judge of that. If so, perhaps we can restrict the test some > more > > by checking vendor/device IDs. > > > --- linux-2.4.29/drivers/scsi/libata-core.c 2005-01-28 > 12:07:56.0 -0500 > > +++ linux-2.4.29-iswraid/drivers/scsi/libata-core.c 2005-01-28 > 12:14:43.0 -0500 > > @@ -3605,6 +3605,9 @@ int ata_pci_init_one (struct pci_dev *pd > > legacy_mode = (1 << 3); > > } > > > > + if ((pdev->class >> 8) == PCI_CLASS_STORAGE_RAID) > > + legacy_mode = 0; > > + > > /* FIXME... */ > > if ((!legacy_mode) && (n_ports > 1)) { > > printk(KERN_ERR "ata: BUG: native mode, n_ports > 1\n"); > > > hmm. Maybe "!= PCI_CLASS_STORAGE_IDE" instead? Yes, that's much better. No need to even read the programming IF byte unless the class code identifies it as an IDE controller. > Overall, however, I am worried about your report of the driver's > behavior based on that BIOS's configuration. The driver follows the > PCI > IDE standard (previously SFF 8038i), where a register indicates > whether > its in legacy or native mode. As it see it, either > a) the driver logic for reading that register is wrong, or > b) BIOS incorrectly configuring the device, or > c) that register is only applicable for PCI_CLASS_STORAGE_IDE > devices. > > Comments either way? I'd say "c". I don't have the spec, but my PCI course-book seems to imply so. I could send a new patch but I can't verify it just yet---the board decided to stop booting... Martins __ Do you Yahoo!? Yahoo! Mail - Easier than ever with enhanced search. Learn more. http://info.mail.yahoo.com/mail_250 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Discuss][i386] Platform SMIs and their interferance with tsc based delay calibration
Issue: Current tsc based delay_calibration can result in significant errors in loops_per_jiffy count when the platform events like SMIs (System Management Interrupts that are non-maskable) are present. This could lead to potential kernel panic(). This issue is becoming more visible with 2.6 kernel (as default HZ is 1000) and on platforms with higher SMI handling latencies. During the boot time, SMIs are mostly used by BIOS (for things like legacy keyboard emulation). Description: The psuedocode for current delay calibration with tsc based delay looks like (0) Estimate a value for loops_per_jiffy (1) While (loops_per_jiffy estimate is accurate enough) (2) wait for jiffy transition (jiffy1) (3) Note down current tsc (tsc1) (4) loop until tsc becomes tsc1 + loops_per_jiffy (5) check whether jiffy changed since jiffy1 or not and refine loops_per_jiffy estimate Consider the following cases Case 1: If SMIs happen between (2) and (3) above, we can end up with a loops_per_jiffy value that is too low. This results in shorted delays and kernel can panic () during boot (Mostly at IOAPIC timer initialization timer_irq_works() as we don't have enough timer interrupts in a specified interval). Case 2: If SMIs happen between (3) and (4) above, then we can end up with a loops_per_jiffy value that is too high. And with current i386 code, too high lpj value (greater than 17M) can result in a overflow in delay.c:__const_udelay() again resulting in shorter delay and panic(). Solution: The patch below makes the calibration routine aware of asynchronous events like SMIs. We increase the delay calibration time and also identify any significant errors (greater than 12.5%) in the calibration and notify it to user. Like to know your comments on this. Thanks, Venki Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> --- linux-2.6.10/./arch/i386/kernel/timers/timer_tsc.c.org 2005-01-05 16:06:52.0 -0800 +++ linux-2.6.10/./arch/i386/kernel/timers/timer_tsc.c 2005-01-19 12:38:20.0 -0800 @@ -552,6 +552,7 @@ static struct timer_opts timer_tsc = { .get_offset = get_offset_tsc, .monotonic_clock = monotonic_clock_tsc, .delay = delay_tsc, + .calibrate_delay = calibrate_delay_tsc, }; struct init_timer_opts __initdata timer_tsc_init = { --- linux-2.6.10/./arch/i386/kernel/timers/common.c.org 2005-01-11 17:51:28.0 -0800 +++ linux-2.6.10/./arch/i386/kernel/timers/common.c 2005-01-19 12:38:20.0 -0800 @@ -158,3 +158,49 @@ void __init init_cpu_khz(void) } } } + +unsigned long calibrate_delay_tsc(void) +{ + unsigned long pre_start, start, post_start; + unsigned long pre_end, end, post_end; + unsigned long start_jiffies; + unsigned long tsc_rate_min, tsc_rate_max; + + if (!cpu_has_tsc) + return 0; + +#define DELAY_CALIBRATION_TICKS((HZ < 100) ? 1 : (HZ/100)) + pre_start = 0; + rdtscl(start); + start_jiffies = jiffies; + while (jiffies <= (start_jiffies + 1)) { + pre_start = start; + rdtscl(start); + } + rdtscl(post_start); + pre_end = 0; + end = post_start; + while (jiffies <= (start_jiffies + 1 + DELAY_CALIBRATION_TICKS)) { + pre_end = end; + rdtscl(end); + } + rdtscl(post_end); + + tsc_rate_max = (post_end - pre_start) / DELAY_CALIBRATION_TICKS; + tsc_rate_min = (pre_end - post_start) / DELAY_CALIBRATION_TICKS; + + /* +* If the upper limit and lower limit of the tsc_rate is more than +* 12.5% apart. +*/ + if (pre_start == 0 || pre_end == 0 || + (tsc_rate_max - tsc_rate_min) > (tsc_rate_max >> 3)) { + printk(KERN_WARNING "TSC calibration may not be precise. " + "Too many SMIs? " + "Consider running with \"lpj=\" boot option\n"); + return 0; + } + + return tsc_rate_max; +} + --- linux-2.6.10/./arch/i386/kernel/timers/timer_hpet.c.org 2005-01-11 17:52:31.0 -0800 +++ linux-2.6.10/./arch/i386/kernel/timers/timer_hpet.c 2005-01-19 12:38:20.0 -0800 @@ -183,6 +183,7 @@ static struct timer_opts timer_hpet = { .get_offset = get_offset_hpet, .monotonic_clock = monotonic_clock_hpet, .delay =delay_hpet, + .calibrate_delay = calibrate_delay_tsc, }; struct init_timer_opts __initdata timer_hpet_init = { --- linux-2.6.10/./arch/i386/kernel/timers/timer_pm.c.org 2005-01-11 17:55:55.0 -0800 +++ linux-2.6.10/./arch/i386/kernel/timers/timer_pm.c 2005-01-19 12:38:20.0 -0800 @@ -246,6 +246,7 @@ static struct timer_opts timer_pmtmr = { .get_offset = get_offset_pmtmr, .monotonic_clock= monotonic_clock_pmtmr, .delay = delay_pmtmr, +
Re: Patch 4/6 randomize the stack pointer
On Thu, 27 Jan 2005, John Richard Moser wrote: Arjan van de Ven wrote: Is this one any worse? yes. oracle, db2 and similar like to mmap 2Gb or more *in one chunk*. Special case? Absolutely, but ... Can I get this put into perspective? How much more important is "Good" randomization versus "not breaking Oracle," which becomes "No randomization" 1) quite a lot of Linux users do use Oracle, DB2 or do scientific calculations - distributions cannot afford to break those applications, the default has to work for everybody 2) "weaker" randomization (2MB) is still effective if the stack is non-executable, so the "load a bunch of NOPs" approach won't work - this is what Fedora and RHEL use 3) it is not as theoretically strong as what you propose, but having the "weaker" scheme enabled is definitely more secure than having the "stronger" scheme disabled because it breaks applications -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Restrict procfs permissions
Hi all, this patch adds a umask option to the proc filesystem. It can be used to restrict the permission of users to view each others process information. E.g. on a multi-user shell server one could use a setting of umask=077 to allow all users to view info about their own processes, only. It should prevent "command line snooping" and generally increases privacy on the server. Top and ps can cope with such restrictions, they simply are quiet about files they cannot access. The umask option affects permissions of the numerical directories in /proc, only (the process info). And root can see everything, of course, even with a umask setting of 0777. Default umask is 0, i.e. unchanged permissions. The patch is inspired by the /proc restriction parts of the GrSecurity patch. The main difference is the ability to configure the restrictions dynamically. You can change the umask setting by running # mount -o remount,umask=007 /proc Testing has been *very* light so far -- it compiles and boots. Patch is against 2.6.11-rc2-bk6. Comments are very welcome. Thanks, Rene diff -rup linux-2.6.11-rc2-bk6/fs/proc/base.c l/fs/proc/base.c --- linux-2.6.11-rc2-bk6/fs/proc/base.c 2005-01-28 23:42:44.0 + +++ l/fs/proc/base.c2005-01-28 23:58:38.0 + @@ -1222,7 +1222,7 @@ static struct dentry *proc_pident_lookup goto out; ei = PROC_I(inode); - inode->i_mode = p->mode; + inode->i_mode = p->mode & ~proc_umask; /* * Yes, it does not scale. And it should not. Don't add * new entries into /proc// without very good reasons. @@ -1537,7 +1537,7 @@ struct dentry *proc_pid_lookup(struct in put_task_struct(task); goto out; } - inode->i_mode = S_IFDIR|S_IRUGO|S_IXUGO; + inode->i_mode = (S_IFDIR|S_IRUGO|S_IXUGO) & ~proc_umask; inode->i_op = _tgid_base_inode_operations; inode->i_fop = _tgid_base_operations; inode->i_nlink = 3; @@ -1592,7 +1592,7 @@ static struct dentry *proc_task_lookup(s if (!inode) goto out_drop_task; - inode->i_mode = S_IFDIR|S_IRUGO|S_IXUGO; + inode->i_mode = (S_IFDIR|S_IRUGO|S_IXUGO) & ~proc_umask; inode->i_op = _tid_base_inode_operations; inode->i_fop = _tid_base_operations; inode->i_nlink = 3; diff -rup linux-2.6.11-rc2-bk6/fs/proc/inode.c l/fs/proc/inode.c --- linux-2.6.11-rc2-bk6/fs/proc/inode.c2005-01-28 23:42:44.0 + +++ l/fs/proc/inode.c 2005-01-28 23:56:11.0 + @@ -22,6 +22,8 @@ extern void free_proc_entry(struct proc_dir_entry *); +umode_t proc_umask = 0; + static inline struct proc_dir_entry * de_get(struct proc_dir_entry *de) { if (de) @@ -127,9 +129,14 @@ int __init proc_init_inodecache(void) return 0; } +static int parse_options(char *, uid_t *, gid_t *); static int proc_remount(struct super_block *sb, int *flags, char *data) { + uid_t dummy_uid; + gid_t dummy_gid; + *flags |= MS_NODIRATIME; + parse_options(data, _uid, _gid); return 0; } @@ -144,12 +151,13 @@ static struct super_operations proc_sops }; enum { - Opt_uid, Opt_gid, Opt_err + Opt_uid, Opt_gid, Opt_umask, Opt_err }; static match_table_t tokens = { {Opt_uid, "uid=%u"}, {Opt_gid, "gid=%u"}, + {Opt_umask, "umask=%o"}, {Opt_err, NULL} }; @@ -181,6 +189,11 @@ static int parse_options(char *options,u return 0; *gid = option; break; + case Opt_umask: + if (match_octal(args, )) + return 0; + proc_umask = option; + break; default: return 0; } diff -rup linux-2.6.11-rc2-bk6/fs/proc/internal.h l/fs/proc/internal.h --- linux-2.6.11-rc2-bk6/fs/proc/internal.h 2005-01-28 23:42:44.0 + +++ l/fs/proc/internal.h2005-01-28 23:58:29.0 + @@ -16,6 +16,8 @@ struct vmalloc_info { unsigned long largest_chunk; }; +extern umode_t proc_umask; + #ifdef CONFIG_MMU #define VMALLOC_TOTAL (VMALLOC_END - VMALLOC_START) extern void get_vmalloc_info(struct vmalloc_info *vmi); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] add driver matching priorities
On Friday 28 January 2005 19:11, Al Viro wrote: > On Fri, Jan 28, 2005 at 06:23:26PM -0500, Dmitry Torokhov wrote: > > On Friday 28 JanuarDy 2005 17:30, Adam Belay wrote: > > > Of course this patch is not going to be effective alone. We also need > > > to change the init order. If a driver is registered early but isn't the > > > best available, it will be bound to the device prematurely. This would > > > be a problem for carbus (yenta) bridges. > > > > > > I think we may have to load all in kernel drivers first, and then begin > > > matching them to hardware. Do you agree? If so, I'd be happy to make a > > > patch for that too. > > > > > > > I disagree. The driver core should automatically unbind generic driver > > from a device when native driver gets loaded. I think the only change is > > that we can no longer skip devices that are bound to a driver and match > > them all over again when a new driver is loaded. > > And what happens if we've already got the object busy? > Mark it as dead and release structures when holder lets it go. With hotplug pretty much everywhere more and more systems can handle it. Plus one could argue that if an object needs a special driver to function properly it will unlikely be busy before native driver is loaded. Also, one still can do what Adam offers by pre-loading native drivers in cases whent is required but still support more flexible default scheme. -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.10 USB devices generate descriptor read error?
Known one - It's non fatal. All your devices should work fine. If you want you can try loading usbcore.ko with module parameter old_scheme_first=y and see if it goes away. Parag Jeff Wiegley wrote: Is anybody else having a similar problem as the following... My USB keydrives use to work fine in 2.6.9. Since I upgraded to 2.6.10 now they just generate a device descriptor read error. Specifically: /var/log/kern.log.0:Jan 26 18:18:18 mail kernel: usb 4-2.1: device descriptor read/64, error -32 Also I noticed that a new Sigmatel based USB IRDA device also produces similar messages... /var/log/kern.log:Jan 27 12:31:19 mail kernel: usb 2-2: device descriptor read/64, error -71 Is this a known problem or is it just me? I noticed that the precompiled debian 2.6.10 kernel works with at least the usb flash drive ok. But my compiled version produces the above. But I don't think I changed any relevant kernel config items from 2.6.9 to 2.6.10 and I've compiled lots of USB enabled kernels before so I'd like to think I'm not an idiot but maybe I missed a new option or something. Please help, - Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.10 USB devices generate descriptor read error?
Is anybody else having a similar problem as the following... My USB keydrives use to work fine in 2.6.9. Since I upgraded to 2.6.10 now they just generate a device descriptor read error. Specifically: /var/log/kern.log.0:Jan 26 18:18:18 mail kernel: usb 4-2.1: device descriptor read/64, error -32 Also I noticed that a new Sigmatel based USB IRDA device also produces similar messages... /var/log/kern.log:Jan 27 12:31:19 mail kernel: usb 2-2: device descriptor read/64, error -71 Is this a known problem or is it just me? I noticed that the precompiled debian 2.6.10 kernel works with at least the usb flash drive ok. But my compiled version produces the above. But I don't think I changed any relevant kernel config items from 2.6.9 to 2.6.10 and I've compiled lots of USB enabled kernels before so I'd like to think I'm not an idiot but maybe I missed a new option or something. Please help, - Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why does the kernel need a gig of VM?
On Fri, Jan 28, 2005 at 03:06:15PM -0500, John Richard Moser wrote: > Can someone give me a layout of what exactly is up there? I got the > basic idea > > K 4G > A 3G > A 2G > A 1G > > App has 3G, kernel has 1G at the top of VM on x86 (dunno about x86_64). > > So what's the layout of that top 1G? What's it all used for? Is there > some obscene restriction of 1G of shared memory or something that gets > mapped up there? By default, the bottom 1G of physical memory is mapped into the 1G of KVA. (If you have less than 1G, it's all mapped.) Thus, the TLB remains valid across the user/kernel switch, which makes system calls much faster. The 4G/4G patches (google for the lwn.net overview) change this, introducing a TLB flush on every syscall. Better for some things because you get more VA space, worse for most things because it's slower. (But it's "lots better for a few" versus "a little worse for everybody", so the tradeoff is often worthwhile.) [1] So the answer to your question is, "What's up there? Memory. All of it." (Until you get to highmem.) [1] The 4G/4G patch's *primary* goal is to increase the amount of KVA available to allow more "struct page" entries without exhausting lowmem. Trying to manage 32GB or 64GB of physical memory with only 896MB of lowmem is very difficult. It has the additional advantage of allowing userland to mmap almost 4GB of stuff (as compared to almost 3GB without 4G/4G) which can be a nice win for database-type apps. -andy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH, 2.4] fix an oops in ata_to_sense_error
Martins Krikis wrote: Jeff, This fixes an occasional oops in the libata-scsi code. will apply, thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH, 2.4] fix an oops in ata_to_sense_error
Martins Krikis wrote: Jeff, This fixes an occasional oops in the libata-scsi code. Martins Krikis --- linux-2.4.29/drivers/scsi/libata-scsi.c 2005-01-28 12:07:56.0 -0500 +++ linux-2.4.29-iswraid/drivers/scsi/libata-scsi.c 2005-01-28 12:14:43.0 -0500 BTW, don't forget your signed-off-by line when submitting emails... http://linux.yyz.us/patch-format.html Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 2.4] ata_piix on ich6r in RAID mode
Martins Krikis wrote: Without this patch, if the BIOS of an ICH6R box has IDE set to "RAID" mode then ata_piix will not find any SATA disks because it incorrectly tries the legacy mode. With the patch all 4 SATA drives become visible. I don't think it would break any other vendor's SATA, but you can be the judge of that. If so, perhaps we can restrict the test some more by checking vendor/device IDs. --- linux-2.4.29/drivers/scsi/libata-core.c 2005-01-28 12:07:56.0 -0500 +++ linux-2.4.29-iswraid/drivers/scsi/libata-core.c 2005-01-28 12:14:43.0 -0500 @@ -3605,6 +3605,9 @@ int ata_pci_init_one (struct pci_dev *pd legacy_mode = (1 << 3); } + if ((pdev->class >> 8) == PCI_CLASS_STORAGE_RAID) + legacy_mode = 0; + /* FIXME... */ if ((!legacy_mode) && (n_ports > 1)) { printk(KERN_ERR "ata: BUG: native mode, n_ports > 1\n"); hmm. Maybe "!= PCI_CLASS_STORAGE_IDE" instead? Overall, however, I am worried about your report of the driver's behavior based on that BIOS's configuration. The driver follows the PCI IDE standard (previously SFF 8038i), where a register indicates whether its in legacy or native mode. As it see it, either a) the driver logic for reading that register is wrong, or b) BIOS incorrectly configuring the device, or c) that register is only applicable for PCI_CLASS_STORAGE_IDE devices. Comments either way? Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Patch] invalidate range of pages after direct IO write
After a direct IO write only invalidate the pages that the write intersected. invalidate_inode_pages2_range(mapping, pgoff start, pgoff end) is added and called from generic_file_direct_IO(). This doesn't break some subtle agreement with some other part of the code, does it? While we're in there, invalidate_inode_pages2() was calling unmap_mapping_range() with the wrong convention in the single page case. It was providing the byte offset of the final page rather than the length of the hole being unmapped. This is also fixed. This was lightly tested with a 10k op fsx run with O_DIRECT on a 16MB file in ext3 on a junky old IDE drive. Totaling vmstat columns of blocks read and written during the runs shows that read traffic drops significantly. The run time seems to have gone down a little. Two runs before the patch gave the following user/real/sys times and total blocks in and out: 0m28.029s 0m20.093s 0m3.166s 16673 125107 0m27.949s 0m20.068s 0m3.227s 18426 126094 and after the patch: 0m26.775s 0m19.996s 0m3.060s 3505 124982 0m26.856s 0m19.935s 0m3.052s 3505 125279 Signed-off-by: Zach Brown <[EMAIL PROTECTED]> --- include/linux/fs.h |2 ++ mm/filemap.c |5 - mm/truncate.c | 52 ++-- 3 files changed, 44 insertions(+), 15 deletions(-) Index: 2.6-mm-odirinv/include/linux/fs.h === --- 2.6-mm-odirinv.orig/include/linux/fs.h 2005-01-28 14:14:19.0 -0800 +++ 2.6-mm-odirinv/include/linux/fs.h 2005-01-28 14:14:35.0 -0800 @@ -1369,6 +1369,8 @@ invalidate_inode_pages(inode->i_mapping); } extern int invalidate_inode_pages2(struct address_space *mapping); +extern int invalidate_inode_pages2_range(struct address_space *mapping, +pgoff_t start, pgoff_t end); extern int write_inode_now(struct inode *, int); extern int filemap_fdatawrite(struct address_space *); extern int filemap_flush(struct address_space *); Index: 2.6-mm-odirinv/mm/filemap.c === --- 2.6-mm-odirinv.orig/mm/filemap.c2005-01-28 13:32:06.0 -0800 +++ 2.6-mm-odirinv/mm/filemap.c 2005-01-28 14:21:04.0 -0800 @@ -2325,7 +2325,10 @@ retval = mapping->a_ops->direct_IO(rw, iocb, iov, offset, nr_segs); if (rw == WRITE && mapping->nrpages) { - int err = invalidate_inode_pages2(mapping); + pgoff_t end = (offset + iov_length(iov, nr_segs) - 1) + >> PAGE_CACHE_SHIFT; + int err = invalidate_inode_pages2_range(mapping, + offset >> PAGE_CACHE_SHIFT, end); if (err) retval = err; } Index: 2.6-mm-odirinv/mm/truncate.c === --- 2.6-mm-odirinv.orig/mm/truncate.c 2005-01-28 13:32:06.0 -0800 +++ 2.6-mm-odirinv/mm/truncate.c2005-01-28 17:03:09.783939857 -0800 @@ -99,7 +99,7 @@ } /** - * truncate_inode_pages - truncate range of pages specified by start and + * truncate_inode_pages_range - truncate range of pages specified by start and * end byte offsets * @mapping: mapping to truncate * @lstart: offset from which to truncate @@ -279,28 +279,38 @@ EXPORT_SYMBOL(invalidate_inode_pages); /** - * invalidate_inode_pages2 - remove all pages from an address_space + * invalidate_inode_pages2_range - remove range of pages from an address_space * @mapping - the address_space + * @start: the page offset 'from' which to invalidate + * @end: the page offset 'to' which to invalidate (inclusive) * * Any pages which are found to be mapped into pagetables are unmapped prior to * invalidation. * * Returns -EIO if any pages could not be invalidated. */ -int invalidate_inode_pages2(struct address_space *mapping) +int invalidate_inode_pages2_range(struct address_space *mapping, + pgoff_t start, pgoff_t end) { struct pagevec pvec; - pgoff_t next = 0; + pgoff_t next; int i; int ret = 0; - int did_full_unmap = 0; + int did_range_unmap = 0; pagevec_init(, 0); - while (!ret && pagevec_lookup(, mapping, next, PAGEVEC_SIZE)) { + next = start; + while (next <= end && + !ret && pagevec_lookup(, mapping, next, PAGEVEC_SIZE)) { for (i = 0; !ret && i < pagevec_count(); i++) { struct page *page = pvec.pages[i]; int was_dirty; + if (page->index > end) { + next = page->index; + break; + } +
[ANNOUNCE] "iswraid" (ICHxR ataraid sub-driver) for 2.4.29
Version 0.1.5 of the Intel Sofware RAID driver (iswraid) is now available for the 2.4 series kernels at http://prdownloads.sourceforge.net/iswraid/2.4.29-iswraid.patch.gz?download It is an ataraid "subdriver" but uses the SCSI subsystem to find the RAID member disks. It depends on the libata library, particularly on either the ata_piix or the ahci driver, that enable the Serial ATA capabilities in ICH5/ICH6/ICH7 chipsets. More information is available at the project's home page at http://iswraid.sourceforge.net/. Driver documentation is included in Documentation/iswraid.txt, which is part of the patch. The license is GPL. The changes WRT version 0.1.4.3 are the following: * Resource deallocation bug fixed for failed initializations. * Read IO resubmission to mirror bug fixed. * RAID1E (covers 4-disk RAID10) code added. * More aggressive marking disks as bad in metadata. * Claiming disks for RAID "feature" removed. * Option defaults now customizable from the build configuration. * iswraid_never_fail "feature" watered down into iswraid_resist_failing. * iswraid_halt_degraded now prevents degraded volumes from being registered. * Debug printouts more customizable. * Some code cleanup and optimization. * Documentation changes. Please consider this driver for inclusion in the 2.4 kernel tree. Martins Krikis Storage Components Division Intel Massachusetts P.S. I've CC-d directly to the potential reviewers suggested a few months ago by Marcelo. I'll appreciate any feedback you (and others) can provide. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: I need a hardware wizard... I have been beating my head on the wall..
Hi Paulo! Your patch generated the following: Jan 28 19:11:51 linux kernel: vsc_sata int status: 0083 Jan 28 19:11:51 linux last message repeated 19 times Jan 28 19:11:51 linux kernel: irq 7: nobody cared! Jan 28 19:11:51 linux kernel: [] __report_bad_irq+0x22/0x90 Jan 28 19:11:51 linux kernel: [] note_interrupt+0x58/0x90 Jan 28 19:11:51 linux kernel: [] __do_IRQ+0xd8/0xe0 . . . . Thanks for helping me... I hope this is useful info Dave Sims On Fri, 28 Jan 2005, Paulo Marques wrote: > David Sims wrote: > > On Thu, 27 Jan 2005, Jeff Garzik wrote: > >>David Sims wrote: > >> > >>>[...] > >>> You can insert the module in a running kernel and after barking as > >>>follows (once for each disk attached) it runs just fine. > >> > >>Basically nobody has ever had hardware to test sata_vsc with that > >>hardware. We should probably remove the PCI ID until an engineer can > >>fix it... > > > > Hi again, > > > > I am willing to make this hardware available to any engineer that wants > > to help me solve this problem and I will do whatever I can to make it > > an easy job... Please help me... > > Well, I don't consider myself a hardware wizard, but at least I'm an > engineer, so I decided to give it a go :) > > It seems that the driver is not acknowledging the interrupt from the > controller. It would be nice to know what kind of interrupt is > triggering this. > > Could you run the attached patch and show the output from dmesg? > > -- > Paulo Marques - www.grupopie.com > > All that is necessary for the triumph of evil is that good men do nothing. > Edmund Burke (1729 - 1797) > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
compat ioctl for submiting URB
Hi, The compatible ioctl is missing for submitting URB from 32 bit application on a x86_64 system. For people who need to refresh their mind, please read the big comment after do_usbdevfs_bulk in fs/compat_ioctl.c VMware is a big user of the usbdevfs, we translate guest USB IO to usbdevfs, by submitting URB. On the x86_64 system, we need those compatible ioctl for submitting URBs. For now we make a hack to submit it through the vmmon driver. But that is very ugly. I do want this problem get fixed in the linux kernel eventually. I have been toying with two different ways to solve it. It seems that it is unavoidable to get hands dirty in the usbdevfs internals. The first one is just educate the usbdevfs to know about the 32 bit URB ioctls. So it don't need to keep around a bounce buffer. The second idea is have a bounce buffer, and let the usbdevfs internals to know about his bounce buffer and free it when the async structure destroyed (except for reap). I attach a patch just implement the first approach. Any comment are welcome. Chris Index: linux-2.5/include/linux/compat_ioctl.h === --- linux-2.5.orig/include/linux/compat_ioctl.h 2005-01-26 17:23:57.0 -0800 +++ linux-2.5/include/linux/compat_ioctl.h 2005-01-28 16:35:14.0 -0800 @@ -692,6 +692,7 @@ COMPATIBLE_IOCTL(USBDEVFS_CONNECTINFO) COMPATIBLE_IOCTL(USBDEVFS_HUB_PORTINFO) COMPATIBLE_IOCTL(USBDEVFS_RESET) +COMPATIBLE_IOCTL(USBDEVFS_SUBMITURB32) COMPATIBLE_IOCTL(USBDEVFS_CLEAR_HALT) /* MTD */ COMPATIBLE_IOCTL(MEMGETINFO) Index: linux-2.5/include/linux/usbdevice_fs.h === --- linux-2.5.orig/include/linux/usbdevice_fs.h 2005-01-25 12:08:02.0 -0800 +++ linux-2.5/include/linux/usbdevice_fs.h 2005-01-28 16:35:14.0 -0800 @@ -32,6 +32,7 @@ #define _LINUX_USBDEVICE_FS_H #include +#include /* - */ @@ -123,6 +124,22 @@ char port [127];/* e.g. port 3 connects to device 27 */ }; +struct usbdevfs_urb32 { + unsigned char type; + unsigned char endpoint; + compat_int_t status; + compat_uint_t flags; + compat_caddr_t buffer; + compat_int_t buffer_length; + compat_int_t actual_length; + compat_int_t start_frame; + compat_int_t number_of_packets; + compat_int_t error_count; + compat_uint_t signr; + compat_caddr_t usercontext; /* unused */ + struct usbdevfs_iso_packet_desc iso_frame_desc[0]; +}; + #define USBDEVFS_CONTROL _IOWR('U', 0, struct usbdevfs_ctrltransfer) #define USBDEVFS_BULK _IOWR('U', 2, struct usbdevfs_bulktransfer) #define USBDEVFS_RESETEP _IOR('U', 3, unsigned int) @@ -130,6 +147,7 @@ #define USBDEVFS_SETCONFIGURATION _IOR('U', 5, unsigned int) #define USBDEVFS_GETDRIVER _IOW('U', 8, struct usbdevfs_getdriver) #define USBDEVFS_SUBMITURB _IOR('U', 10, struct usbdevfs_urb) +#define USBDEVFS_SUBMITURB32 _IOR('U', 10, struct usbdevfs_urb32) #define USBDEVFS_DISCARDURB_IO('U', 11) #define USBDEVFS_REAPURB _IOW('U', 12, void *) #define USBDEVFS_REAPURBNDELAY _IOW('U', 13, void *) @@ -143,5 +161,4 @@ #define USBDEVFS_CLEAR_HALT_IOR('U', 21, unsigned int) #define USBDEVFS_DISCONNECT_IO('U', 22) #define USBDEVFS_CONNECT _IO('U', 23) - #endif /* _LINUX_USBDEVICE_FS_H */ Index: linux-2.5/include/linux/usb.h === --- linux-2.5.orig/include/linux/usb.h 2005-01-25 12:07:54.0 -0800 +++ linux-2.5/include/linux/usb.h 2005-01-28 16:35:14.0 -0800 @@ -608,6 +608,7 @@ #define URB_NO_FSBR0x0020 /* UHCI-specific */ #define URB_ZERO_PACKET0x0040 /* Finish bulk OUTs with short packet */ #define URB_NO_INTERRUPT 0x0080 /* HINT: no non-error interrupt needed */ +#define URB_COMPAT 0x0100 /* compat mode */ struct usb_iso_packet_descriptor { unsigned int offset; Index: linux-2.5/fs/compat_ioctl.c === --- linux-2.5.orig/fs/compat_ioctl.c2005-01-25 12:08:12.0 -0800 +++ linux-2.5/fs/compat_ioctl.c 2005-01-28 16:35:14.0 -0800 @@ -2570,228 +2570,19 @@ return sys_ioctl(fd, USBDEVFS_BULK, (unsigned long)p); } -/* This needs more work before we can enable it. Unfortunately - * because of the fancy asynchronous way URB status/error is written - * back to userspace, we'll need to fiddle with USB devio internals - * and/or reimplement entirely the frontend of it ourselves. -DaveM - * - * The issue is: - * - * When an URB is submitted via usbdevicefs it is put onto an - * asynchronous queue. When the URB completes, it may be reaped - * via another ioctl. During this
[patch 1/1] fix syscallN() macro errno value checking for i386
From: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> Cc: David Howells <[EMAIL PROTECTED]> The errno values which are visible for userspace are actually in the range -1 - -129, not until -128 (): this value was added: #define EKEYREJECTED129 /* Key was rejected by service */ And this would break ucLibc (for what I heard). This is just a quick-fix, because putting a macro inside errno.h instead of having it copied in two places would be probably nicer. However, I've heard by D. Howells it wasn't accepted, so this is the solution for now. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]> --- linux-2.6.11-paolo/include/asm-i386/unistd.h |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff -puN include/asm-i386/unistd.h~fix-syscall-macro include/asm-i386/unistd.h --- linux-2.6.11/include/asm-i386/unistd.h~fix-syscall-macro2005-01-29 00:42:48.0 +0100 +++ linux-2.6.11-paolo/include/asm-i386/unistd.h2005-01-29 00:44:51.0 +0100 @@ -298,12 +298,12 @@ #define NR_syscalls 289 /* - * user-visible error numbers are in the range -1 - -128: see - * + * user-visible error numbers are in the range -1 - -129: see + * (currently it includes ) */ #define __syscall_return(type, res) \ do { \ - if ((unsigned long)(res) >= (unsigned long)(-(128 + 1))) { \ + if ((unsigned long)(res) >= (unsigned long)(-(129 + 1))) { \ errno = -(res); \ res = -1; \ } \ _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-r1 freezes dual 2.5 GHz PowerMac G5
The patch below works. Thanks. Maurice Volaski writes: > I am running Gentoo with a fresh 2.6.11-r1. I have all the kernel > debugging options turned on. Occasionally, I can get past the boot > process, but half the time it freezes somewhere along the way. If > not, I do get to boot, it doesn't take very long for it to freeze. Did 2.6.10 work Ok? Try the patch below, it fixes 2.6.11-rc1 boot lockups on both my Beige G3 (locks up in ADB driver) and my G4 eMac (locks up in radeonfb). --- linux-2.6.11-rc1/init/main.c.~1~2005-01-15 03:30:25.0 +0100 +++ linux-2.6.11-rc1/init/main.c2005-01-15 03:31:44.0 +0100 @@ -377,7 +377,7 @@ static void noinline rest_init(void) * Re-enable preemption but disable interrupts to make sure * we dont get preempted until we schedule() in cpu_idle(). */ - local_irq_disable(); +// local_irq_disable(); preempt_enable_no_resched(); unlock_kernel(); cpu_idle(); -- Maurice Volaski, [EMAIL PROTECTED] Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch, 2.6.11-rc2] sched: RLIMIT_RT_CPU_RATIO feature
On Fri, 2005-01-28 at 10:11 +0100, Ingo Molnar wrote: > * Jack O'Quin <[EMAIL PROTECTED]> wrote: > > > > thus after a couple of years we'd end up with lots of desktop apps > > > running as SCHED_FIFO, and latency would go down the drain again. > > > > I wonder how Mac OS X and Windows deal with this priority escalation > > problem? Is it real or only theoretical? > > no idea. Anyone with MacOSX/Windows application writing experience? :-| > Here's the description from Apple. (from http://developer.apple.com/documentation/Darwin/Conceptual/KernelProgramming/scheduler/chapter_8_section_4.html): However, according Stéphane Letz who ported JACK to OSX, this does NOT describe the reality of the current implementation - it's not a real deadline scheduler. "period" and "constraint" are ignored, RT tasks are scheduled round robin, and the scheduler just uses "computation" as the timeslice. If an RT task repeatedly uses its entire timeslice without blocking, the scheduler can demote the task to SCHED_NORMAL. Audio apps do not normally set these parameters directly, the CoreAudio backend handles it. (quoting Stéphane Letz) > For example in CoreAudio, the computation value is directly related > to the audio buffer size in the following way: > > buffer size computation > > 64 frames 500 us > 128 300 us > >= 256100 us > > The idea is that threads with smaller buffer size will get a larger > computation slice so that there is a chance they can complete their > jobs. Threads with larger buffer size are more interruptible. The > CoreMidi thread (to handle incoming Midi events) also has a > computation value of 500 us. > Other RT threads like Firewire and various system threads computation > value are also carefully chosen. (This was from a private mail thread, that lead to Con's SCHED_ISO patches, if all the participants agree I will post a link to the full thread because it answers many questions that are sure to come up on LKML) So this system *requires* an app to tell the kernel in advance what its RT constraints are, then revokes isochronous scheduling privileges if the task lied. This would require a new API. Furthermore I suspect that these "System" threads aren't subject to having their RT privileges revoked, and that the GUI gets special treatment, etc. The upshot is while the OSX system works in that environment, it's largely due to Apple controlling the kernel and a lot of userspace. OSX is useful as a model of what a good API for soft realtime support in a desktop OS would look like. But we are a general purpose OS so we certainly need a more general solution. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PNP and bus association
Adam Belay wrote: Hi Pierre, The platform bus does not show the actual physical relationship either. For x86, ACPI is typically needed to determine this. It would be easy to bind to spawn pnp devices off of an ISA bridge device, attached to the pci bus, but whether it's the actual physical parent would be very difficult to determine without firmware assistance. At the moment the pnp bus is only showing a logical bus relationship. If we were to use ACPI to aid in the generation of the physical device tree, we could put these devices in the correct physical location. So it is correct behaviour that the device shows up under /sys/bus/pnp when found using PNP, and /sys/bus/platform when scanned for? I'm trying to get it to work well with HAL and it would be nice if it could be found in a consistent way. Rgds Pierre - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.10 ACPI on dell inspiron 8100
I noticed something strange with ACPI and the battery: /proc/acpi/battery/BAT1$ cat info present: yes design capacity: 57420 mWh last full capacity: 57420 mWh battery technology: rechargeable design voltage: 14800 mV design capacity warning: 3000 mWh design capacity low: 1000 mWh capacity granularity 1: 200 mWh capacity granularity 2: 200 mWh model number:LIP8084DLP serial number: 20495 battery type:LION OEM info:Sony Corp. /proc/acpi/battery/BAT1$ cat state present: yes capacity state: ok charging state: charging present rate:unknown remaining capacity: 59040 mWh present voltage: 16716 mV /proc/acpi/battery/BAT1$ Is my laptop messed up or is ACPI not seeing proper values? How can I have 59040 remaining capacity when it the full capacity is 57420? Also the system didn't display the charging light so I know it's not charging. I yanked the battery and I saw this: /proc/acpi/battery/BAT1$ cat state present: yes capacity state: ok charging state: charged present rate:unknown remaining capacity: 0 mWh present voltage: 0 mV /proc/acpi/battery/BAT1$ Under BAT0, info and state show one line, present: no -- Lab tests show that use of micro$oft causes cancer in lab animals - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
driver model: fix u32 vs. pm_message_t in OSS
Hi! This fixes u32 vs. pm_message_t in OSS. [I tried to go through alsa developers, but Takashi told me they do not have control over sound/oss.] No real code changes, please apply, Pavel (all bugs are mine :-). From: Bernard Blackham <[EMAIL PROTECTED]> Signed-off-by: Pavel Machek <[EMAIL PROTECTED]> --- clean/sound/oss/ali5455.c 2005-01-22 02:48:45.0 +0100 +++ linux/sound/oss/ali5455.c 2005-01-28 19:18:10.0 +0100 @@ -3528,7 +3528,7 @@ } #ifdef CONFIG_PM -static int ali_pm_suspend(struct pci_dev *dev, u32 pm_state) +static int ali_pm_suspend(struct pci_dev *dev, pm_message_t pm_state) { struct ali_card *card = pci_get_drvdata(dev); struct ali_state *state; --- clean/sound/oss/cs4281/cs4281_wrapper-24.c 2005-01-22 02:47:48.0 +0100 +++ linux/sound/oss/cs4281/cs4281_wrapper-24.c 2005-01-28 19:18:10.0 +0100 @@ -27,7 +27,7 @@ #include static int cs4281_resume_null(struct pci_dev *pcidev) { return 0; } -static int cs4281_suspend_null(struct pci_dev *pcidev, u32 state) { return 0; } +static int cs4281_suspend_null(struct pci_dev *pcidev, pm_message_t state) { return 0; } #define free_dmabuf(state, dmabuf) \ pci_free_consistent(state->pcidev, \ --- clean/sound/oss/cs46xx.c2005-01-22 02:49:21.0 +0100 +++ linux/sound/oss/cs46xx.c2005-01-28 19:18:10.0 +0100 @@ -388,7 +388,7 @@ static int cs46xx_powerup(struct cs_card *card, unsigned int type); static int cs461x_powerdown(struct cs_card *card, unsigned int type, int suspendflag); static void cs461x_clear_serial_FIFOs(struct cs_card *card, int type); -static int cs46xx_suspend_tbl(struct pci_dev *pcidev, u32 state); +static int cs46xx_suspend_tbl(struct pci_dev *pcidev, pm_message_t state); static int cs46xx_resume_tbl(struct pci_dev *pcidev); #ifndef CS46XX_ACPI_SUPPORT @@ -5774,7 +5774,7 @@ #endif #if CS46XX_ACPI_SUPPORT -static int cs46xx_suspend_tbl(struct pci_dev *pcidev, u32 state) +static int cs46xx_suspend_tbl(struct pci_dev *pcidev, pm_message_t state) { struct cs_card *s = PCI_GET_DRIVER_DATA(pcidev); CS_DBGOUT(CS_PM | CS_FUNCTION, 2, --- clean/sound/oss/cs46xxpm-24.h 2005-01-22 02:48:58.0 +0100 +++ linux/sound/oss/cs46xxpm-24.h 2005-01-28 19:18:10.0 +0100 @@ -36,7 +36,7 @@ * for now (12/22/00) only enable the pm_register PM support. * allow these table entries to be null. */ -static int cs46xx_suspend_tbl(struct pci_dev *pcidev, u32 state); +static int cs46xx_suspend_tbl(struct pci_dev *pcidev, pm_message_t state); static int cs46xx_resume_tbl(struct pci_dev *pcidev); #define cs_pm_register(a, b, c) NULL #define cs_pm_unregister_all(a) --- clean/sound/oss/esssolo1.c 2005-01-22 02:47:15.0 +0100 +++ linux/sound/oss/esssolo1.c 2005-01-28 19:18:10.0 +0100 @@ -2257,7 +2257,7 @@ } static int -solo1_suspend(struct pci_dev *pci_dev, u32 state) { +solo1_suspend(struct pci_dev *pci_dev, pm_message_t state) { struct solo1_state *s = (struct solo1_state*)pci_get_drvdata(pci_dev); if (!s) return 1; --- clean/sound/oss/i810_audio.c2005-01-22 02:48:35.0 +0100 +++ linux/sound/oss/i810_audio.c2005-01-28 19:18:10.0 +0100 @@ -3457,7 +3457,7 @@ } #ifdef CONFIG_PM -static int i810_pm_suspend(struct pci_dev *dev, u32 pm_state) +static int i810_pm_suspend(struct pci_dev *dev, pm_message_t pm_state) { struct i810_card *card = pci_get_drvdata(dev); struct i810_state *state; --- clean/sound/oss/maestro3.c 2005-01-22 02:48:48.0 +0100 +++ linux/sound/oss/maestro3.c 2005-01-28 19:18:10.0 +0100 @@ -375,7 +375,7 @@ * I'm not very good at laying out functions in a file :) */ static int m3_notifier(struct notifier_block *nb, unsigned long event, void *buf); -static int m3_suspend(struct pci_dev *pci_dev, u32 state); +static int m3_suspend(struct pci_dev *pci_dev, pm_message_t state); static void check_suspend(struct m3_card *card); static struct notifier_block m3_reboot_nb = { @@ -2777,12 +2777,12 @@ for(card = devs; card != NULL; card = card->next) { if(!card->in_suspend) -m3_suspend(card->pcidev, 3); /* XXX legal? */ +m3_suspend(card->pcidev, PMSG_SUSPEND); /* XXX legal? */ } return 0; } -static int m3_suspend(struct pci_dev *pci_dev, u32 state) +static int m3_suspend(struct pci_dev *pci_dev, pm_message_t state) { unsigned long flags; int i; --- clean/sound/oss/trident.c 2005-01-22 02:48:35.0 +0100 +++ linux/sound/oss/trident.c 2005-01-28 19:18:10.0 +0100 @@ -487,7 +487,7 @@ static struct trident_channel *ali_alloc_pcm_channel(struct trident_card *card); static void ali_restore_regs(struct trident_card *card); static void ali_save_regs(struct trident_card *card); -static int trident_suspend(struct pci_dev *dev, u32 unused); +static
Re: [RFC][PATCH] add driver matching priorities
On Fri, Jan 28, 2005 at 06:23:26PM -0500, Dmitry Torokhov wrote: > On Friday 28 January 2005 17:30, Adam Belay wrote: > > Of course this patch is not going to be effective alone. We also need > > to change the init order. If a driver is registered early but isn't the > > best available, it will be bound to the device prematurely. This would > > be a problem for carbus (yenta) bridges. > > > > I think we may have to load all in kernel drivers first, and then begin > > matching them to hardware. Do you agree? If so, I'd be happy to make a > > patch for that too. > > > > I disagree. The driver core should automatically unbind generic driver > from a device when native driver gets loaded. I think the only change is > that we can no longer skip devices that are bound to a driver and match > them all over again when a new driver is loaded. And what happens if we've already got the object busy? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] add driver matching priorities
On Fri, 2005-01-28 at 18:51 -0500, Dmitry Torokhov wrote: > If generic driver binds to a device that is has no idea how to drive > _at all_ then I will argue that the generic driver is broken. It should > not bind to begin with. > In the case of pci bridges, sometimes we can't really tell if we can drive the hardware entirely. It's a classcode match. Generic drivers may support a portion of hardware in a limited fashion. It's not that they have no idea what they're doing with the hardware. It's more a matter of not always doing the best or most complete thing. For some hardware this may work fine. Because we don't support generic drivers in the current driver model, we haven't had a chance to see how well they would work, or where they could be used. Also, consider this. If the pci bridge driver binds to yenta, it will (in theory, it also might explode) enumerate all of the cardbus devices. If then later, it is discovered that there is a better driver for the bridge, all of the bridge's children will have to be torn down. Thier drivers will be released, and the devices removed. This might increase the odds of something going wrong. Thanks, Adam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/10] UML - compile fixes
On Friday 28 January 2005 23:10, Andrew Morton wrote: > Blaisorblade <[EMAIL PROTECTED]> wrote: > > On Monday 17 January 2005 08:27, Andrew Morton wrote: > > > Jeff Dike <[EMAIL PROTECTED]> wrote: > > > > This fixes some warnings, and changes the system call table so that > > > > it will compile in -linus, where the vperf system calls are not yet > > > > merged. > > > > > > methinks we already fixed this. > > > > > > > Signed-off-by: Jeff Dike <[EMAIL PROTECTED]> > > > > No, incorrect, this is not applied, current bitkeeper snapshots don't > > compile for this reason too. > > > > Jeff, I think you should resend the patch anyway. > > I don't know what this is about. Yes, it was from some days ago... so I guess either I or Jeff will have to resend it... Andrew, when do you plan to release 2.6.11? Jeff, you should send your queued fixes, and also resend this one (indeed, it was not applied). If I find the time I'll select the interesting ones and send them (with a mail to request their prompt merge). > The only UML patch I have pending is > > uml-kconfig_arch-little-cleanup-to-merge-before-2611.patch Ok, please merge it ASAP (as the title suggests). > From: [EMAIL PROTECTED] -- Paolo Giarrusso, aka Blaisorblade Linux registered user n. 292729 http://www.user-mode-linux.org/~blaisorblade - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 4081] New: OpenOffice crashes while starting due to a threading error
On Fri, 28 Jan 2005 18:46:13 -0500 Parag Warudkar <[EMAIL PROTECTED]> wrote: > Lee Revell wrote: > > > > > > >>munmap(0x955838, 8192) = -1 EINVAL (Invalid argument) > >>munmap(0x80d7ff0, 3221222108) = -1 EINVAL (Invalid argument) > >>--- SIGSEGV (Segmentation fault) @ 0 (0) --- > >> > >> > > > >No, it really looks like OO tried to munmap() something incorrectly. > >3,221,222,108 bytes at offset 0x80d7ff0? > > > >Lee > > > > > > > May be that's another OO.o bug which gets triggered by failure to open > /dev/dri? Actually Stephen had OO working fine with earlier kernels, > where possibly /dev/dri/* permissions were appropriate and it was able > to open it - With new kernel the permissions seem to be improper which > is confirmed by strace -- > > open("/dev/dri/card0", O_RDWR) = -1 EACCES (Permission denied) > > Should be filed as a bug with OO.org - it shouldnt segfault due to DRI > permissions.. > > Parag Note: on 2.6.10 /dev/dri/card0 crw-rw-rw- on 2.6.11-rc2 /dev/dri/card0 crw-rw /dev/dri/card1 crw-rw Changing permissions seems to fix (it for startup), will try more and see if udev remembers not to turn them back. -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] add driver matching priorities
On Friday 28 January 2005 18:33, Adam Belay wrote: > On Fri, 2005-01-28 at 18:23 -0500, Dmitry Torokhov wrote: > > On Friday 28 January 2005 17:30, Adam Belay wrote: > > > Of course this patch is not going to be effective alone. We also need > > > to change the init order. If a driver is registered early but isn't the > > > best available, it will be bound to the device prematurely. This would > > > be a problem for carbus (yenta) bridges. > > > > > > I think we may have to load all in kernel drivers first, and then begin > > > matching them to hardware. Do you agree? If so, I'd be happy to make a > > > patch for that too. > > > > > > > I disagree. The driver core should automatically unbind generic driver > > from a device when native driver gets loaded. I think the only change is > > that we can no longer skip devices that are bound to a driver and match > > them all over again when a new driver is loaded. > > > > That's another option. My concern is that if a generic driver pokes > around with hardware, it may fail to initialize properly when the actual > driver is loaded. There are other problems too. If the system were to > be suspended while the generic driver was loaded, the restore_state code > may be incorrect, also rendering the device unusable. > If generic driver binds to a device that is has no idea how to drive _at all_ then I will argue that the generic driver is broken. It should not bind to begin with. -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] add driver matching priorities
On Fri, 2005-01-28 at 18:23 -0500, Dmitry Torokhov wrote: > On Friday 28 January 2005 17:30, Adam Belay wrote: > > Of course this patch is not going to be effective alone. We also need > > to change the init order. If a driver is registered early but isn't the > > best available, it will be bound to the device prematurely. This would > > be a problem for carbus (yenta) bridges. > > > > I think we may have to load all in kernel drivers first, and then begin > > matching them to hardware. Do you agree? If so, I'd be happy to make a > > patch for that too. > > > > I disagree. The driver core should automatically unbind generic driver > from a device when native driver gets loaded. I think the only change is > that we can no longer skip devices that are bound to a driver and match > them all over again when a new driver is loaded. > That's another option. My concern is that if a generic driver pokes around with hardware, it may fail to initialize properly when the actual driver is loaded. There are other problems too. If the system were to be suspended while the generic driver was loaded, the restore_state code may be incorrect, also rendering the device unusable. I'd like to leave the option of unloading generic driver open. I just think we need to be aware of potential problems it might cause, before deciding to go that direction. Thanks, Adam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15)
On Fri, Jan 28, 2005 at 08:45:46PM +0100, Ingo Molnar wrote: > * Trond Myklebust <[EMAIL PROTECTED]> wrote: > > If you do have a highest interrupt case that causes all activity to > > block, then rwsems may indeed fit the bill. > > > > In the NFS client code we may use rwsems in order to protect stateful > > operations against the (very infrequently used) server reboot recovery > > code. The point is that when the server reboots, the server forces us > > to block *all* requests that involve adding new state (e.g. opening an > > NFSv4 file, or setting up a lock) while our client and others are > > re-establishing their existing state on the server. > > it seems the most scalable solution for this would be a global flag plus > per-CPU spinlocks (or per-CPU mutexes) to make this totally scalable and > still support the requirements of this rare event. An rwsem really > bounces around on SMP, and it seems very unnecessary in the case you > described. > > possibly this could be formalised as an rwlock/rwlock implementation > that scales better. brlocks were such an attempt. >From how I understand it, you'll have to have a global structure to denote an exclusive operation and then take some additional cpumask_t representing the spinlocks set and use it to iterate over when doing a PI chain operation. Locking of each individual parametric typed spinlock might require a raw_spinlock manipulate lists structures, which, added up, is rather heavy weight. No only that, you'd have to introduce a notion of it being counted since it could also be aquired/preempted by another higher priority thread on that same procesor. Not having this semantic would make the thread in that specific circumstance effectively non-preemptable (PI scheduler indeterminancy), where the mulipule readers portion of a real read/write (shared-exclusve) lock would have permitted this. http://people.lynuxworks.com/~bhuey/rt-share-exclusive-lock/rtsem.tgz.1208 Is our attempt at getting real shared-exclusive lock semantics in a blocking lock and may still be incomplete and buggy. Igor is still working on this and this is the latest that I have of his work. Getting comments on this approach would be a good thing as I/we (me/Igor) believed from the start that this approach is correct. Assuming that this is possible with the current approach, optimizing it to avoid CPU ping-ponging is an important next step bill - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] idle thread preemption fix
* Olaf Hering <[EMAIL PROTECTED]> wrote: > Whats the purpose of local_irq_disable() here? Locks up my toys in > atkbd_init or IP hash foo functions. fix already posted a couple of days ago, see: -- * Benjamin Herrenschmidt <[EMAIL PROTECTED]> wrote: > Hi Ingo ! > > Could you explain me precisely what is the race you are fixing by > adding local_irq_disable() to rest_init() ? it can be bad for the idle task to hold the BKL and to have preemption enabled - in such a situation the scheduler will get confused if an interrupt triggers a forced preemption in that small window. But it's not necessary to keep IRQs disabled after the BKL has been dropped. In fact i think IRQ-disabling doesnt have to be done at all, the patch below ought to solve this scenario equally well, and should solve the PPC side-effects too. Tested ontop of 2.6.11-rc2 on x86 PREEMPT+SMP and PREEMPT+!SMP (which IIRC were the config variants that triggered the original problem), on an SMP and on a UP system. Ingo Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- linux/init/main.c.orig +++ linux/init/main.c @@ -373,14 +373,9 @@ static void noinline rest_init(void) { kernel_thread(init, NULL, CLONE_FS | CLONE_SIGHAND); numa_default_policy(); - /* -* Re-enable preemption but disable interrupts to make sure -* we dont get preempted until we schedule() in cpu_idle(). -*/ - local_irq_disable(); - preempt_enable_no_resched(); unlock_kernel(); - cpu_idle(); + preempt_enable_no_resched(); + cpu_idle(); } /* Check for early params. */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] add driver matching priorities
On Friday 28 January 2005 17:30, Adam Belay wrote: > Of course this patch is not going to be effective alone. ÂWe also need > to change the init order. ÂIf a driver is registered early but isn't the > best available, it will be bound to the device prematurely. ÂThis would > be a problem for carbus (yenta) bridges. > > I think we may have to load all in kernel drivers first, and then begin > matching them to hardware. ÂDo you agree? ÂIf so, I'd be happy to make a > patch for that too. > I disagree. The driver core should automatically unbind generic driver from a device when native driver gets loaded. I think the only change is that we can no longer skip devices that are bound to a driver and match them all over again when a new driver is loaded. -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] document atkbd.softraw
Document atkbd.softraw (and shorten a few long lines nearby). diff -uprN -X /linux/dontdiff a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt --- a/Documentation/kernel-parameters.txt 2004-12-29 03:39:42.0 +0100 +++ b/Documentation/kernel-parameters.txt 2005-01-29 00:21:07.0 +0100 @@ -222,15 +222,19 @@ running once the system is up. atascsi=[HW,SCSI] Atari SCSI - atkbd.extra=[HW] Enable extra LEDs and keys on IBM RapidAccess, EzKey - and similar keyboards + atkbd.extra=[HW] Enable extra LEDs and keys on IBM RapidAccess, + EzKey and similar keyboards atkbd.reset=[HW] Reset keyboard during initialization atkbd.set= [HW] Select keyboard code set Format: (2 = AT (default) 3 = PS/2) - atkbd.scroll= [HW] Enable scroll wheel on MS Office and similar keyboards + atkbd.scroll= [HW] Enable scroll wheel on MS Office and similar + keyboards + + atkbd.softraw= [HW] Choose between synthetic and real raw mode + Format: (0 = real, 1 = synthetic (default)) atkbd.softrepeat= [HW] Use software keyboard repeat - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 4081] New: OpenOffice crashes while starting due to a threading error
On Fri, 2005-01-28 at 09:31 -0800, Stephen Hemminger wrote: > Here is the strace output of the part that SEGV's, looks like a DRI issue?? [snip] > munmap(0x955838, 8192) = -1 EINVAL (Invalid argument) > munmap(0x80d7ff0, 3221222108) = -1 EINVAL (Invalid argument) > --- SIGSEGV (Segmentation fault) @ 0 (0) --- No, it really looks like OO tried to munmap() something incorrectly. 3,221,222,108 bytes at offset 0x80d7ff0? Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bug 4081] New: OpenOffice crashes while starting due to a threading error
Stephen Hemminger wrote: Here is the strace output of the part that SEGV's, looks like a DRI issue?? Yep.. If you haven't already, just change the permissions on /dev/dri/card0 to give access to your user id and it should be fine. (Reporter of this bug had to do the same in order to get it working) Something in the kernel changes as far as DRI goes - Dont know what. And I know why I wasn't affected - NVIDIA driver which doesnt use /dev/dri/*. Though Trever seems to be having an entirely different problem - one oddity being the continuos -EINTRs that his OO.o gets on startup. Trever - If your problem isn't solved yet - Can you run gdb /path/to/ooffice from commandline and then when it segfaults, post the backtrace? Parag - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
PROBLEM: SysV semaphore race vs SIGSTOP
There seem to be a race when SIGSTOP-ing a process waiting for a SysV semaphore. Even if it could not possibly have owned the semaphore when the signal was sent (because the sender of the signal owned it at the time), it still occasionally happens that it both stops execution *and* acquires the semaphore, with a deadlocked application as the result. This is a problem for some of the high-performance stuff I'm working on. A sample test program exhibiting the problem is available at http://www.ping.uio.no/~ovehk/sembug.c For me, it will show "ACQUIRE FAILED!! DEADLOCK!!" almost every time I run it. Occasionally it will run fine; if it does for you, just try again a couple of times. The kernel I currently use is: Linux version 2.4.27-1-k7 ([EMAIL PROTECTED]) (gcc version 3.3.5 (Debian 1:3.3.5-2)) #1 Wed Dec 1 20:12:01 JST 2004 and I run it on a uniprocessor system (AMD Athlon, 1.9GHz) with Debian "sid" installed. I'm not a kernel hacker, but from a quick peruse of the 2.4 code, it didn't seem to me like the semaphore code in the kernel (ipc/sem.c) even try to handle suspended threads (though I wouldn't know how to do so). The 2.6 semaphore code looked almost the same to me, too, so it might be a problem there as well. Please Cc me on any questions or comments, since I am too wimpy to subscribe yet. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: panic in raid1_end_write_request
Thanks Mark, On Fri, Jan 28, 2005 at 04:34:01PM -0600, Mark Rustad wrote: > I used to get these running SuSE SLES 9 and also with a variety of > kernel.org kernels. The crash was triggered by a media error on a > RAID1. Were there any media errors logged? My system does not log any such errors. >A patch that I got from SuSE fixed it for me. The patch is below > your message excerpt. That looks like the "bio clone memory corruption" patch which is supposed to be in 2.6.10-1.747_FC3smp via 2.6.10-ac10 being included in that kernel. I was hoping that would solve my problem as well, but it didn't. -- Norman Gaywood, Systems Administrator School of Mathematics, Statistics and Computer Science University of New England, Armidale, NSW 2351, Australia [EMAIL PROTECTED]Phone: +61 (0)2 6773 2412 http://turing.une.edu.au/~normFax: +61 (0)2 6773 3312 Please avoid sending me Word or PowerPoint attachments. See http://www.fsf.org/philosophy/no-word-attachments.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PNP and bus association
Hi Pierre, The platform bus does not show the actual physical relationship either. For x86, ACPI is typically needed to determine this. It would be easy to bind to spawn pnp devices off of an ISA bridge device, attached to the pci bus, but whether it's the actual physical parent would be very difficult to determine without firmware assistance. At the moment the pnp bus is only showing a logical bus relationship. If we were to use ACPI to aid in the generation of the physical device tree, we could put these devices in the correct physical location. Thanks, Adam On Thu, Jan 27, 2005 at 10:16:50PM +0100, Pierre Ossman wrote: > I recently tried out adding PNP support to my driver to remove the > hassle of finding the correct parameters for it. This, however, causes > it to show up under the pnp bus, where as it previously was located > under the platform bus. > > Is the idea that PNP devices should only reside on the PNP bus or is > there some magic available to get the device to appear on several buses? > It's a bit of a hassle to search in two different places in sysfs > depending on if PNP is used or not. > > Also, the PNP bus doesn't really say that much about where the device is > physically connected. The other bus types usually give a hint about this. It's normal for ISA devices to not tell us much about their physical properties. > > Rgds > Pierre > - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: multiple neighbour cache tables for AF_INET
In article <[EMAIL PROTECTED]> (at Sat, 29 Jan 2005 09:19:49 +1100), Herbert Xu <[EMAIL PROTECTED]> says: > IMHO you need to give the user a way to specify which table they want > to operate on. If they don't specify one, then the current behaviour > of choosing the first table found is reasonble. We have dev. Isn't is sufficient? --yoshfuji - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] idle thread preemption fix
On Sat, Jan 08, Linux Kernel Mailing List wrote: > ChangeSet 1.2316, 2005/01/08 13:53:41-08:00, [EMAIL PROTECTED] > > [PATCH] idle thread preemption fix > > The early bootup stage is pretty fragile because the idle thread is not > yet > functioning as such and so we need preemption disabled. Whether the > bootup > fails or not seems to depend on timing details so e.g. the presence of > SCHED_SMT makes it go away. > > Disabling preemption explicitly has another advantage: the atomicity > check > in schedule() will catch early-bootup schedule() calls from now on. > > The patch also fixes another preempt-bkl buglet: interrupt-driven > forced-preemption didnt go through preempt_schedule() so it resulted in > auto-dropping of the BKL. Now we go through preempt_schedule() which > properly deals with the BKL. > > Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> > Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]> > diff -Nru a/init/main.c b/init/main.c > --- a/init/main.c 2005-01-08 15:18:18 -08:00 > +++ b/init/main.c 2005-01-08 15:18:18 -08:00 > @@ -373,6 +373,12 @@ > { > kernel_thread(init, NULL, CLONE_FS | CLONE_SIGHAND); > numa_default_policy(); > + /* > + * Re-enable preemption but disable interrupts to make sure > + * we dont get preempted until we schedule() in cpu_idle(). > + */ > + local_irq_disable(); > + preempt_enable_no_resched(); > unlock_kernel(); > cpu_idle(); > } Whats the purpose of local_irq_disable() here? Locks up my toys in atkbd_init or IP hash foo functions. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH] add driver matching priorities
Hi, This patch adds initial support for driver matching priorities to the driver model. It is needed for my work on converting the pci bridge driver to use "struct device_driver". It may also be helpful for driver with more complex (or long id lists as I've seen in many cases) matching criteria. "match" has been added to "struct device_driver". There are now two steps in the matching process. The first step is a bus specific filter that determines possible driver candidates. The second step is a driver specific match function that verifies if the driver will work with the hardware, and returns a priority code (how well it is able to handle the device). The bus layer could override the driver's match function if necessary (similar to how it passes *probe through it's layer and then on to the actual driver). The current priorities are as follows: enum { MATCH_PRIORITY_FAILURE = 0, MATCH_PRIORITY_GENERIC, MATCH_PRIORITY_NORMAL, MATCH_PRIORITY_VENDOR, }; let me know if any of this would need to be changed. For example, the "struct bus_type" match function could return a priority code. Of course this patch is not going to be effective alone. We also need to change the init order. If a driver is registered early but isn't the best available, it will be bound to the device prematurely. This would be a problem for carbus (yenta) bridges. I think we may have to load all in kernel drivers first, and then begin matching them to hardware. Do you agree? If so, I'd be happy to make a patch for that too. Thanks, Adam --- a/drivers/base/bus.c2005-01-20 17:37:46.0 -0500 +++ b/drivers/base/bus.c2005-01-28 16:59:00.0 -0500 @@ -286,6 +286,9 @@ if (drv->bus->match && !drv->bus->match(dev, drv)) return -ENODEV; + if (drv->match && !drv->match(dev)) + return -ENODEV; + dev->driver = drv; if (drv->probe) { int error = drv->probe(dev); @@ -299,6 +302,42 @@ return 0; } +/** + * driver_probe_device_priority - attempt to bind device & driver with a + *given match level priority + * @drv: driver. + * @dev: device. + * @priority the match level priority + */ + +static int driver_probe_device_priority(struct device_driver * drv, + struct device * dev, int priority) +{ + int matchp; + + if (drv->bus->match && !drv->bus->match(dev, drv)) + return -ENODEV; + + if (drv->match) { + matchp = drv->match(dev); + } else + matchp = MATCH_PRIORITY_NORMAL; + + if (matchp != priority) + return -ENODEV; + + dev->driver = drv; + if (drv->probe) { + int error = drv->probe(dev); + if (error) { + dev->driver = NULL; + return error; + } + } + + device_bind_driver(dev); + return 0; +} /** * device_attach - try to attach device to a driver. @@ -312,17 +351,20 @@ { struct bus_type * bus = dev->bus; struct list_head * entry; - int error; + int error, matchp = MATCH_PRIORITY_VENDOR; if (dev->driver) { device_bind_driver(dev); return 1; } - if (bus->match) { + if (!bus->match) + return 0; + + while (matchp > 0) { list_for_each(entry, >drivers.list) { struct device_driver * drv = to_drv(entry); - error = driver_probe_device(drv, dev); + error = driver_probe_device_priority(drv, dev, matchp); if (!error) /* success, driver matched */ return 1; @@ -332,6 +374,7 @@ "%s: probe of %s failed with error %d\n", drv->name, dev->bus_id, error); } + matchp--; } return 0; --- a/include/linux/device.h2005-01-20 17:37:26.0 -0500 +++ b/include/linux/device.h2005-01-28 16:40:22.0 -0500 @@ -41,6 +41,13 @@ RESUME_ENABLE, }; +enum { + MATCH_PRIORITY_FAILURE = 0, + MATCH_PRIORITY_GENERIC, + MATCH_PRIORITY_NORMAL, + MATCH_PRIORITY_VENDOR, +}; + struct device; struct device_driver; struct class; @@ -108,6 +115,7 @@ struct module * owner; + int (*match)(struct device * dev); int (*probe)(struct device * dev); int (*remove) (struct device * dev); void(*shutdown) (struct device * dev); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info
[WATCHDOG] 2.6.11-rc2 watchdog patches
Hi Andrew, please do a bk pull http://linux-watchdog.bkbits.net/linux-2.6-watchdog-mm This will update the following files: drivers/char/watchdog/i8xx_tco.c| 34 +++--- drivers/char/watchdog/ixp2000_wdt.c |2 +- drivers/char/watchdog/ixp4xx_wdt.c |2 +- drivers/char/watchdog/sa1100_wdt.c |2 +- drivers/char/watchdog/scx200_wdt.c |2 +- 5 files changed, 31 insertions(+), 11 deletions(-) through these ChangeSets: <[EMAIL PROTECTED]> (05/01/28 1.1984) [WATCHDOG] i8xx_tco.c-ICH4/6/7-patch Added support for the ICH4-M, ICH6, ICH6R, ICH6-M, ICH6W and ICH6RW chipsets. Also added support for the "undocumented" ICH7. <[EMAIL PROTECTED]> (05/01/28 1.1985) [WATCHDOG] correct sysfs name for watchdog devices While looking for possible candidates for our udev.rules package, I found a few odd ->name properties. /dev/watchdog has minor 130 according to devices.txt. Since all watchdog drivers use the misc_register() call, they will end up in /sys/class/misc/$foo. udev may create the /dev/watchdog node if the driver is loaded. I dont have such a device, so I cant test it. The drivers below provide names with spaces and even with / in it. Not a big deal, but apps may expect /dev/watchdog. Signed-off-by: Olaf Hering <[EMAIL PROTECTED]> Signed-off-by: Wim Van Sebroeck <[EMAIL PROTECTED]> The ChangeSets can also be looked at on: http://linux-watchdog.bkbits.net:8080/linux-2.6-watchdog-mm For completeness, I added the patches below. Greetings, Wim. diff -Nru a/drivers/char/watchdog/i8xx_tco.c b/drivers/char/watchdog/i8xx_tco.c --- a/drivers/char/watchdog/i8xx_tco.c 2005-01-28 23:29:58 +01:00 +++ b/drivers/char/watchdog/i8xx_tco.c 2005-01-28 23:29:58 +01:00 @@ -1,5 +1,5 @@ /* - * i8xx_tco 0.06: TCO timer driver for i8xx chipsets + * i8xx_tco 0.07: TCO timer driver for i8xx chipsets * * (c) Copyright 2000 kernel concepts <[EMAIL PROTECTED]>, All Rights Reserved. * http://www.kernelconcepts.de @@ -22,11 +22,22 @@ * * The TCO timer is implemented in the following I/O controller hubs: * (See the intel documentation on http://developer.intel.com.) - * 82801AA & 82801AB chip : document number 290655-003, 290677-004, - * 82801BA & 82801BAM chip : document number 290687-002, 298242-005, - * 82801CA & 82801CAM chip : document number 290716-001, 290718-001, - * 82801DB & 82801E chip : document number 290744-001, 273599-001, - * 82801EB & 82801ER chip : document number 252516-001 + * 82801AA (ICH): document number 290655-003, 290677-014, + * 82801AB (ICHO) : document number 290655-003, 290677-014, + * 82801BA (ICH2) : document number 290687-002, 298242-027, + * 82801BAM (ICH2-M) : document number 290687-002, 298242-027, + * 82801CA (ICH3-S) : document number 290733-003, 290739-013, + * 82801CAM (ICH3-M) : document number 290716-001, 290718-007, + * 82801DB (ICH4) : document number 290744-001, 290745-020, + * 82801DBM (ICH4-M) : document number 252337-001, 252663-005, + * 82801E (C-ICH) : document number 273599-001, 273645-002, + * 82801EB (ICH5) : document number 252516-001, 252517-003, + * 82801ER (ICH5R) : document number 252516-001, 252517-003, + * 82801FB (ICH6) : document number 301473-002, 301474-007, + * 82801FR (ICH6R) : document number 301473-002, 301474-007, + * 82801FBM (ICH6-M) : document number 301473-002, 301474-007, + * 82801FW (ICH6W) : document number 301473-001, 301474-007, + * 82801FRW (ICH6RW) : document number 301473-001, 301474-007 * * 2710 Nils Faerber * Initial Version 0.01 @@ -49,6 +60,9 @@ * 20030921 Wim Van Sebroeck <[EMAIL PROTECTED]> * 0.06 change i810_margin to heartbeat, use module_param, * added notify system support, renamed module to i8xx_tco. + * 20050128 Wim Van Sebroeck <[EMAIL PROTECTED]> + * 0.07 Added support for the ICH4-M, ICH6, ICH6R, ICH6-M, ICH6W and ICH6RW + * chipsets. Also added support for the "undocumented" ICH7 chipset. */ /* @@ -73,7 +87,7 @@ #include "i8xx_tco.h" /* Module and version information */ -#define TCO_VERSION "0.06" +#define TCO_VERSION "0.07" #define TCO_MODULE_NAME "i8xx TCO timer" #define TCO_DRIVER_NAME TCO_MODULE_NAME ", v" TCO_VERSION #define PFX TCO_MODULE_NAME ": " @@ -360,8 +374,14 @@ { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82801CA_0, PCI_ANY_ID, PCI_ANY_ID, }, { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82801CA_12, PCI_ANY_ID, PCI_ANY_ID, }, { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82801DB_0, PCI_ANY_ID, PCI_ANY_ID, }, +
Re: panic in raid1_end_write_request
Norman, I used to get these running SuSE SLES 9 and also with a variety of kernel.org kernels. The crash was triggered by a media error on a RAID1. A patch that I got from SuSE fixed it for me. The patch is below your message excerpt. On Jan 28, 2005, at 3:23 PM, Norman Gaywood wrote: I have a Dell PE2650, Dual Xeon, 1G memory and several software raid1 partitions, ext3. Main duties include NFS, DHCP and samba. A Fedora kernel 2.6.10-1.747_FC3smp which includes 2.6.10-ac10. This system panics frequently, between several hours to several days. It does not seem to be related to load. Hardware and memory tests indicate a good system. Panic messages are similar to: Unable to handle kernel NULL pointer dereference at virtual address 0038 printing eip: f882940f *pde = 379c9001 Oops: [#1] Here is the patch: --- linux-2.6.5/fs/bio.c~ 2004-11-24 12:42:10.532343678 +0100 +++ linux-2.6.5/fs/bio.c2004-11-24 12:46:49.308021403 +0100 @@ -98,12 +98,7 @@ BIO_BUG_ON(pool_idx >= BIOVEC_NR_POOLS); - /* -* cloned bio doesn't own the veclist -*/ - if (!bio_flagged(bio, BIO_CLONED)) - mempool_free(bio->bi_io_vec, bp->pool); - + mempool_free(bio->bi_io_vec, bp->pool); mempool_free(bio, bio_pool); } @@ -212,7 +207,9 @@ */ inline void __bio_clone(struct bio *bio, struct bio *bio_src) { - bio->bi_io_vec = bio_src->bi_io_vec; + request_queue_t *q = bdev_get_queue(bio_src->bi_bdev); + + memcpy(bio->bi_io_vec, bio_src->bi_io_vec, bio_src->bi_max_vecs * sizeof(struct bio_vec)); bio->bi_sector = bio_src->bi_sector; bio->bi_bdev = bio_src->bi_bdev; @@ -224,21 +221,9 @@ * for the clone */ bio->bi_vcnt = bio_src->bi_vcnt; - bio->bi_idx = bio_src->bi_idx; - if (bio_flagged(bio, BIO_SEG_VALID)) { - bio->bi_phys_segments = bio_src->bi_phys_segments; - bio->bi_hw_segments = bio_src->bi_hw_segments; - bio->bi_flags |= (1 << BIO_SEG_VALID); - } bio->bi_size = bio_src->bi_size; - - /* -* cloned bio does not own the bio_vec, so users cannot fiddle with -* it. clear bi_max_vecs and clear the BIO_POOL_BITS to make this -* apparent -*/ - bio->bi_max_vecs = 0; - bio->bi_flags &= (BIO_POOL_MASK - 1); + bio_phys_segments(q, bio); + bio_hw_segments(q, bio); } /** @@ -250,7 +235,7 @@ */ struct bio *bio_clone(struct bio *bio, int gfp_mask) { - struct bio *b = bio_alloc(gfp_mask, 0); + struct bio *b = bio_alloc(gfp_mask, bio->bi_max_vecs); if (b) __bio_clone(b, bio); -- Mark Rustad, [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] shared subtrees
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Al Viro wrote: > OK, here comes the first draft of proposed semantics for subtree > sharing. What we want is being able to propagate events between > the parts of mount trees. Below is a description of what I think > might be a workable semantics; it does *NOT* describe the data > structures I would consider final and there are considerable > areas where we still need to figure out the right behaviour. > Okay, I'm not convinced that shared subtrees as proposed will work well with autofs. The idea discussed off-line was this: When you install an autofs mountpoint, on say /home, a daemon is started to service the requests. As far as the admin is concerned, an fs is mounted in the current namespace, call it namespaceA. The daemon actually runs in it's one private namespace: call it namespaceB. namespaceB receives a new autofs filesystem: call it autofsB. autofsB is in it's own p-node. namespaceA gets an autofsA on /home as well, and autofsA is 'owned' by autofsB's p-node. So: autofsB -> autofsB and autofsB -> autofsA Effectively, namespaceA has a private instance of autofsB in its tree. The problem is this: Assume /home/mikew is accessed in namespaceA. The daemon running in namespaceB gets the event, and mounts an nfs vfsmount on autofsB. This event is propagated back to autofsA. (Problem 1: how do you block access to /home/mikew in namespaceA?) Next, a CLONE_NS is done in namespaceA, creating namespaceA'. the homedir on /home/mikew is also copied. Now, in namespaceA', what happens when a user umount's /home/mikew? We haven't yet determined how to handle umount event propagation, but it appears likely that it will be *a hard thing to do*. Assuming the nfs umount succeeds, /home/mikew is accessed again in namespaceA'. (Problem 2: The daemon in namespaceB will see the event, but it already has something mounted on it's version of /home/mikew. How does it 'send' a mountpoint to namespaceB.) - --- Shared subtrees may help in some adminstrative situations, but don't look like the right solution for autofs. Autofs will work with namespaces if the following functionality is added to the kernel: The ability to perform mount(2) operations on a directory fd. This has been discussed before and quickly vetoed, citing that it is a security risk. I still fail to understand how allowing a mount to happen cross-namespace given a dirfd target is any worse than what is already possible given a dirfd. If you don't want someone to play with your namespace, don't give them a dirfd. Thoughts? - -- Mike Waychison Sun Microsystems, Inc. 1 (650) 352-5299 voice 1 (416) 202-8336 voice ~~ NOTICE: The opinions expressed in this email are held by me, and may not represent the views of Sun Microsystems, Inc. ~~ -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFB+r1OdQs4kOxk3/MRAmSpAJ96ix25fjze6o7viCq2DCET9J/AlQCfYlC1 CoLKusJXjL+fYxgwggOCW+w= =8bTv -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] relayfs redux, part 2
Tom Zanussi <[EMAIL PROTECTED]> wrote: > > This patch is the result of the latest round of liposuction on relayfs > - the patch size is now 44K, down from 110K and the 200K before that. > I'm posting it as a patch against 2.6.10 rather than -mm in order to > make it easier to review, but will create one for -mm once the changes > have settled down. Actually, I'll drop all the relayfs and ltt patches from -mm. They seem to have done their job ;) When things settle down and the code is ready for a new run, you know where I sit. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel oops!
On Fri, 2005-01-28 at 12:28 -0800, Linus Torvalds wrote: > I'm surprised that it makes _that_ much of a difference, but it sounds > like you used to be borderline on CPU usage before, and this just made it > much worse. it's musch worst, I had a load of 5 with 250 VPN connections, and now, I have a load of 200 with 150 connections -- ierdnah <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: multiple neighbour cache tables for AF_INET
Wilfried Weissmann <[EMAIL PROTECTED]> wrote: > > The kernels 2.4.28+ and 2.6.9+ with IPv4 and ATM-CLIP enabled have bugs in > the neighbour cache code. neigh_delete() and neigh_add() only work properly > if one cache table per address family exist. After ATM-CLIP installed a > second cache table for AF_INET, neigh_delete() and neigh_add() only examine > the first table (the ATM-CLIP table if IPv4 and ATM-CLIP are compiled into > the kernel). neigh_dump_info() is also affected if the neigh_dump_table() > call fails. Indeed, this has been the case for a very long time. IMHO you need to give the user a way to specify which table they want to operate on. If they don't specify one, then the current behaviour of choosing the first table found is reasonble. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch, 2.6.11-rc2] sched: RLIMIT_RT_CPU_RATIO feature
* Peter Williams <[EMAIL PROTECTED]> wrote: > I think part of the problem here is that by comparing each tasks limit > to the runqueue's usage rate (and to some extent using a relatively > short decay period) you're creating the need for the limits to be > quite large i.e. it has to be big enough to be bigger than the > combined usage rates of all the unprivileged real time tasks and also > to handle the short term usage rate peaks of the task. actually, at least for Jackd use, the current average worked out pretty well - setting the limit 5-10% above that of the reported average CPU use gave a result that was equivalent to unrestricted SCHED_FIFO results. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch, 2.6.11-rc2] sched: RLIMIT_RT_CPU_RATIO feature
Ingo Molnar wrote: * Jack O'Quin <[EMAIL PROTECTED]> wrote: i'm wondering, couldnt Jackd solve this whole issue completely in user-space, via a simple setuid-root wrapper app that does nothing else but validates whether the user is in the 'jackd' group and then keeps a pipe open to to the real jackd process which it forks off, deprivileges and exec()s? Then unprivileged jackd could request RT-priority changes via that pipe in a straightforward way. Jack normally gets installed as root/admin anyway, so it's not like this couldnt be done. Perhaps. Until recently, that didn't work because of the longstanding rlimits bug in mlockall(). For scheduling only, it might be possible. Of course, this violates your requirement that the user not be able to lock up the CPU for DoS. The jackd watchdog is not perfect. there is a legitimate fear that if it's made "too easy" to acquire some sort of SCHED_FIFO priority, that an "arm's race" would begin between desktop apps, each trying to set themselves to SCHED_FIFO (or SCHED_ISO) and advising users to 'raise the limit if they see delays' - just to get snappier than the rest. thus after a couple of years we'd end up with lots of desktop apps running as SCHED_FIFO, and latency would go down the drain again. (yeah, this feels like going back to the drawing board.) I think part of the problem here is that by comparing each tasks limit to the runqueue's usage rate (and to some extent using a relatively short decay period) you're creating the need for the limits to be quite large i.e. it has to be big enough to be bigger than the combined usage rates of all the unprivileged real time tasks and also to handle the short term usage rate peaks of the task. If the average usage rate is estimated over longer periods it will be lower allowing lower limits to be used. Also if the task's own usage rate estimates are used to test the limits then the limit can be lower. If the default limits can be made sufficiently small then the temptation to use this feature by "ordinary" applications will disappear. I'm not an expert but I imagine that the CPU usage rates of most RT tasks taken over reasonably long time intervals is quite low and therefore the default limits could also be quite low without adversely effecting the programs that this mechanism is meant to help. The sched_cpustats.[ch] files that are part of my SPA scheduler patches provide a cheap method of estimating per task usage rates. They estimate usage rates for a task over its recent scheduling cycles but could be modified to provide updates every tick for the currently active task for use with this mechanism. Peter -- Peter Williams [EMAIL PROTECTED] "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OpenBSD Networking-related randomization port
El vie, 28-01-2005 a las 21:47 +0100, Arjan van de Ven escribió: > as for obsd_get_random_long().. would it be possible to use the > get_random_int() function from the patches I posted the other day? They > use the existing random.c infrastructure instead of making a copy... As seen at http://www.kernel.org/pub/linux/kernel/people/arjan/execshield/00-randomize-A0 you can suppose that there's no point to use that, we can easily maintain the functions at obsd_rand.c so we wouldn't need to add more maintenance overhead, I hope you can understand why I want it like that and not depending on random.c in more than the function exports (which make it even more independent as we don't need to use our proper header and add each proper include entry in the modified files, as most of them use or have already random.h included). Attached you can find the new patch with the indentation fixes. The tests on the patch are the following ones: http://www.osdl.org/plm-cgi/plm?module=patch_info_id=4136 (above one shows that there are no SMP-related issues) http://khack.osdl.org/stp/300417 http://khack.osdl.org/stp/300420 Cheers and thanks for the information, -- Lorenzo Hernández García-Hierro <[EMAIL PROTECTED]> [1024D/6F2B2DEC] & [2048g/9AE91A22][http://tuxedo-es.org] diff -Nur linux-2.6.11-rc2/include/linux/random.h linux-2.6.11-rc2.tx1/include/linux/random.h --- linux-2.6.11-rc2/include/linux/random.h 2005-01-26 19:54:17.0 +0100 +++ linux-2.6.11-rc2.tx1/include/linux/random.h 2005-01-28 19:45:31.359923392 +0100 @@ -42,6 +42,12 @@ #ifdef __KERNEL__ +/* OpenBSD Networking-related randomization functions - [EMAIL PROTECTED] */ +extern unsigned long obsd_get_random_long(void); +extern __u16 ip_randomid(void); +extern __u32 ip_randomisn(void); + + extern void rand_initialize_irq(int irq); extern void add_input_randomness(unsigned int type, unsigned int code, diff -Nur linux-2.6.11-rc2/net/ipv4/tcp_ipv4.c linux-2.6.11-rc2.tx1/net/ipv4/tcp_ipv4.c --- linux-2.6.11-rc2/net/ipv4/tcp_ipv4.c 2005-01-26 19:54:19.0 +0100 +++ linux-2.6.11-rc2.tx1/net/ipv4/tcp_ipv4.c 2005-01-28 22:28:24.991105608 +0100 @@ -539,10 +539,7 @@ static inline __u32 tcp_v4_init_sequence(struct sock *sk, struct sk_buff *skb) { - return secure_tcp_sequence_number(skb->nh.iph->daddr, - skb->nh.iph->saddr, - skb->h.th->dest, - skb->h.th->source); + return ip_randomisn(); } /* called with local bh disabled */ @@ -834,13 +830,9 @@ tp->ext2_header_len = rt->u.dst.header_len; if (!tp->write_seq) - tp->write_seq = secure_tcp_sequence_number(inet->saddr, - inet->daddr, - inet->sport, - usin->sin_port); - - inet->id = tp->write_seq ^ jiffies; + tp->write_seq = ip_randomisn(); + inet->id = htons(ip_randomid()); err = tcp_connect(sk); rt = NULL; if (err) @@ -1566,20 +1555,20 @@ newsk->sk_dst_cache = dst; tcp_v4_setup_caps(newsk, dst); - newtp = tcp_sk(newsk); - newinet = inet_sk(newsk); - newinet->daddr = req->af.v4_req.rmt_addr; - newinet->rcv_saddr= req->af.v4_req.loc_addr; - newinet->saddr = req->af.v4_req.loc_addr; - newinet->opt = req->af.v4_req.opt; - req->af.v4_req.opt= NULL; - newinet->mc_index = tcp_v4_iif(skb); - newinet->mc_ttl = skb->nh.iph->ttl; + newtp = tcp_sk(newsk); + newinet = inet_sk(newsk); + newinet->daddr = req->af.v4_req.rmt_addr; + newinet->rcv_saddr = req->af.v4_req.loc_addr; + newinet->saddr = req->af.v4_req.loc_addr; + newinet->opt = req->af.v4_req.opt; + req->af.v4_req.opt = NULL; + newinet->mc_index = tcp_v4_iif(skb); + newinet->mc_ttl = skb->nh.iph->ttl; newtp->ext_header_len = 0; if (newinet->opt) newtp->ext_header_len = newinet->opt->optlen; newtp->ext2_header_len = dst->header_len; - newinet->id = newtp->write_seq ^ jiffies; + newinet->id = htons(ip_randomid()); tcp_sync_mss(newsk, dst_pmtu(dst)); newtp->advmss = dst_metric(dst, RTAX_ADVMSS); diff -Nur linux-2.6.11-rc2/net/Makefile linux-2.6.11-rc2.tx1/net/Makefile --- linux-2.6.11-rc2/net/Makefile 2005-01-26 19:50:49.0 +0100 +++ linux-2.6.11-rc2.tx1/net/Makefile 2005-01-28 21:01:21.870140688 +0100 @@ -11,6 +11,7 @@ tmp-$(CONFIG_COMPAT) := compat.o obj-$(CONFIG_NET) += $(tmp-y) +obj-y+= obsd_rand.o # LLC has to be linked before the files in net/802/ obj-$(CONFIG_LLC) += llc/ diff -Nur linux-2.6.11-rc2/net/obsd_rand.c linux-2.6.11-rc2.tx1/net/obsd_rand.c --- linux-2.6.11-rc2/net/obsd_rand.c 1970-01-01 01:00:00.0 +0100 +++ linux-2.6.11-rc2.tx1/net/obsd_rand.c 2005-01-28 17:43:50.0 +0100 @@ -0,0 +1,269 @@ +/* $Id: openbsd-netrand-2.6.11-rc2.patch,v 1.6 2005/01/28 22:10:30 lorenzo Exp $ + * Copyright (c) 2005 Lorenzo Hernandez Garcia-Hierro <[EMAIL PROTECTED]>. + * All rights reserved. + * + * Added some macros and stolen code from random.c, for individual and less + * "invasive" implementation.Also removed the get_random_long() macro definition, + * which is not good if we can
Re: Possible bug in keyboard.c (2.6.10)
On Fri, Jan 28, 2005 at 12:10:05PM +0100, Vojtech Pavlik wrote: > And, btw, raw mode in 2.6 is not badly broken. It works as it is > intended to. If you want the 2.4 behavior on x86, you just need to > specify "atkbd.softraw=0" on the kernel command line. Thanks for pointing that out - I should have read patch-2.6.9 more carefully. I'll add that to the setkeycodes.8 man page. Nevertheless I disagree a bit. "raw mode" is by definition the mode where scan codes are passed unmodified to user space. So before 2.6.9 this was just broken, and since 2.6.9 it is broken by default but there is a boot option to make it work. What is the reason that you do not make this the default? The current default is really messy and confusing, especially when people have to map keys using setkeycodes. Andries BTW, now that I read the corresponding code: if (atkbd_softrepeat) atkbd_softraw = 1; if (!atkbd_softrepeat) { atkbd->dev.rep[REP_DELAY] = 250; atkbd->dev.rep[REP_PERIOD] = 33; } else atkbd_softraw = 1; The "else" part is superfluous. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[WATCHDOG] 2.6.11-rc2 i8xx_tco.c-ICH4/6/7-patch
Hi Linus, Andrew, please do a bk pull http://linux-watchdog.bkbits.net/linux-2.6-watchdog This will update the following files: drivers/char/watchdog/i8xx_tco.c | 34 +++--- 1 files changed, 27 insertions(+), 7 deletions(-) through these ChangeSets: <[EMAIL PROTECTED]> (05/01/28 1.1984) [WATCHDOG] i8xx_tco.c-ICH4/6/7-patch Added support for the ICH4-M, ICH6, ICH6R, ICH6-M, ICH6W and ICH6RW chipsets. Also added support for the "undocumented" ICH7. The ChangeSets can also be looked at on: http://linux-watchdog.bkbits.net:8080/linux-2.6-watchdog For completeness, I added the patches below. Greetings, Wim. diff -Nru a/drivers/char/watchdog/i8xx_tco.c b/drivers/char/watchdog/i8xx_tco.c --- a/drivers/char/watchdog/i8xx_tco.c 2005-01-28 22:51:31 +01:00 +++ b/drivers/char/watchdog/i8xx_tco.c 2005-01-28 22:51:31 +01:00 @@ -1,5 +1,5 @@ /* - * i8xx_tco 0.06: TCO timer driver for i8xx chipsets + * i8xx_tco 0.07: TCO timer driver for i8xx chipsets * * (c) Copyright 2000 kernel concepts <[EMAIL PROTECTED]>, All Rights Reserved. * http://www.kernelconcepts.de @@ -22,11 +22,22 @@ * * The TCO timer is implemented in the following I/O controller hubs: * (See the intel documentation on http://developer.intel.com.) - * 82801AA & 82801AB chip : document number 290655-003, 290677-004, - * 82801BA & 82801BAM chip : document number 290687-002, 298242-005, - * 82801CA & 82801CAM chip : document number 290716-001, 290718-001, - * 82801DB & 82801E chip : document number 290744-001, 273599-001, - * 82801EB & 82801ER chip : document number 252516-001 + * 82801AA (ICH): document number 290655-003, 290677-014, + * 82801AB (ICHO) : document number 290655-003, 290677-014, + * 82801BA (ICH2) : document number 290687-002, 298242-027, + * 82801BAM (ICH2-M) : document number 290687-002, 298242-027, + * 82801CA (ICH3-S) : document number 290733-003, 290739-013, + * 82801CAM (ICH3-M) : document number 290716-001, 290718-007, + * 82801DB (ICH4) : document number 290744-001, 290745-020, + * 82801DBM (ICH4-M) : document number 252337-001, 252663-005, + * 82801E (C-ICH) : document number 273599-001, 273645-002, + * 82801EB (ICH5) : document number 252516-001, 252517-003, + * 82801ER (ICH5R) : document number 252516-001, 252517-003, + * 82801FB (ICH6) : document number 301473-002, 301474-007, + * 82801FR (ICH6R) : document number 301473-002, 301474-007, + * 82801FBM (ICH6-M) : document number 301473-002, 301474-007, + * 82801FW (ICH6W) : document number 301473-001, 301474-007, + * 82801FRW (ICH6RW) : document number 301473-001, 301474-007 * * 2710 Nils Faerber * Initial Version 0.01 @@ -49,6 +60,9 @@ * 20030921 Wim Van Sebroeck <[EMAIL PROTECTED]> * 0.06 change i810_margin to heartbeat, use module_param, * added notify system support, renamed module to i8xx_tco. + * 20050128 Wim Van Sebroeck <[EMAIL PROTECTED]> + * 0.07 Added support for the ICH4-M, ICH6, ICH6R, ICH6-M, ICH6W and ICH6RW + * chipsets. Also added support for the "undocumented" ICH7 chipset. */ /* @@ -73,7 +87,7 @@ #include "i8xx_tco.h" /* Module and version information */ -#define TCO_VERSION "0.06" +#define TCO_VERSION "0.07" #define TCO_MODULE_NAME "i8xx TCO timer" #define TCO_DRIVER_NAME TCO_MODULE_NAME ", v" TCO_VERSION #define PFX TCO_MODULE_NAME ": " @@ -360,8 +374,14 @@ { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82801CA_0, PCI_ANY_ID, PCI_ANY_ID, }, { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82801CA_12, PCI_ANY_ID, PCI_ANY_ID, }, { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82801DB_0, PCI_ANY_ID, PCI_ANY_ID, }, + { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82801DB_12, PCI_ANY_ID, PCI_ANY_ID, }, { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82801E_0,PCI_ANY_ID, PCI_ANY_ID, }, { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82801EB_0, PCI_ANY_ID, PCI_ANY_ID, }, + { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH6_0, PCI_ANY_ID, PCI_ANY_ID, }, + { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH6_1, PCI_ANY_ID, PCI_ANY_ID, }, + { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH6_2, PCI_ANY_ID, PCI_ANY_ID, }, + { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH7_0, PCI_ANY_ID, PCI_ANY_ID, }, + { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH7_1, PCI_ANY_ID, PCI_ANY_ID, }, { 0, }, /* End of list */ }; MODULE_DEVICE_TABLE (pci, i8xx_tco_pci_tbl); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a mess
Re: [2.6.11-rc2] kernel BUG at fs/reiserfs/prints.c:362
On Thu, 2005-01-27 at 17:15 +0300, Vladimir Saveliev wrote: > Earlier reiserfs used to lock_kernel on entering and unlock on exit. The > reason is that reiserfs has no fine grain locking protecting access to > its data structures. > Since that time there could be introduced some minor improvements, > though. No, reiser3 still does not have proper locking. It uses the BKL for everything. This will not be fixed as reiser3 is in maintenance mode. According to Hans "the fix is reiser4". This came up early in the voluntary preemption development process, we found reiser3 to be unusable for low latency audio due to the excessive BKL use disabling preemption all over the place. It would be interesting to test reiser3 with the preemptible BKL enabled. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OpenBSD Networking-related randomization port
On Fri, 28 Jan 2005 13:34:08 -0800 Stephen Hemminger <[EMAIL PROTECTED]> wrote: > per-cpu would be the way to go here. Does the sbox get somehow seeded from use to use? If not, then yes that's the thing to do. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/1] tpm: insert missing up mutex in an error path
This patch puts in the missing up call on the tpm_mutex on an error condition in the tpm_transmit function. Bug reported by Stefan Berger <[EMAIL PROTECTED]>. This patch also implements a new status function to handle future chip configurations which may generate status differntly. Thanks, Kylie Signed-off-by: Kylene Hall <[EMAIL PROTECTED]> --- diff -uprN linux-2.6.10/drivers/char/tpm/tpm_atmel.c linux-2.6.10-tpm/drivers/char/tpm/tpm_atmel.c --- linux-2.6.10/drivers/char/tpm/tpm_atmel.c 2005-01-18 16:42:17.0 -0600 +++ linux-2.6.10-tpm/drivers/char/tpm/tpm_atmel.c 2005-01-21 13:11:11.0 -0600 @@ -112,6 +112,11 @@ static void tpm_atml_cancel(struct tpm_c outb(ATML_STATUS_ABORT, chip->vendor->base + 1); } +static u8 tpm_atml_status(struct tpm_chip *chip) +{ + return inb( chip->vendor->base + 1); +} + static struct file_operations atmel_ops = { .owner = THIS_MODULE, .llseek = no_llseek, @@ -125,6 +130,7 @@ static struct tpm_vendor_specific tpm_at .recv = tpm_atml_recv, .send = tpm_atml_send, .cancel = tpm_atml_cancel, + .status = tpm_atml_status, .req_complete_mask = ATML_STATUS_BUSY | ATML_STATUS_DATA_AVAIL, .req_complete_val = ATML_STATUS_DATA_AVAIL, .base = TPM_ATML_BASE, diff -uprN linux-2.6.10/drivers/char/tpm/tpm.c linux-2.6.10-tpm/drivers/char/tpm/tpm.c --- linux-2.6.10/drivers/char/tpm/tpm.c 2005-01-21 12:53:26.0 -0600 +++ linux-2.6.10-tpm/drivers/char/tpm/tpm.c 2005-01-28 16:28:45.578493680 -0600 @@ -152,6 +151,7 @@ static ssize_t tpm_transmit(struct tpm_c if ((len = chip->vendor->send(chip, (u8 *) buf, count)) < 0) { dev_err(>pci_dev->dev, "tpm_transmit: tpm_send: error %d\n", len); + up(>tpm_mutex); return len; } @@ -165,7 +165,7 @@ static ssize_t tpm_transmit(struct tpm_c up(>timer_manipulation_mutex); do { - u8 status = inb(chip->vendor->base + 1); + u8 status = chip->vendor->status(chip); if ((status & chip->vendor->req_complete_mask) == chip->vendor->req_complete_val) { down(>timer_manipulation_mutex); diff -uprN linux-2.6.10/drivers/char/tpm/tpm.h linux-2.6.10-tpm/drivers/char/tpm/tpm.h --- linux-2.6.10/drivers/char/tpm/tpm.h 2005-01-18 16:42:17.0 -0600 +++ linux-2.6.10-tpm/drivers/char/tpm/tpm.h 2005-01-21 13:10:20.0 -0600 @@ -40,6 +40,7 @@ struct tpm_vendor_specific { int (*recv) (struct tpm_chip *, u8 *, size_t); int (*send) (struct tpm_chip *, u8 *, size_t); void (*cancel) (struct tpm_chip *); + u8 (*status) (struct tpm_chip *); struct miscdevice miscdev; }; diff -uprN linux-2.6.10/drivers/char/tpm/tpm_nsc.c linux-2.6.10-tpm/drivers/char/tpm/tpm_nsc.c --- linux-2.6.10/drivers/char/tpm/tpm_nsc.c 2005-01-18 16:42:17.0 -0600 +++ linux-2.6.10-tpm/drivers/char/tpm/tpm_nsc.c 2005-01-21 13:12:27.0 -0600 @@ -219,6 +219,12 @@ static void tpm_nsc_cancel(struct tpm_ch outb(NSC_COMMAND_CANCEL, chip->vendor->base + NSC_COMMAND); } + +static u8 tpm_nsc_status(struct tpm_chip *chip) +{ + return inb(chip->vendor->base + NSC_STATUS); +} + static struct file_operations nsc_ops = { .owner = THIS_MODULE, .llseek = no_llseek, @@ -232,6 +238,7 @@ static struct tpm_vendor_specific tpm_ns .recv = tpm_nsc_recv, .send = tpm_nsc_send, .cancel = tpm_nsc_cancel, + .status = tpm_nsc_status, .req_complete_mask = NSC_STATUS_OBF, .req_complete_val = NSC_STATUS_OBF, .base = TPM_NSC_BASE, - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why does the kernel need a gig of VM?
Am Freitag, 28. Januar 2005 21:42 schrieb Josh Boyer: > Because of various reasons. Normal kernel space virtual addresses > usually start at 0xc000, which is where the 3GiB userspace > restriction comes from. > > Then there is the vmalloc virtual address space, which usually starts at > a higher address than a normal kernel address. Along the same lines are > ioremap addresses, etc. > > Poke around in the header files. I bet you'll find lots of reasons. Probably, this some FAQ, but anyway. The kernel needs physical memory present and accessible all the time from all contexts. This is mapped into this area. All other RAM is called High Mem and needs to be specifically mapped before it can be used from kernel space. Regards Oliver - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: userspace vs. kernelspace address
Hi everbody, Thanks for your replies. Lemme explain my problem a little bit more I have a thread that does exactly similar things in kernel-mode and user-mode (depending on how you invoked it; of course, the kernel one is forked using kernel_thread(), and the user one is from pthread_create()). The architecture-dependant stuff is taken care of by extensive use of __KERNEL__ macro testing. This particular thread gets a packet of data, the header of which contains address to where it should be copying the payload associated with that packet. The kernel-mode thread will need to decide how to copy data into another process' address space, so will the user-mode thread. However I think my copy_to_user and copy_from_user are failing since the kernel-mode thread is copying data into another process's address space, and I am not sure how to do this. Do the get_fs() and set_fs() combinations let you do that? If not, then how do I do it? Something like when you invoke the ->write or ->read functions, you need to copy the requisite data into the buffer the application provided you with. Thanks and regards, Rock --- Jan Hudec <[EMAIL PROTECTED]> wrote: > On Fri, Jan 28, 2005 at 01:06:21 +0100, Bernd > Petrovitsch wrote: > > On Thu, 2005-01-27 at 09:14 -0800, Rock Gordon > wrote: > > > If I'm given a particular address, how do I test > > > whether that address is from userspace or from > kernel > > > space? > > > > You don't. > > > > > I need to make these decisions from either > inside a > > > kernel module or a userspace program. The idea > is I > > > use memcpy() in the user-user version, > > > copy_from/to_user in the kernel-kernel version, > and > > > prohibit the others. > > > > You need to know where the address is from and use > the correct function. > > If the interface is defined as taking userland > address, than kernel > function passing a kernel address in is responsible > for calling > set_fs(KERNEL_DS) before and undoing it after. That > way the > copy_to/from_user does not complain. > > --- >Jan 'Bulb' Hudec <[EMAIL > PROTECTED]> > > ATTACHMENT part 2 application/pgp-signature name=signature.asc __ Do you Yahoo!? Take Yahoo! Mail with you! Get it on your mobile phone. http://mobile.yahoo.com/maildemo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.11-rc2 ALSA
On Thu, 2005-01-27 at 08:46 +0100, Jaroslav Kysela wrote: > Fixed the default state of "Headphone Jack Sense" switch on AD1981x > codecs. Setting this on affects the output of some machines (e.g. > Thindpads). You probably meant "Thimkpads". Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OpenBSD Networking-related randomization port
On Fri, 28 Jan 2005 12:45:17 -0800 "David S. Miller" <[EMAIL PROTECTED]> wrote: > On Fri, 28 Jan 2005 21:34:52 +0100 > Lorenzo Hernández García-Hierro <[EMAIL PROTECTED]> wrote: > > > Attached the new patch following Arjan's recommendations. > > No SMP protection on the SBOX, better look into that. > The locking you'll likely need to add will make this > routine serialize many networking operations which is > one thing we've been trying to avoid. > per-cpu would be the way to go here. -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Fix compile errors with 2.6.11-rc2
Hi ! When compiling 2.6.11-rc2: ... CC kernel/stop_machine.o In file included from include/linux/sysdev.h:24, from include/linux/cpu.h:22, from include/linux/stop_machine.h:8, from kernel/stop_machine.c:1: include/linux/kobject.h: In function `to_kset': include/linux/kobject.h:116: warning: implicit declaration of function `container_of' include/linux/kobject.h:116: error: parse error before "struct" include/linux/kobject.h:117: warning: no return statement in function returning non-void include/linux/kobject.h: In function `subsys_get': include/linux/kobject.h:224: error: parse error before "struct" include/linux/kobject.h:225: warning: no return statement in function returning non-void make[1]: *** [kernel/stop_machine.o] Error 1 make: *** [kernel] Error 2 Attached patch fixes this. Thanks Manish Lachwani Signed-off-by: Manish Lachwani <[EMAIL PROTECTED]> Index: linux-2.6.11-rc2/include/linux/kobject.h === --- linux-2.6.11-rc2.orig/include/linux/kobject.h +++ linux-2.6.11-rc2/include/linux/kobject.h @@ -23,6 +23,7 @@ #include #include #include +#include #include #define KOBJ_NAME_LEN 20
panic in raid1_end_write_request
I have a Dell PE2650, Dual Xeon, 1G memory and several software raid1 partitions, ext3. Main duties include NFS, DHCP and samba. A Fedora kernel 2.6.10-1.747_FC3smp which includes 2.6.10-ac10. This system panics frequently, between several hours to several days. It does not seem to be related to load. Hardware and memory tests indicate a good system. Panic messages are similar to: Unable to handle kernel NULL pointer dereference at virtual address 0038 printing eip: f882940f *pde = 379c9001 Oops: [#1] SMP Modules linked in: iptable_filter ip_tables nfsd exportfs md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core nfs lockd sunrpc microcode dm_mod video button battery ac cfi_probe gen_probe scb2_flash mtdcore chipreg map_funcs tg3 floppy sg ext3 jbd raid1 aic7xxx sd_mod scsi_mod CPU:3 EIP:0060:[]Not tainted VLI EFLAGS: 00010246 (2.6.10-1.747_FC3smp) EIP is at raid1_end_write_request+0x8e/0xb2 [raid1] eax: ebx: f7dda400 ecx: f79e78a0 edx: esi: 0018 edi: f7dd6e00 ebp: f7dda400 esp: c03aef18 ds: 007b es: 007b ss: 0068 Process swapper (pid: 0, threadinfo=c03ae000 task=f7f5fa40) Stack: f7fbd100 1000 f8829381 c01564ce 1000 f7fbd100 c03aef60 c0217b6f f7bcca24 1000 f7bcca24 f7d4b33c f78f4080 0001 f88435ec 0001 e4d10b80 f7bcca24 f78f4080 Call Trace: [] raid1_end_write_request+0x0/0xb2 [raid1] [] bio_endio+0x50/0x55 [] __end_that_request_first+0xea/0x1ab [] scsi_end_request+0x1b/0x9d [scsi_mod] [] scsi_io_completion+0x206/0x40f [scsi_mod] [] __wake_up+0x29/0x3c [] scsi_finish_command+0xad/0xb1 [scsi_mod] [] scsi_softirq+0xb6/0xbe [scsi_mod] [] __do_softirq+0x4c/0xb1 [] do_softirq+0x41/0x48 === [] do_IRQ+0x74/0x7e [] common_interrupt+0x1a/0x20 [] default_idle+0x0/0x2f [] xfrm_sk_policy_lookup+0x2cd/0x355 [] default_idle+0x29/0x2f [] cpu_idle+0x26/0x3b Code: 53 08 89 44 0e 04 89 54 0e 08 f0 ff 0b 0f 94 c0 84 c0 74 0f 8b 43 14 e8 bf 5f a3 c7 89 d8 e8 15 fe ff ff 8b 47 04 8b 1f 8b 04 06 <8b> 48 38 f0 ff 48 48 0f 94 c2 84 d2 74 0d 85 c9 74 09 f0 0f ba <0>Kernel panic - not syncing: Fatal exception in interrupt -- Norman Gaywood, Systems Administrator School of Mathematics, Statistics and Computer Science University of New England, Armidale, NSW 2351, Australia [EMAIL PROTECTED]Phone: +61 (0)2 6773 2412 http://turing.une.edu.au/~normFax: +61 (0)2 6773 3312 Please avoid sending me Word or PowerPoint attachments. See http://www.fsf.org/philosophy/no-word-attachments.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15)
On Fri, 2005-01-28 at 11:18 -0800, Trond Myklebust wrote: > In the NFS client code we may use rwsems in order to protect stateful > operations against the (very infrequently used) server reboot recovery > code. The point is that when the server reboots, the server forces us to > block *all* requests that involve adding new state (e.g. opening an > NFSv4 file, or setting up a lock) while our client and others are > re-establishing their existing state on the server. Hmm, when I was an ISP sysadmin I used to use this all the time. NFS mounts from the BSD/OS clients would start to act up under heavy web server load and the cleanest way to get them to recover was to simulate a reboot on the NetApp. Of course Linux clients were unaffected, they were just along for the ride ;-) Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0
Stephen C. Tweedie wrote: Hi, On Fri, 2005-01-28 at 20:15, Jeffrey E. Hundstad wrote: Does linux-2.6.11-rc2 have both the linux-2.6.10-ac10 fix and the xattr problem fixed? Not sure about how much of -ac went in, but it has the xattr fix. I've had my machine that would crash daily if not hourly stay up for 10 days now. This is with the linux-2.6.10-ac10 kernel. Good to know. Are you using xattrs extensively (eg. for ACLs, SELinux or Samba 4)? --Stephen On the machines that were having problems we really weren't using them for anything. I think I may have been running into the BIO problem that was fixed in 2.6.10-ac10. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0
Hi, On Fri, 2005-01-28 at 20:15, Jeffrey E. Hundstad wrote: > >>Does linux-2.6.11-rc2 have both the linux-2.6.10-ac10 fix and the xattr > >>problem fixed? > >Not sure about how much of -ac went in, but it has the xattr fix. > I've had my machine that would crash daily if not hourly stay up for 10 > days now. This is with the linux-2.6.10-ac10 kernel. Good to know. Are you using xattrs extensively (eg. for ACLs, SELinux or Samba 4)? --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why does the kernel need a gig of VM?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Wow. I'd heard that there was a way to set 3.5/0.5 GiB split, and that there was a patch that removed the split and isolated the kernel (but that was slow), so I was just curious about all this stuff with people screaming about how tight 4G of VM is vs a half gig or a gig that can be freed up. Josh Boyer wrote: > On Fri, 2005-01-28 at 15:06 -0500, John Richard Moser wrote: > >>-BEGIN PGP SIGNED MESSAGE- >>Hash: SHA1 >> >>Can someone give me a layout of what exactly is up there? I got the >>basic idea >> >>K 4G >>A 3G >>A 2G >>A 1G >> >>App has 3G, kernel has 1G at the top of VM on x86 (dunno about x86_64). >> >>So what's the layout of that top 1G? What's it all used for? Is there >>some obscene restriction of 1G of shared memory or something that gets >>mapped up there? >> >>How much does it need, and why? What, if anything, is variable and >>likely to do more than 10 or 15 megs of variation? > > > Because of various reasons. Normal kernel space virtual addresses > usually start at 0xc000, which is where the 3GiB userspace > restriction comes from. > > Then there is the vmalloc virtual address space, which usually starts at > a higher address than a normal kernel address. Along the same lines are > ioremap addresses, etc. > > Poke around in the header files. I bet you'll find lots of reasons. > > josh > > - -- All content of all messages exchanged herein are left in the Public Domain, unless otherwise explicitly stated. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.0 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFB+qUdhDd4aOud5P8RAmU8AJ9fRQi4A+yIVaXdv/oWlPIqObROPQCfUgvU KAsRKxYgSTWVecLsZZCvXgE= =v+fM -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] relayfs redux, part 2
Tom Zanussi wrote: > diff -urpN -X dontdiff linux-2.6.10/fs/Kconfig linux-2.6.10-cur/fs/Kconfig ... > + This file system is also available as a module ( = code which can be > + inserted in and removed from the running kernel whenever you want). > + The module is called relayfs. If you want to compile it as a > + module, say M here and read . ... This is a real nit, but personally I'd remove the stuff in parens above. It's not relayfs' job to educate users about what a module is. I'll try to give some more substantive feedback next week. Tim - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: I need a hardware wizard... I have been beating my head on the wall..
David Sims wrote: On Thu, 27 Jan 2005, Jeff Garzik wrote: David Sims wrote: [...] You can insert the module in a running kernel and after barking as follows (once for each disk attached) it runs just fine. Basically nobody has ever had hardware to test sata_vsc with that hardware. We should probably remove the PCI ID until an engineer can fix it... Hi again, I am willing to make this hardware available to any engineer that wants to help me solve this problem and I will do whatever I can to make it an easy job... Please help me... Well, I don't consider myself a hardware wizard, but at least I'm an engineer, so I decided to give it a go :) It seems that the driver is not acknowledging the interrupt from the controller. It would be nice to know what kind of interrupt is triggering this. Could you run the attached patch and show the output from dmesg? -- Paulo Marques - www.grupopie.com All that is necessary for the triumph of evil is that good men do nothing. Edmund Burke (1729 - 1797) --- sata_vsc.c.orig 2005-01-28 12:23:47.0 + +++ sata_vsc.c 2005-01-28 20:51:13.993868526 + @@ -160,12 +160,17 @@ irqreturn_t vsc_sata_interrupt (int irq, struct ata_host_set *host_set = dev_instance; unsigned int i; unsigned int handled = 0; +static int int_count = 0; u32 int_status; spin_lock(_set->lock); int_status = readl(host_set->mmio_base + VSC_SATA_INT_STAT_OFFSET); + int_count++; + if (int_count > 1000 && int_count <= 1020) + printk("vsc_sata int status: %08x\n", int_status); + for (i = 0; i < host_set->n_ports; i++) { if (int_status & ((u32) 0xFF << (8 * i))) { struct ata_port *ap;
Re: [PATCH] OpenBSD Networking-related randomization port
On Fri, 28 Jan 2005 21:34:52 +0100 Lorenzo Hernández García-Hierro <[EMAIL PROTECTED]> wrote: > Attached the new patch following Arjan's recommendations. No SMP protection on the SBOX, better look into that. The locking you'll likely need to add will make this routine serialize many networking operations which is one thing we've been trying to avoid. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why does the kernel need a gig of VM?
On Fri, 2005-01-28 at 15:06 -0500, John Richard Moser wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Can someone give me a layout of what exactly is up there? I got the > basic idea > > K 4G > A 3G > A 2G > A 1G > > App has 3G, kernel has 1G at the top of VM on x86 (dunno about x86_64). > > So what's the layout of that top 1G? What's it all used for? Is there > some obscene restriction of 1G of shared memory or something that gets > mapped up there? > > How much does it need, and why? What, if anything, is variable and > likely to do more than 10 or 15 megs of variation? Because of various reasons. Normal kernel space virtual addresses usually start at 0xc000, which is where the 3GiB userspace restriction comes from. Then there is the vmalloc virtual address space, which usually starts at a higher address than a normal kernel address. Along the same lines are ioremap addresses, etc. Poke around in the header files. I bet you'll find lots of reasons. josh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] OpenBSD Networking-related randomization port
On Fri, 2005-01-28 at 21:34 +0100, Lorenzo HernÃndez GarcÃa-Hierro wrote: > Hi, > > Attached the new patch following Arjan's recommendations. > I'm sorry about not making it "inlined", but my mail agent messes up the > diffs if I do so. > Still waiting for the OSDL STP tests results, they will take a while to > finish. > > Cheers, lots better already! Some more comments (now that the patch got a lot easier to read :) static inline __u32 tcp_v4_init_sequence(struct sock *sk, struct sk_buff *skb) { - return secure_tcp_sequence_number(skb->nh.iph->daddr, - skb->nh.iph->saddr, - skb->h.th->dest, - skb->h.th->source); + + return ip_randomisn(); } is there a reason for the weird indentation? + if (!tp->write_seq) { + tp->write_seq = ip_randomisn(); + } spare { } pare that's not needed, also looks like one tab too many as for obsd_get_random_long().. would it be possible to use the get_random_int() function from the patches I posted the other day? They use the existing random.c infrastructure instead of making a copy... I still don't understand why you need a obsd_rand.c and can't use the normal random.c static inline u32 xprt_alloc_xid(struct rpc_xprt *xprt) { - return xprt->xid++; + /* Return randomized xprt->xid instead of prt->xid++ */ + return (u32) obsd_get_random_long(); + } that cast looks quite redundant... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
page fault scalability patch V16 [2/4]: mm counter macros
This patch extracts all the interesting pieces for handling rss and anon_rss into definitions in include/linux/sched.h. All rss operations are performed through the following three macros: get_mm_counter(mm, member) -> Obtain the value of a counter set_mm_counter(mm, member, value) -> Set the value of a counter update_mm_counter(mm, member, value)-> Add a value to a counter The simple definitions provided in this patch should result in no change to to the generated code. With this patch it becomes easier to add new counters and it is possible to redefine the method of counter handling (f.e. the page fault scalability patches may want to use atomic operations or split rss). Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.10/include/linux/sched.h === --- linux-2.6.10.orig/include/linux/sched.h 2005-01-28 11:01:51.0 -0800 +++ linux-2.6.10/include/linux/sched.h 2005-01-28 11:02:00.0 -0800 @@ -203,6 +203,10 @@ arch_get_unmapped_area_topdown(struct fi extern void arch_unmap_area(struct vm_area_struct *area); extern void arch_unmap_area_topdown(struct vm_area_struct *area); +#define set_mm_counter(mm, member, value) (mm)->member = (value) +#define get_mm_counter(mm, member) ((mm)->member) +#define update_mm_counter(mm, member, value) (mm)->member += (value) +#define MM_COUNTER_T unsigned long struct mm_struct { struct vm_area_struct * mmap; /* list of VMAs */ @@ -219,7 +223,7 @@ struct mm_struct { atomic_t mm_count; /* How many references to "struct mm_struct" (users count as 1) */ int map_count; /* number of VMAs */ struct rw_semaphore mmap_sem; - spinlock_t page_table_lock; /* Protects page tables, mm->rss, mm->anon_rss */ + spinlock_t page_table_lock; /* Protects page tables and some counters */ struct list_head mmlist;/* List of maybe swapped mm's. These are globally strung * together off init_mm.mmlist, and are protected @@ -229,9 +233,13 @@ struct mm_struct { unsigned long start_code, end_code, start_data, end_data; unsigned long start_brk, brk, start_stack; unsigned long arg_start, arg_end, env_start, env_end; - unsigned long rss, anon_rss, total_vm, locked_vm, shared_vm; + unsigned long total_vm, locked_vm, shared_vm; unsigned long exec_vm, stack_vm, reserved_vm, def_flags, nr_ptes; + /* Special counters protected by the page_table_lock */ + MM_COUNTER_T rss; + MM_COUNTER_T anon_rss; + unsigned long saved_auxv[42]; /* for /proc/PID/auxv */ unsigned dumpable:1; Index: linux-2.6.10/mm/memory.c === --- linux-2.6.10.orig/mm/memory.c 2005-01-28 11:01:58.0 -0800 +++ linux-2.6.10/mm/memory.c2005-01-28 11:02:00.0 -0800 @@ -324,9 +324,9 @@ copy_one_pte(struct mm_struct *dst_mm, pte = pte_mkclean(pte); pte = pte_mkold(pte); get_page(page); - dst_mm->rss++; + update_mm_counter(dst_mm, rss, 1); if (PageAnon(page)) - dst_mm->anon_rss++; + update_mm_counter(dst_mm, anon_rss, 1); set_pte(dst_pte, pte); page_dup_rmap(page); } @@ -528,7 +528,7 @@ static void zap_pte_range(struct mmu_gat if (pte_dirty(pte)) set_page_dirty(page); if (PageAnon(page)) - tlb->mm->anon_rss--; + update_mm_counter(tlb->mm, anon_rss, -1); else if (pte_young(pte)) mark_page_accessed(page); tlb->freed++; @@ -1345,13 +1345,14 @@ static int do_wp_page(struct mm_struct * spin_lock(>page_table_lock); page_table = pte_offset_map(pmd, address); if (likely(pte_same(*page_table, pte))) { - if (PageAnon(old_page)) - mm->anon_rss--; + if (PageAnon(old_page)) + update_mm_counter(mm, anon_rss, -1); if (PageReserved(old_page)) { - ++mm->rss; + update_mm_counter(mm, rss, 1); acct_update_integrals(); update_mem_hiwater(); } else + page_remove_rmap(old_page); break_cow(vma, new_page, address, page_table); lru_cache_add_active(new_page); @@ -1755,7 +1756,7 @@ static int do_swap_page(struct mm_struct if (vm_swap_full()) remove_exclusive_swap_page(page); - mm->rss++; + update_mm_counter(mm, rss, 1); acct_update_integrals();
page fault scalability patch V16 [3/4]: Drop page_table_lock in handle_mm_fault
The page fault handler attempts to use the page_table_lock only for short time periods. It repeatedly drops and reacquires the lock. When the lock is reacquired, checks are made if the underlying pte has changed before replacing the pte value. These locations are a good fit for the use of ptep_cmpxchg. The following patch allows to remove the first time the page_table_lock is acquired and uses atomic operations on the page table instead. A section using atomic pte operations is begun with page_table_atomic_start(struct mm_struct *) and ends with page_table_atomic_stop(struct mm_struct *) Both of these become spin_lock(page_table_lock) and spin_unlock(page_table_lock) if atomic page table operations are not configured (CONFIG_ATOMIC_TABLE_OPS undefined). Atomic operations with pte_xchg and pte_cmpxchg only work for the lowest layer of the page table. Higher layers may also be populated in an atomic way by defining pmd_test_and_populate() etc. The generic versions of these functions fall back to the page_table_lock (populating higher level page table entries is rare and therefore this is not likely to be performance critical). For ia64 the definitions for higher level atomic operations is included and these may easily be added for other architectures. This patch depends on the pte_cmpxchg patch to be applied first and will only remove the first use of the page_table_lock in the page fault handler. This will allow the following page table operations without acquiring the page_table_lock: 1. Updating of access bits (handle_mm_faults) 2. Anonymous read faults (do_anonymous_page) The page_table_lock is still acquired for creating a new pte for an anonymous write fault and therefore the problems with rss that were addressed by splitting rss into the task structure do not yet occur. The patch also adds some diagnostic features by counting the number of cmpxchg failures (useful for verification if this patch works right) and the number of faults received that led to no change in the page table. These statistics may be viewed via /proc/meminfo Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.10/mm/memory.c === --- linux-2.6.10.orig/mm/memory.c 2005-01-27 16:27:59.0 -0800 +++ linux-2.6.10/mm/memory.c2005-01-27 16:28:54.0 -0800 @@ -36,6 +36,8 @@ * ([EMAIL PROTECTED]) * * Aug/Sep 2004 Changed to four level page tables (Andi Kleen) + * Jan 2005Scalability improvement by reducing the use and the length of time + * the page table lock is held (Christoph Lameter) */ #include @@ -1285,8 +1287,8 @@ static inline void break_cow(struct vm_a * change only once the write actually happens. This avoids a few races, * and potentially makes it more efficient. * - * We hold the mm semaphore and the page_table_lock on entry and exit - * with the page_table_lock released. + * We hold the mm semaphore and have started atomic pte operations, + * exit with pte ops completed. */ static int do_wp_page(struct mm_struct *mm, struct vm_area_struct * vma, unsigned long address, pte_t *page_table, pmd_t *pmd, pte_t pte) @@ -1304,7 +1306,7 @@ static int do_wp_page(struct mm_struct * pte_unmap(page_table); printk(KERN_ERR "do_wp_page: bogus page at address %08lx\n", address); - spin_unlock(>page_table_lock); + page_table_atomic_stop(mm); return VM_FAULT_OOM; } old_page = pfn_to_page(pfn); @@ -1316,21 +1318,27 @@ static int do_wp_page(struct mm_struct * flush_cache_page(vma, address); entry = maybe_mkwrite(pte_mkyoung(pte_mkdirty(pte)), vma); - ptep_set_access_flags(vma, address, page_table, entry, 1); - update_mmu_cache(vma, address, entry); + /* +* If the bits are not updated then another fault +* will be generated with another chance of updating. +*/ + if (ptep_cmpxchg(page_table, pte, entry)) + update_mmu_cache(vma, address, entry); + else + inc_page_state(cmpxchg_fail_flag_reuse); pte_unmap(page_table); - spin_unlock(>page_table_lock); + page_table_atomic_stop(mm); return VM_FAULT_MINOR; } } pte_unmap(page_table); + page_table_atomic_stop(mm); /* * Ok, we need to copy. Oh, well.. */ if (!PageReserved(old_page)) page_cache_get(old_page); - spin_unlock(>page_table_lock); if (unlikely(anon_vma_prepare(vma)))
Re: [PATCH] OpenBSD Networking-related randomization port
Hi, Attached the new patch following Arjan's recommendations. I'm sorry about not making it "inlined", but my mail agent messes up the diffs if I do so. Still waiting for the OSDL STP tests results, they will take a while to finish. Cheers, -- Lorenzo Hernández García-Hierro <[EMAIL PROTECTED]> [1024D/6F2B2DEC] & [2048g/9AE91A22][http://tuxedo-es.org] diff -Nur linux-2.6.11-rc2/include/linux/random.h linux-2.6.11-rc2.tx1/include/linux/random.h --- linux-2.6.11-rc2/include/linux/random.h 2005-01-26 19:54:17.0 +0100 +++ linux-2.6.11-rc2.tx1/include/linux/random.h 2005-01-28 19:45:31.359923392 +0100 @@ -42,6 +42,12 @@ #ifdef __KERNEL__ +/* OpenBSD Networking-related randomization functions - [EMAIL PROTECTED] */ +extern unsigned long obsd_get_random_long(void); +extern __u16 ip_randomid(void); +extern __u32 ip_randomisn(void); + + extern void rand_initialize_irq(int irq); extern void add_input_randomness(unsigned int type, unsigned int code, diff -Nur linux-2.6.11-rc2/net/ipv4/tcp_ipv4.c linux-2.6.11-rc2.tx1/net/ipv4/tcp_ipv4.c --- linux-2.6.11-rc2/net/ipv4/tcp_ipv4.c 2005-01-26 19:54:19.0 +0100 +++ linux-2.6.11-rc2.tx1/net/ipv4/tcp_ipv4.c 2005-01-28 19:39:48.0 +0100 @@ -539,10 +539,8 @@ static inline __u32 tcp_v4_init_sequence(struct sock *sk, struct sk_buff *skb) { - return secure_tcp_sequence_number(skb->nh.iph->daddr, - skb->nh.iph->saddr, - skb->h.th->dest, - skb->h.th->source); + + return ip_randomisn(); } /* called with local bh disabled */ @@ -833,14 +831,11 @@ tcp_v4_setup_caps(sk, >u.dst); tp->ext2_header_len = rt->u.dst.header_len; - if (!tp->write_seq) - tp->write_seq = secure_tcp_sequence_number(inet->saddr, - inet->daddr, - inet->sport, - usin->sin_port); - - inet->id = tp->write_seq ^ jiffies; - + if (!tp->write_seq) { + tp->write_seq = ip_randomisn(); + } + + inet->id = htons(ip_randomid()); err = tcp_connect(sk); rt = NULL; if (err) @@ -1579,8 +1574,8 @@ if (newinet->opt) newtp->ext_header_len = newinet->opt->optlen; newtp->ext2_header_len = dst->header_len; - newinet->id = newtp->write_seq ^ jiffies; - + newinet->id = htons(ip_randomid()); + tcp_sync_mss(newsk, dst_pmtu(dst)); newtp->advmss = dst_metric(dst, RTAX_ADVMSS); tcp_initialize_rcv_mss(newsk); diff -Nur linux-2.6.11-rc2/net/Makefile linux-2.6.11-rc2.tx1/net/Makefile --- linux-2.6.11-rc2/net/Makefile 2005-01-26 19:50:49.0 +0100 +++ linux-2.6.11-rc2.tx1/net/Makefile 2005-01-28 21:01:21.870140688 +0100 @@ -11,6 +11,7 @@ tmp-$(CONFIG_COMPAT) := compat.o obj-$(CONFIG_NET) += $(tmp-y) +obj-y+= obsd_rand.o # LLC has to be linked before the files in net/802/ obj-$(CONFIG_LLC) += llc/ diff -Nur linux-2.6.11-rc2/net/obsd_rand.c linux-2.6.11-rc2.tx1/net/obsd_rand.c --- linux-2.6.11-rc2/net/obsd_rand.c 1970-01-01 01:00:00.0 +0100 +++ linux-2.6.11-rc2.tx1/net/obsd_rand.c 2005-01-28 17:43:50.0 +0100 @@ -0,0 +1,269 @@ +/* $Id: openbsd-netrand-2.6.11-rc2.patch,v 1.5 2005/01/28 20:16:21 lorenzo Exp $ + * Copyright (c) 2005 Lorenzo Hernandez Garcia-Hierro <[EMAIL PROTECTED]>. + * All rights reserved. + * + * Added some macros and stolen code from random.c, for individual and less + * "invasive" implementation.Also removed the get_random_long() macro definition, + * which is not good if we can simply call back obsd_get_random_long(). + * + * Copyright (c) 1996, 1997, 2000-2002 Michael Shalayeff. + * + * Version 1.90, last modified 28-Jan-05 + * + * Copyright Theodore Ts'o, 1994, 1995, 1996, 1997, 1998, 1999. + * All rights reserved. + * + * Copyright 1998 Niels Provos <[EMAIL PROTECTED]> + * All rights reserved. + * Theo de Raadt <[EMAIL PROTECTED]> came up with the idea of using + * such a mathematical system to generate more random (yet non-repeating) + * ids to solve the resolver/named problem. But Niels designed the + * actual system based on the constraints. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + *notice, this list of conditions and the following disclaimer, + * 2. Redistributions in binary form must reproduce the above copyright + *notice, this list of conditions and the following disclaimer in the + *documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
page fault scalability patch V16 [4/4]: Drop page_table_lock in do_anonymous_page
Do not use the page_table_lock in do_anonymous_page. This will significantly increase the parallelism in the page fault handler in SMP systems. The patch also modifies the definitions of _mm_counter functions so that rss and anon_rss become atomic. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.10/mm/memory.c === --- linux-2.6.10.orig/mm/memory.c 2005-01-27 16:39:24.0 -0800 +++ linux-2.6.10/mm/memory.c2005-01-27 16:39:24.0 -0800 @@ -1839,12 +1839,12 @@ do_anonymous_page(struct mm_struct *mm, vma->vm_page_prot)), vma); - spin_lock(>page_table_lock); + page_table_atomic_start(mm); if (!ptep_cmpxchg(page_table, orig_entry, entry)) { pte_unmap(page_table); page_cache_release(page); - spin_unlock(>page_table_lock); + page_table_atomic_stop(mm); inc_page_state(cmpxchg_fail_anon_write); return VM_FAULT_MINOR; } @@ -1862,7 +1862,7 @@ do_anonymous_page(struct mm_struct *mm, update_mmu_cache(vma, addr, entry); pte_unmap(page_table); - spin_unlock(>page_table_lock); + page_table_atomic_stop(mm); return VM_FAULT_MINOR; } Index: linux-2.6.10/include/linux/sched.h === --- linux-2.6.10.orig/include/linux/sched.h 2005-01-27 16:39:24.0 -0800 +++ linux-2.6.10/include/linux/sched.h 2005-01-27 16:40:24.0 -0800 @@ -203,10 +203,26 @@ arch_get_unmapped_area_topdown(struct fi extern void arch_unmap_area(struct vm_area_struct *area); extern void arch_unmap_area_topdown(struct vm_area_struct *area); +#ifdef CONFIG_ATOMIC_TABLE_OPS +/* + * Atomic page table operations require that the counters are also + * incremented atomically +*/ +#define set_mm_counter(mm, member, value) atomic_set(&(mm)->member, value) +#define get_mm_counter(mm, member) ((unsigned long)atomic_read(&(mm)->member)) +#define update_mm_counter(mm, member, value) atomic_add(value, &(mm)->member) +#define MM_COUNTER_T atomic_t + +#else +/* + * No atomic page table operations. Counters are protected by + * the page table lock + */ #define set_mm_counter(mm, member, value) (mm)->member = (value) #define get_mm_counter(mm, member) ((mm)->member) #define update_mm_counter(mm, member, value) (mm)->member += (value) #define MM_COUNTER_T unsigned long +#endif struct mm_struct { struct vm_area_struct * mmap; /* list of VMAs */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
page fault scalability patch V16 [0/4]: redesign overview
Changes from V15->V16 of this patch: Complete Redesign. An introduction to what this patch does and a patch archive can be found on http://oss.sgi.com/projects/page_fault_performance. The archive also has a combined patch. The basic approach in this patchset is the same as used in SGI's 2.4.X based kernels which have been in production use in ProPack 3 for a long time. The patchset is composed of 4 patches (and was tested against 2.6.11-rc2-bk6 on ia64, i386 and x86_64): 1/4: ptep_cmpxchg and ptep_xchg to avoid intermittent zeroing of ptes The current way of synchronizing with the CPU or arch specific interrupts updating page table entries is to first set a pte to zero before writing a new value. This patch uses ptep_xchg and ptep_cmpxchg to avoid writing the zero for certain configurations. The patch introduces CONFIG_ATOMIC_TABLE_OPS that may be enabled as a experimental feature during kernel configuration if the hardware is able to support atomic operations and if an SMP kernel is being configured. A Kconfig update for i386, x86_64 and ia64 has been provided. On i386 this options is restricted to CPUs better than a 486 and non PAE mode (that way all the cmpxchg issues on old i386 CPUS and the problems with 64bit atomic operations on recent i386 CPUS are avoided). If CONFIG_ATOMIC_TABLE_OPS is not set then ptep_xchg and ptep_xcmpxchg are realized by falling back to clearing a pte before updating it. The patch does not change the use of mm->page_table_lock and the only performance improvement is the replacement of xchg-with-zero-and-then-write-new-pte-value with an xchg with the new value for SMP on some architectures if CONFIG_ATOMIC_TABLE_OPS is configured. It should not do anything major to VM operations. 2/4: Macros for mm counter manipulation There are various approaches to handling mm counters if the page_table_lock is no longer acquired. This patch defines macros in include/linux/sched.h to handle these counters and makes sure that these macros are used throughout the kernel to access and manipulate rss and anon_rss. There should be no change to the generated code as a result of this patch. 3/4: Drop the first use of the page_table_lock in handle_mm_fault The patch introduces two new functions: page_table_atomic_start(mm), page_table_atomic_stop(mm) that fall back to the use of the page_table_lock if CONFIG_ATOMIC_TABLE_OPS is not defined. If CONFIG_ATOMIC_TABLE_OPS is defined those functions may be used to prep the CPU for atomic table ops (i386 in PAE mode may f.e. get the MMX register ready for 64bit atomic ops) but are simply empty by default. Two operations may then be performed on the page table without acquiring the page table lock: a) updating access bits in pte b) anonymous read faults installed a mapping to the zero page. All counters are still protected with the page_table_lock thus avoiding any issues there. Some additional statistics are added to /proc/meminfo to give some statistics. Also counts spurious faults with no effect. There is a surprisingly high number of those on ia64 (used to populate the cpu caches with the pte??) 4/4: Drop the use of the page_table_lock in do_anonymous_page The second acquisition of the page_table_lock is removed from do_anonymous_page and allows the anonymous write fault to be possible without the page_table_lock. The macros for manipulating rss and anon_rss in include/linux/sched.h are changed if CONFIG_ATOMIC_TABLE_OPS is set to use atomic operations for rss and anon_rss (safest solution for now, other solutions may easily be implemented by changing those macros). This patch typically yield significant increases in page fault performance for threaded applications on SMP systems. I have an additional patch that drops the page_table_lock for COW but that raises a lot of other issues. I will post that patch separately and only to linux-mm. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/