[RFC] kernel/pid.c pid allocation wierdness
Hi. I'm looking at how alloc_pid() works and can't understand one (simple/stupid) thing. It first kmem_cache_alloc()-s a strct pid, then calls alloc_pidmap() and at the end it taks a global pidmap_lock() to add new pid to hash. The question is - why does alloc_pidmap() use at least two atomic ops and potentially loop to find a zero bit in pidmap? Why not call alloc_pidmap() under pidmap_lock and find zero pid in pidmap w/o any loops and atomics? The same is for free_pid(). Do I miss something? Thank, Pavel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ALSA PATCH] alsa-git merge request
Linus, please pull from [the linus branch at]: master.kernel.org:/pub/scm/linux/kernel/git/perex/alsa.git linus gitweb interface: http://www.kernel.org/git/?p=linux/kernel/git/perex/alsa.git The GNU patch is available at: ftp://ftp.alsa-project.org/pub/kernel-patches/alsa-git-2007-03-14.patch.gz Additional notes: Only fixes / new hardware IDs. The following files will be updated: Documentation/sound/alsa/ALSA-Configuration.txt |1 + include/sound/version.h |2 +- sound/pci/ac97/ac97_patch.c | 13 --- sound/pci/hda/hda_intel.c | 13 ++- sound/pci/hda/patch_analog.c| 41 +-- sound/pci/hda/patch_realtek.c |1 + sound/pci/hda/patch_sigmatel.c |5 +++ sound/pci/intel8x0.c| 10 -- sound/soc/Kconfig |2 + sound/soc/at91/Kconfig |3 +- sound/soc/pxa/Kconfig |3 +- 11 files changed, 76 insertions(+), 18 deletions(-) The following things were done: Jaroslav Kysela (1): [ALSA] version 1.0.14rc3 Randy Cushman (1): [ALSA] ac97 - fix AD shared shared jack control logic Takashi Iwai (5): [ALSA] soc - Fix dependencies in Kconfig files [ALSA] hda-intel - Fix codec probe with ATI contorllers [ALSA] hda-codec - Fix speaker output on MacPro [ALSA] intel8x0 - Fix Oops at kdump crash kernel [ALSA] hda-codec - Add model for HP Compaq d5700 Tobin Davis (2): [ALSA] hda-codec - Add suppoprt for Asus M2N-SLI motherboard [ALSA] hda-codec - more systems for Analog Devices Tommi Kyntola (1): [ALSA] intel8x0 - Fix speaker output after S2RAM - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 1/7] Resource counters
Srivatsa Vaddagiri wrote: > On Tue, Mar 13, 2007 at 06:41:05PM +0300, Pavel Emelianov wrote: >>> right, but atomic ops have much less impact on most >>> architectures than locks :) >> Right. But atomic_add_unless() is slower as it is >> essentially a loop. See my previous letter in this sub-thread. > > If I am not mistaken, you shouldn't loop in normal cases, which means > it boils down to a atomic_read() + atomic_cmpxch() > > So does the lock - in a normal case (when it's not heavily contented) it will boil down to atomic_dec_and_test(). Nevertheless, making charge like in this patchset requires two atomic ops with atomic_xxx and only one with spin_lock(). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 4/7] RSS accounting hooks over the code
Balbir Singh wrote: Nick Piggin wrote: And strangely, this example does not go outside the parameters of what you asked for AFAIKS. In the worst case of one container getting _all_ the shared pages, they will still remain inside their maximum rss limit. When that does happen and if a container hits it limit, with a LRU per-container, if the container is not actually using those pages, they'll get thrown out of that container and get mapped into the container that is using those pages most frequently. Exactly. Statistically, first touch will work OK. It may mean some reclaim inefficiencies in corner cases, but things will tend to even out. So they might get penalised a bit on reclaim, but maximum rss limits will work fine, and you can (almost) guarantee X amount of memory for a given container, and it will _work_. But I also take back my comments about this being the only design I have seen that gets everything, because the node-per-container idea is a really good one on the surface. And it could mean even less impact on the core VM than this patch. That is also a first-touch scheme. With the proposed node-per-container, we will need to make massive core VM changes to reorganize zones and nodes. We would want to allow 1. For sharing of nodes 2. Resizing nodes 3. May be more But a lot of that is happening anyway for other reasons (eg. memory plug/unplug). And I don't consider node/zone setup to be part of the "core VM" as such... it is _good_ if we can move extra work into setup rather than have it in the mm. That said, I don't think this patch is terribly intrusive either. With the node-per-container idea, it will hard to control page cache limits, independent of RSS limits or mlock limits. NOTE: page cache == unmapped page cache here. I don't know that it would be particularly harder than any other first-touch scheme. If one container ends up being charged with too much pagecache, eventually they'll reclaim a bit of it and the pages will get charged to more frequent users. However the messed up accounting that doesn't handle sharing between groups of processes properly really bugs me. Especially when we have the infrastructure to do it right. Does that make more sense? I think it is simplistic. Sure you could probably use some of the rmap stuff to account shared mapped _user_ pages once for each container that touches them. And this patchset isn't preventing that. But how do you account kernel allocations? How do you account unmapped pagecache? What's the big deal so many accounting people have with just RSS? I'm not a container person, this is an honest question. Because from my POV if you conveniently ignore everything else... you may as well just not do any accounting at all. We decided to implement accounting and control in phases 1. RSS control 2. unmapped page cache control 3. mlock control 4. Kernel accounting and limits This has several advantages 1. The limits can be individually set and controlled. 2. The code is broken down into simpler chunks for review and merging. But this patch gives the groundwork to handle 1-4, and it is in a small chunk, and one would be able to apply different limits to different types of pages with it. Just using rmap to handle 1 does not really seem like a viable alternative because it fundamentally isn't going to handle 2 or 4. I'm not saying that you couldn't _later_ add something that uses rmap or our current RSS accounting to tweak container-RSS semantics. But isn't it sensible to lay the groundwork first? Get a clear path to something that is good (not perfect), but *works*? -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [5/6] 2.6.21-rc3: known regressions
On Tuesday 13 of March 2007, Adrian Bunk wrote: > Subject: ThinkPad Z60m: usb mouse stops working after suspend to ram > References : http://lkml.org/lkml/2007/2/21/413 > http://lkml.org/lkml/2007/2/28/172 > Submitter : Arkadiusz Miskiewicz <[EMAIL PROTECTED]> > Caused-By : Konstantin Karasyov <[EMAIL PROTECTED]> > commit 0a6139027f3986162233adc17285151e78b39cac > Handled-By : Konstantin Karasyov <[EMAIL PROTECTED]> > Status : problem is being debugged It's fixed in git tree. Commit ff24ba74b6d3befbfbafa142582211b5a6095d45 -- Arkadiusz MiśkiewiczPLD/Linux Team arekm / maven.plhttp://ftp.pld-linux.org/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86 boot, pda and gdt cleanups
On Tue, 2007-03-13 at 21:39 -0700, Jeremy Fitzhardinge wrote: > Rusty Russell wrote: > > This is called "pissing in the corners". Don't do it: we don't need to > > touch that code and I actually prefer the original anyway (explicit is > > *good*). > > > > The habit of extracting cpu number once then using it is an optimization > > which we should be aiming to get rid of (it simply hurts archs with > > efficient per-cpu implementations). > > No, that was for a reason. I was worried about smp_processor_id() not > returning valid values between init_gdt and cpu_set_gdt. It's not > actually a problem, but relying on smp_processor_id() while we're moving > the foundations its based on seems fragile. smp_processor_id() always works, so it's fundamental, not fragile. However, we *should* remove the arg from cpu_set_gdt, since we have such faith in smp_processor_id() 8) Cheers, Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Stolen and degraded time and schedulers
Daniel Walker wrote: > The adjustments that I spoke of above are working regardless of ntp .. > The stability of the TSC directly effects the clock mult adjustments in > timekeeping, as does interrupt latency since the clock is essentially > validated against the timer interrupt. > Yep. But the tsc is just an example of a clocksource, and doesn't have any real bearing on what I'm saying. > like I said there are other factors so that's not going to exactly model > cpu speed changes. You could come up with another method, but that would > likely require another known constant clock. > Well, it doesn't need to be a constant clock if its modelling a changing rate. And it doesn't need to be an exact model; it just needs to be better than the current situation. > sched_clock doesn't measure amounts of cpu work either, it's all about > timing. > Specifically, how much cpu time a process has used. But if the CPU is running at half speed (or 50% duty cycle), then claiming that the process got the full amount of time is just an error. >> Well, lots of cpus have dynamic frequencies. Any scheduler which >> maintains history will suffer the same problem, even on UP. If >> processes A and B are supposed to have the same priority and they both >> execute for 1ms of real time, did they make the same amount of >> progress? Not if the cpu changed speed in between. >> > > That's true, but given a constant clock (like what sched_clock should > have) then the accounting is similarly inaccurate. Any connection > between the scheduler and the TSC frequency changes aren't part of the > design AFAIK .. > Well, my whole argument is that sched_clock /should not/ be a constant clock. And I'm not quite sure why you keep bringing up the tsc, because it has no relevance. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Introduce load_TLS to the "for" loop.
On Tue, 2007-03-13 at 21:55 +0100, Andi Kleen wrote: > On Tue, Mar 13, 2007 at 10:31:27AM -0700, Jeremy Fitzhardinge wrote: > > Andi Kleen wrote: > > > On Tue, Mar 13, 2007 at 05:39:36PM +1100, Rusty Russell wrote: > > > > > >> GCC (4.1 at least) unrolls it anyway, but I can't believe this code > > >> > > > > > > Are you sure? Normally it doesn't unroll without -funroll-loops which > > > the kernel does normally not set. Especially not with -Os builds. > > > > > > > Does it matter either way in this case? > > It's in the middle of the context switch. Well, the rest of __switch_to isn't "0PTIM1Z3D!!!" like this. But even so, that's no excuse for crap code. If it had used memcpy, we wouldn't be wasting cycles on this discussion. Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 4/7] RSS accounting hooks over the code
Nick Piggin wrote: Eric W. Biederman wrote: Nick Piggin <[EMAIL PROTECTED]> writes: Eric W. Biederman wrote: First touch page ownership does not guarantee give me anything useful for knowing if I can run my application or not. Because of page sharing my application might run inside the rss limit only because I got lucky and happened to share a lot of pages with another running application. If the next I run and it isn't running my application will fail. That is ridiculous. Let's be practical here, what you're asking is basically impossible. Unless by deterministic you mean that it never enters the a non trivial syscall, in which case, you just want to know about maximum RSS of the process, which we already account). Not per process I want this on a group of processes, and yes that is all I want just. I just want accounting of the maximum RSS of a group of processes and then the mechanism to limit that maximum rss. Well don't you just sum up the maximum for each process? Or do you want to only count shared pages inside a container once, or something difficult like that? I don't want sharing between vservers/VE/containers to affect how many pages I can have mapped into my processes at once. You seem to want total isolation. You could use virtualization? No. I don't want the meaning of my rss limit to be affected by what other processes are doing. We have constraints of how many resources the box actually has. But I don't want accounting so sloppy that processes outside my group of processes can artificially lower my rss value, which magically raises my rss limit. So what are you going to do about all the shared caches and slabs inside the kernel? It is basically handwaving anyway. The only approach I've seen with a sane (not perfect, but good) way of accounting memory use is this one. If you care to define "proper", then we could discuss that. I will agree that this patchset is probably in the right general ballpark. But the fact that pages are assigned exactly one owner is pure non-sense. We can do better. That is all I am asking for someone to at least attempt to actually account for the rss of a group of processes and get the numbers right when we have shared pages, between different groups of processes. We have the data structures to support this with rmap. Well rmap only supports mapped, userspace pages. Let me describe the situation where I think the accounting in the patchset goes totally wonky. Gcc as I recall maps the pages it is compiling with mmap. If in a single kernel tree I do: make -jN O=../compile1 & make -jN O=../compile2 & But set it up so that the two compiles are in different rss groups. If I run the concurrently they will use the same files at the same time and most likely because of the first touch rss limit rule even if I have a draconian rss limit the compiles will both be able to complete and finish. However if I run either of them alone if I use the most draconian rss limit I can that allows both compiles to finish I won't be able to compile a single kernel tree. Yeah it is not perfect. Fortunately, there is no perfect solution, so we don't have to be too upset about that. And strangely, this example does not go outside the parameters of what you asked for AFAIKS. In the worst case of one container getting _all_ the shared pages, they will still remain inside their maximum rss limit. When that does happen and if a container hits it limit, with a LRU per-container, if the container is not actually using those pages, they'll get thrown out of that container and get mapped into the container that is using those pages most frequently. So they might get penalised a bit on reclaim, but maximum rss limits will work fine, and you can (almost) guarantee X amount of memory for a given container, and it will _work_. But I also take back my comments about this being the only design I have seen that gets everything, because the node-per-container idea is a really good one on the surface. And it could mean even less impact on the core VM than this patch. That is also a first-touch scheme. With the proposed node-per-container, we will need to make massive core VM changes to reorganize zones and nodes. We would want to allow 1. For sharing of nodes 2. Resizing nodes 3. May be more With the node-per-container idea, it will hard to control page cache limits, independent of RSS limits or mlock limits. NOTE: page cache == unmapped page cache here. However the messed up accounting that doesn't handle sharing between groups of processes properly really bugs me. Especially when we have the infrastructure to do it right. Does that make more sense? I think it is simplistic. Sure you could probably use some of the rmap stuff to account shared mapped _user_ pages once for each container that touches them. And this patchset isn't preventing that. But how do you account kernel allocations? How do you account unmapped pagecache? Wh
Re: [PATCH] Introduce load_TLS to the "for" loop.
On Tue, 2007-03-13 at 14:50 +0100, Andi Kleen wrote: > On Tue, Mar 13, 2007 at 05:39:36PM +1100, Rusty Russell wrote: > > GCC (4.1 at least) unrolls it anyway, but I can't believe this code > > Are you sure? Normally it doesn't unroll without -funroll-loops which > the kernel does normally not set. Especially not with -Os builds. Yep, checked again: $ gcc --version gcc (GCC) 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5) ... ... gcc -Wp,-MD,arch/x86_64/kernel/.process.o.d -nostdinc -isystem /usr/lib/gcc/i486-linux-gnu/4.1.2/include -D__KERNEL__ -Iinclude -include include/linux/autoconf.h -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -O2 -mtune=generic -m64 -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks -Wno-sign-compare -fno-asynchronous-unwind-tables -funit-at-a-time -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -maccumulate-outgoing-args -fno-omit-frame-pointer -fno-optimize-sibling-calls -g -fno-stack-protector -Wdeclaration-after-statement -Wno-pointer-sign -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(process)" -D"KBUILD_MODNAME=KBUILD_STR(process)" -c -o arch/x86_64/kernel/process.o arch/x86_64/kernel/process.c ... $ objdump -Dr arch/x86_64/kernel/process.o | less ... 6be: 48 8b 94 00 00 00 00mov0x0(%rax,%rax,1),%rdx 6c5: 00 6c2: R_X86_64_32S cpu_gdt_descr+0x2 6c6: 48 8b 83 98 02 00 00mov0x298(%rbx),%rax 6cd: 48 83 c2 60 add$0x60,%rdx 6d1: 48 89 02mov%rax,(%rdx) 6d4: 48 8b 83 a0 02 00 00mov0x2a0(%rbx),%rax 6db: 48 89 42 08 mov%rax,0x8(%rdx) 6df: 48 8b 83 a8 02 00 00mov0x2a8(%rbx),%rax 6e6: 48 89 42 10 mov%rax,0x10(%rdx) If I turn on CONFIG_OPTIMIZE_FOR_SIZE, it's still unrolled, interestingly. Cheers, Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New thread RDSL, post-2.6.20 kernels and amanda (tar) miss-fires
> On Wednesday 14 March 2007, William Lee Irwin III wrote: > >On Tue, Mar 13, 2007 at 11:31:53PM -0400, Gene Heskett wrote: > >> Now, can someone suggest a patch I can revert that might fix this? > >> The total number of patches between 2.6.20 and 2.6.21-rc1 will have me > >> building kernels to bisect this till the middle of June at this rate. > > > >4 billion patches could be bisected in 34 boots. Between 2.6.20 and > >2.6.21-rc1 there are only: > > > >$ git rev-list --no-merges v2.6.20..v2.6.21-rc1 |wc -l > >3118 > > > >patches, requiring 14 boots. In general ceil(log(n)/log(2))+2 boots. > > > >Of course, this is a little optimistic because it assumes no additional > >breakage occurring at the various bisection points. In any event, > >assuming (pessimistically) 10 minutes per build, this is 280 minutes or > >4 hours and 40 minutes of build time. I estimate the process should > >complete well before Friday of this week, never mind June. > On Wed, Mar 14, 2007 at 02:09:57AM -0400, Gene Heskett wrote: > Chuckle, sorry to disappoint you wli, on that 32 cpu Niagra Con was > calling 'poor equipment', maybe. > Even using ccache, its about 15-18 minutes per build, with another 10 to > edit my build script and construct the kernel tree with the proper > patches applied. Then a reboot, probably 10 minutes by the time I get > the nvidia driver installed for the new kernel and get startx'd, then its > another 2 hours or a bit less for an amanda run to test it. 2 hours, 48 minutes times 13 boots (see the correction post) is 36 hours, 24 minutes. One attempt a day (24 hours instead of 2 hours, 48 minutes) yyields 2 weeks. So you're still done by April, not June. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New thread RDSL, post-2.6.20 kernels and amanda (tar) miss-fires
On Wednesday 14 March 2007, William Lee Irwin III wrote: >On Tue, Mar 13, 2007 at 11:31:53PM -0400, Gene Heskett wrote: >> Now, can someone suggest a patch I can revert that might fix this? >> The total number of patches between 2.6.20 and 2.6.21-rc1 will have me >> building kernels to bisect this till the middle of June at this rate. > >4 billion patches could be bisected in 34 boots. Between 2.6.20 and >2.6.21-rc1 there are only: > >$ git rev-list --no-merges v2.6.20..v2.6.21-rc1 |wc -l >3118 > >patches, requiring 14 boots. In general ceil(log(n)/log(2))+2 boots. > >Of course, this is a little optimistic because it assumes no additional >breakage occurring at the various bisection points. In any event, >assuming (pessimistically) 10 minutes per build, this is 280 minutes or >4 hours and 40 minutes of build time. I estimate the process should >complete well before Friday of this week, never mind June. Chuckle, sorry to disappoint you wli, on that 32 cpu Niagra Con was calling 'poor equipment', maybe. Even using ccache, its about 15-18 minutes per build, with another 10 to edit my build script and construct the kernel tree with the proper patches applied. Then a reboot, probably 10 minutes by the time I get the nvidia driver installed for the new kernel and get startx'd, then its another 2 hours or a bit less for an amanda run to test it. I've posted to the amanda lists too, so they will be aware of it. And because an ls -lc returns perfectly sane values for the mtimes and sizes, I suspect the real problem may not necessarily be 100% kernel related. I have been intermittently ranting because both the tar api stir, and the return from tar are such a moving target that the developers are having a hard time staying ahead of the changes to tar, backward compatibility it seems, is the furthest thing from the tar maintainers minds. The most recent change that I'm aware of is that tar now returns a 1 for success! What the heck were those guys at gnu.org thinking? Or smoking as the case may be. I obviously have a copy of the -rc1 patch in its entirety that I could peruse, but I'm not sure I would recognize a change that would effect tar if it bit me, hence the questions here to those who are far more conversant than I. As I've said on several occasions William, at my age, the best part I can play here is the Canary, in the coal mine scene, and something strange in the air of 2.6.21* just killed me. It's now up to the coroner(s) to determine the cause, and he has several dozen very very able assistants monitoring this list in my NSH opinion. Whatever, either tar, or this particular board in the kernels architecture surely needs fixed before 2.6.21 final. I don't even know if gnu.org has a bugzilla setup, but I'll look around tomorrow night as I'm tied up now till late tomorrow. If they do, I'll file it. But, I'm also amazed that no one else has been bitten. Don't any of you ever make a backup using tar for the pack mule? Thanks William. >-- wli -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Do unto others before they undo you. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3/6] 2.6.21-rc2: known regressions
Hello, Mathieu Bérard wrote: > [ 15.031823] ata1.00: taskfile_load_raw: (0x1f1-1f7): hex: 10 03 00 00 > 00 a0 ef Okay, this is interesting. This is Enable Device-Initiated Interface Power State Transitions. So, after this command is executed the device will try to transit to partial/slumber SATA PHY power states at its discretion, which is all cool and dandy in theory but depending on controller and drive firmware can cause all sorts of problems. The NCQ problem you're seeing probably is some side effect of device initiated link PS. Can't tell whether the controller or the drive's firmware is problem without further info. Due to blacklisting, NCQ won't be turned on your drive in future kernels and link PS doesn't seem to cause any problem no non-NCQ, so your case is taken care of here but this leaves me a bit worried about what _GTF feeds us. I don't think we can reliably filter out command TFs as it might even contain vendor-specific commands but it might be better to always log TFs executed for _GTF such that we at least know what's going on with the drive. Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NPTL patch for linux 2.4.28
On Wed, Mar 14, 2007 at 05:49:22AM +0530, Syed Ahemed wrote: > Hello all. > I have a tricky problem on hand and a straight forward question. > > Tricky problem: > - > While debugging a simple multithreaded application using gdb linux > 2.4.28 , i noticed the thread that has crashed after sigsegv has > complete information on the gdb (both address and function at the > time of crash ) .But the other threads that are in wait state ( > executing glibc functions at the time of crash ) just has the address > but not the function name as shown below. > > > sh-2.05b# ./gdb a.out /mnt/cf/engg_files/core_files/ > a.out.1173437318.core.5312 a.out.1173453940.core.9829 > a.out.1173438125.core.16016 lost+found > a.out.1173438881.core.18721 > GNU gdb 6.3 > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db > library > "/lib/libthread_db.so.1". > > warning: exec file is newer than core file. > Core was generated by `./a.out'. > Program terminated with signal 11, Segmentation fault. > Reading symbols from /lib/libpthread.so.0...done. > Loaded symbols for /lib/libpthread.so.0 > Reading symbols from /lib/libc.so.6...done. > Loaded symbols for /lib/libc.so.6 > Reading symbols from /opt/lib/ld-linux.so.2...done. > Loaded symbols for /opt//lib/ld-linux.so.2 > #0 0x080485df in a (p=0x0) at threadcore.c:34 > 34 threadcore.c: No such file or directory. >in threadcore.c > (gdb) info threads > 3 process 10993 0x08053840 in ?? () > 2 process 1267 0xbf5ff9d0 in ?? () > * 1 process 9829 0x080485df in a (p=0x0) at threadcore.c:34 > (gdb) thread 3 > [Switching to thread 3 (process 10993)]#0 0x08053840 in ?? () > (gdb) bt > #0 0x08053840 in ?? () > Cannot access memory at address 0x2b > (gdb) > #0 0x08053840 in ?? () > Cannot access memory at address 0x2b > (gdb) thread 1 > [Switching to thread 1 (process 9829)]#0 0x080485df in a (p=0x0) >at threadcore.c:34 > 34 in threadcore.c > (gdb) bt > #0 0x080485df in a (p=0x0) at threadcore.c:34 > #1 0x080485bc in main () at threadcore.c:21 > (gdb) thread 2 > [Switching to thread 2 (process 1267)]#0 0xbf5ff9d0 in ?? () > (gdb) bt > #0 0xbf5ff9d0 in ?? () > Cannot access memory at address 0x2b > (gdb) q > sh-2.05b# > > > The problem is with the same glibc and gdb , Redhat 9 linux 2.4.20-8 > does give me complete information of all the threads in the "info > threads" command. > Having read similar problems on various mailing lists , i believe the > only difference is redhat 9 has patched its kernel with NTPL or > debugging support for linux in the kernel. > > Wanted to confirm if it this correct . I really have no idea about this problem. > My question > -- > > Someone would say move to 2.6 kernel and a different glibc,But with > custom applications at stake .I can't take that risk as yet .So i > would want an NTPL patch for 2.4.28 kernel > Where do i get it ? Please do respond . Last time I saw an NPTL patch, it was for something like 2.4.21 patched with O(1) scheduler. I bet you'll have a hard time merging those together in 2.4.28. It is also possible that core dumps don't look the same. I have in mind some old changes about thread core dumps, but that's too far away to say anything reliable on the subject. Check that your core files are of the same sizes between NPTL and no-NPTL kernels. Alternatively, you could try RHEL3's kernel (2.4.21) which has all those things and which is still supported. Regards, Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New thread RDSL, post-2.6.20 kernels and amanda (tar) miss-fires
On Tue, Mar 13, 2007 at 11:31:53PM -0400, Gene Heskett wrote: >> Now, can someone suggest a patch I can revert that might fix this? The >> total number of patches between 2.6.20 and 2.6.21-rc1 will have me >> building kernels to bisect this till the middle of June at this rate. On Tue, Mar 13, 2007 at 10:07:21PM -0700, William Lee Irwin III wrote: > 4 billion patches could be bisected in 34 boots. Between 2.6.20 and > 2.6.21-rc1 there are only: > $ git rev-list --no-merges v2.6.20..v2.6.21-rc1 |wc -l > 3118 > patches, requiring 14 boots. In general ceil(log(n)/log(2))+2 boots. > Of course, this is a little optimistic because it assumes no additional > breakage occurring at the various bisection points. In any event, > assuming (pessimistically) 10 minutes per build, this is 280 minutes or > 4 hours and 40 minutes of build time. I estimate the process should > complete well before Friday of this week, never mind June. 33 boots for 4 billion, 13 boots for 3118, ceil(log(n)/log(2))+1 boots in general, 10 minutes/build gives 130 minutes or 2 hours, 10 minutes for 13 boots. I have no plausible explanation for these errors, and don't care to be told of any, either. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.20.3
From: Bill Irwin <[EMAIL PROTECTED]> Date: Tue, 13 Mar 2007 22:40:18 -0700 > I'm still trying to get on this. See a response I just gave in this thread, I gave some tips that might help track down what's going wrong here. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.20.3
On Tue, Mar 13, 2007 at 05:29:17PM -0700, Nish Aravamudan wrote: > Ok, truly bizarre, I found that I was not running stock 2.6.20.3, but > had your small hugetlb patch on top. > So I went back and patched 2.6.20.1 with your patch, rebooted, got a > soft lockup. Went back to stock 2.6.20.1 and did not. > I don't see how your patch (C&P below for reference) could make any > difference...Especially because no hugepages were in use at the time. > On patched 2.6.20.1, I was just trying to check if my source tree had > your patch applied (by `patch -p1 < davem.patch`) and got the > soft-lockup I saw in 2.6.20.3 with the patch applied. I am going to > try a clean 2.6.20.3 as well, now. I'm still trying to get on this. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG] reiser4: page lock recursion in reiser4_write_extent
This little code snippet seems to have a page_lock recursion, in addition to overall looking particularly fragile to me. It seems to be handling the case where a page needs to be brought uptodate because a partial page write is being done. The page gets locked as many as 3 times, each checking PageUptodate, however the two failure cases here go BUG() instead of returning an error. I'm starting to think that somehow the whole suspect branch just never gets taken, because otherwise I would expect to see bug reports related to -EIO, -ENOMEM, etc causing this to barf. either way, it seems there's a lock recursion if another thread races to bring @page uptodate while we're waiting on the first lock_page() call. --- page = jnode_page(jnodes[i]); if (page_offset(page) < inode->i_size && !PageUptodate(page) && to_page != PAGE_CACHE_SIZE) { /* * the above is not optimal for partial write to last * page of file when file size is not at boundary of * page */ takes the lock lock_page(page); raced with readpage? if (!PageUptodate(page)) { readpage drops lock result = readpage_unix_file(NULL, page); BUG_ON(result != 0); -ENOMEM? /* wait for read completion */ lock_page(page); BUG_ON(!PageUptodate(page)); -EIO? unlock_page(page); } else still have the lock here result = 0; } BUG_ON(get_current_context()->trans->atom != NULL); fault_in_pages_readable(buf, to_page); BUG_ON(get_current_context()->trans->atom != NULL); BOOM!!! lock_page(page); if (!PageUptodate(page) && to_page != PAGE_CACHE_SIZE) { --- NATE - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.20.3
From: "Nish Aravamudan" <[EMAIL PROTECTED]> Date: Tue, 13 Mar 2007 17:29:17 -0700 > Ok, truly bizarre, I found that I was not running stock 2.6.20.3, but > had your small hugetlb patch on top. > > So I went back and patched 2.6.20.1 with your patch, rebooted, got a > soft lockup. Went back to stock 2.6.20.1 and did not. > > I don't see how your patch (C&P below for reference) could make any > difference...Especially because no hugepages were in use at the time. > On patched 2.6.20.1, I was just trying to check if my source tree had > your patch applied (by `patch -p1 < davem.patch`) and got the > soft-lockup I saw in 2.6.20.3 with the patch applied. I am going to > try a clean 2.6.20.3 as well, now. We've seen cases in the past where something benign like this triggers a bug because it moves the data/bss/etc. sections around. For example, if the used parts of the kernel image end up extending into another page, this can influence the bootup memory detection logic. The softlockup in your first trace shows it spinning in the journaling code. Perhaps what is happening is that it is looping endlessly over some data structure which is in a corrupted state. This could happen if the bootup code erroneously frees up pages it should not have. I would recommend figuring out exactly what the journaling code is stuck on, then try to trace the life of that page of memory from early bootup until it is allocated. I hope this can help you figure out this bug as I can't reproduce it here at all, and I did merge the hugetlb fix into Linus's tree already (and that is the right thing to do as we certainly have some unrelated bug here). On the flip side, I bet removing some kernel config option might make the heisenbug go away if you're just eager to test the hugetlb patch :-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression between 2.6.20 and 2.6.21-rc1: NCQ problem with ahci and Hitachi drive
> Yes It works with acpi=off (2.6.21-rc1): > Please notice that IRQ is changed from 19 with ACPI to 11 without. Please verify the problem still exists in the latest 2.6.21 git. If yes, please file a bug here: http://bugzilla.kernel.org/enter_bug.cgi?product=ACPI For 2.6.20.stable, please attach the complete output from dmesg -s64000 output from acpidump output from lspci -vv and paste a copy of /proc/interrupts For 2.6.21.broken, please attach as much of the dmesg as you can capture, and the /proc/interrupts if you can get that far. thanks, -Len - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 16/18] kconfig for oprofile
Merge the oprofile configs. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]> Cc: Rusty Russell <[EMAIL PROTECTED]> Cc: Chris Wright <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]> diff --git a/arch/x86_64/oprofile/Kconfig b/arch/x86_64/oprofile/Kconfig deleted file mode 100644 index d8a8408..000 --- a/arch/x86_64/oprofile/Kconfig +++ /dev/null @@ -1,17 +0,0 @@ -config PROFILING - bool "Profiling support (EXPERIMENTAL)" - help - Say Y here to enable the extended profiling support mechanisms used - by profilers such as OProfile. - - -config OPROFILE - tristate "OProfile system profiling (EXPERIMENTAL)" - depends on PROFILING - help - OProfile is a profiling system capable of profiling the - whole system, include the kernel, kernel modules, libraries, - and applications. - - If unsure, say N. - -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 09/18] create x86/kernel/cpu/mcheck/Makefile
Create the Makefile in the common hold and adjust the i386 and x86_64 code accordingly. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]> Cc: Rusty Russell <[EMAIL PROTECTED]> Cc: Chris Wright <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]> diff --git a/arch/x86/kernel/cpu/mcheck/Makefile b/arch/x86/kernel/cpu/mcheck/Makefile new file mode 100644 index 000..6e7cb4c --- /dev/null +++ b/arch/x86/kernel/cpu/mcheck/Makefile @@ -0,0 +1 @@ +obj-y = therm_throt.o \ No newline at end of file diff --git a/arch/i386/kernel/cpu/mcheck/Makefile b/arch/i386/kernel/cpu/mcheck/Makefile index f1ebe1c..30808f3 100644 --- a/arch/i386/kernel/cpu/mcheck/Makefile +++ b/arch/i386/kernel/cpu/mcheck/Makefile @@ -1,2 +1,2 @@ -obj-y = mce.o k7.o p4.o p5.o p6.o winchip.o therm_throt.o +obj-y = mce.o k7.o p4.o p5.o p6.o winchip.o obj-$(CONFIG_X86_MCE_NONFATAL) += non-fatal.o -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 15/18] create x86/mm/Makefile
Create the Makefile in the common hold and adjust the i386 and x86_64 code accordingly. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]> Cc: Rusty Russell <[EMAIL PROTECTED]> Cc: Chris Wright <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]> diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile new file mode 100644 index 000..1b6e922 --- /dev/null +++ b/arch/x86/mm/Makefile @@ -0,0 +1 @@ +obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o diff --git a/arch/i386/mm/Makefile b/arch/i386/mm/Makefile index 80908b5..0cb01e6 100644 --- a/arch/i386/mm/Makefile +++ b/arch/i386/mm/Makefile @@ -5,6 +5,5 @@ obj-y := init.o pgtable.o fault.o ioremap.o extable.o pageattr.o mmap.o obj-$(CONFIG_NUMA) += discontig.o -obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o obj-$(CONFIG_HIGHMEM) += highmem.o obj-$(CONFIG_BOOT_IOREMAP) += boot_ioremap.o diff --git a/arch/x86_64/mm/Makefile b/arch/x86_64/mm/Makefile index d25ac86..4beaed8 100644 --- a/arch/x86_64/mm/Makefile +++ b/arch/x86_64/mm/Makefile @@ -3,9 +3,6 @@ # obj-y := init.o fault.o ioremap.o extable.o pageattr.o mmap.o -obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o obj-$(CONFIG_NUMA) += numa.o obj-$(CONFIG_K8_NUMA) += k8topology.o obj-$(CONFIG_ACPI_NUMA) += srat.o - -hugetlbpage-y = ../../i386/mm/hugetlbpage.o -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 14/18] rm include pointer to i386 msr-on-cpu.c file
Remove the C file with just the include that points to the i386 msr-on-cpu.c file. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]> Cc: Rusty Russell <[EMAIL PROTECTED]> Cc: Chris Wright <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]> diff --git a/arch/x86_64/lib/msr-on-cpu.c b/arch/x86_64/lib/msr-on-cpu.c deleted file mode 100644 index 47e0ec4..000 --- a/arch/x86_64/lib/msr-on-cpu.c +++ /dev/null @@ -1 +0,0 @@ -#include "../../i386/lib/msr-on-cpu.c" -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 03/18] acpi Makefile updates
Create the arch/x86/acpi/Makefile, and remove the associate stuff from the i386 and x86_64. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]> Cc: Rusty Russell <[EMAIL PROTECTED]> Cc: Chris Wright <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]> diff --git a/arch/x86/kernel/acpi/Makefile b/arch/x86/kernel/acpi/Makefile new file mode 100644 index 000..f4aa6dc --- /dev/null +++ b/arch/x86/kernel/acpi/Makefile @@ -0,0 +1,5 @@ +obj-$(CONFIG_ACPI) += boot.o + +ifneq ($(CONFIG_ACPI_PROCESSOR),) +obj-y += processor.o cstate.o +endif diff --git a/arch/x86_64/kernel/acpi/Makefile b/arch/x86_64/kernel/acpi/Makefile index 080b996..eb4bc11 100644 --- a/arch/x86_64/kernel/acpi/Makefile +++ b/arch/x86_64/kernel/acpi/Makefile @@ -1,9 +1,2 @@ -obj-y := boot.o -boot-y := ../../../i386/kernel/acpi/boot.o obj-$(CONFIG_ACPI_SLEEP) += sleep.o wakeup.o -ifneq ($(CONFIG_ACPI_PROCESSOR),) -obj-y += processor.o -processor-y:= ../../../i386/kernel/acpi/processor.o ../../../i386/kernel/acpi/cstate.o -endif - -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 07/18] mv kernel/cpu/cpufreq/speedstep-lib.c
Move kernel/cpu/cpufreq/speedstep-lib.c to the common hold. Also has the slight change to reference speedstep-lib.h that is being moved to include/asm-i386. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]> Cc: Rusty Russell <[EMAIL PROTECTED]> Cc: Chris Wright <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]> diff --git a/arch/i386/kernel/cpu/cpufreq/speedstep-lib.c b/arch/x86/kernel/cpu/cpufreq/speedstep-lib.c similarity index 100% rename from arch/i386/kernel/cpu/cpufreq/speedstep-lib.c rename to arch/x86/kernel/cpu/cpufreq/speedstep-lib.c index d59277c..ff4482b 100644 --- a/arch/i386/kernel/cpu/cpufreq/speedstep-lib.c +++ b/arch/x86/kernel/cpu/cpufreq/speedstep-lib.c @@ -17,7 +17,7 @@ #include #include -#include "speedstep-lib.h" +#include #define dprintk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_DRIVER, "speedstep-lib", msg) -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 05/18] mv kernel/cpu/cpufreq/p4-clockmod.c
Move kernel/cpu/cpufreq/p4-clockmod.c to the common hold. Also has the slight change to reference speedstep-lib.h that is being moved to include/asm-i386. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]> Cc: Rusty Russell <[EMAIL PROTECTED]> Cc: Chris Wright <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]> diff --git a/arch/i386/kernel/cpu/cpufreq/p4-clockmod.c b/arch/x86/kernel/cpu/cpufreq/p4-clockmod.c similarity index 100% rename from arch/i386/kernel/cpu/cpufreq/p4-clockmod.c rename to arch/x86/kernel/cpu/cpufreq/p4-clockmod.c index 4786fed..ac5d5a1 100644 --- a/arch/i386/kernel/cpu/cpufreq/p4-clockmod.c +++ b/arch/x86/kernel/cpu/cpufreq/p4-clockmod.c @@ -33,7 +33,7 @@ #include #include -#include "speedstep-lib.h" +#include #define PFX"p4-clockmod: " #define dprintk(msg...) cpufreq_debug_printk(CPUFREQ_DEBUG_DRIVER, "p4-clockmod", msg) -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 08/18] create x86/kernel/cpu/Makefile
Create the Makefile in the common hold and adjust the i386 and x86_64 code accordingly. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]> Cc: Rusty Russell <[EMAIL PROTECTED]> Cc: Chris Wright <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]> diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile new file mode 100644 index 000..bf4ae59 --- /dev/null +++ b/arch/x86/kernel/cpu/Makefile @@ -0,0 +1,5 @@ + +obj-$(CONFIG_MTRR) += mtrr/ +obj-y += intel_cacheinfo.o +obj-$(CONFIG_X86_MCE) += mcheck/ +obj-$(CONFIG_CPU_FREQ) += cpufreq/ diff --git a/arch/i386/kernel/cpu/Makefile b/arch/i386/kernel/cpu/Makefile index 010aecf..e484d74 100644 --- a/arch/i386/kernel/cpu/Makefile +++ b/arch/i386/kernel/cpu/Makefile @@ -8,12 +8,11 @@ obj-y += amd.o obj-y += cyrix.o obj-y += centaur.o obj-y += transmeta.o -obj-y += intel.o intel_cacheinfo.o +obj-y += intel.o obj-y += rise.o obj-y += nexgen.o obj-y += umc.o obj-$(CONFIG_X86_MCE) += mcheck/ -obj-$(CONFIG_MTRR) += mtrr/ obj-$(CONFIG_CPU_FREQ) += cpufreq/ -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 18/18] Straight file moves
Here's a list of files that were moved from either i386 or x86_64 over to the arch/x86 directory. Since I now used the git-diff -M option (thanks Linus!), and to spare LKML with a lot of patches, I put all the renames that were unmodified (strictly renamed) into this file, with one exception. I put the moving of the speedstep-lib.h file in it's own file to allow for discussion on that, (ok Chris). Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]> Cc: Rusty Russell <[EMAIL PROTECTED]> Cc: Chris Wright <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]> diff --git a/arch/i386/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c similarity index 100% rename from arch/i386/kernel/acpi/boot.c rename to arch/x86/kernel/acpi/boot.c diff --git a/arch/i386/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c similarity index 100% rename from arch/i386/kernel/acpi/cstate.c rename to arch/x86/kernel/acpi/cstate.c diff --git a/arch/i386/kernel/acpi/processor.c b/arch/x86/kernel/acpi/processor.c similarity index 100% rename from arch/i386/kernel/acpi/processor.c rename to arch/x86/kernel/acpi/processor.c diff --git a/arch/i386/kernel/alternative.c b/arch/x86/kernel/alternative.c similarity index 100% rename from arch/i386/kernel/alternative.c rename to arch/x86/kernel/alternative.c diff --git a/arch/i386/kernel/bootflag.c b/arch/x86/kernel/bootflag.c similarity index 100% rename from arch/i386/kernel/bootflag.c rename to arch/x86/kernel/bootflag.c diff --git a/arch/i386/kernel/cpu/intel_cacheinfo.c b/arch/x86/kernel/cpu/intel_cacheinfo.c similarity index 100% rename from arch/i386/kernel/cpu/intel_cacheinfo.c rename to arch/x86/kernel/cpu/intel_cacheinfo.c diff --git a/arch/i386/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c similarity index 100% rename from arch/i386/kernel/cpu/mcheck/therm_throt.c rename to arch/x86/kernel/cpu/mcheck/therm_throt.c diff --git a/arch/i386/kernel/cpu/mtrr/Makefile b/arch/x86/kernel/cpu/mtrr/Makefile similarity index 100% rename from arch/i386/kernel/cpu/mtrr/Makefile rename to arch/x86/kernel/cpu/mtrr/Makefile diff --git a/arch/i386/kernel/cpu/mtrr/amd.c b/arch/x86/kernel/cpu/mtrr/amd.c similarity index 100% rename from arch/i386/kernel/cpu/mtrr/amd.c rename to arch/x86/kernel/cpu/mtrr/amd.c diff --git a/arch/i386/kernel/cpu/mtrr/centaur.c b/arch/x86/kernel/cpu/mtrr/centaur.c similarity index 100% rename from arch/i386/kernel/cpu/mtrr/centaur.c rename to arch/x86/kernel/cpu/mtrr/centaur.c diff --git a/arch/i386/kernel/cpu/mtrr/cyrix.c b/arch/x86/kernel/cpu/mtrr/cyrix.c similarity index 100% rename from arch/i386/kernel/cpu/mtrr/cyrix.c rename to arch/x86/kernel/cpu/mtrr/cyrix.c diff --git a/arch/i386/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c similarity index 100% rename from arch/i386/kernel/cpu/mtrr/generic.c rename to arch/x86/kernel/cpu/mtrr/generic.c diff --git a/arch/i386/kernel/cpu/mtrr/if.c b/arch/x86/kernel/cpu/mtrr/if.c similarity index 100% rename from arch/i386/kernel/cpu/mtrr/if.c rename to arch/x86/kernel/cpu/mtrr/if.c diff --git a/arch/i386/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c similarity index 100% rename from arch/i386/kernel/cpu/mtrr/main.c rename to arch/x86/kernel/cpu/mtrr/main.c diff --git a/arch/i386/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h similarity index 100% rename from arch/i386/kernel/cpu/mtrr/mtrr.h rename to arch/x86/kernel/cpu/mtrr/mtrr.h diff --git a/arch/i386/kernel/cpu/mtrr/state.c b/arch/x86/kernel/cpu/mtrr/state.c similarity index 100% rename from arch/i386/kernel/cpu/mtrr/state.c rename to arch/x86/kernel/cpu/mtrr/state.c diff --git a/arch/i386/kernel/cpuid.c b/arch/x86/kernel/cpuid.c similarity index 100% rename from arch/i386/kernel/cpuid.c rename to arch/x86/kernel/cpuid.c diff --git a/arch/x86_64/kernel/early_printk.c b/arch/x86/kernel/early_printk.c similarity index 100% rename from arch/x86_64/kernel/early_printk.c rename to arch/x86/kernel/early_printk.c diff --git a/arch/i386/kernel/i8237.c b/arch/x86/kernel/i8237.c similarity index 100% rename from arch/i386/kernel/i8237.c rename to arch/x86/kernel/i8237.c diff --git a/arch/x86_64/kernel/k8.c b/arch/x86/kernel/k8.c similarity index 100% rename from arch/x86_64/kernel/k8.c rename to arch/x86/kernel/k8.c diff --git a/arch/i386/kernel/microcode.c b/arch/x86/kernel/microcode.c similarity index 100% rename from arch/i386/kernel/microcode.c rename to arch/x86/kernel/microcode.c diff --git a/arch/i386/kernel/msr.c b/arch/x86/kernel/msr.c similarity index 100% rename from arch/i386/kernel/msr.c rename to arch/x86/kernel/msr.c diff --git a/arch/i386/kernel/pcspeaker.c b/arch/x86/kernel/pcspeaker.c similarity index 100% rename from arch/i386/kernel/pcspeaker.c rename to arch/x86/kernel/pcspeaker.c diff --git a/arch/i386/kernel/quirks.c b/arch/x86/kernel/quirks.c similarity index 100% rename from arch/i386/kernel/quir
[PATCH 01/18] toplevel Kconfig changes
Create a toplevel Kconfig for arch/x86 and update the i386 and x86_64 Kconfigs as well. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]> Cc: Rusty Russell <[EMAIL PROTECTED]> Cc: Chris Wright <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig new file mode 100644 index 000..18223ff --- /dev/null +++ b/arch/x86/Kconfig @@ -0,0 +1,4 @@ + + +source arch/x86/oprofile/Kconfig + diff --git a/arch/x86_64/Kconfig b/arch/x86_64/Kconfig index 56eb14c..3a2e117 100644 --- a/arch/x86_64/Kconfig +++ b/arch/x86_64/Kconfig @@ -738,7 +738,7 @@ source fs/Kconfig menu "Instrumentation Support" depends on EXPERIMENTAL -source "arch/x86_64/oprofile/Kconfig" +source "arch/x86/Kconfig" config KPROBES bool "Kprobes (EXPERIMENTAL)" -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 00/18] Make common x86 arch area for i386 and x86_64 - Take 2
[Hopefully fixed email client to make it to the list this time] [This series has changed by using git-diff -M] Recently I've been doing some work that will affect both the i386 and x86_64 architectures. So there will be common code for both, as well as code that will be unique for the specific arch. So I was looking into a way to do this cleanly, and found that there is no clean way to share code between x86_64 and i386. What we have currently is a bunch of hacks. Seems that people can't make up their mind to what to do. We have hack 1. === Reference code from i386 in the x86_64 Makefiles. Examples: therm_throt-y += ../../i386/kernel/cpu/mcheck/therm_throt.o bootflag-y+= ../../i386/kernel/bootflag.o [tabs screwed up, because the above can't be consistant on that either] We have hack 2. === Reference code from x86_64 in the i386 Makefiles. Examples: k8-y += ../../x86_64/kernel/k8.o stacktrace-y+= ../../x86_64/kernel/stacktrace.o [again the tabs too are messed up] [--ok I'm sure I mess up the tabs too in my code--] Now my favorite hacks! We have hack 3. === Make a sole file with just an include pointer to the i386 code. [EMAIL PROTECTED]:~/work/git/linus.git$ cat arch/x86_64/lib/msr-on-cpu.c #include "../../i386/lib/msr-on-cpu.c" [EMAIL PROTECTED]:~/work/git/linus.git$ We have hack 4. === Make a sole file with just an include pointer to the x86_64 code. [EMAIL PROTECTED]:~/work/git/linus.git$ cat arch/i386/kernel/early_printk.c #include "../../x86_64/kernel/early_printk.c" [EMAIL PROTECTED]:~/work/git/linus.git$ So I spent last night hacking up something to try to make a common ground for all code that is shared between x86_64 and i386. I called this arch/x86 Seems appropriate, but I really don't care what it's called. One thing about this name, is that typing arch/x86 doesn't tab complete x86_64 anymore. But if you can think of something better, I'd be happy to apply it. So the following set of patches moves common code into the arch/x86 area and updates the i386 and x86_64 files accordingly. I separated the patches into files that hold just Makefile changes, Kconfig changes, and the actual moves of files. The moves are now represted in its own patch, with one big rename patch, using the git-diff -M format. So the moves are simply renames, with the slight exception of files that hold the speedstep-lib.h file. This file was moved from the arch/i386/kerne/cpu/cpufreq directory and put into the include/asm-i386 directory. This was due to the fact that some of the moved files included it, and some files that were not moved also included it. Instead of using the #include "../../x86/" hack again, I just simply moved it to the global i386 include directory. Only the arch/x86 will use the include/asm-i386 change. But to make this change the move patches of the files that contain this change also contain the changes to reference the change to locate this file. With this change of having a single repo that holds both the x86_64 files as well as the i386 code, it becomes obvious of what files are being shared. This way we don't have to worry about someone changing a file in either x86_64 or i386 and having it break the other arch, because they didn't realize it was being shared. Note: I left out all the shared pci code. It seems that this code is placed special in the Makefiles for linking order or what not, and I don't want to spend the time sorting that out without knowing if these changes are acceptible or not. -- Steve PS. Sorry for the spam. I need to figure out how to tame quilt mail! -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 11/18] rm include pointer to x86_64 early_printk.c
Remove the C file with just the include that points to the x86_64 early_printk.c file. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]> Cc: Rusty Russell <[EMAIL PROTECTED]> Cc: Chris Wright <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]> diff --git a/arch/i386/kernel/early_printk.c b/arch/i386/kernel/early_printk.c deleted file mode 100644 index 92f812b..000 --- a/arch/i386/kernel/early_printk.c +++ /dev/null @@ -1,2 +0,0 @@ - -#include "../../x86_64/kernel/early_printk.c" -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 06/18] mv kernel/cpu/cpufreq/speedstep-lib.h
OK, this one is a little different. Move arch/i386/kernel/cpu/cpufreq/speedstep-lib.h to include/asm-i386.h This file is used by files in arch/i386/kernel/cpu/cpufreq that are not moved. So we move it into a more global area, to keep the includes from going a bit crazy. Note, the moved files that include this file will have the change to locate it. So it's not just a straight copy. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]> Cc: Rusty Russell <[EMAIL PROTECTED]> Cc: Chris Wright <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]> diff --git a/arch/i386/kernel/cpu/cpufreq/speedstep-lib.h b/include/asm-i386/speedstep-lib.h similarity index 100% rename from arch/i386/kernel/cpu/cpufreq/speedstep-lib.h rename to include/asm-i386/speedstep-lib.h -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 10/18] make the kernel Makefile
Create the arch/x86/kernel/Makefile and change the i386 and x86_64 Makefiles accordingly. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]> Cc: Rusty Russell <[EMAIL PROTECTED]> Cc: Chris Wright <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile new file mode 100644 index 000..e2300ad --- /dev/null +++ b/arch/x86/kernel/Makefile @@ -0,0 +1,18 @@ + +obj-y += bootflag.o topology.o quirks.o i8237.o alternative.o \ + pcspeaker.o + +obj-$(CONFIG_STACKTRACE) += stacktrace.o + +obj-y += cpu/ +obj-$(CONFIG_X86_MSR) += msr.o +obj-$(CONFIG_MICROCODE)+= microcode.o +obj-$(CONFIG_X86_CPUID)+= cpuid.o +obj-$(CONFIG_ACPI) += acpi/ +obj-$(CONFIG_EARLY_PRINTK) += early_printk.o + +ifeq ($(CONFIG_X86_VOYAGER), ) +obj-$(CONFIG_SMP) += tsc_sync.o +endif + +obj-$(CONFIG_K8_NB)+= k8.o diff --git a/arch/i386/kernel/Makefile b/arch/i386/kernel/Makefile index 4ae3dcf..bea2137 100644 --- a/arch/i386/kernel/Makefile +++ b/arch/i386/kernel/Makefile @@ -6,19 +6,15 @@ extra-y := head.o init_task.o vmlinux.lds obj-y := process.o signal.o entry.o traps.o irq.o \ ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \ - pci-dma.o i386_ksyms.o i387.o bootflag.o e820.o\ - quirks.o i8237.o topology.o alternative.o i8253.o tsc.o + pci-dma.o i386_ksyms.o i387.o e820.o\ + i8253.o tsc.o -obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-y += cpu/ obj-y += acpi/ obj-$(CONFIG_X86_BIOS_REBOOT) += reboot.o obj-$(CONFIG_MCA) += mca.o -obj-$(CONFIG_X86_MSR) += msr.o -obj-$(CONFIG_X86_CPUID)+= cpuid.o -obj-$(CONFIG_MICROCODE)+= microcode.o obj-$(CONFIG_APM) += apm.o -obj-$(CONFIG_X86_SMP) += smp.o smpboot.o tsc_sync.o +obj-$(CONFIG_X86_SMP) += smp.o smpboot.o obj-$(CONFIG_X86_TRAMPOLINE) += trampoline.o obj-$(CONFIG_X86_MPPARSE) += mpparse.o obj-$(CONFIG_X86_LOCAL_APIC) += apic.o nmi.o @@ -35,13 +31,10 @@ obj-$(CONFIG_ACPI_SRAT) += srat.o obj-$(CONFIG_EFI) += efi.o efi_stub.o obj-$(CONFIG_DOUBLEFAULT) += doublefault.o obj-$(CONFIG_VM86) += vm86.o -obj-$(CONFIG_EARLY_PRINTK) += early_printk.o obj-$(CONFIG_HPET_TIMER) += hpet.o -obj-$(CONFIG_K8_NB)+= k8.o obj-$(CONFIG_VMI) += vmi.o vmitime.o obj-$(CONFIG_PARAVIRT) += paravirt.o -obj-y += pcspeaker.o EXTRA_AFLAGS := -traditional @@ -82,7 +75,3 @@ SYSCFLAGS_vsyscall-syms.o = -r $(obj)/vsyscall-syms.o: $(src)/vsyscall.lds \ $(obj)/vsyscall-sysenter.o $(obj)/vsyscall-note.o FORCE $(call if_changed,syscall) - -k8-y += ../../x86_64/kernel/k8.o -stacktrace-y += ../../x86_64/kernel/stacktrace.o - diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile index bb47e86..3f10fe0 100644 --- a/arch/x86_64/kernel/Makefile +++ b/arch/x86_64/kernel/Makefile @@ -7,19 +7,14 @@ EXTRA_AFLAGS := -traditional obj-y := process.o signal.o entry.o traps.o irq.o \ ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_x86_64.o \ x8664_ksyms.o i387.o syscall.o vsyscall.o \ - setup64.o bootflag.o e820.o reboot.o quirks.o i8237.o \ - pci-dma.o pci-nommu.o alternative.o hpet.o tsc.o + setup64.o e820.o reboot.o \ + pci-dma.o pci-nommu.o hpet.o tsc.o -obj-$(CONFIG_STACKTRACE) += stacktrace.o -obj-$(CONFIG_X86_MCE) += mce.o therm_throt.o +obj-$(CONFIG_X86_MCE) += mce.o obj-$(CONFIG_X86_MCE_INTEL)+= mce_intel.o obj-$(CONFIG_X86_MCE_AMD) += mce_amd.o -obj-$(CONFIG_MTRR) += ../../i386/kernel/cpu/mtrr/ obj-$(CONFIG_ACPI) += acpi/ -obj-$(CONFIG_X86_MSR) += msr.o -obj-$(CONFIG_MICROCODE)+= microcode.o -obj-$(CONFIG_X86_CPUID)+= cpuid.o -obj-$(CONFIG_SMP) += smp.o smpboot.o trampoline.o tsc_sync.o +obj-$(CONFIG_SMP) += smp.o smpboot.o trampoline.o obj-y += apic.o nmi.o obj-y += io_apic.o mpparse.o \ genapic.o genapic_cluster.o genapic_flat.o @@ -27,34 +22,15 @@ obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o crash.o obj-$(CONFIG_CRASH_DUMP) += crash_dump.o obj-$(CONFIG_PM) += suspend.o obj-$(CONFIG_SOFTWARE_SUSPEND) += suspend_asm.o -obj-$(CONFIG_CPU_FREQ) += cpufreq/ -obj-$(CONFIG_EARLY_PRINTK) += early_printk.o obj-$(CONFIG_IOMMU)+= pci-gart.o aperture.o
[PATCH 12/18] rm include pointer to x86_64 tsc_sync.c
Remove the C file with just the include that points to the x86_64 tsc_sync.c file. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]> Cc: Rusty Russell <[EMAIL PROTECTED]> Cc: Chris Wright <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]> diff --git a/arch/i386/kernel/tsc_sync.c b/arch/i386/kernel/tsc_sync.c deleted file mode 100644 index 1242462..000 --- a/arch/i386/kernel/tsc_sync.c +++ /dev/null @@ -1 +0,0 @@ -#include "../../x86_64/kernel/tsc_sync.c" -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 17/18] create x86/oprofile/Makefile
Create the Makefile in the common hold and adjust the i386 and x86_64 code accordingly. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]> Cc: Rusty Russell <[EMAIL PROTECTED]> Cc: Chris Wright <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]> diff --git a/arch/x86_64/oprofile/Makefile b/arch/x86_64/oprofile/Makefile deleted file mode 100644 index 6be3268..000 --- a/arch/x86_64/oprofile/Makefile +++ /dev/null @@ -1,19 +0,0 @@ -# -# oprofile for x86-64. -# Just reuse the one from i386. -# - -obj-$(CONFIG_OPROFILE) += oprofile.o - -DRIVER_OBJS = $(addprefix ../../../drivers/oprofile/, \ - oprof.o cpu_buffer.o buffer_sync.o \ - event_buffer.o oprofile_files.o \ - oprofilefs.o oprofile_stats.o \ - timer_int.o ) - -OPROFILE-y := init.o backtrace.o -OPROFILE-$(CONFIG_X86_LOCAL_APIC) += nmi_int.o op_model_athlon.o op_model_p4.o \ -op_model_ppro.o -OPROFILE-$(CONFIG_X86_IO_APIC)+= nmi_timer_int.o - -oprofile-y = $(DRIVER_OBJS) $(addprefix ../../i386/oprofile/, $(OPROFILE-y)) -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 13/18] create x86/lib/Makefile
Create the Makefile in the common hold and adjust the i386 and x86_64 code accordingly. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]> Cc: Rusty Russell <[EMAIL PROTECTED]> Cc: Chris Wright <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]> diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile new file mode 100644 index 000..d683d55 --- /dev/null +++ b/arch/x86/lib/Makefile @@ -0,0 +1 @@ +obj-$(CONFIG_SMP) += msr-on-cpu.o diff --git a/arch/i386/lib/Makefile b/arch/i386/lib/Makefile index 22d8ac5..304f754 100644 --- a/arch/i386/lib/Makefile +++ b/arch/i386/lib/Makefile @@ -8,4 +8,3 @@ lib-y = checksum.o delay.o usercopy.o getuser.o putuser.o memcpy.o strstr.o \ lib-$(CONFIG_X86_USE_3DNOW) += mmx.o -obj-$(CONFIG_SMP) += msr-on-cpu.o diff --git a/arch/x86_64/lib/Makefile b/arch/x86_64/lib/Makefile index c943271..8d5f835 100644 --- a/arch/x86_64/lib/Makefile +++ b/arch/x86_64/lib/Makefile @@ -5,7 +5,6 @@ CFLAGS_csum-partial.o := -funroll-loops obj-y := io.o iomap_copy.o -obj-$(CONFIG_SMP) += msr-on-cpu.o lib-y := csum-partial.o csum-copy.o csum-wrappers.o delay.o \ usercopy.o getuser.o putuser.o \ -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 04/18] make the cpu/cpufreq/Makefile
Create the arch/x86/kernel/cpu/cpufreq/Makefile and update the i386 and x86_64 accordingly. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]> Cc: Rusty Russell <[EMAIL PROTECTED]> Cc: Chris Wright <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]> diff --git a/arch/x86/kernel/cpu/cpufreq/Makefile b/arch/x86/kernel/cpu/cpufreq/Makefile new file mode 100644 index 000..51b32fe --- /dev/null +++ b/arch/x86/kernel/cpu/cpufreq/Makefile @@ -0,0 +1,6 @@ + +obj-$(CONFIG_X86_POWERNOW_K8) += powernow-k8.o +obj-$(CONFIG_X86_ACPI_CPUFREQ) += acpi-cpufreq.o +obj-$(CONFIG_X86_SPEEDSTEP_CENTRINO) += speedstep-centrino.o +obj-$(CONFIG_X86_P4_CLOCKMOD) += p4-clockmod.o +obj-$(CONFIG_X86_SPEEDSTEP_LIB) += speedstep-lib.o diff --git a/arch/x86_64/kernel/cpufreq/Makefile b/arch/x86_64/kernel/cpufreq/Makefile deleted file mode 100644 index 753ce1d..000 --- a/arch/x86_64/kernel/cpufreq/Makefile +++ /dev/null @@ -1,17 +0,0 @@ -# -# Reuse the i386 cpufreq drivers -# - -SRCDIR := ../../../i386/kernel/cpu/cpufreq - -obj-$(CONFIG_X86_POWERNOW_K8) += powernow-k8.o -obj-$(CONFIG_X86_ACPI_CPUFREQ) += acpi-cpufreq.o -obj-$(CONFIG_X86_SPEEDSTEP_CENTRINO) += speedstep-centrino.o -obj-$(CONFIG_X86_P4_CLOCKMOD) += p4-clockmod.o -obj-$(CONFIG_X86_SPEEDSTEP_LIB) += speedstep-lib.o - -powernow-k8-objs := ${SRCDIR}/powernow-k8.o -speedstep-centrino-objs := ${SRCDIR}/speedstep-centrino.o -acpi-cpufreq-objs := ${SRCDIR}/acpi-cpufreq.o -p4-clockmod-objs := ${SRCDIR}/p4-clockmod.o -speedstep-lib-objs := ${SRCDIR}/speedstep-lib.o diff --git a/arch/i386/kernel/cpu/cpufreq/Makefile b/arch/i386/kernel/cpu/cpufreq/Makefile index 560f776..49c4ca4 100644 --- a/arch/i386/kernel/cpu/cpufreq/Makefile +++ b/arch/i386/kernel/cpu/cpufreq/Makefile @@ -1,6 +1,6 @@ +# See also arch/x86/kernel/cpu/cpufreq/Makefile obj-$(CONFIG_X86_POWERNOW_K6) += powernow-k6.o obj-$(CONFIG_X86_POWERNOW_K7) += powernow-k7.o -obj-$(CONFIG_X86_POWERNOW_K8) += powernow-k8.o obj-$(CONFIG_X86_LONGHAUL) += longhaul.o obj-$(CONFIG_X86_E_POWERSAVER) += e_powersaver.o obj-$(CONFIG_ELAN_CPUFREQ) += elanfreq.o @@ -8,9 +8,5 @@ obj-$(CONFIG_SC520_CPUFREQ) += sc520_freq.o obj-$(CONFIG_X86_LONGRUN) += longrun.o obj-$(CONFIG_X86_GX_SUSPMOD) += gx-suspmod.o obj-$(CONFIG_X86_SPEEDSTEP_ICH)+= speedstep-ich.o -obj-$(CONFIG_X86_SPEEDSTEP_LIB)+= speedstep-lib.o obj-$(CONFIG_X86_SPEEDSTEP_SMI)+= speedstep-smi.o -obj-$(CONFIG_X86_ACPI_CPUFREQ) += acpi-cpufreq.o -obj-$(CONFIG_X86_SPEEDSTEP_CENTRINO) += speedstep-centrino.o -obj-$(CONFIG_X86_P4_CLOCKMOD) += p4-clockmod.o obj-$(CONFIG_X86_CPUFREQ_NFORCE2) += cpufreq-nforce2.o -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 02/18] x86 Makefile changes
Create the arch/x86/Makefile and modify the i386 and x86_64 Makefiles accordingly. Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]> Cc: Glauber de Oliveira Costa <[EMAIL PROTECTED]> Cc: Rusty Russell <[EMAIL PROTECTED]> Cc: Chris Wright <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: Jeremy Fitzhardinge <[EMAIL PROTECTED]> diff --git a/arch/x86/Makefile b/arch/x86/Makefile new file mode 100644 index 000..97407b2 --- /dev/null +++ b/arch/x86/Makefile @@ -0,0 +1,3 @@ + + +drivers-$(CONFIG_OPROFILE) += arch/x86/oprofile/ diff --git a/arch/i386/Makefile b/arch/i386/Makefile index bd28f9f..78e59ef 100644 --- a/arch/i386/Makefile +++ b/arch/i386/Makefile @@ -98,15 +98,17 @@ mflags-y += -Iinclude/asm-i386/mach-default head-y := arch/i386/kernel/head.o arch/i386/kernel/init_task.o -libs-y += arch/i386/lib/ +libs-y += arch/i386/lib/ \ + arch/x86/lib/ core-y += arch/i386/kernel/ \ + arch/x86/kernel/ \ arch/i386/mm/ \ + arch/x86/mm/ \ arch/i386/$(mcore-y)/ \ arch/i386/crypto/ drivers-$(CONFIG_MATH_EMULATION) += arch/i386/math-emu/ drivers-$(CONFIG_PCI) += arch/i386/pci/ # must be linked after kernel/ -drivers-$(CONFIG_OPROFILE) += arch/i386/oprofile/ drivers-$(CONFIG_PM) += arch/i386/power/ CFLAGS += $(mflags-y) diff --git a/arch/x86_64/Makefile b/arch/x86_64/Makefile index 2941a91..150942b 100644 --- a/arch/x86_64/Makefile +++ b/arch/x86_64/Makefile @@ -79,11 +79,15 @@ head-y := arch/x86_64/kernel/head.o arch/x86_64/kernel/head64.o arch/x86_64/kern libs-y += arch/x86_64/lib/ core-y += arch/x86_64/kernel/ \ + arch/x86/kernel/ \ arch/x86_64/mm/ \ - arch/x86_64/crypto/ + arch/x86/mm/ \ + arch/x86_64/crypto/ \ + arch/x86/lib/ core-$(CONFIG_IA32_EMULATION) += arch/x86_64/ia32/ drivers-$(CONFIG_PCI) += arch/x86_64/pci/ -drivers-$(CONFIG_OPROFILE) += arch/x86_64/oprofile/ + +include arch/x86/Makefile boot := arch/x86_64/boot -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New thread RDSL, post-2.6.20 kernels and amanda (tar) miss-fires
On Tue, Mar 13, 2007 at 11:31:53PM -0400, Gene Heskett wrote: > Now, can someone suggest a patch I can revert that might fix this? The > total number of patches between 2.6.20 and 2.6.21-rc1 will have me > building kernels to bisect this till the middle of June at this rate. 4 billion patches could be bisected in 34 boots. Between 2.6.20 and 2.6.21-rc1 there are only: $ git rev-list --no-merges v2.6.20..v2.6.21-rc1 |wc -l 3118 patches, requiring 14 boots. In general ceil(log(n)/log(2))+2 boots. Of course, this is a little optimistic because it assumes no additional breakage occurring at the various bisection points. In any event, assuming (pessimistically) 10 minutes per build, this is 280 minutes or 4 hours and 40 minutes of build time. I estimate the process should complete well before Friday of this week, never mind June. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Ooops with suspend to RAM
Hi all, With latest GIT tree I am getting the following oops when I try to suspend to RAM: BUG: unable to handle kernel NULL pointer dereference at virtual address 0094 printing eip: c0222af4 *pde = Oops: [#1] PREEMPT Modules linked in: i915 drm snd_pcm_oss snd_mixer_oss snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device usbhid eth1394 ipw2200 ieee80211 ieee80211_crypt snd_hda_intel snd_hda_codec snd_pcm snd_timer snd snd_page_alloc tifm_7xx1 tifm_core i2c_i801 i2c_core ehci_hcd uhci_hcd ohci1394 ieee1394 pcmcia usbcore yenta_socket rsrc_nonstatic pcmcia_core sony_laptop backlight CPU:0 EIP:0060:[]Not tainted VLI EFLAGS: 00010246 (2.6.21-rc3 #12) EIP is at class_device_remove_attrs+0xa/0x30 eax: f7cb5b18 ebx: ecx: f8bde010 edx: esi: edi: f7cb5b18 ebp: esp: d93e7e1c ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Process modprobe (pid: 12200, ti=d93e6000 task=e5770a50 task.ti=d93e6000) Stack: f7cb5b18 f7cb5b20 c0222bc3 f7cb5990 f7cb5b18 f7cb59c4 f8bcdc0f c0222bfb f7cb5990 f8bcdbf6 f8bd3275 04e2c100 000f 03c3 f8dcf05f f7e3e000 f8bcdc17 c0220567 f7e3e0a4 Call Trace: [] class_device_del+0xa9/0xd9 [] __nodemgr_remove_host_dev+0x0/0xb [ieee1394] [] class_device_unregister+0x8/0x10 [] nodemgr_remove_ne+0x61/0x7a [ieee1394] [] ether1394_mac_addr+0x0/0x12 [eth1394] [] __nodemgr_remove_host_dev+0x8/0xb [ieee1394] [] device_for_each_child+0x1a/0x3c [] nodemgr_remove_host+0x30/0x90 [ieee1394] [] __unregister_host+0x1a/0xac [ieee1394] [] flush_cpu_workqueue+0x98/0xb7 [] highlevel_remove_host+0x21/0x42 [ieee1394] [] hpsb_remove_host+0x37/0x58 [ieee1394] [] ohci1394_pci_remove+0x47/0x1ec [ohci1394] [] sysfs_hash_and_remove+0xfa/0x111 [] pci_device_remove+0x16/0x35 [] __device_release_driver+0x6e/0x8b [] driver_detach+0x99/0xda [] bus_remove_driver+0x57/0x75 [] driver_unregister+0x8/0x13 [] pci_unregister_driver+0xc/0x67 [] sys_delete_module+0x15c/0x19d [] remove_vma+0x31/0x36 [] do_munmap+0x19b/0x1b4 [] sysenter_past_esp+0x5f/0x85 [] packet_notifier+0xf3/0x157 === Code: ff c3 85 c0 74 08 83 c0 08 e9 83 6d f6 ff b8 ea ff ff ff c3 85 c0 74 08 83 c0 08 e9 4c 51 f6 ff c3 57 89 c7 56 53 8b 70 44 31 db <83> be 94 00 00 00 00 75 09 eb 17 89 f8 e8 d7 ff ff ff 89 da 83 EIP: [] class_device_remove_attrs+0xa/0x30 SS:ESP 0068:d93e7e1c Checking Google I see a similar oops was reported long ago: http://lkml.org/lkml/2006/11/16/147 . Any ideas/patches to test? Please CC me in your replies. Thanks. -- Happiness in intelligent people is the rarest thing I know. (Ernest Hemingway) Ismail Donmez ismail (at) pardus.org.tr GPG Fingerprint: 7ACD 5836 7827 5598 D721 DF0D 1A9D 257A 5B88 F54C Pardus Linux / KDE developer - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/8] x86 boot, pda and gdt cleanups
Rusty Russell wrote: > This is called "pissing in the corners". Don't do it: we don't need to > touch that code and I actually prefer the original anyway (explicit is > *good*). > > The habit of extracting cpu number once then using it is an optimization > which we should be aiming to get rid of (it simply hurts archs with > efficient per-cpu implementations). No, that was for a reason. I was worried about smp_processor_id() not returning valid values between init_gdt and cpu_set_gdt. It's not actually a problem, but relying on smp_processor_id() while we're moving the foundations its based on seems fragile. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Stolen and degraded time and schedulers
Dan Hecht wrote: > With your previous definition of work time, would it be that: > > monotonic_time == work_time + stolen_time ?? (By monotonic time, I presume you mean monotonic real time.) Yes, I suppose you could, but I don't think that's terribly useful. I think work_time is probably most naturally measured in cpu clock cycles rather than an actual time unit. You could convert it to ns, but I don't see the point. I know its a term in general use, but I don't think the term "stolen time" is all that useful, particularly when we're talking about a more general notion of cpu work contributing to the progress of process execution. In the cpufreq case, time isn't "stolen" per se. (I guess I don't like the term stolen time because you don't refer to time spent on other processes as being stolen from your process: its just processor time being distributed.) > i.e. would you be defining stolen_time to include the time lost to > processes due to the cpu running at a lower frequency? How does this > play into the other potential users, besides sched_clock(), of stolen > time? We should make sure that the abstraction introduced here makes > sense in those places too. Be specific. What other uses are there? > For example, the stuff that happens in update_process_times(). I > think we'd want to account the stolen time to cpustat->steal. I guess we could do something for that. Would we account non-full-speed cpus to it? Maybe? How is cpustat->steal used? How does it get out to usermode? > Also we'd probably want account for stolen time with regards to > task_running_tick(). (Though, in the latter case, maybe we first have > to move the scheduler away from assuming HZ rate decrementing of > p->time_slice to get this right. i.e. remove the tick based assumption > from the scheduler, and then maybe stolen time falls in more naturally > when accounting time slices). I think the important part is that sched_clock() be used to actually compute how much time each process gets. The fact that a time quantum gets stolen is less important. Or do you mean something else? > I guess taking your cpufreq as an example of work_time progressing > slower than monotonic_time (and assuming that the remaining time is > what you would call stolen), then e.g. top would report 50% of your > cpu stolen when you cpu is running at 1/2 max rate. Yes. In the same way that clock modulation gates the cpu clock, the hypervisor effectively gates the clock by giving time to other vcpus. > And p->time_slice would decrement at 1/2 the rate it normally did when > running at 1/2 speed. Is this the right thing to do? If so, then I > agree it makes sense to model hypervisor stolen time in terms of your > "work time". Yes, that's my thought. > But, if not, then maybe the amount of work you can get done during a > period of time that is not stolen and the stolen time itself are > really two different notions, and shouldn't be confused. I can see > arguments both ways. It seems to me like a nice opportunity to solve two problems with one mechanism. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm2 (oops in move_freepages)
FYI, I'm seeing the following oops with 2.6.21-rc3-mm1 (and -mm2) on the HP rx2600 and an Intel Tiger (both ia64 boxes). I haven't investigated this other than to determine that it does not occur with 2.6.21-rc3 or 2.6.20-rc3-mm1, and the instruction at move_freepages+0x10 is a load of the value pointed to by the third argument (end_page). Linux version 2.6.21-rc3-mm1 ([EMAIL PROTECTED]) (gcc version 4.0.3 (Debian 4.0.3-1)) #2 SMP Tue Mar 13 16:16:22 MST 2007 ... mptbase: Initiating ioc0 bringup ioc0: 53C1030: Capabilities={Initiator,Target} scsi0 : ioc0: LSI53C1030, FwRev=01032300h, Ports=1, MaxQ=255, IRQ=53 scsi 0:0:0:0: Direct-Access HP 36.4G ST336706LC HP04 PQ: 0 ANSI: 2 target0:0:0: Beginning Domain Validation target0:0:0: Ending Domain Validation target0:0:0: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 63) Unable to handle kernel paging request at virtual address a0007fffc758 swapper[1]: Oops 8813272891392 [1] Modules linked in: Pid: 1, CPU 1, comm: swapper psr : 1010085a2010 ifs : 840b ip : []Not tainted ip is at move_freepages+0x10/0x340 unat: pfs : 030a rsc : 0003 rnat: bsps: pr : a581 ldrs: ccv : fpsr: 0009804c8a74433f csd : ssd : b0 : a001000fea10 b6 : a001005ad980 b7 : a001bb20 f6 : 1003e f7 : 1003ed37a6f4de9bd37a7 f8 : 1003e f9 : 1003e000194a0cb8b f10 : 1003e70658ddf530a940d f11 : 1003e r1 : a00100dfe280 r2 : r3 : r8 : r9 : 4000 r10 : r11 : 0002 r12 : e0406004fc10 r13 : e04060048000 r14 : 0001 r15 : 0001c000 r16 : 0004 r17 : 0400 r18 : a0007fffc720 r19 : 4000 r20 : 4000 r21 : 2fff4000 r22 : d000 r23 : 5fff4000 r24 : 2fffa000 r25 : 6318 r26 : 0318c000 r27 : 00c5c000 r28 : 000c4000 r29 : 0008 r30 : 0003fc00 r31 : 000e Call Trace: [] show_stack+0x40/0xa0 sp=e0406004f7c0 bsp=e040600496f8 [] show_regs+0x880/0x8a0 sp=e0406004f990 bsp=e040600496a0 [] die+0x1c0/0x2c0 sp=e0406004f990 bsp=e04060049658 [] ia64_do_page_fault+0x820/0x9c0 sp=e0406004f9b0 bsp=e04060049608 [] ia64_leave_kernel+0x0/0x270 sp=e0406004fa40 bsp=e04060049608 [] move_freepages+0x10/0x340 sp=e0406004fc10 bsp=e040600495a8 [] move_freepages_block+0x110/0x140 sp=e0406004fc10 bsp=e04060049578 [] __rmqueue+0x4e0/0x7e0 sp=e0406004fc10 bsp=e04060049518 [] rmqueue_bulk+0x50/0x120 sp=e0406004fc10 bsp=e040600494d0 [] get_page_from_freelist+0x460/0xd40 sp=e0406004fc10 bsp=e04060049420 [] __alloc_pages+0xa0/0x580 sp=e0406004fc10 bsp=e040600493a8 [] kmem_getpages+0x150/0x3a0 sp=e0406004fc20 bsp=e04060049370 [] cache_grow+0x1e0/0x640 sp=e0406004fc30 bsp=e04060049308 [] cache_alloc_refill+0x490/0x580 sp=e0406004fc30 bsp=e040600492a0 [] kmem_cache_alloc+0x120/0x1e0 sp=e0406004fc30 bsp=e04060049270 [] sd_revalidate_disk+0x90/0x1c20 sp=e0406004fc30 bsp=e040600491f0 [] sd_probe+0x6c0/0x7c0 sp=e0406004fc70 bsp=e04060049198 [] driver_probe_device+0x230/0x360 sp=e0406004fc80 bsp=e04060049160 [] __device_attach+0x30/0x60 sp=e0406004fc80 bsp=e04060049138 [] bus_for_each_drv+0x80/0x120 sp=e0406004fc80 bsp=e04060049100 [] device_attach+0x190/0x200 sp=e0406004fca0 bsp=e040600490c8 [] bus_attach_device+0x80/0x160 sp=e0406004fca0 bsp=e04060049090 [] device_add+0x940/0xf60 sp=e0406004fca0 bsp=e04060049028 [] scsi_sysfs_add_sdev+0x60/0x520 sp=e0406004fca0 bsp=e04060048fd8 [] scsi_probe_and_add_lun+0x1000/0x1200 sp=e0406004fca0 bsp=e04060048f68 [] __scsi_scan_target+0x150/0xae0 sp=e0406004fcd0 bsp=e04060048f10 [] scsi_scan_channel+0x60/0xe0
Re: [RFC][PATCH 4/7] RSS accounting hooks over the code
Eric W. Biederman wrote: Nick Piggin <[EMAIL PROTECTED]> writes: Eric W. Biederman wrote: First touch page ownership does not guarantee give me anything useful for knowing if I can run my application or not. Because of page sharing my application might run inside the rss limit only because I got lucky and happened to share a lot of pages with another running application. If the next I run and it isn't running my application will fail. That is ridiculous. Let's be practical here, what you're asking is basically impossible. Unless by deterministic you mean that it never enters the a non trivial syscall, in which case, you just want to know about maximum RSS of the process, which we already account). Not per process I want this on a group of processes, and yes that is all I want just. I just want accounting of the maximum RSS of a group of processes and then the mechanism to limit that maximum rss. Well don't you just sum up the maximum for each process? Or do you want to only count shared pages inside a container once, or something difficult like that? I don't want sharing between vservers/VE/containers to affect how many pages I can have mapped into my processes at once. You seem to want total isolation. You could use virtualization? No. I don't want the meaning of my rss limit to be affected by what other processes are doing. We have constraints of how many resources the box actually has. But I don't want accounting so sloppy that processes outside my group of processes can artificially lower my rss value, which magically raises my rss limit. So what are you going to do about all the shared caches and slabs inside the kernel? It is basically handwaving anyway. The only approach I've seen with a sane (not perfect, but good) way of accounting memory use is this one. If you care to define "proper", then we could discuss that. I will agree that this patchset is probably in the right general ballpark. But the fact that pages are assigned exactly one owner is pure non-sense. We can do better. That is all I am asking for someone to at least attempt to actually account for the rss of a group of processes and get the numbers right when we have shared pages, between different groups of processes. We have the data structures to support this with rmap. Well rmap only supports mapped, userspace pages. Let me describe the situation where I think the accounting in the patchset goes totally wonky. Gcc as I recall maps the pages it is compiling with mmap. If in a single kernel tree I do: make -jN O=../compile1 & make -jN O=../compile2 & But set it up so that the two compiles are in different rss groups. If I run the concurrently they will use the same files at the same time and most likely because of the first touch rss limit rule even if I have a draconian rss limit the compiles will both be able to complete and finish. However if I run either of them alone if I use the most draconian rss limit I can that allows both compiles to finish I won't be able to compile a single kernel tree. Yeah it is not perfect. Fortunately, there is no perfect solution, so we don't have to be too upset about that. And strangely, this example does not go outside the parameters of what you asked for AFAIKS. In the worst case of one container getting _all_ the shared pages, they will still remain inside their maximum rss limit. So they might get penalised a bit on reclaim, but maximum rss limits will work fine, and you can (almost) guarantee X amount of memory for a given container, and it will _work_. But I also take back my comments about this being the only design I have seen that gets everything, because the node-per-container idea is a really good one on the surface. And it could mean even less impact on the core VM than this patch. That is also a first-touch scheme. However the messed up accounting that doesn't handle sharing between groups of processes properly really bugs me. Especially when we have the infrastructure to do it right. Does that make more sense? I think it is simplistic. Sure you could probably use some of the rmap stuff to account shared mapped _user_ pages once for each container that touches them. And this patchset isn't preventing that. But how do you account kernel allocations? How do you account unmapped pagecache? What's the big deal so many accounting people have with just RSS? I'm not a container person, this is an honest question. Because from my POV if you conveniently ignore everything else... you may as well just not do any accounting at all. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New thread RDSL, post-2.6.20 kernels and amanda (tar) miss-fires
On Tuesday 13 March 2007, Gene Heskett wrote: >On Tuesday 13 March 2007, Gene Heskett wrote: >>Greetings; >>Someone suggested a fresh thread for this. >> >>I now have my scripts more or less under control, and I can report that >>kernel-2.6.20.1 with no other patches does not exhibit the undesirable >>behaviour where tar thinks its all new, even when told to do a level 2 >> on a directory tree that hasn't been touched in months to update >> anything. >> >>Next up, 2.6.20.2, plain and with the latest RDSL-0.30 patch. > >And amanda/tar worked normally for 2.6.20.2 plain. > >Next up, 2.6.21-rc1 if it will build here. It built, it booted, and its busted big time. First, with an amdump running in the background, the machine is so close to unusable that I considered rebooting, but I needed the data to show the problem. I am losing the keyboard and mouse for a minute or more at a time but the keystrokes seem to be being registered so it eventually catches up. Disk i/o seems to be the killer according to gkrellm. But to give one an idea of the fits this is giving tar, I'll snip a line or 2 from an amstatus report here: coyote:/GenesAmandaHelper-0.6 1 planner: [dumps way too big, 138200 KB, must skip incremental dumps] Huh? 138.2GB? A 'du -h .' in that dir says 766megs. coyote:/root 1 4426m wait for dumping du -h says 5.0GB so that's ballpark, but its also a level 1, so maybe 20 megs is actually new since 15:57 this afternoon local. kmails final maildir is in that dir. This goes on for much of the amstatus report, very few of the reported sizes are close to sane. Now, can someone suggest a patch I can revert that might fix this? The total number of patches between 2.6.20 and 2.6.21-rc1 will have me building kernels to bisect this till the middle of June at this rate. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Is a tattoo real, like a curb or a battleship? Or are we suffering in Safeway? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
[EMAIL PROTECTED] wrote: On Mon, 12 Mar 2007 17:38:38 BST, Kasper Sandberg said: with latest xorg, xlib will be using xcb internally, Out of curiosity, when is this "latest" Xorg going to escape to distros, Already is .. Xorg 7.2+ libx11 build with xcb enabled.. and is it far enough along that beta testers can gather usable numbers? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)
> Yes, the code could be reworked by moving some of the data from the CPU > hw-breakpoint info into the thread's info. I'll see how much simpler it > ends up being. I don't quite understand that characterization of the kind of change I'm advocating. If the common case path in context switch has really anything at all more than the example I gave, something is wrong. > It isn't quite that easy. Even though the number of user breakpoints may > not have changed, their identities may have. So the unlikely case has to > encompass two possibilities: the number of installable user breakpoints > has changed, or any user breakpoints have been registered or unregistered. Why does it matter? When a new user breakpoint was made the highest-priority one, it ought to update tdr[0..3] right then before the registration call returns. It seems fine to me for it to make an uninstalled callback right away rather than at the thread's next switch-in. But even if you wanted to delay it, you could just set active_dr7 to zero or something so that the unlikely case triggers. > > For the masks to work as I described, you need to use the same enable bit > > (or both) for kernel and user allocations. It really doesn't matter which > > one you use, since all of Linux is "local" for the sense of the dr7 enable > > bits (i.e. you should just use DR_GLOBAL_ENABLE). > > This shouldn't be necessary. So long as DR_GLOBAL_ENABLE always belongs > to the kernel's part of DR7 and DR_LOCAL_ENABLE always belongs to the > thread's part there will be no interference between them. The plan I suggested relies on setting want_dr7 with the enable bits that do include the ones the kernel uses (for contested slots). Of course it works as well to use either bit for this, as long as you're consistent. But as I've said at least twice already, there is no actual meaning whatsoever to choosing one enable bit over the other. It's just confusing and misleading to have the code make special efforts to set one rather than the other for different cases. You talk about them as if they meant something, which keeps making me wonder if you're confused. Since the hardware doesn't care which bit you set, you could overload them to record a bit and a half of information there if really wanted to, but you're not even doing that, unless I'm confused. > Maybe. I always had in the back of my mind the possibility that there > might be a user I/O breakpoint set. It could be triggered by an interrupt > handler even in the SIGKILL case. But since we're not supporting I/O > breakpoints now, that's a moot point. How would that happen? This would mean that some user process has been allowed to enable ioperm for some io port that kernel drivers also send to from interrupt handlers. Can that ever happen? > Actually the code _doesn't_ already know what's there; the chbi area > doesn't include any storage for the kernel DR7 value. I figured it was at > least as easy to read it from the CPU register as to read it from memory. > But maybe that's not true; according to my ancient processor manual, moves > to/from debug registers take many more clock cycles than moves to/from > memory. The purpose of the chbi area is to optimize this path. Make it store whatever precomputed values are most convenient for the hot paths. This path doesn't need num_kbps, it needs kdr7. So precompute that and do that one load, instead of a load of chbi->num_bkps we don't otherwise need plus a load from kdr7_masks that can be avoided altogether on hot paths. I don't really know about the slowness of reading debug registers, though I would guess it is slower than most common operations. But regardless, you can avoid it because kdr7 is something you need anyway, so you're not replacing it with a load but letting a load you already had kill two birds. > No. If a debugger has removed some user breakpoints since the last time > the thread ran, the chbi->bps[] entries could still be present. Likewise > if the previously-running task had more breakpoints than the current one. I don't really get why user breakpoints would be in chbi->bps at all. When a debug trap hits, you can check kdr7 or whatnot to see if it was a kernel allocation, and otherwise look in current->thbi->bps to find it. > I don't like using DR_LEN_1, because it would force asm/debugreg.h to be > #included by any user of hw_breakpoint. The raw numerical value should do > just as well. Agreed. (I just used DR_LEN_1 as shorthand and was not hot on including asm/debugreg.h in asm/hw_breakpoint.h in the actual version.) > > On powerpc, the address breakpoint is always for an 8-byte address range. > > So there's no way to trap on accesses to a particular byte within a > string? There's no way to tell which of the 8 bytes were accessed, AFAIK. It's the same as LEN8 on x86_64 or LEN[42] on i386: some byte in there was accessed. > Better yet, if type is HW_BREAKPOINT_TYPE_EXECUTE then just ignore the > caller's
Re: [RFC] [Patch 1/1] IBAC Patch
On Thu, Mar 08, 2007 at 05:58:16PM -0500, Mimi Zohar wrote: > This is a request for comments for a new Integrity Based Access > Control(IBAC) LSM module which bases access control decisions > on the new integrity framework services. Thanks Mimi, nice to see an example of how the integrity framework ought to be used. > (Hopefully this will help clarify the interaction between an LSM > module and LIM module.) Is this module intended to clarify an interface, or be useful in and of itself? > Index: linux-2.6.21-rc3-mm2/security/ibac/Makefile > === > --- /dev/null > +++ linux-2.6.21-rc3-mm2/security/ibac/Makefile > @@ -0,0 +1,6 @@ > +# > +# Makefile for building IBAC > +# > + > +obj-$(CONFIG_SECURITY_IBAC) += ibac.o > +ibac-y := ibac_main.o > Index: linux-2.6.21-rc3-mm2/security/ibac/ibac_main.c > === > --- /dev/null > +++ linux-2.6.21-rc3-mm2/security/ibac/ibac_main.c > @@ -0,0 +1,126 @@ > +/* > + * Integrity Based Access Control (IBAC) > + * > + * Copyright (C) 2007 IBM Corporation > + * Author: Mimi Zohar <[EMAIL PROTECTED]> > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation, version 2 of the License. > + */ > + > +#include > +#include > +#include > +#include > +#include > + > +#ifdef CONFIG_SECURITY_IBAC_BOOTPARAM > +int ibac_enabled = CONFIG_SECURITY_IBAC_BOOTPARAM_VALUE; > + > +static int __init ibac_enabled_setup(char *str) > +{ > + ibac_enabled = simple_strtol(str, NULL, 0); > + return 1; > +} > + > +__setup("ibac=", ibac_enabled_setup); > +#else > +int ibac_enabled = 0; > +#endif If the command line option isn't enabled, how will ibac_enabled ever be set to '1'? Have I overlooked or forgotten some helper routine elsewhere? > +static unsigned int integrity_enforce = 0; > +static int __init integrity_enforce_setup(char *str) > +{ > + integrity_enforce = simple_strtol(str, NULL, 0); > + return 1; > +} > + > +__setup("ibac_enforce=", integrity_enforce_setup); > + > +#define XATTR_NAME "security.evm.hash" Is this name unique to this IBAC module? Or should it be kept in sync with the integrity framework? > +static inline int is_kernel_thread(struct task_struct *tsk) > +{ > + return (!tsk->mm) ? 1 : 0; > +} > + > +static int ibac_bprm_check_security(struct linux_binprm *bprm) > +{ > + struct dentry *dentry = bprm->file->f_dentry; > + int xattr_len; > + char *xattr_value = NULL; > + int rc, status; > + > + rc = integrity_verify_metadata(dentry, XATTR_NAME, > +&xattr_value, &xattr_len, &status); > + if (rc < 0 && rc == -EOPNOTSUPP) { > + kfree(xattr_value); > + return 0; > + } > + > + if (rc < 0) { > + printk(KERN_INFO "verify_metadata %s failed " > +"(rc: %d - status: %d)\n", bprm->filename, rc, status); > + if (!integrity_enforce) > + rc = 0; > + goto out; > + } > + if (status != INTEGRITY_PASS) { /* FAIL | NO_LABEL */ > + if (!is_kernel_thread(current)) { Please remind me why kernel threads are exempt? > + printk(KERN_INFO "verify_metadata %s " > +"(Integrity status: FAIL)\n", bprm->filename); Integrity status may be FAIL or NO_LABEL at this point -- would it be more useful to report the whole truth? > + if (integrity_enforce) { > + rc = -EACCES; > + goto out; > + } > + } > + } > + > + rc = integrity_verify_data(dentry, &status); > + if (rc < 0) { > + printk(KERN_INFO "%s verify_data failed " > +"(rc: %d - status: %d)\n", bprm->filename, rc, status); > + if (!integrity_enforce) > + rc = 0; > + goto out; > + } > + if (status != INTEGRITY_PASS) { > + if (!is_kernel_thread(current)) { Please remind me why kernel threads are exempt? > + printk(KERN_INFO "verify_data %s " > +"(Integrity status: FAIL)\n", bprm->filename); Same question about FAIL vs NO_LABEL.. (Would NO_LABEL be caught by a failing verify_metadata above?) > + if (integrity_enforce) { > + rc = -EACCES; > + goto out; > + } > + } > + } > + > + kfree(xattr_value); > + > + /* measure all integrity level executables */ > + integrity_measure(dentry, bprm->filename, MAY_EXEC); > + return 0; If integrity_measure() fails (can it fail?) is allowing the exec still the right approach? (I seem to recall that "measuring
Re: [PATCH 8/8] Convert PDA into the percpu section
On Tue, 2007-03-13 at 10:15 -0700, Jeremy Fitzhardinge wrote: > Rusty Russell wrote: > > + pack_descriptor((u32 *)&gdt[GDT_ENTRY_PERCPU].a, > > + (u32 *)&gdt[GDT_ENTRY_PERCPU].b, > > + __per_cpu_offset[cpu], 0xF, > > 0x80 | DESCTYPE_S | 0x2, 0); /* present read-write data > > segment */ > > > > Why testing with qemu is not enough. Indeed 8(. Thanks! Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On Mon, 12 Mar 2007 17:38:38 BST, Kasper Sandberg said: > with latest xorg, xlib will be using xcb internally, Out of curiosity, when is this "latest" Xorg going to escape to distros, and is it far enough along that beta testers can gather usable numbers? pgpt7KqlXv9Rp.pgp Description: PGP signature
Re: [PATCH 0/8] x86 boot, pda and gdt cleanups
On Tue, 2007-03-13 at 13:48 -0700, Jeremy Fitzhardinge wrote: > Rusty Russell wrote: > > Hi all, > > > > The GDT stuff on x86 is a little more complex than it need be, but > > playing with boot code is always dangerous. These compile and boot on > > UP and SMP for me, but Andrew should let the cook in -mm for a while. > > > Hi Rusty, > > This is my rough hacking patch I needed to get things into a Xen-shape > state. Looks good. Just one thing: > void __devinit native_smp_prepare_boot_cpu(void) > { > - cpu_set(smp_processor_id(), cpu_online_map); > - cpu_set(smp_processor_id(), cpu_callout_map); > - cpu_set(smp_processor_id(), cpu_present_map); > - cpu_set(smp_processor_id(), cpu_possible_map); > - per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE; > + int cpu = smp_processor_id(); > + > + cpu_set(cpu, cpu_online_map); > + cpu_set(cpu, cpu_callout_map); > + cpu_set(cpu, cpu_present_map); > + cpu_set(cpu, cpu_possible_map); > + per_cpu(cpu_state, cpu) = CPU_ONLINE; > > /* Set up %fs to point to our per-CPU area now it's allocated */ > - init_gdt(smp_processor_id(), &init_task); > - cpu_set_gdt(smp_processor_id()); > + init_gdt(cpu, &init_task); > + cpu_set_gdt(cpu); > } This is called "pissing in the corners". Don't do it: we don't need to touch that code and I actually prefer the original anyway (explicit is *good*). The habit of extracting cpu number once then using it is an optimization which we should be aiming to get rid of (it simply hurts archs with efficient per-cpu implementations). Cheers, Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: irda rmmod lockdep trace.
From: Samuel Ortiz <[EMAIL PROTECTED]> Date: Wed, 14 Mar 2007 02:50:03 +0200 > On Mon, Mar 12, 2007 at 04:49:21PM -0700, David Miller wrote: > > I would strongly caution against adding any run-time overhead just to > > cure a false lockdep warning. Even adding a new function argument > > is too much IMHO. > > > > Make the cost show up for lockdep only, perhaps by putting each > > hashbin lock into a seperate locking class? > Does that look better to you: Yes, it does.:) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: _proxy_pda still makes linking modules fail
On Tue, 2007-03-13 at 16:57 +0100, Andi Kleen wrote: > On Tue, Mar 13, 2007 at 05:23:52PM +1100, Rusty Russell wrote: > > In particular, it's been put in GCC 4.1 for > > CONFIG_CC_STACKPROTECTOR, which assumes %gs:40 will give the stack > > canary. > > Yes that was always ugly, but I don't know a better way. Well, "%gs:__gcc_stack_protector" would have been better. We could have defined __gcc_stack_protector as an absolute symbol (0x40) at the moment, and made it a real per-cpu var later. > > For the record: the PDA should never have existed, that's what percpu > > vars were supposed to be for. Something went wrong here 8( > > PDA predates per cpu. Indeed, but I should have converted it over back in 2003 (?) when the per-cpu stuff went in 8( > > The ideal solution has always been to use __thread, but no architecture > > has yet managed it (I tried for i386, and it quickly caused unbearable > > I tried it too, but __thread is hopeless for kernel code > > > pain). On x86-64 that uses "%fs" on x86-64, not "%gs" as the kernel > > does, but I might try that if I feel particularly masochistic soon... > > Then swapgs wouldn't work anymore (there is no swapfs) Good point. Thanks, Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Stolen and degraded time and schedulers
On Tue, 2007-03-13 at 14:59 -0700, Jeremy Fitzhardinge wrote: > Daniel Walker wrote: > > The frequency tracking you mention is done to some extent inside the > > timekeeping adjustment functions, but I'm not sure it's totally accurate > > for non-timekeeping, and it also tracks things like interrupt latency. > > Tracking frequency changes where it's important to get it right > > shouldn't be done I think .. > > > > If you want accurate time accounting, don't use the TSC . > > > > I'm not sure I follow you here. Clocksources have the means to adjust > the rate of time progression, mostly to warp the time for things like > ntp. The stability or otherwise of the tsc is irrelevant. The adjustments that I spoke of above are working regardless of ntp .. The stability of the TSC directly effects the clock mult adjustments in timekeeping, as does interrupt latency since the clock is essentially validated against the timer interrupt. > If you had a clocksource which was explicitly using the rate at which a > CPU does work as a timebase, then using the same warping mechanism would > allow you to model CPU speed changes. like I said there are other factors so that's not going to exactly model cpu speed changes. You could come up with another method, but that would likely require another known constant clock. > > The sched_clock interface is basically a stripped down clocksource.. > > I've implemented sched_clock as a clocksource in the past .. > > > > Yes, that works. But a clocksource is strictly about measuring the > progression of real time, and so doesn't generally measure how much work > a CPU has done. sched_clock doesn't measure amounts of cpu work either, it's all about timing. > >> We currently have a sched_clock interface in paravirt_ops to deal with > >> the hypervisor aspect. It only occurred to me this morning that cpufreq > >> presents exactly the same problem to the rest of the kernel, and so > >> there's room for a more general solution. > >> > > > > Are there other architecture which have this per-cpu clock frequency > > changing issue? I worked with several other architectures beyond just > > x86 and haven't seen this issue .. > > Well, lots of cpus have dynamic frequencies. Any scheduler which > maintains history will suffer the same problem, even on UP. If > processes A and B are supposed to have the same priority and they both > execute for 1ms of real time, did they make the same amount of > progress? Not if the cpu changed speed in between. That's true, but given a constant clock (like what sched_clock should have) then the accounting is similarly inaccurate. Any connection between the scheduler and the TSC frequency changes aren't part of the design AFAIK .. > And any system which commonly runs virtualized (s390, power, etc) will > need to deal with the notion of stolen time. I haven't followed the "stolen time" discussion, but just a brief look at your first email I'd say don't mess with the clocks .. The clocks should always reflect the time accurately .. That's the point of the clocks, and when the TSC, or any other clock, changes frequency it sucks.. I haven't thought it through completely, but you might be able to solve the issue by adding a value to each jiffie in the scheduler or altering the scheduler to extend the number of jiffies a task gets pending on the virtual speed of the cpu.. Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Suspend to RAM fault in VT when resuming
Pavel Machek wrote: > Hi1 > >> I've chased one of the 'Suspend to RAM' resume problems to a specific >> line in drivers/char/vt.c, see attached 2.6.21-rc3 diff with > > Has suspend/resume ever worked on that hardware? > >> TRACE_RESUME() instrumentation. The macro scr_writew resolves to '*addr >> = val', which appears to be causing the problem. I've verified that the >> pointer is not NULL, but don't know if its really valid. Its pretty >> tough to tell what is happening, but on a Dell XPS it just hangs. A Dell >> Precision blinks the keyboard lights. > > It is possible that video is not initialized at that point, and that > hardware goes seriously unhappy when you access non-existing vga. Does > it resume ok when you completely disable video support? > > > Pavel > > Resume works on the Dell XPS with the Ubuntu Edgy release Ubuntu-2.6.17-10.25 (2.6.17 plus a zillion fixes). Ubuntu's git tree is rsync://rsync.kernel.org/pub/scm/linux/kernel/git/bcollins/ubuntu-2.6.git. Ubuntu-2.6.17-8.21 is the first version where resume works, but there are a boatload of changes in ACPI and SW suspend between that and the previous tag Ubuntu-2.6.17-7.20. I don't know what made SW suspend work, but even knowing that won't tell me what broke it again. I've been avoiding the bisect process because it is quite time consuming on my slow machine, there is much branch weirdness, and I'm not that good with git. I thought if I narrowed the failure down to a small chunk of code in 2.6.21-rc3, then the answer might be obvious. No such luck, huh? This crash behaves like the video memory space has become unmapped. Is that possible? rtg -- Tim Gardner [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.20.3
many changes since 2.6.20.3-rc1? On 3/14/07, Greg KH <[EMAIL PROTECTED]> wrote: We (the -stable team) are announcing the release of the 2.6.20.3 kernel. It contains a number of bugfixes and all 2.6.20 users are recommended to upgrade. The diffstat and short summary of the fixes are below. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] avoid OPEN_MAX in SCM_MAX_FD
On Tue, 13 Mar 2007, Roland McGrath wrote: > > Ok, fine. But PATH_MAX is a real constant that has some meaning in the > kernel. It's perfectly correct to use PATH_MAX as a constant on a system > like Linux that defines it and means what it says. Conversely, OPEN_MAX > has no useful relationship with anything the kernel is doing at all. Sure. I'm just saying that some people may use OPEN_MAX the way I know people use PATH_MAX - whether it's what you're supposed to or not. I do agree that PATH_MAX is much more appropriate to be used that way, and is more likely to have "real" meaning, I just worry. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 0/4] Arch independent quicklists V2
On Tue, Mar 13, 2007 at 04:47:56AM -0800, Andrew Morton wrote: > I'm trying to remember why we ever would have needed to zero out the > pagetable pages if we're taking down the whole mm? Maybe it's > because "oh, the arch wants to put this page into a quicklist to > recycle it", which is all rather circular. > It would be interesting to look at a) leave the page full of random > garbage if we're releasing the whole mm and b) return it straight to > the page allocator. We never did need to modify ptes on exit() or other pagetable prunings (not that they were ever done outside exit() before 2.6.x). The only subtlety is that pruning on munmap() needs a TLB flush for the TLB itself to drop the references to the pages referred to by the PTE's on pruning in the presence of hardware pagetable walkers (in the exit() case there are no user execution contexts left to potentially utilize the dead translations so it's less important). That's handled by tlb_remove_page() and shouldn't need any updates across such a change. I believe the zeroing on teardown was largely a result of idiom vs. any particular need. Essentially using ptep_get_and_clear() to handle the non-pruning munmap() case in a manner unified with other pagetable teardowns. Also likely is 2.4.x legacy from when that and possibly earlier kernels maintained arch-private quicklists for pagetables. There are furthermore distinctions to make between fork() and execve(). fork() stomps over the entire process address space copying pagetables en masse. After execve() a process incrementally faults in PTE's one at a time. It should be clear that if case analyses are of interest at all, fork() will want cache-hot pages (cache-preloaded pages?) where such are largely wasted on incremental faults after execve(). The copy operations in fork() should probably also be examined in the context of shared pagetables at some point. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: _proxy_pda still makes linking modules fail
On Tue, 2007-03-13 at 08:31 -0700, Jeremy Fitzhardinge wrote: > Paul Mackerras wrote: > > There is a fundamental problem with using __thread, which is that gcc > > assumes that the addresses of __thread variables are constant within > > one thread, and that therefore it can cache the result of address > > calculations. However, with preempt, threads in the kernel can't rely > > on staying on one cpu, and therefore the addresses of per-cpu > > variables can change. There appears to be no way to tell gcc to drop > > all cached __thread variable address calculations at a given point > > (e.g. when enabling or disabling preemption). That is basically why I > > gave up on using __thread for per-cpu variables on powerpc. [ Thanks for the enlightenment, Paul ] > Doesn't that fall under the general class of "you have to be pinned to a > particular cpu in order to meaningfully use per-cpu variables"? No, it makes assumptions about the *address* of a per-cpu variable not changing, even across barriers. > In principle gcc could CSE the value of smp_processor_id() across a cpu > change in the same way. No, this is why preempt_enable and the like are memory barriers. Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench
On 3/13/07, Eric Dumazet <[EMAIL PROTECTED]> wrote: Nish Aravamudan a écrit : > On 3/12/07, Anton Blanchard <[EMAIL PROTECTED]> wrote: >> >> Hi Nick, >> >> > Anyway, I'll keep experimenting. If anyone from MySQL wants to help >> look >> > at this, send me a mail (eg. especially with the sched_setscheduler >> issue, >> > you might be able to do something better). >> >> I took a look at this today and figured Id document it: >> >> http://ozlabs.org/~anton/linux/sysbench/ >> >> Bottom line: it looks like issues in the glibc malloc library, replacing >> it with the google malloc library fixes the negative scaling: >> >> # apt-get install libgoogle-perftools0 >> # LD_PRELOAD=/usr/lib/libtcmalloc.so /usr/sbin/mysqld > > Quick datapoint, still collecting data and trying to verify it's > always the case: on my 8-way Xeon, I'm actually seeing *much* worse > performance with libtcmalloc.so compared to mainline. Am generating > graphs and such still, but maybe someone else with x86_64 hardware > could try the google PRELOAD and see if it helps/hurts (to rule out > tester stupidity)? I wish I had a 8-way test platform :) Anyway, could you post some oprofile results ? Hopefully soon -- want to still make sure I'm not doing something dumb. Am also hoping to get some of the gdb backtraces like Anton had. Thanks, Nish - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC/PATCH 00/59] Make common x86 arch area for i386 and x86_64
On Tue, 2007-03-13 at 14:45 -0700, Chris Wright wrote: > what about asm-x86/ dir? the asm/ symlink would still point to relevant > arch, but the file there could be simply #include ? Would it be acceptable to have an include/asm-x86/ dir with one file? Of course it will open the door to merge current code and share it there too. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC/PATCH 00/59] Make common x86 arch area for i386 and x86_64
On Tue, 2007-03-13 at 14:39 -0700, Linus Torvalds wrote: > > On Tue, 13 Mar 2007, Steven Rostedt wrote: > > > > What we have currently is a bunch of hacks. Seems that people can't make > > up their mind to what to do. > > I don't mind the patches, but I'd be a lot happier if it also was a stated > intention to actually make it be buildable as "x86", the same way that the > separate 32-bit and 64-bit POWER architectures were merged into just one > architecture that could be built either way. That's actually a larger goal, but for the immediate future, I figure this would be a good first step. Start out by stating what's similar, and then build off of this for something bigger. But in the mean time, we can have a staging ground for work that's for both i386 and x86_64 archs. And for those that know these systems in a more intimate way (Andi :) they can work off of this to make that monster. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench
Nish Aravamudan a écrit : On 3/12/07, Anton Blanchard <[EMAIL PROTECTED]> wrote: Hi Nick, > Anyway, I'll keep experimenting. If anyone from MySQL wants to help look > at this, send me a mail (eg. especially with the sched_setscheduler issue, > you might be able to do something better). I took a look at this today and figured Id document it: http://ozlabs.org/~anton/linux/sysbench/ Bottom line: it looks like issues in the glibc malloc library, replacing it with the google malloc library fixes the negative scaling: # apt-get install libgoogle-perftools0 # LD_PRELOAD=/usr/lib/libtcmalloc.so /usr/sbin/mysqld Quick datapoint, still collecting data and trying to verify it's always the case: on my 8-way Xeon, I'm actually seeing *much* worse performance with libtcmalloc.so compared to mainline. Am generating graphs and such still, but maybe someone else with x86_64 hardware could try the google PRELOAD and see if it helps/hurts (to rule out tester stupidity)? I wish I had a 8-way test platform :) Anyway, could you post some oprofile results ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] avoid OPEN_MAX in SCM_MAX_FD
> I'd actually prefer this as part of the "remove OPEN_MAX" patch. Ok. (But now you're going to argue with me about "remove OPEN_MAX", and you haven't said you have any problem with changing SCM_MAX_FD, so why make it wait?) > That said, it actually worries me that you should call "_SC_OPEN_MAX". [...] > For example, I know perfectly well that I should use _SC_PATH_MAX, but a > *lot* of code simply doesn't care. In git, I used PATH_MAX, and the reason [...] Ok, fine. But PATH_MAX is a real constant that has some meaning in the kernel. It's perfectly correct to use PATH_MAX as a constant on a system like Linux that defines it and means what it says. Conversely, OPEN_MAX has no useful relationship with anything the kernel is doing at all. > So, what's the likelihood that this will break some old programs? I > realize that modern distributions don't put the kernel headers in their > user-visible includes any more, but the breakage is most likely exactly > for old programs and older distributions. Well, I don't know for sure. It doesn't seem all that likely to me (not like PATH_MAX), as there has been getdtablesize() since before there was OPEN_MAX by that name (not to mention before there was Linux). If things use OPEN_MAX as a constant for arrays, they're already broken unless they call setrlimit to constrain themselves. Getting things fixed has to start somewhere. Thanks, Roland - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC/PATCH 06/59] mv kernel/acpi/processor.c
On Tue, 2007-03-13 at 14:32 -0700, Linus Torvalds wrote: > > On Tue, 13 Mar 2007, Steven Rostedt wrote: > > > > Move kernel/acpi/processor.c to the common hold. > > Please use > > git diff -M OK, thanks! I'm still quite a git-nubie. I'll update all the move patches. It may take a bit of hand work. What I really did to do this patch series was to make all my changes in git. But the changes where not smooth from change set to change set. So I did one big git-diff, and then used good old midnight commander (mc) to parse the patches. And then pulled them into quilt to comment and send them. > > for things like this. > > In fact, even if you weren't a git user, I'd ask you to *become* one just > because I think that it's a *lot* more productive if people actually see > renames as renames, and will see what - if anything - changed when > renaming. > > The "-M" flag isn't the default, simply because it generates patches that > cannot be applied with regular "patch", but for something like this, I > think it's practically imperative. The old kind of "remove file" + "add > file" patch just isn't acceptable when there are very viable alternaties. I wish I knew this before breaking it up. But I'm sure I can do another big patch and automate these updates :) -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: irda rmmod lockdep trace.
On Mon, Mar 12, 2007 at 04:49:21PM -0700, David Miller wrote: > I would strongly caution against adding any run-time overhead just to > cure a false lockdep warning. Even adding a new function argument > is too much IMHO. > > Make the cost show up for lockdep only, perhaps by putting each > hashbin lock into a seperate locking class? Does that look better to you: diff --git a/include/net/irda/irqueue.h b/include/net/irda/irqueue.h index 335b0ac..67cb434 100644 --- a/include/net/irda/irqueue.h +++ b/include/net/irda/irqueue.h @@ -71,6 +71,7 @@ typedef struct hashbin_t { inthb_size; spinlock_t hb_spinlock; /* HB_LOCK - Can be used by the user */ + struct lock_class_key hb_lock_key; irda_queue_t* hb_queue[HASHBIN_SIZE] IRDA_ALIGN; irda_queue_t* hb_current; diff --git a/net/irda/irqueue.c b/net/irda/irqueue.c index 9266233..c72ecee 100644 --- a/net/irda/irqueue.c +++ b/net/irda/irqueue.c @@ -370,6 +370,8 @@ hashbin_t *hashbin_new(int type) /* Make sure all spinlock's are unlocked */ if ( hashbin->hb_type & HB_LOCK ) { spin_lock_init(&hashbin->hb_spinlock); + lockdep_set_class(&hashbin->hb_spinlock, + &hashbin->hb_lock_key); } return hashbin; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Stolen and degraded time and schedulers
On 03/13/2007 02:59 PM, Jeremy Fitzhardinge wrote: Daniel Walker wrote: The frequency tracking you mention is done to some extent inside the timekeeping adjustment functions, but I'm not sure it's totally accurate for non-timekeeping, and it also tracks things like interrupt latency. Tracking frequency changes where it's important to get it right shouldn't be done I think .. If you want accurate time accounting, don't use the TSC . I'm not sure I follow you here. Clocksources have the means to adjust the rate of time progression, mostly to warp the time for things like ntp. The stability or otherwise of the tsc is irrelevant. If you had a clocksource which was explicitly using the rate at which a CPU does work as a timebase, then using the same warping mechanism would allow you to model CPU speed changes. The sched_clock interface is basically a stripped down clocksource.. I've implemented sched_clock as a clocksource in the past .. Yes, that works. But a clocksource is strictly about measuring the progression of real time, and so doesn't generally measure how much work a CPU has done. We currently have a sched_clock interface in paravirt_ops to deal with the hypervisor aspect. It only occurred to me this morning that cpufreq presents exactly the same problem to the rest of the kernel, and so there's room for a more general solution. Are there other architecture which have this per-cpu clock frequency changing issue? I worked with several other architectures beyond just x86 and haven't seen this issue .. Well, lots of cpus have dynamic frequencies. Any scheduler which maintains history will suffer the same problem, even on UP. If processes A and B are supposed to have the same priority and they both execute for 1ms of real time, did they make the same amount of progress? Not if the cpu changed speed in between. And any system which commonly runs virtualized (s390, power, etc) will need to deal with the notion of stolen time. With your previous definition of work time, would it be that: monotonic_time == work_time + stolen_time ?? i.e. would you be defining stolen_time to include the time lost to processes due to the cpu running at a lower frequency? How does this play into the other potential users, besides sched_clock(), of stolen time? We should make sure that the abstraction introduced here makes sense in those places too. For example, the stuff that happens in update_process_times(). I think we'd want to account the stolen time to cpustat->steal. Also we'd probably want account for stolen time with regards to task_running_tick(). (Though, in the latter case, maybe we first have to move the scheduler away from assuming HZ rate decrementing of p->time_slice to get this right. i.e. remove the tick based assumption from the scheduler, and then maybe stolen time falls in more naturally when accounting time slices). I guess taking your cpufreq as an example of work_time progressing slower than monotonic_time (and assuming that the remaining time is what you would call stolen), then e.g. top would report 50% of your cpu stolen when you cpu is running at 1/2 max rate. And p->time_slice would decrement at 1/2 the rate it normally did when running at 1/2 speed. Is this the right thing to do? If so, then I agree it makes sense to model hypervisor stolen time in terms of your "work time". But, if not, then maybe the amount of work you can get done during a period of time that is not stolen and the stolen time itself are really two different notions, and shouldn't be confused. I can see arguments both ways. Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 0/6] Arch independent quicklists V1
On Mon, Mar 12, 2007 at 03:51:57PM -0700, David Miller wrote: > Someone with some extreme patience could do the sparc 32-bit port too, > in fact it's lacking the cached PGD update logic that x86 et al. have > so it would even end up being a bug fix :-) This lack is why sparc32 > pre-initializes the vmalloc/module area PGDs with static page tables > at boot time, FWIW. I'll spare everyone the details and let code if/when it appears stand in for promises on the sparc32 front. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench
On 3/12/07, Anton Blanchard <[EMAIL PROTECTED]> wrote: Hi Nick, > Anyway, I'll keep experimenting. If anyone from MySQL wants to help look > at this, send me a mail (eg. especially with the sched_setscheduler issue, > you might be able to do something better). I took a look at this today and figured Id document it: http://ozlabs.org/~anton/linux/sysbench/ Bottom line: it looks like issues in the glibc malloc library, replacing it with the google malloc library fixes the negative scaling: # apt-get install libgoogle-perftools0 # LD_PRELOAD=/usr/lib/libtcmalloc.so /usr/sbin/mysqld Quick datapoint, still collecting data and trying to verify it's always the case: on my 8-way Xeon, I'm actually seeing *much* worse performance with libtcmalloc.so compared to mainline. Am generating graphs and such still, but maybe someone else with x86_64 hardware could try the google PRELOAD and see if it helps/hurts (to rule out tester stupidity)? Thanks, Nish - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.20.3
On 3/13/07, Nish Aravamudan <[EMAIL PROTECTED]> wrote: On 3/13/07, David Miller <[EMAIL PROTECTED]> wrote: > From: "Nish Aravamudan" <[EMAIL PROTECTED]> > Date: Tue, 13 Mar 2007 14:58:24 -0700 > > > On 3/13/07, Nish Aravamudan <[EMAIL PROTECTED]> wrote: > > > On 3/13/07, Greg KH <[EMAIL PROTECTED]> wrote: > > > > We (the -stable team) are announcing the release of the 2.6.20.3 kernel. > > > > It contains a number of bugfixes and all 2.6.20 users are recommended to > > > > upgrade. > > > > > > > > The diffstat and short summary of the fixes are below. > > > > > > > > I'll also be replying to this message with a copy of the patch between > > > > 2.6.20.2 and 2.6.20.3. > > > > > > Compared to 2.6.20.1 (will try 2.6.20.2 as well), I now get: > > > > err, duh -- this is a Sun Ultra 60, debian testing install. > > Figure out if 2.6.20.2 does it too, then please try to git bisect > it down further. Yep, that's the plan, just wanted to make folks aware. > I took a quick look and the two sparc64 commits between 2.6.20.1 > and 2.6.20.2 are benign, a fix for E450 interrupts and a kenvctrld > fix which is for a driver for hardware your ultra60 doesn't have. :) > > There is a decent amount of raid and nfs fixes in here, do you > use either? Neither. > Another commit that might be relevant is: > > commit 530b09160744a12450fdacb2b78779c9830a29c8 > Author: Aristeu Sergio Rozanski Filho <[EMAIL PROTECTED]> > Date: Thu Mar 1 19:02:55 2007 -0500 > > tty_io: fix race in master pty close/slave pty close path > > Hmmm... > > Please let us know if you can narrow it down further. Building 2.6.20.2 right now, will let you know. Ok, truly bizarre, I found that I was not running stock 2.6.20.3, but had your small hugetlb patch on top. So I went back and patched 2.6.20.1 with your patch, rebooted, got a soft lockup. Went back to stock 2.6.20.1 and did not. I don't see how your patch (C&P below for reference) could make any difference...Especially because no hugepages were in use at the time. On patched 2.6.20.1, I was just trying to check if my source tree had your patch applied (by `patch -p1 < davem.patch`) and got the soft-lockup I saw in 2.6.20.3 with the patch applied. I am going to try a clean 2.6.20.3 as well, now. diff --git a/arch/sparc64/mm/hugetlbpage.c b/arch/sparc64/mm/hugetlbpage.c index 33fd0b2..00677b5 100644 --- a/arch/sparc64/mm/hugetlbpage.c +++ b/arch/sparc64/mm/hugetlbpage.c @@ -248,6 +248,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, if (!pte_present(*ptep) && pte_present(entry)) mm->context.huge_pte_count++; + addr &= HPAGE_MASK; for (i = 0; i < (1 << HUGETLB_PAGE_ORDER); i++) { set_pte_at(mm, addr, ptep, entry); ptep++; @@ -266,6 +267,8 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, if (pte_present(entry)) mm->context.huge_pte_count--; + addr &= HPAGE_MASK; + for (i = 0; i < (1 << HUGETLB_PAGE_ORDER); i++) { pte_clear(mm, addr, ptep); addr += PAGE_SIZE; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
NPTL patch for linux 2.4.28
Hello all. I have a tricky problem on hand and a straight forward question. Tricky problem: - While debugging a simple multithreaded application using gdb linux 2.4.28 , i noticed the thread that has crashed after sigsegv has complete information on the gdb (both address and function at the time of crash ) .But the other threads that are in wait state ( executing glibc functions at the time of crash ) just has the address but not the function name as shown below. sh-2.05b# ./gdb a.out /mnt/cf/engg_files/core_files/ a.out.1173437318.core.5312 a.out.1173453940.core.9829 a.out.1173438125.core.16016 lost+found a.out.1173438881.core.18721 http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sys_write() racy for multi-threaded append?
In case anyone cares, this is a snippet of my work-in-progress read_write.c illustrating how I might handle f_pos. Can anyone point me to data showing whether it's worth avoiding the spinlock when the "struct file" is not shared between threads? (In my world, correctness comes before code-bumming as long as the algorithm scales properly, and there are a fair number of corner cases to think through -- although one might be able to piggy-back on the logic in fget_light.) Cheers, - Michael /* * Synchronization of f_pos is not for the purpose of serializing writes * to the same file descriptor from multiple threads. It is solely to * protect against corruption of the f_pos field leading to a severe * violation of its semantics, such as: * - a user-visible negative value on a file type which POSIX forbids *ever to have a negative offset; or * - an unexpected jump from (say) (2^32 - small) to (2^33 - small), *due to an interrupt between the two 32-bit write instructions *needed to write out an loff_t on some architectures, leading to *a delayed overwrite of half of the f_pos value written by another *thread. (Applicable to SMP and CONFIG_PREEMPT kernels.) * * Three tiers of protection on f_pos may be needed in order to trade off * between performance and least surprise: * *1. All f_pos accesses must go through accessors that protect against * problems with atomic 64-bit writes on some platforms. These * accessors are only atomic with respect to one another. * *2. Those few accesses that cannot handle transient negative values of * f_pos must be protected from a race in some llseek implementations * (including generic_file_llseek). Correct application code should * never encounter this race, and the syscall use cases that are * vulnerable to it are relatively infrequent. This is a job for an * rwlock, although the sense is inverted (readers need exclusive * access to a "stalled pipeline", while writers only need to be able * to fix things up after the fact in the event of an exception). * *3. Applications that cannot handle transient overshoot on f_pos, under * conditions where several threads are writing to the same open file * concurrently and one of them experiences a short write, can be * protected from themselves by an rwsem around vfs_write(v) calls. * (The same applies to multi-threaded reads, mutatis mutandis.) * When CONFIG_WOMBAT (waste of memory, brain, and time -- thanks, * Bodo!) is enabled, this per-struct-file rwsem is taken as necessary. */ #define file_pos_local_acquire(file, flags) \ spin_lock_irqsave(file->f_pos_lock, flags) #define file_pos_local_release(file, flags) \ spin_unlock_irqrestore(file->f_pos_lock, flags) #define file_pos_excl_acquire(file, flags) \ do {\ write_lock_irqsave(file->f_pos_rwlock, flags); \ spin_lock(file->f_pos_lock);\ } while (0) #define file_pos_excl_release(file, flags) \ do {\ spin_unlock(file->f_pos_lock); \ write_unlock_irqrestore(file->f_pos_rwlock, flags); \ } while (0) #define file_pos_nonexcl_acquire(file, flags) \ do {\ read_lock_irqsave(file->f_pos_rwlock, flags); \ spin_lock(file->f_pos_lock);\ } while (0) #define file_pos_nonexcl_release(file, flags) \ do {\ spin_unlock(file->f_pos_lock); \ read_unlock_irqrestore(file->f_pos_rwlock, flags); \ } while (0) /* * Accessors for f_pos (the file descriptor "position" for seekable file * types, also of interest as a bytes read/written counter on non-seekable * file types such as pipes and FIFOs). The f_pos field of struct file * should be accessed exclusively through these functions, so that the * changes needed to interlock these accesses atomically are localized to * the accessor functions. * * file_pos_write is defined to return the old file position so that it * can be restored by the caller if appropriate. (Note that it is not * necessarily guaranteed that restoring the old position will not clobber * a value written by another thread; see below.) file_pos_adjust is also * defined to return the old file position because it is more often needed * immediately by the caller; the new position can always be obtained by * adding the value passed into the "pos" parameter to file_pos_adjust. */ /* * Architectures on which an aligned 64-bit read/write is atomic can omit * l
Re: [QUICKLIST 0/4] Arch independent quicklists V2
Andrew Morton writes: > Plus, we can get in a situation where take a cache-cold, known-zero page > from the pte quicklist when there is a cache-hot, non-zero page sitting in > the page allocator. I suspect that zeroing the cache-hot page would take a > similar amount of time to a single miss agains the cache-cold page. That is certainly the case on powerpc. > I'm not saying that I _know_ that the quicklists are pointless, but I don't > think it's established that they are pointful. I don't see much point to them. For powerpc, I would rather grab an arbitrary page and zero it than get a page off a quicklist. > Maybe, dunno. It was apparently a win on powerpc many years ago. I had a My recollection was that it wasn't a win, but it was a long time ago... Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Summary of resource management discussion
On Tue, Mar 13, 2007 at 11:28:20PM +0530, Srivatsa Vaddagiri wrote: > On Tue, Mar 13, 2007 at 05:24:59PM +0100, Herbert Poetzl wrote: > > what about identifying different resource categories and > > handling them according to the typical usage pattern? > > > > like the following: > > > > - cpu and scheduler related accounting/limits > > - memory related accounting/limits > > - network related accounting/limits > > - generic/file system related accounting/limits > > > > I don't worry too much about having the generic/file stuff > > attached to the nsproxy, but the cpu/sched stuff might be > > better off being directly reachable from the task > > I think we should experiment with both combinations (a direct pointer > to cpu_limit structure from task_struct and an indirect pointer), get > some numbers and then decide. Or do you have results already with > respect to that? nope, no numbers for that, but I appreciate some testing and probably can do some testing in this regard too (although I want to get some testing done for the resource sharing between guests first) > > > 3. How are cpusets related to vserver/containers? > > > > > > Should it be possible to, lets say, create exclusive cpusets and > > > attach containers to different cpusets? > > > > that is what Linux-VServer does atm, i.e. you can put > > an entire guest into a specific cpu set > > Interesting. What abt /dev/cpuset view? host only for now best, Herbert > Is that same for all containers or do you restrict that view > to the containers cpuset only? > > -- > Regards, > vatsa > ___ > Containers mailing list > [EMAIL PROTECTED] > https://lists.osdl.org/mailman/listinfo/containers - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: _proxy_pda still makes linking modules fail
Jeremy Fitzhardinge writes: > Or do you mean that if you have: > > preempt_disable(); > use_my_percpu++; > preempt_enable(); > // switch cpus > preempt_disable(); > use_my_percpu++; > preempt_enable(); > > then it will still use the old pointer to use_my_percpu? Yes. It can, and sometimes does. There's no way (that I know of) to tell gcc "all my __thread variables might have moved to a different address". > In principle gcc could CSE the value of smp_processor_id() across a cpu > change in the same way. There it's easier to make gcc do what we want, because we can use a barrier or a volatile. The difference is that smp_processor_id() is ultimately the value of something, not the address of something. We can tell gcc "values might have changed" but have no way to say "addresses might have changed". Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Heads up on sys_fallocate()
On Tue, Mar 06, 2007 at 10:46:56AM -0600, Eric Sandeen wrote: > Ulrich Drepper wrote: > > Christoph Hellwig wrote: > >> fallocate with the whence argument and flags is already quite complicated, > >> I'd rather have another call for placement decisions, that would > >> be called on an fd to do placement decissions for any further allocations > >> (prealloc, write, etc) > > > > Yes, posix_fallocate shouldn't be made more complicated. But I don't > > understand why requesting linear layout of the blocks should be an > > option. It's always an advantage if the blocks requested this way are > > linear on disk. So, the kernel should always do its best to make this > > happen, without needing an additional option. > > > > Agreed on both points. The hints would be for things like start block, > or speculative EOF preallocation, not contiguity, which I think should > always be the goal. ISTR having had this discussion before ;) About guided preallocation for defrag: http://marc.info/?t=11624785951&r=1&w=2 e.g.: The sorts of policies we need for effective use of preallocation: http://marc.info/?l=linux-fsdevel&m=116184475308164&w=2 http://marc.info/?l=linux-fsdevel&m=116278169519095&w=2 Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
> a previous discussion that said 4 was the default...I don't see > why. nice uses +10 by default on all linux distro...So I suspect > that if Mike just used "nice lame" instead of "nice +5 lame", he > would have got what he wanted. tcsh, and probably csh, has a builtin 'nice' with default +4. So tcsh% nice ps -l will show a process with nice +4. If you tell it not to use the builtin, tcsh% \nice ps -l then it uses /usr/bin/nice and you get +10. bash doesn't have a nice builtin, so it always uses /usr/bin/nice and you get +10 by default. -Sanjoy `Not all those who wander are lost.' (J.R.R. Tolkien) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sys_write() racy for multi-threaded append?
On 3/13/07, Christoph Hellwig <[EMAIL PROTECTED]> wrote: Michael, please stop spreading this utter bullshit _now_. You're so full of half-knowledge that it's not funny anymore, and you try to insult people knowing a few magniutes more than you left and right. Thank you Christoph for that informative response to my comments. I take it that you consider read_write.c to be code of the highest quality and maintainability. If you have something specific in mind when you write "utter bullshit" and "half-knowledge", I'd love to hear it. Now, for those who still care to respond as if improving the kernel were a goal that you and I can share, a question: When generic_file_llseek needs the inode in order to retrieve the current file size, it goes through f_mapping (the pagecache entry?) rather than through f_path.dentry (the dentry cache?). All other inode retrievals in read_write.c go through f_path.dentry. Why? Or is this a question that can only be asked on linux-fsdevel? Cheers, - Michael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] wriston_btns: Add acerhk laptop database
This patch adds all the "tm_new" laptops information that is in acerhk to wistron_btns. That's about 25 more laptops. Obviously, I couldn't try them all. I've just tried the Aspire 3020. For this reason, I've also added a printk which ask the users of those laptops to confirm me it works (or not). Surprisingly, the dmi information could be found on google for a majority of the laptops, so it might not work so badly. The information about which laptop has which led is also imported, however for now it doesn't do anything. It's just in case someone adds led support later, in order to avoid hunting information in the acerhk for a second time. Eric From: Eric Piel <[EMAIL PROTECTED]> wriston_btns: Add acerhk laptop database acerhk supports already a lot of laptops. Lets import its database so that everyone can benefit of the work of Olaf Tauber. Only the "tm_new" laptops were imported. "tm_old" laptops could be possible but requires more testing and probably only few laptops are still alive. "dritek" laptops should probably be imported into a different driver. Also compress the keymaps by fitting each entry on an int. Most of the dmi matching was written based on google searches, so it's rather prone to errors. That's why I'm asking people to confirm it works. This adds the following hardware: Acer TravelMate 370 Acer TravelMate 380 Acer TravelMate C300 Acer TravelMate C100 Acer TravelMate C110 Acer TravelMate 250 Acer TravelMate 350 Acer TravelMate 620 Acer TravelMate 630 Acer TravelMate 220 Acer TravelMate 230 Acer TravelMate 260 Acer TravelMate 280 Acer TravelMate 360 Acer TravelMate 2100 Acer TravelMate 2410 Acer Aspire 1500 Acer Aspire 1600 Acer Aspire 3020 Acer Aspire 5020 Medion MD 2900 Medion MD 40100 Medion MD 95400 Medion MD 96500 Fujitsu Siemens Amilo 7820 Signed-off-by: Eric Piel <[EMAIL PROTECTED]> --- linux-2.6.21/drivers/input/misc/wistron_btns.c~tm610 2007-03-10 01:41:23.0 +0100 +++ linux-2.6.21/drivers/input/misc/wistron_btns.c 2007-03-12 00:54:54.0 +0100 @@ -233,11 +233,15 @@ static void bios_set_state(u8 subsys, in struct key_entry { char type; /* See KE_* below */ u8 code; - unsigned keycode; /* For KE_KEY */ + u16 keycode; /* For KE_KEY */ }; enum { KE_END, KE_KEY, KE_WIFI, KE_BLUETOOTH }; +#define FE_MAIL_LED 0x01 +#define FE_WIFI_LED 0x02 +#define FE_UNTESTED 0x80 + static const struct key_entry *keymap; /* = NULL; Current key map */ static int have_wifi; static int have_bluetooth; @@ -288,7 +292,16 @@ static struct key_entry keymap_wistron_m { KE_KEY, 0x13, KEY_PROG3 }, { KE_KEY, 0x31, KEY_MAIL }, { KE_KEY, 0x36, KEY_WWW }, - { KE_END, 0 } + { KE_END, FE_MAIL_LED } +}; + +static struct key_entry keymap_wistron_md40100[] = { + { KE_KEY, 0x01, KEY_HELP }, + { KE_KEY, 0x02, KEY_CONFIG }, + { KE_KEY, 0x31, KEY_MAIL }, + { KE_KEY, 0x36, KEY_WWW }, + { KE_KEY, 0x37, KEY_SCREEN }, /* Display on/off */ + { KE_END, FE_MAIL_LED | FE_WIFI_LED | FE_UNTESTED } }; static struct key_entry keymap_wistron_ms2141[] = { @@ -305,23 +318,163 @@ static struct key_entry keymap_wistron_m }; static struct key_entry keymap_acer_aspire_1500[] = { + { KE_KEY, 0x01, KEY_HELP }, + { KE_KEY, 0x03, KEY_POWER }, { KE_KEY, 0x11, KEY_PROG1 }, { KE_KEY, 0x12, KEY_PROG2 }, { KE_WIFI, 0x30, 0 }, { KE_KEY, 0x31, KEY_MAIL }, { KE_KEY, 0x36, KEY_WWW }, + { KE_KEY, 0x49, KEY_CONFIG }, { KE_BLUETOOTH, 0x44, 0 }, - { KE_END, 0 } + { KE_END, FE_UNTESTED } +}; + +static struct key_entry keymap_acer_aspire_1600[] = { + { KE_KEY, 0x01, KEY_HELP }, + { KE_KEY, 0x03, KEY_POWER }, + { KE_KEY, 0x08, KEY_MUTE }, + { KE_KEY, 0x11, KEY_PROG1 }, + { KE_KEY, 0x12, KEY_PROG2 }, + { KE_KEY, 0x13, KEY_PROG3 }, + { KE_KEY, 0x31, KEY_MAIL }, + { KE_KEY, 0x36, KEY_WWW }, + { KE_KEY, 0x49, KEY_CONFIG }, + { KE_WIFI, 0x30, 0 }, + { KE_BLUETOOTH, 0x44, 0 }, + { KE_END, FE_MAIL_LED | FE_UNTESTED } +}; + +/* 3020 has been tested */ +static struct key_entry keymap_acer_aspire_5020[] = { + { KE_KEY, 0x01, KEY_HELP }, + { KE_KEY, 0x03, KEY_POWER }, + { KE_KEY, 0x05, KEY_MEDIA }, /* Display switch */ + { KE_KEY, 0x11, KEY_PROG1 }, + { KE_KEY, 0x12, KEY_PROG2 }, + { KE_KEY, 0x31, KEY_MAIL }, + { KE_KEY, 0x36, KEY_WWW }, + { KE_KEY, 0x6a, KEY_CONFIG }, + { KE_WIFI, 0x30, 0 }, + { KE_BLUETOOTH, 0x44, 0 }, + { KE_END, FE_MAIL_LED | FE_UNTESTED } +}; + +static struct key_entry keymap_acer_travelmate_2410[] = { + { KE_KEY, 0x01, KEY_HELP }, + { KE_KEY, 0x6d, KEY_POWER }, + { KE_KEY, 0x11, KEY_PROG1 }, + { KE_KEY, 0x12, KEY_PROG2 }, + { KE_KEY, 0x31, KEY_MAIL }, + { KE_KEY, 0x36, KEY_WWW }, + { KE_KEY, 0x6a, KEY_CONFIG }, + { KE_WIFI, 0x30, 0 }, + { KE_BLUETOOTH, 0x44, 0 }, + { KE_END, FE_MAIL_LED | FE_UNTESTED } +}; + +static struct key_entry keymap_acer_travelmate_110[] = { + { KE_KEY, 0x01, KEY_HELP }, + { KE_KEY, 0x02, KEY_CONFIG }, + { KE_KEY, 0x03, KEY_POWER }, + { KE_KEY, 0x08, KEY_MUTE }, + { KE_KEY, 0x11, KEY_PROG1 }, + { KE_KEY, 0x12, KEY_PROG2 }, + { KE_KEY, 0x20, KEY_VOLUM
RSDL development plans
On Wednesday 14 March 2007 07:58, Con Kolivas wrote: > On Wednesday 14 March 2007 03:03, Con Kolivas wrote: > > On Wednesday 14 March 2007 02:31, Con Kolivas wrote: > > > On Monday 12 March 2007 22:26, Al Boldi wrote: > > > > I think, it should be possible to spread this max expiration latency > > > > across the rotation, should it not? > > > > > > Can you try the attached patch please Al and Mike? It "dithers" the > > > priority bitmap which tends to fluctuate the latency a lot more but in > > > a cyclical fashion. This tends to make the max latency bound to a > > > smaller value and should make it possible to run -nice tasks without > > > killing the latency of the non niced tasks. Eg you could possibly run X > > > nice -10 at a guess like we used to in 2.4 days. It's not essential of > > > course, but is a workaround for Mike's testcase. > > > > Oops, one tiny fix. This is a respin of the patch, sorry. > Bah with a bit more sleep under my belt it became clear that I forgot to > update the expired array in any proper way so this change almost breaks > stuff at the moment in the shape it's in. Please disregard this change for > now apart from interest in how I'm tackling the nice issue. The rsdl patches queued up so far are stable and boot fine and are reasonably performant on many architectures so I'm quite happy for them to get a run in -mm. The changes planned will (as you may have seen on this email thread) decrease average latencies across all nice levels, and make differential nice levels run better together. This will allow -nice to be used without significant latency harm to not niced tasks (as there is presently in rsdl and mainline). The change required on top of the patch earlier in this email is to make the dynamic bitmap reflect where the tasks will actually be on an array swap. However, I must inform people that I have to arrest the RSDL development for at least this week. I have a new and fairly serious neck problem that is being exacerbated badly by sitting in front of the computer for any extended period. I suspect the inner workings of RSDL currently are not well understood yet by anyone else well enough to hack on it. I'm not at all opposed to someone taking up the code at the moment and making the necessary changes I've mentioned above in the meantime though if they can get their head around it. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: /proc/kallsyms race vs module unload
On Tue, Mar 13, 2007 at 06:49:50PM +, Paulo Marques wrote: > Alexey Dobriyan wrote: > >[...] > >What happens is that module_get_kallsym() drops module_mutex, > >returns "struct module *", module unloaded, "struct module *" > >used. > > The only use for the "struct module *" is to display the name of the > module. Ehh? > This can be solved by adding a "char mod_name[MODULE_NAME_LEN];" field > to "kallsym_iter" and copy the name of the module over, while still > holding module_mutex. It would be slightly slower, but safer. iter->owner = module_get_kallsym(iter->pos - kallsyms_num_syms, &iter->value, &iter->type, iter->name, sizeof(iter->name)); if (iter->owner == NULL) return 0; /* Label it "global" if it is exported, "local" if not exported. */ iter->type = is_exported(iter->name, iter->owner) ^^^ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] wistron_btns: Generic keymap
This patch adds a generic map. That is, a keymap that should output the correct keycodes for most laptops. This is simply based on the observation of all those keymaps already gathered, as most of the wistron codes are always mapped to the same keycode. Hopefully, this way users which have a non-supported laptop will have a quick and dirty way to use the multimedia keys. Eric From: Eric Piel <[EMAIL PROTECTED]> wistron_btns: Generic keymap It turns out that the mapping of the wistron code is always the same, the main difference being some keys which may not exist and leds which might not be present. Therefore it's possible to write a generic keymap which would allow the use of unknown keyboard. The user can select it specifying the parameter "keymap=generic". Signed-off-by: Eric Piel <[EMAIL PROTECTED]> --- linux-2.6.21/drivers/input/misc/wistron_btns.c 2007-03-12 00:53:51.0 +0100 +++ linux-2.6.21/drivers/input/misc/wistron_btns.c~full 2007-03-12 00:39:26.0 +0100 @@ -58,7 +58,7 @@ MODULE_PARM_DESC(force, "Load even if co static char *keymap_name; /* = NULL; */ module_param_named(keymap, keymap_name, charp, 0); -MODULE_PARM_DESC(keymap, "Keymap name, if it can't be autodetected"); +MODULE_PARM_DESC(keymap, "Keymap name, if it can't be autodetected [generic, 1557/MS2141]"); static struct platform_device *wistron_device; @@ -562,6 +562,42 @@ static struct key_entry keymap_wistron_m { KE_END, FE_UNTESTED } }; +static struct key_entry keymap_wistron_generic[] = { + { KE_KEY, 0x01, KEY_HELP }, + { KE_KEY, 0x02, KEY_CONFIG }, + { KE_KEY, 0x03, KEY_POWER }, + { KE_KEY, 0x05, KEY_MEDIA }, /* Display switch */ + { KE_KEY, 0x06, KEY_SCREEN }, /* Display on/off */ + { KE_KEY, 0x08, KEY_MUTE }, + { KE_KEY, 0x11, KEY_PROG1 }, + { KE_KEY, 0x12, KEY_PROG2 }, + { KE_KEY, 0x13, KEY_PROG3 }, + { KE_KEY, 0x14, KEY_MAIL }, + { KE_KEY, 0x15, KEY_WWW }, + { KE_KEY, 0x20, KEY_VOLUMEUP }, + { KE_KEY, 0x21, KEY_VOLUMEDOWN }, + { KE_KEY, 0x22, KEY_REWIND }, + { KE_KEY, 0x23, KEY_FORWARD }, + { KE_KEY, 0x24, KEY_PLAYPAUSE }, + { KE_KEY, 0x25, KEY_STOPCD }, + { KE_KEY, 0x31, KEY_MAIL }, + { KE_KEY, 0x36, KEY_WWW }, + { KE_KEY, 0x37, KEY_SCREEN }, /* Display on/off */ + { KE_KEY, 0x40, KEY_WLAN }, + { KE_KEY, 0x49, KEY_CONFIG }, + { KE_KEY, 0x4a, KEY_CLOSE }, /* lid close */ + { KE_KEY, 0x4b, KEY_OPEN }, /* lid open */ + { KE_KEY, 0x6a, KEY_CONFIG }, + { KE_KEY, 0x6d, KEY_POWER }, + { KE_KEY, 0x71, KEY_STOPCD }, + { KE_KEY, 0x72, KEY_PLAYPAUSE }, + { KE_KEY, 0x74, KEY_REWIND }, + { KE_KEY, 0x78, KEY_FORWARD }, + { KE_WIFI, 0x30, 0 }, + { KE_BLUETOOTH, 0x44, 0 }, + { KE_END, 0 } +}; + /* * If your machine is not here (which is currently rather likely), please send * a list of buttons and their key codes (reported when loading this module @@ -880,15 +916,17 @@ static struct dmi_system_id dmi_ids[] __ static int __init select_keymap(void) { + dmi_check_system(dmi_ids); if (keymap_name != NULL) { if (strcmp (keymap_name, "1557/MS2141") == 0) keymap = keymap_wistron_ms2141; + else if (strcmp (keymap_name, "generic") == 0) + keymap = keymap_wistron_generic; else { printk(KERN_ERR "wistron_btns: Keymap unknown\n"); return -EINVAL; } } - dmi_check_system(dmi_ids); if (keymap == NULL) { if (!force) { printk(KERN_ERR "wistron_btns: System unknown\n");
[PATCH 0/2] wistron_btns: More keymaps
Hello, As a sequel to my patch "Wistron button support for TravelMate 610" of last week, here is a bigger addition of keymaps for the wistron_btns. Patch 1 adds all the database of acerhk which fits this driver (about 25 more laptops). Patch 2 adds a generic map that should fit most users but has the disadvantage of not being automatic. Dmitry, I've tried to make them against your tree. Still, if they don't apply cleanly, just tell me and I'll try harder! See you, Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.20.3
On 3/13/07, David Miller <[EMAIL PROTECTED]> wrote: From: "Nish Aravamudan" <[EMAIL PROTECTED]> Date: Tue, 13 Mar 2007 14:58:24 -0700 > On 3/13/07, Nish Aravamudan <[EMAIL PROTECTED]> wrote: > > On 3/13/07, Greg KH <[EMAIL PROTECTED]> wrote: > > > We (the -stable team) are announcing the release of the 2.6.20.3 kernel. > > > It contains a number of bugfixes and all 2.6.20 users are recommended to > > > upgrade. > > > > > > The diffstat and short summary of the fixes are below. > > > > > > I'll also be replying to this message with a copy of the patch between > > > 2.6.20.2 and 2.6.20.3. > > > > Compared to 2.6.20.1 (will try 2.6.20.2 as well), I now get: > > err, duh -- this is a Sun Ultra 60, debian testing install. Figure out if 2.6.20.2 does it too, then please try to git bisect it down further. Yep, that's the plan, just wanted to make folks aware. I took a quick look and the two sparc64 commits between 2.6.20.1 and 2.6.20.2 are benign, a fix for E450 interrupts and a kenvctrld fix which is for a driver for hardware your ultra60 doesn't have. :) There is a decent amount of raid and nfs fixes in here, do you use either? Neither. Another commit that might be relevant is: commit 530b09160744a12450fdacb2b78779c9830a29c8 Author: Aristeu Sergio Rozanski Filho <[EMAIL PROTECTED]> Date: Thu Mar 1 19:02:55 2007 -0500 tty_io: fix race in master pty close/slave pty close path Hmmm... Please let us know if you can narrow it down further. Building 2.6.20.2 right now, will let you know. Thanks, Nish - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] LinuxPPS: Pulse per Second support for Linux
On Tue, Mar 13, 2007 at 10:38:43PM +0100, Rodolfo Giometti wrote: > here my new patch for PPS support in Linux. > > I tried to follow your suggestions as much possible! Please let me > know if this new version could be more acceptable. I have tried out 3.0.0-rc2 which seems to work pretty well so far (when combined with the patches to the jsm driver I just posted). It took soe work to get ntp's refclock_nmea to work though, since the patch that is linked to from the linuxpps page seems out of date. Here is the patch that seems to be working for me, although I am still testing it. Given you know the linuxpps code better perhaps you can see if it looks sane to you. --- ntpd/refclock_nmea.c.ori2007-03-13 18:38:01.0 -0400 +++ ntpd/refclock_nmea.c2007-03-13 18:44:47.0 -0400 @@ -79,6 +79,7 @@ #define RANGEGATE 50 /* range gate (ns) */ #define LENNMEA75 /* min timecode length */ +#define LENPPS PPS_MAX_NAME_LEN /* * Tables to compute the ddd of year form icky dd/mm timecode. Viva la @@ -99,6 +100,7 @@ pps_params_t pps_params; /* pps parameters */ pps_info_t pps_info;/* last pps data */ pps_handle_t handle;/* pps handlebars */ + int handle_created; /* pps handle created flag */ #endif /* HAVE_PPSAPI */ }; @@ -147,6 +149,11 @@ register struct nmeaunit *up; struct refclockproc *pp; int fd; +#ifdef PPS_HAVE_FINDPATH + char id[LENPPS] = "", +path[LENPPS], +mylink[LENPPS] = "";/* just a default device */ +#endif /* PPS_HAVE_FINDPATH */ char device[20]; /* @@ -201,7 +208,20 @@ #else return (0); #endif -} +} else { +struct serial_struct ss; +if (ioctl(fd, TIOCGSERIAL, &ss) < 0 || +( +ss.flags |= ASYNC_HARDPPS_CD, + ioctl(fd, TIOCSSERIAL, &ss)) < 0) { + msyslog(LOG_NOTICE, "refclock_nmea: TIOCSSERIAL fd %d, %m", fd); + msyslog(LOG_NOTICE, + "refclock_nmea: optional PPS processing not available"); +} else { +msyslog(LOG_INFO, +"refclock_nmea: PPS detection on"); +} + } /* * Allocate and initialize unit structure @@ -238,12 +258,26 @@ * Start the PPSAPI interface if it is there. Default to use * the assert edge and do not enable the kernel hardpps. */ +#ifdef PPS_HAVE_FINDPATH + /* Get the PPS source's real name */ + //time_pps_readlink(mylink, LENPPS, path, LENPPS); + time_pps_readlink(device, LENPPS, path, LENPPS); + + /* Try to find the source */ + fd = time_pps_findpath(path, LENPPS, id, LENPPS); + if (fd < 0) { + msyslog(LOG_ERR, "refclock_nmea: cannot find PPS path \"%s\" in the system", path); + return (0); + } + msyslog(LOG_INFO, "refclock_nmea: found PPS source \"%s\" at id #%d on \"%s\"", path, fd, id); +#endif /* PPS_HAVE_FINDPATH */ if (time_pps_create(fd, &up->handle) < 0) { - up->handle = 0; + up->handle_created = 0; msyslog(LOG_ERR, "refclock_nmea: time_pps_create failed: %m"); return (1); } + up->handle_created = ~0; return(nmea_ppsapi(peer, 0, 0)); #else return (1); @@ -265,8 +299,10 @@ pp = peer->procptr; up = (struct nmeaunit *)pp->unitptr; #ifdef HAVE_PPSAPI - if (up->handle != 0) + if (up->handle_created) { time_pps_destroy(up->handle); + up->handle_created = 0; + } #endif /* HAVE_PPSAPI */ io_closeclock(&pp->io); free(up); @@ -374,7 +410,7 @@ /* * Convert the timespec nanoseconds field to ntp l_fp units. */ - if (up->handle == 0) + if (!up->handle_created) return (0); timeout.tv_sec = 0; timeout.tv_nsec = 0; -- Len Sorensen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [5/6] 2.6.21-rc3: known regressions
Hi, On Tue, Mar 13, 2007 at 04:56:24PM +0100, Tomáš Janoušek wrote: > On Tue, Mar 13, 2007 at 04:51:39PM +0100, [EMAIL PROTECTED] wrote: > > Can you please try to compile without nohz and without hrtimers and try it > > again? > > A colleage told me to try this yesterday and if I remember correctly, it did > not help. I may try it again because I'm not sure whether it wasn't some Ok, this was bullshit. Nohz and hrtimers turned off really solve the issue with having to press keys. Seems like the yesterday's check was for the other issue and I just pressed the keys automatically, remembering that I had to. Sorry, -- Tomáš Janoušek, a.k.a. Liskni_si, http://work.lisk.in/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: module.h and moduleparam.h: more header file pedantry
On Wed, 14 Mar 2007, Alexey Dobriyan wrote: > On Mon, Mar 12, 2007 at 12:59:20PM -0400, Robert P. J. Day wrote: > > to my surprise, i learned only today that module.h includes > > moduleparam.h, which flies in the face of all of the documentation > > i've ever read which was adamant that i *had* to include moduleparam.h > > if i was using parameters. i'm guessing this comes as a surprise to > > the 400+ header files which include both unnecessarily. > > > > so ... in a perfect world, should a module source file that doesn't > > use parameters *at all* need to include moduleparam.h? > > Probably not. > > > as it stands > > now, yes, it does, given some ugly inter-dependencies between the two > > files. > > > > so, at the very least, programmers can stop including moduleparam.h, > > unless there's a cleaner way to do all that. > > Regardless, of what you'll do: cross-compile test! > > After aforementioned removal and adding "struct kernel_param;" > > + akmk arm-assabet -k > CHK include/linux/version.h > make[2]: `include/asm-arm/mach-types.h' is up to date. > Using /home/linux/linux-irq-flags-t as source for kernel > GEN /home/linux/build/arm-assabet/Makefile > CHK include/linux/utsrelease.h > CHK include/linux/compile.h > CC arch/arm/nwfpe/fpmodule.o > arch/arm/nwfpe/fpmodule.c:179: error: syntax error before string constant > arch/arm/nwfpe/fpmodule.c:179: warning: type defaults to `int' in declaration > of `__MODULE_INFO' > arch/arm/nwfpe/fpmodule.c:179: warning: function declaration isn't a prototype > arch/arm/nwfpe/fpmodule.c:179: warning: data definition has no type or > storage class oh, i've already been by that and figured out what's going on. i'm going to summarize this on the KJ wiki. it's really quite the mess. rday -- Robert P. J. Day Linux Consulting, Training and Annoying Kernel Pedantry Waterloo, Ontario, CANADA http://fsdev.net/wiki/index.php?title=Main_Page - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
jsm driver fix for linuxpps support
The jsm driver doesn't currently use the uart_handle_*_change helper functions, which are the obvious place for things like linuxpps to tie into (which it now does of course), and as a result the jsm driver can not be used with linuxpps and anything else that ties into the serial_core helper functions. This patch adds calls to these helper functions whenever the value they manage changes. That actual storage of the state is not modified since the jsm driver caches the current settings (The 8250 driver reads them everytime a user asks for the state), and only updates them whenever they change. Signed-off-by: Len Sorensen <[EMAIL PROTECTED]> --- a/drivers/serial/jsm/jsm_neo.c 2007-03-01 10:31:28.0 -0500 +++ b/drivers/serial/jsm/jsm_neo.c 2007-03-01 10:18:16.0 -0500 @@ -592,8 +592,13 @@ return; /* Scrub off lower bits. They signify delta's, which I don't care about */ - msignals &= 0xf0; + /* Keep DDCD and DDSR though */ + msignals &= 0xf8; + if (msignals & UART_MSR_DDCD) + uart_handle_dcd_change(&ch->uart_port, msignals & UART_MSR_DCD); + if (msignals & UART_MSR_DDSR) + uart_handle_cts_change(&ch->uart_port, msignals & UART_MSR_CTS); if (msignals & UART_MSR_DCD) ch->ch_mistat |= UART_MSR_DCD; else - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Small fixes for jsm driver
The jsm driver fails when you try to use the TIOCSSERIAL ioctl. The reason is that the driver never sets uart_port.uartclk, causing the data received using TIOCGSERIAL to not match the internal state of the driver. This patch fixes this problem by settings the uartclk to the value used by the serial_core (16 times the baud base). Signed-off-by: Len Sorensen <[EMAIL PROTECTED]> --- a/drivers/serial/jsm/jsm_tty.c 2007-03-13 15:53:39.0 -0400 +++ b/drivers/serial/jsm/jsm_tty.c 2007-03-13 15:55:15.0 -0400 @@ -471,6 +471,7 @@ continue; brd->channels[i]->uart_port.irq = brd->irq; + brd->channels[i]->uart_port.uartclk = 14745600; brd->channels[i]->uart_port.type = PORT_JSM; brd->channels[i]->uart_port.iotype = UPIO_MEM; brd->channels[i]->uart_port.membase = brd->re_map_membase; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 0/4] Arch independent quicklists V2
> "Jeremy" == Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes: Jeremy> And do the same in pte pages for actual mapped pages? Or do Jeremy> you think they would be too densely populated for it to be Jeremy> worthwhile? We've been doing some measurements on how densely clumped ptes are. On 32-bit platforms, they're pretty dense. On IA64, quite a bit sparser, depending on the workload of course. I think that's mostly because of the larger pagesize on IA64 -- with 64k pages, you don't need very many to map a small object. I'm hoping IanW can give more details. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.20.3
From: "Nish Aravamudan" <[EMAIL PROTECTED]> Date: Tue, 13 Mar 2007 14:58:24 -0700 > On 3/13/07, Nish Aravamudan <[EMAIL PROTECTED]> wrote: > > On 3/13/07, Greg KH <[EMAIL PROTECTED]> wrote: > > > We (the -stable team) are announcing the release of the 2.6.20.3 kernel. > > > It contains a number of bugfixes and all 2.6.20 users are recommended to > > > upgrade. > > > > > > The diffstat and short summary of the fixes are below. > > > > > > I'll also be replying to this message with a copy of the patch between > > > 2.6.20.2 and 2.6.20.3. > > > > Compared to 2.6.20.1 (will try 2.6.20.2 as well), I now get: > > err, duh -- this is a Sun Ultra 60, debian testing install. Figure out if 2.6.20.2 does it too, then please try to git bisect it down further. I took a quick look and the two sparc64 commits between 2.6.20.1 and 2.6.20.2 are benign, a fix for E450 interrupts and a kenvctrld fix which is for a driver for hardware your ultra60 doesn't have. :) There is a decent amount of raid and nfs fixes in here, do you use either? Another commit that might be relevant is: commit 530b09160744a12450fdacb2b78779c9830a29c8 Author: Aristeu Sergio Rozanski Filho <[EMAIL PROTECTED]> Date: Thu Mar 1 19:02:55 2007 -0500 tty_io: fix race in master pty close/slave pty close path Hmmm... Please let us know if you can narrow it down further. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
FF layer restrictions [Was: [PATCH 1/1] Input: add sensable phantom driver]
Why did you remove all Cced people? Anyway I filtered some of them out johann deneux napsal(a): You are right, the direction in ff_effect is meant to be an angle. A dirty solution would be to use the 16 bits as two 8-bits angles. Or That would be a problem as I need 3x 16bits. maybe we should change the API. I don't think there are many applications using force feedback yet, so maybe that should be ok? If we change the API, we should remove the assumption that a device has at most two axes to render effects. We could for instance have a magnitude argument for each axis which is capable of rendering effects. That might be necessary even for more common gaming devices like racing wheels: One can think pedals could also be capable of force feedback some day, not just the steering wheel. I can do that, but in that case, I need to know how people (especially those input one) want me to do... regards, -- http://www.fi.muni.cz/~xslaby/Jiri Slaby faculty of informatics, masaryk university, brno, cz e-mail: jirislaby gmail com, gpg pubkey fingerprint: B674 9967 0407 CE62 ACC8 22A0 32CC 55C3 39D4 7A7E - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ANNOUNCE] iproute2 2.6.20-070313
This is an experimental to the iproute2 command set. The version number includes the kernel version to denote what features are supported. The same source should build on older systems, but obviously the newer kernel features won't be available. As much as possible, this package tries to be source compatible across releases. It can be downloaded from: http://developer.osdl.org/dev/iproute2/download/iproute2-2.6.20-070313.tar.gz Repository: git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git For more info on iproute2 see: http://linux-net.osdl.org/index.php/Iproute2 Changes: Jamal Hadi Salim: update rest to use nl_mgrp nl_mgrp to crap if base multicast groups exceeded Old bug on tc Mike Frysinger: do not ignore build failures in subdirs of iproute2 Noriaki TAKAMIYA: enabled to manipulate the flags of IFA_F_HOMEADDRESS or IFA_F_NODAD from ip. Patrick McHardy: tbf: fix latency printing Use tc_calc_xmittime() where appropriate Introduce tc_calc_xmitsize and use where appropriate Introduce TIME_UNITS_PER_SEC to represent internal clock resolution Replace "usec" by "time" in function names Add sprint_ticks() function and use in CBQ Handle different kernel clock resolutions Increase internal clock resolution to nsec Stephen Hemminger: netem use read/write for changes fix tc-pfifo and tc-bfifo man pages iptables library fix TC bfifo man page Use kernel headers from 2.6.20.y Thomas Hisch: Fixes use of uninitialized string - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/8] per backing_dev dirty and writeback page accounting
On Tue, Mar 13, 2007 at 09:21:59AM +0100, Miklos Szeredi wrote: > > > read request > > > sys_write > > > mutex_lock(i_mutex) > > > ... > > > balance_dirty_pages > > > submit write requests > > > loop ... write requests completed ... dirty still over limit ... > > > ... loop forever > > > > Hmmm - the situation in balance_dirty_pages() after an attempt > > to writeback_inodes(&wbc) that has written nothing because there > > is nothing to write would be: > > > > wbc->nr_write == write_chunk && > > wbc->pages_skipped == 0 && > > wbc->encountered_congestion == 0 && > > !bdi_congested(wbc->bdi) > > > > What happens if you make that an exit condition to the loop? > > That's almost right. The only problem is that even if there's no > congestion, the device queue can be holding a great amount of yet > unwritten pages. So exiting on this condition would mean, that > dirty+writeback could go way over the threshold. Only if the queue depth is not bound. Queue depths are bound and so the distance we can go over the threshold is limited. This is the fundamental principle on which the throttling is based. Hence, if the queue is not full, then we will have either written dirty pages to it (i.e wbc->nr_write != write_chunk so we will throttle or continue normally if write_chunk was written) or we have no more dirty pages left. Having no dirty pages left on the bdi and it not being congested means we effectively have a clean, idle bdi. We should not be trying to throttle writeback here - we can't do anything to improve the situation by continuing to try to do writeback on this bdi, so we may as well give up and let the writer continue. Once we have dirty pages on the bdi, we'll get throttled appropriately. The point I'm making here is that if the bdi is not congested, any pages dirtied on that bdi can be cleaned _quickly_ and so writing more pages to it isn't a big deal even if we are over the global dirty threshold. Remember, the global dirty threshold is not really a hard limit - it's a threshold at which we change behaviour. Throttling idle bdi's does not contribute usefully to reducing the number of dirty pages in the system; all it really does is deny service to devices that could otherwise be doing useful work. > How much this would be a problem? I don't know, I guess it depends on > many things: how many queues, how many requests per queue, how many > bytes per request. Right, and most ppl don't have enough devices in their system for this to be a problem. Even those of us that do have enough devices for this to potentially be a problem usually have enough RAM in the machine so that it is not a problem > > Or alternatively, adding another bit to the wbc structure to > > say "there was nothing to do" and setting that if we find > > list_empty(&sb->s_dirty) when trying to flush dirty inodes." > > > > [ FWIW, this may also solve another problem of fast block devices > > being throttled incorrectly when a slow block dev is consuming > > all the dirty pages... ] > > There may be a patch floating around, which I think basically does > this, but only as long as the dirty+writeback are over a soft limit, > but under the hard limit. > > When over the the hard limit, balance_dirty_pages still loops until > dirty+writeback go below the threshold. The difference between the two methods is that if there is any hard limit that results in balance_dirty_pages looping then you have a potential deadlock. Hence the soft+hard limits will reduce the occurrence but not remove the deadlock. Breaking out of the loop when there is nothing to do simply means we'll reenter again with something to do very shortly (and *then* throttle) if the process continues to write. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc3-mm1 RSDL results
Con Kolivas wrote: On Wednesday 14 March 2007 05:21, Mark Lord wrote: Con Kolivas wrote: Can you try the new version of RSDL. Assuming it doesn't oops on you it has some accounting bugfixes which may have been biting you. Retesting today with 2.6.21-rc3-git7 + 2.6.21-rc3-sched-rsdl-0.30.patch. Still not pleasant to use the GUI with a kernel build (-j1 or -j2) happening unless the build is manually "nice'd". Also, accounting looks weird in top(1). With a 100% busy machine, top will show something like this : top - 14:20:11 up 10:22, 1 user, load average: 2.65, 2.80, 2.18 Tasks: 134 total, 4 running, 128 sleeping, 0 stopped, 2 zombie Cpu(s): 68.7% us, 6.7% sy, 24.7% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 2076964k total, 2002560k used,74404k free, 148924k buffers Swap: 2409740k total, 244k used, 2409496k free, 1448876k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 1824 root 36 10 11748 7244 1936 R 4.0 0.3 0:00.12 cc1 1845 root 31 0 8080 5272 1412 R 1.7 0.3 0:00.05 cc1 4139 root 20 0 176m 35m 6860 S 1.3 1.7 18:59.35 Xorg 29381 root 20 0 33712 16m 12m R 1.0 0.8 0:27.24 konsole 3 root 20 0 000 S 0.3 0.0 0:00.49 events/0 1529 root 20 0 2556 1460 752 S 0.3 0.1 0:00.05 make 14623 root 20 0 2200 1144 860 R 0.3 0.1 0:00.89 top 1 root 20 0 1568 532 464 S 0.0 0.0 0:00.22 init 2 root 39 19 000 S 0.0 0.0 0:00.01 ksoftirqd/0 4 root 20 0 000 S 0.0 0.0 0:00.00 khelper 5 root 20 0 000 S 0.0 0.0 0:00.00 kthread Mmm.. I wonder where all of that 100% CPU went to.. the busiest tasks are only showing up as 4.0% and 1.7% (when in fact they are using near 100%). Nothing ever looks like it stays running for very long. That would be enough to account for this sort of top picture. Sorry, I just don't buy that one. This was a 2-second sampling interval in top. top(1) is a program that has to work, so if this scheduler breaks it like this, then we need to understand and fix top(1) or the scheduler. What HZ are you running? Do you usually run two makes at different nice levels? This was HZ=1000, with NO_HZ. And, no, not normally different nice levels. Here I was just trying to keep the machine usable while building a couple of things. Keep at it. Someday this might be good enough for mainline, but right now the stock scheduler beats it for my desktop (notebook) loads. Cheers - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: module.h and moduleparam.h: more header file pedantry
On Mon, Mar 12, 2007 at 12:59:20PM -0400, Robert P. J. Day wrote: > to my surprise, i learned only today that module.h includes > moduleparam.h, which flies in the face of all of the documentation > i've ever read which was adamant that i *had* to include moduleparam.h > if i was using parameters. i'm guessing this comes as a surprise to > the 400+ header files which include both unnecessarily. > > so ... in a perfect world, should a module source file that doesn't > use parameters *at all* need to include moduleparam.h? Probably not. > as it stands > now, yes, it does, given some ugly inter-dependencies between the two > files. > > so, at the very least, programmers can stop including moduleparam.h, > unless there's a cleaner way to do all that. Regardless, of what you'll do: cross-compile test! After aforementioned removal and adding "struct kernel_param;" + akmk arm-assabet -k CHK include/linux/version.h make[2]: `include/asm-arm/mach-types.h' is up to date. Using /home/linux/linux-irq-flags-t as source for kernel GEN /home/linux/build/arm-assabet/Makefile CHK include/linux/utsrelease.h CHK include/linux/compile.h CC arch/arm/nwfpe/fpmodule.o arch/arm/nwfpe/fpmodule.c:179: error: syntax error before string constant arch/arm/nwfpe/fpmodule.c:179: warning: type defaults to `int' in declaration of `__MODULE_INFO' arch/arm/nwfpe/fpmodule.c:179: warning: function declaration isn't a prototype arch/arm/nwfpe/fpmodule.c:179: warning: data definition has no type or storage class - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Stolen and degraded time and schedulers
Daniel Walker wrote: > The frequency tracking you mention is done to some extent inside the > timekeeping adjustment functions, but I'm not sure it's totally accurate > for non-timekeeping, and it also tracks things like interrupt latency. > Tracking frequency changes where it's important to get it right > shouldn't be done I think .. > > If you want accurate time accounting, don't use the TSC . > I'm not sure I follow you here. Clocksources have the means to adjust the rate of time progression, mostly to warp the time for things like ntp. The stability or otherwise of the tsc is irrelevant. If you had a clocksource which was explicitly using the rate at which a CPU does work as a timebase, then using the same warping mechanism would allow you to model CPU speed changes. > The sched_clock interface is basically a stripped down clocksource.. > I've implemented sched_clock as a clocksource in the past .. > Yes, that works. But a clocksource is strictly about measuring the progression of real time, and so doesn't generally measure how much work a CPU has done. >> We currently have a sched_clock interface in paravirt_ops to deal with >> the hypervisor aspect. It only occurred to me this morning that cpufreq >> presents exactly the same problem to the rest of the kernel, and so >> there's room for a more general solution. >> > > Are there other architecture which have this per-cpu clock frequency > changing issue? I worked with several other architectures beyond just > x86 and haven't seen this issue .. Well, lots of cpus have dynamic frequencies. Any scheduler which maintains history will suffer the same problem, even on UP. If processes A and B are supposed to have the same priority and they both execute for 1ms of real time, did they make the same amount of progress? Not if the cpu changed speed in between. And any system which commonly runs virtualized (s390, power, etc) will need to deal with the notion of stolen time. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.20.3
On 3/13/07, Greg KH <[EMAIL PROTECTED]> wrote: We (the -stable team) are announcing the release of the 2.6.20.3 kernel. It contains a number of bugfixes and all 2.6.20 users are recommended to upgrade. The diffstat and short summary of the fixes are below. I'll also be replying to this message with a copy of the patch between 2.6.20.2 and 2.6.20.3. Compared to 2.6.20.1 (will try 2.6.20.2 as well), I now get: [ 199.361347] BUG: soft lockup detected on CPU#2! smp_percpu_timer_interrupt+0xd4/0x180 tl0_irq14+0x1c/0x20 journal_add_journal_head+0x2c/0x1e0 journal_write_metadata_buffer+0x480/0x500 journal_commit_transaction+0xc38/0x1040 kjournald+0xc0/0x1e0 kthread+0xb0/0xc0 kernel_thread+0x38/0x60 keventd_create_kthread+0x20/0xa0 shortly after the serial console prompts for login. Thanks, Nish - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.20.3
On 3/13/07, Nish Aravamudan <[EMAIL PROTECTED]> wrote: On 3/13/07, Greg KH <[EMAIL PROTECTED]> wrote: > We (the -stable team) are announcing the release of the 2.6.20.3 kernel. > It contains a number of bugfixes and all 2.6.20 users are recommended to > upgrade. > > The diffstat and short summary of the fixes are below. > > I'll also be replying to this message with a copy of the patch between > 2.6.20.2 and 2.6.20.3. Compared to 2.6.20.1 (will try 2.6.20.2 as well), I now get: err, duh -- this is a Sun Ultra 60, debian testing install. Thanks, Nish - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/