Re: -mm merge plans for 2.6.23
On Wed, 25 Jul 2007, Nick Piggin wrote: Eric St-Laurent wrote: On Wed, 2007-25-07 at 06:55 +0200, Rene Herman wrote: > It certainly doesn't run for me ever. Always kind of a "that's not the > point" comment but I just keep wondering whenever I see anyone complain > about updatedb why the _hell_ they are running it in the first place. If > anyone who never uses "locate" for anything simply disable updatedb, the > problem will for a large part be solved. > > This not just meant as a cheap comment; while I can think of a few > similar loads even on the desktop (scanning a browser cache, a media > player indexing a large amount of media files, ...) I've never heard of > problems _other_ than updatedb. So just junk that crap and be happy. >From my POV there's two different problems discussed recently: - updatedb type of workloads that add tons of inodes and dentries in the slab caches which of course use the pagecache. - streaming large files (read or copying) that fill the pagecache with useless used-once data swap prefetch fix the first case, drop-behind fix the second case. OK, this is where I start to worry. Swap prefetch AFAIKS doesn't fix the updatedb problem very well, because if updatedb has caused swapout then it has filled memory, and swap prefetch doesn't run unless there is free memory (not to mention that updatedb would have paged out other files as well). And drop behind doesn't fix your usual problem where you are downloading from a server, because that is use-once write(2) data which is the problem. And this readahead-based drop behind also doesn't help if data you were reading happened to be a sequence of small files, or otherwise not in good readahead order. Not to say that neither fix some problems, but for such conceptually big changes, it should take a little more effort than a constructed test case and no consideration of the alternatives to get it merged. well, there appears to be a fairly large group of people who have subjective opinions that it helps them. but those were dismissed becouse they aren't measurements. so now the measurements of the constructed test case aren't acceptable. what sort of test case would be acceptable? David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -mm merge plans for 2.6.23
On Wed, 25 Jul 2007, Rene Herman wrote: On 07/25/2007 07:12 AM, [EMAIL PROTECTED] wrote: On Wed, 25 Jul 2007, Rene Herman wrote: > It certainly doesn't run for me ever. Always kind of a "that's not the > point" comment but I just keep wondering whenever I see anyone complain > about updatedb why the _hell_ they are running it in the first place. If > anyone who never uses "locate" for anything simply disable updatedb, the > problem will for a large part be solved. > > This not just meant as a cheap comment; while I can think of a few > similar loads even on the desktop (scanning a browser cache, a media > player indexing a large amount of media files, ...) I've never heard of > problems _other_ than updatedb. So just junk that crap and be happy. but if you do use locate then the alturnative becomes sitting around and waiting for find to complete on a regular basis. Yes, but what's locate's usage scenario? I've never, ever wanted to use it. When do you know the name of something but not where it's located, other than situations which "which" wouldn't cover and after just having installed/unpacked something meaning locate doesn't know about it yet either? which only finds executables that are in the path. I commonly use locate to find config files (or sample config files) for packages that were installed at some point in the past with fairly default configs and now I want to go and tweak them. so I start reading documentation and then need to find out where $disto moved the files to this release (I commonly am working on machines with over a half dozen different distro releases, and none of them RedHat) David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc1 regression: mm: fix fault vs invalidate race for linear mappings
Dave Airlie wrote: Is this with a binary-only module? We saw an issue with that in SLES9 where the module is returning a locked page from its nopage handler when it isn't really supposed to. It might be fixed in latest drivers, have you tried them? Doesn't sound like it he mentions radeon drm module which is open... OK. Well from the task trace, X is getting stuck on a locked page. And as it is never getting unlocked, I'd be almost positive it comes from a driver's nopage. I know some ATI driver did that in the past. Would the radeon drm module do anything similar? -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Print utsname on Oops on all architectures
On Thu, 5 Jul 2007 18:52:27 -0700 (PDT) Joshua Wise <[EMAIL PROTECTED]> wrote: > Background: > This patch is a follow-on to "Info dump on Oops or panic()" [1]. > > On some architectures, the kernel printed some information on the running > kernel, but not on all architectures. The information printed was generally > the version and build number, but it was not located in a consistant place, > and some architectures did not print it at all. > > Description: > This patch uses the already-existing die_chain to print utsname information > on Oops. This patch also removes the architecture-specific utsname > printers. To avoid crashing the system further (and hence not printing the > Oops) in the case where the system is so hopelessly smashed that utsname > might be destroyed, we vsprintf the utsname data into a static buffer > first, and then just print that on crash. > > Testing: > I wrote a module that does a *(int*)0 = 0; and observed that I got my > utsname data printed. > > Potential impact: > This adds another line to the Oops output, causing the first few lines to > potentially scroll off the screen. This also adds a few more pointer > dereferences in the Oops path, because it adds to the die_chain notifier > chain, reducing the likelihood that the Oops will be printed if there is > very bad memory corruption. There are strange happenings due to this patch on i386: Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 hdc: max request size: 128KiB hdc: 156355584 sectors (80054 MB) w/1819KiB Cache, CHS=65535/16/63, UDMA(33) hdc: cache flushes supported hdc:<0>Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 hdc1 hdc2 hdc3 hdc4 <<0>Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 hdc5<0>Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 hdc6 > Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 initcall 0xc052a060: idedisk_init+0x0/0x10() returned 0. initcall 0xc052a060 ran for 38 msecs: idedisk_init+0x0/0x10() Calling initcall 0xc052a070: ide_cdrom_init+0x0/0x10() Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 Linux 2.6.23-rc1-mm1 #7 SMP Tue Jul 24 22:34:40 PDT 2007 i686 initcall 0xc052a070: ide_cdrom_init+0x0/0x10() returned 0. initcall 0xc052a070 ran for 3 msecs: ide_cdrom_init+0x0/0x10() Calling initcall 0xc052a080: idetape_init+0x0/0x90() initcall 0xc052a080: idetape_init+0x0/0x90() returned 0. initcall 0xc052a080 ran for 0 msecs: idetape_init+0x0/0x90() Calling initcall 0xc052a110: idefloppy_init+0x0/0x20() ide-floppy driver 0.99.newide initcall 0xc052a110: idefloppy_init+0x0/0x20()
Re: -mm merge plans for 2.6.23
Eric St-Laurent wrote: On Wed, 2007-25-07 at 06:55 +0200, Rene Herman wrote: It certainly doesn't run for me ever. Always kind of a "that's not the point" comment but I just keep wondering whenever I see anyone complain about updatedb why the _hell_ they are running it in the first place. If anyone who never uses "locate" for anything simply disable updatedb, the problem will for a large part be solved. This not just meant as a cheap comment; while I can think of a few similar loads even on the desktop (scanning a browser cache, a media player indexing a large amount of media files, ...) I've never heard of problems _other_ than updatedb. So just junk that crap and be happy. From my POV there's two different problems discussed recently: - updatedb type of workloads that add tons of inodes and dentries in the slab caches which of course use the pagecache. - streaming large files (read or copying) that fill the pagecache with useless used-once data swap prefetch fix the first case, drop-behind fix the second case. OK, this is where I start to worry. Swap prefetch AFAIKS doesn't fix the updatedb problem very well, because if updatedb has caused swapout then it has filled memory, and swap prefetch doesn't run unless there is free memory (not to mention that updatedb would have paged out other files as well). And drop behind doesn't fix your usual problem where you are downloading from a server, because that is use-once write(2) data which is the problem. And this readahead-based drop behind also doesn't help if data you were reading happened to be a sequence of small files, or otherwise not in good readahead order. Not to say that neither fix some problems, but for such conceptually big changes, it should take a little more effort than a constructed test case and no consideration of the alternatives to get it merged. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -mm merge plans for 2.6.23
On 07/25/2007 07:12 AM, [EMAIL PROTECTED] wrote: On Wed, 25 Jul 2007, Rene Herman wrote: It certainly doesn't run for me ever. Always kind of a "that's not the point" comment but I just keep wondering whenever I see anyone complain about updatedb why the _hell_ they are running it in the first place. If anyone who never uses "locate" for anything simply disable updatedb, the problem will for a large part be solved. This not just meant as a cheap comment; while I can think of a few similar loads even on the desktop (scanning a browser cache, a media player indexing a large amount of media files, ...) I've never heard of problems _other_ than updatedb. So just junk that crap and be happy. but if you do use locate then the alturnative becomes sitting around and waiting for find to complete on a regular basis. Yes, but what's locate's usage scenario? I've never, ever wanted to use it. When do you know the name of something but not where it's located, other than situations which "which" wouldn't cover and after just having installed/unpacked something meaning locate doesn't know about it yet either? Rene. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -mm merge plans for 2.6.23
On Wed, 2007-25-07 at 06:55 +0200, Rene Herman wrote: > It certainly doesn't run for me ever. Always kind of a "that's not the > point" comment but I just keep wondering whenever I see anyone complain > about updatedb why the _hell_ they are running it in the first place. If > anyone who never uses "locate" for anything simply disable updatedb, the > problem will for a large part be solved. > > This not just meant as a cheap comment; while I can think of a few similar > loads even on the desktop (scanning a browser cache, a media player indexing > a large amount of media files, ...) I've never heard of problems _other_ > than updatedb. So just junk that crap and be happy. >From my POV there's two different problems discussed recently: - updatedb type of workloads that add tons of inodes and dentries in the slab caches which of course use the pagecache. - streaming large files (read or copying) that fill the pagecache with useless used-once data swap prefetch fix the first case, drop-behind fix the second case. Both have the same symptoms but the cause is different. Personally updatedb doesn't really hurt me. But I don't have that many files on my desktop. I've tried the swap prefetch patch in the past and it was not so noticeable for me. (I don't doubt it's helpful for others) But every time I read or copy a large file around (usually from a server) the slowdown is noticeable for some moments. I just wanted to point this out, if it wasn't clean enough for everyone. I hope both problems get fixed. Best regards, - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] fs/super.c: Why alloc_super use a static variable default_op?
On 7/25/07, Al Viro <[EMAIL PROTECTED]> wrote: On Wed, Jul 25, 2007 at 12:29:17PM +0800, rae l wrote: > But is it valuable? Compared to a waste of sizeof(struct super_block) > bytes memory. It's less that struct super_block, actually. > When some code want to refer fs_type->s_op, it almost always want to > refer some function pointer in s_op with fs_type->s_op->***, but all > pointers in default_op are all NULLs, what about this scenario? Yes, and? You still need one test instead of two. Which gets you more than 21 words used by that sucker, only in .text instead of .bss. > and if you do grep s_op in the source code, you will found nowhere > will want to test s_op or dependent on s_op not NULL. What? fs/inode.c: if (sb->s_op->alloc_inode) inode = sb->s_op->alloc_inode(sb); else inode = (struct inode *) kmem_cache_alloc(inode_cachep, GFP_KERNEL); and the same goes everywhere else. Of course we don't check for sb->s_op not being NULL - that's exactly why we are safe skipping such tests. Oh, Thank you. But there are also many other subsystems will do fs/dcache.c: void dput(struct dentry *dentry) if (dentry->d_op && dentry->d_op->d_delete) { Do you think it's worth optimizing it with a static d_op filled? we can add a static variable to d_alloc and set its initial d_op to this static variable? struct dentry *d_alloc(struct dentry * parent, const struct qstr *name) -- Denis Cheng Linux Application Developer "One of my most productive days was throwing away 1000 lines of code." - Ken Thompson. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] readahead drop behind and size adjustment
Eric St-Laurent wrote: On Mon, 2007-23-07 at 19:00 +1000, Nick Piggin wrote: I don't like this kind of conditional information going from something like readahead into page reclaim. Unless it is for readahead _specific_ data such as "I got these all wrong, so you can reclaim them" (which this isn't). But I don't like it as a use-once thing. The VM should be able to get that right. Question: How work the use-once code in the current kernel? Is there any? I doesn't quite work for me... What *I* think is supposed to happen is that newly read in pages get put on the inactive list, and unless they get accessed againbefore being reclaimed, they are allowed to fall off the end of the list without disturbing active data too much. I think there is a missing piece here, that we used to ease the reclaim pressure off the active list when the inactive list grows relatively much larger than it (which could indicate a lot of use-once pages in the system). Andrew got rid of that logic for some reason which I don't know, but I can't see that use-once would be terribly effective today (so your results don't surprise me too much). I think I've been banned from touching vmscan.c, but if you're keen to try a patch, I might be convinced to come out of retirement :) See my previous email today, I've done a small test case to demonstrate the problem and the effectiveness of Peter's patch. The only piece missing is the copy case (read once + write once). Regardless of how it's implemented, I think a similar mechanism must be added. This is a long standing issue. In the end, I think it's a pagecache resources allocation problem. the VM lacks fair-share limits between processes. The kernel doesn't have enough information to make the right decisions. You can refine or use more advanced page reclaim, but some fair-share splitting (like the CPU scheduler) between the processes must be present. Of course some process should have large or unlimited VM limits, like databases. Maybe the "containers" patchset and memory controller can help. With some specific configuration and/or a userspace daemon to adjust the limits on the fly. Independently, the basic large file streaming read (or copy) once cases should not trash the pagecache. Can we agree on that? One man's trash is another's treasure: some people will want the files to remain in cache because they'll use them again (copy it somewhere else, or start editing it after being copied or whatever). But yeah, we can probably do better at the sequential read/write case. I say, let's add some code to fix the problem. If we hear about any regression in some workloads, we can add a tunable to limit or disable its effects, _if_ a better compromised solution cannot be found. Sure, but let's figure out the workloads and look at all the alternatives first. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86_64 tce section mismatch
On Tue, Jul 24, 2007 at 02:17:02PM -0700, Randy Dunlap wrote: > From: Randy Dunlap <[EMAIL PROTECTED]> > > Fix section mismatch warnings: > these functions are called only from __init functions. > > WARNING: vmlinux.o(.text+0x1861c): Section mismatch: reference to > .init.text:free_bootmem (between 'free_tce_table' and 'build_tce_table') > WARNING: vmlinux.o(.text+0x187e5): Section mismatch: reference to > .init.text:__alloc_bootmem_low (between 'alloc_tce_table' and > 'kretprobe_trampoline_holder') > > Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]> At some point in time we will need to support hotplug with IOMMU translation enabled, in which case they'll be called when hotplug happens as well, but in the mean time Signed-off-by: Muli Ben-Yehuda <[EMAIL PROTECTED]> I'll push it with the next Calgary update unless Andi picks it up first. Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -mm merge plans for 2.6.23
On Wed, 25 Jul 2007, Rene Herman wrote: On 07/25/2007 06:06 AM, Nick Piggin wrote: Ray Lee wrote: > Anyway, my point is that I worry that tuning for an unusual and > infrequent workload (which updatedb certainly is), is the wrong way to > go. Well it runs every day or so for every desktop Linux user, and it has similarities with other workloads. It certainly doesn't run for me ever. Always kind of a "that's not the point" comment but I just keep wondering whenever I see anyone complain about updatedb why the _hell_ they are running it in the first place. If anyone who never uses "locate" for anything simply disable updatedb, the problem will for a large part be solved. This not just meant as a cheap comment; while I can think of a few similar loads even on the desktop (scanning a browser cache, a media player indexing a large amount of media files, ...) I've never heard of problems _other_ than updatedb. So just junk that crap and be happy. but if you do use locate then the alturnative becomes sitting around and waiting for find to complete on a regular basis. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RFC] getting rid of stupid loop in BUG()
Keith Owens wrote: > Trent Piepho (on Tue, 24 Jul 2007 19:31:36 -0700 (PDT)) wrote: >> Adding __builtin_trap after the >> asm might be an ok fix. It will emit a spurious int 6, but that won't even >> be >> reached since the asm doesn't return, and it probably be less extra code than >> the loop. > > int 6 is a two byte instruction, the loop generates jmp with an 8 bit > offset, also two bytes. No change in code size. > INT 6 is #UD, so the __builtin_trap() replaces the ud2a as well as the loop. How far back was __builtin_trap() supported? -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] firewire: mass-storage i/o-problems
On Tue, Jul 24, 2007 at 09:56:59PM +0200, Stefan Richter wrote: > Manuel Lauss wrote: > > Actually, copying data to the disk while playing/seeking through a moviefile > > which is also located on it is already enough. Forget the NFS thing... > > > > Afterwards the firewire_sbp2 module has to be rmmod-ed and modprobed again > > or it will continue to throw errors even for single reads. > > > > I hope this helps tracking it down... > > I tried this and similar tests on my main PC (PCIe based) and on an > Athlon/KM266 PC, with 1394b and 1394a hardware. Nothing happened, > except for a single "status write for unknown orb", followed by command > abort from which the disk immediately recovered. I did many tests and > it didn't happen again. I.e. it's probable that the supposed bug > happens here too, but very rarely. I tried 2.6.23 in the meantime, it's *MUCH* harder to trigger; in fact I had to skip through movies for ~10 minutes to get the orb timeout. The disk was inaccessible for a few seconds then recovered fine. > Could you (and everyone else who has repeated I/O errors with the new > drivers, but not with the old drivers) test the attached patches, one > patch at a time? They apply to 2.6.22. Will do. Thanks, Manuel Lauss - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
On Tue, 2007-07-24 at 11:25 -0700, Linus Torvalds wrote: > > On Tue, 24 Jul 2007, Andrew Morton wrote: > > > > I guess this was the bug: > > Looks very likely to me. Mike, Alexey, does this fix things for you? I don't have very much runtime on it yet, but yes, it seems to have. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]: allow individual core dump methods to be unlimited when sending to a pipe
SuperH allmodconfig broke: fs/binfmt_flat.c:83: warning: initialization from incompatible pointer type fs/binfmt_flat.c:94: error: conflicting types for 'flat_core_dump' fs/binfmt_flat.c:78: error: previous declaration of 'flat_core_dump' was here fs/binfmt_flat.c:94: error: conflicting types for 'flat_core_dump' fs/binfmt_flat.c:78: error: previous declaration of 'flat_core_dump' was here fs/binfmt_flat.c: In function `decompress_exec': fs/binfmt_flat.c:293: warning: label `out' defined but not used fs/binfmt_flat.c: In function `load_flat_file': fs/binfmt_flat.c:462: warning: unsigned int format, long int arg (arg 3) fs/binfmt_flat.c:462: warning: unsigned int format, long int arg (arg 4) fs/binfmt_flat.c:518: warning: comparison of distinct pointer types lacks a cast fs/binfmt_flat.c:549: warning: passing arg 1 of `ksize' makes pointer from integer without a cast fs/binfmt_flat.c:601: warning: passing arg 1 of `ksize' makes pointer from integer without a cast fs/binfmt_flat.c: At top level: fs/binfmt_flat.c:78: warning: 'flat_core_dump' used but never defined fs/binfmt_flat.c:94: warning: 'flat_core_dump' defined but not used - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix corruption of memmap on IA64 SPARSEMEM when mem_section is not a power of 2
On Tue, 13 Mar 2007 10:42:02 + [EMAIL PROTECTED] (Mel Gorman) wrote: > There are problems in the use of SPARSEMEM and pageblock flags that causes > problems on ia64. > > The first part of the problem is that units are incorrect in > SECTION_BLOCKFLAGS_BITS computation. This results in a map_section's > section_mem_map being treated as part of a bitmap which isn't good. This > was evident with an invalid virtual address when mem_init attempted to free > bootmem pages while relinquishing control from the bootmem allocator. > > The second part of the problem occurs because the pageblock flags bitmap is > be located with the mem_section. The SECTIONS_PER_ROOT computation using > sizeof (mem_section) may not be a power of 2 depending on the size of the > bitmap. This renders masks and other such things not power of 2 base. This > issue was seen with SPARSEMEM_EXTREME on ia64. This patch moves the bitmap > outside of mem_section and uses a pointer instead in the mem_section. The > bitmaps are allocated when the section is being initialised. > > Note that sparse_early_usemap_alloc() does not use alloc_remap() like > sparse_early_mem_map_alloc(). The allocation required for the bitmap on x86, > the only architecture that uses alloc_remap is typically smaller than a cache > line. alloc_remap() pads out allocations to the cache size which would be > a needless waste. > > Credit to Bob Picco for identifying the original problem and effecting a > fix for the SECTION_BLOCKFLAGS_BITS calculation. Credit to Andy Whitcroft > for devising the best way of allocating the bitmaps only when required for > the section. SuperH allmodconfig blew up: mm/sparse.c: In function `sparse_init': mm/sparse.c:482: error: implicit declaration of function `sparse_early_usemap_alloc' mm/sparse.c:482: warning: assignment makes pointer from integer without a cast mm/sparse.c: In function `sparse_add_one_section': mm/sparse.c:553: error: implicit declaration of function `__kmalloc_section_usemap' mm/sparse.c:553: warning: assignment makes pointer from integer without a cast - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -mm merge plans for 2.6.23
Rene Herman wrote: On 07/25/2007 06:06 AM, Nick Piggin wrote: Ray Lee wrote: Anyway, my point is that I worry that tuning for an unusual and infrequent workload (which updatedb certainly is), is the wrong way to go. Well it runs every day or so for every desktop Linux user, and it has similarities with other workloads. It certainly doesn't run for me ever. Always kind of a "that's not the point" comment but I just keep wondering whenever I see anyone complain about updatedb why the _hell_ they are running it in the first place. If anyone who never uses "locate" for anything simply disable updatedb, the problem will for a large part be solved. This not just meant as a cheap comment; while I can think of a few similar loads even on the desktop (scanning a browser cache, a media player indexing a large amount of media files, ...) I've never heard of problems _other_ than updatedb. So just junk that crap and be happy. OK fair point, but the counter point that there are real patterns that just use-once a lot of metadata (ls, for example. grep even.) -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -mm merge plans for 2.6.23
On 07/25/2007 06:06 AM, Nick Piggin wrote: Ray Lee wrote: Anyway, my point is that I worry that tuning for an unusual and infrequent workload (which updatedb certainly is), is the wrong way to go. Well it runs every day or so for every desktop Linux user, and it has similarities with other workloads. It certainly doesn't run for me ever. Always kind of a "that's not the point" comment but I just keep wondering whenever I see anyone complain about updatedb why the _hell_ they are running it in the first place. If anyone who never uses "locate" for anything simply disable updatedb, the problem will for a large part be solved. This not just meant as a cheap comment; while I can think of a few similar loads even on the desktop (scanning a browser cache, a media player indexing a large amount of media files, ...) I've never heard of problems _other_ than updatedb. So just junk that crap and be happy. Rene. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/8] i386: bitops: Kill volatile-casting of memory addresses
Linus Torvalds wrote: On Tue, 24 Jul 2007, Benjamin Herrenschmidt wrote: Besides, as Nick pointed out, it prevents some valid optimizations. No it doesn't. Not the ones on the functions that just do an inline asm. The only valid optimization it might break is for "constant_test_bit()", which isn't even using inline asm. The constant case is probably most used (at least for page flags), and is most important for me. constant_test_bit may not be using inline asm, but the volatile pointer target means that it reloads the value and can't do much optimisation over it. BTW. once volatile goes away, i386 really should start using the C versions of __set_bit and __clear_bit as well IMO. (at least for the constant bitnr case), so gcc can potentially optimise better. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] oom: print points as unsigned long
In badness(), the automatic variable 'points' is unsigned long. Print it as such. Signed-off-by: David Rientjes <[EMAIL PROTECTED]> --- mm/oom_kill.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -156,7 +156,7 @@ unsigned long badness(struct task_struct *p, unsigned long uptime) } #ifdef DEBUG - printk(KERN_DEBUG "OOMkill: task %d (%s) got %d points\n", + printk(KERN_DEBUG "OOMkill: task %d (%s) got %lu points\n", p->pid, p->comm, points); #endif return points; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -mm merge plans for 2.6.23
On Tue, 24 Jul 2007, Ray Lee wrote: On 7/23/07, Nick Piggin <[EMAIL PROTECTED]> wrote: Ray Lee wrote: Looking at your past email, you have a 1GB desktop system and your overnight updatedb run is causing stuff to get swapped out such that swap prefetch makes it significantly better. This is really intriguing to me, and I would hope we can start by making this particular workload "not suck" without swap prefetch (and hopefully make it even better than it currently is with swap prefetch because we'll try not to evict useful file backed pages as well). updatedb is an annoying case, because one would hope that there would be a better way to deal with that highly specific workload. It's also pretty stat dominant, which puts it roughly in the same category as a git diff. (They differ in that updatedb does a lot of open()s and getdents on directories, git merely does a ton of lstat()s instead.) Anyway, my point is that I worry that tuning for an unusual and infrequent workload (which updatedb certainly is), is the wrong way to go. updatedb pushing out program data may be able to be improved on with drop behind or similar. however another scenerio that causes a similar problem is when a user is busy useing one of the big memory hogs and then switches to another (think switching between openoffice and firefox) After that we can look at other problems that swap prefetch helps with, or think of some ways to measure your "whole day" scenario. So when/if you have time, I can cook up a list of things to monitor and possibly a patch to add some instrumentation over this updatedb run. That would be appreciated. Don't spend huge amounts of time on it, okay? Point me the right direction, and we'll see how far I can run with it. you could make a synthetic test by writing a memory hog that allocates 3/4 of your ram then pauses waiting for input and then randomly accesses the memory for a while (say randomly accessing 2x # of pages allocated) and then pausing again before repeating run two of these, alternating which one is running at any one time. time how long it takes to do the random accesses. the difference in this time should be a fair example of how much it would impact the user. by the way, I've also seen comments on the Postgres performance mailing list about how slow linux is compared to other OS's in pulling data back in that's been pushed out to swap (not a factor on dedicated database machines, but a big factor on multi-purpose machines) Anyway, I realise swap prefetching has some situations where it will fundamentally outperform even the page replacement oracle. This is why I haven't asked for it to be dropped: it isn't a bad idea at all. However, if we can improve basic page reclaim where it is obviously lacking, that is always preferable. eg: being a highly speculative operation, swap prefetch is not great for power efficiency -- but we still want laptop users to have a good experience as well, right? Absolutely. Disk I/O is the enemy, and the best I/O is one you never had to do in the first place. almost always true, however there is some amount of I/O that is free with todays drives (remember, they read the entire track into ram and then give you the sectors on the track that you asked for). and if you have a raid array this is even more true. if you read one sector in from a raid5 array you have done all the same I/O that you would have to do to read in the entire stripe, but I don't believe that the current system will keep it all around if it exceeds the readahead limit. so in many cases readahead may end up being significantly cheaper then you expect. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc1 regression: mm: fix fault vs invalidate race for linear mappings
Is this with a binary-only module? We saw an issue with that in SLES9 where the module is returning a locked page from its nopage handler when it isn't really supposed to. It might be fixed in latest drivers, have you tried them? Doesn't sound like it he mentions radeon drm module which is open... Dave. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] fs/super.c: Why alloc_super use a static variable default_op?
On Wed, Jul 25, 2007 at 12:29:17PM +0800, rae l wrote: > But is it valuable? Compared to a waste of sizeof(struct super_block) > bytes memory. It's less that struct super_block, actually. > When some code want to refer fs_type->s_op, it almost always want to > refer some function pointer in s_op with fs_type->s_op->***, but all > pointers in default_op are all NULLs, what about this scenario? Yes, and? You still need one test instead of two. Which gets you more than 21 words used by that sucker, only in .text instead of .bss. > and if you do grep s_op in the source code, you will found nowhere > will want to test s_op or dependent on s_op not NULL. What? fs/inode.c: if (sb->s_op->alloc_inode) inode = sb->s_op->alloc_inode(sb); else inode = (struct inode *) kmem_cache_alloc(inode_cachep, GFP_KERNEL); and the same goes everywhere else. Of course we don't check for sb->s_op not being NULL - that's exactly why we are safe skipping such tests. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] readahead drop behind and size adjustment
On Mon, 2007-23-07 at 19:00 +1000, Nick Piggin wrote: > I don't like this kind of conditional information going from something > like readahead into page reclaim. Unless it is for readahead _specific_ > data such as "I got these all wrong, so you can reclaim them" (which > this isn't). > > But I don't like it as a use-once thing. The VM should be able to get > that right. > Question: How work the use-once code in the current kernel? Is there any? I doesn't quite work for me... See my previous email today, I've done a small test case to demonstrate the problem and the effectiveness of Peter's patch. The only piece missing is the copy case (read once + write once). Regardless of how it's implemented, I think a similar mechanism must be added. This is a long standing issue. In the end, I think it's a pagecache resources allocation problem. the VM lacks fair-share limits between processes. The kernel doesn't have enough information to make the right decisions. You can refine or use more advanced page reclaim, but some fair-share splitting (like the CPU scheduler) between the processes must be present. Of course some process should have large or unlimited VM limits, like databases. Maybe the "containers" patchset and memory controller can help. With some specific configuration and/or a userspace daemon to adjust the limits on the fly. Independently, the basic large file streaming read (or copy) once cases should not trash the pagecache. Can we agree on that? I say, let's add some code to fix the problem. If we hear about any regression in some workloads, we can add a tunable to limit or disable its effects, _if_ a better compromised solution cannot be found. Surely it's possible to have a acceptable solution. Best regards, - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] fs/super.c: Why alloc_super use a static variable default_op?
On 7/25/07, Al Viro <[EMAIL PROTECTED]> wrote: On Wed, Jul 25, 2007 at 11:48:35AM +0800, rae l wrote: > Why alloc_super use a static variable default_op? > the static struct super_operations default_op is just all zeros, and > just referenced as the initial value of a new allocated super_block, > what does it for? So that we would not have to care about ->s_op *ever* being NULL. But is it valuable? Compared to a waste of sizeof(struct super_block) bytes memory. When some code want to refer fs_type->s_op, it almost always want to refer some function pointer in s_op with fs_type->s_op->***, but all pointers in default_op are all NULLs, what about this scenario? and if you do grep s_op in the source code, you will found nowhere will want to test s_op or dependent on s_op not NULL. So my opinion is to remove default_ops, just keep new allocated s_op NULL. -- Denis Cheng Linux Application Developer "One of my most productive days was throwing away 1000 lines of code." - Ken Thompson. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: commit 7e92b4fc34 - x86, serial: convert legacy COM ports to platform devices - broke my serial console
On 7/24/07, Bjorn Helgaas <[EMAIL PROTECTED]> wrote: On Tuesday 24 July 2007 02:33:05 pm Yinghai Lu wrote: > I have a system that has the same problem, and it turns out that FW > missed PNP0501 is DSDT for uart. and add that it into DSDT works well. Is this FW that has been shipped? Can you give any more details, like DMI info and a copy of the DSDT? We can't expect users to upgrade their firmware or use a custom DSDT. The system is not shipped yet. Normally PNP0501 is coming with superio section in DSDT. So i think late BIOS if have acpi there, that should be there already. Problem is that some new design may get rid of superio, but SB could have extra uart for serial port. at that case BIOS may not have that PNP0501... I don't think revert is reasonable. or we can make legacy_serial.force=1 is default at this point. YH - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] fs/super.c: Why alloc_super use a static variable default_op?
On Wed, Jul 25, 2007 at 11:48:35AM +0800, rae l wrote: > Why alloc_super use a static variable default_op? > the static struct super_operations default_op is just all zeros, and > just referenced as the initial value of a new allocated super_block, > what does it for? So that we would not have to care about ->s_op *ever* being NULL. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/8] i386: bitops: Don't mark memory as clobbered unnecessarily
Benjamin Herrenschmidt wrote: On Tue, 2007-07-24 at 17:55 -0400, Trond Myklebust wrote: If you want to use bitops as spinlocks you should rather be using . That also does the right thing w.r.t. pre-emption and sparse locking annotations. Heh, I didn't know about those... A bit annoying that I can't override them in the arch, I might be able to save a barrier or two here. Our I guess the test_and_set_bit_lock / clear_bit_unlock will allow you to override them in a way. The big performance problem I see on my powerpc system is not the bit spinlocks (open-coded or not), but the bit sleep locks. Anyway, I'll finally send out the lock bitops patches again today... -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -mm merge plans for 2.6.23
Ray Lee wrote: On 7/23/07, Nick Piggin <[EMAIL PROTECTED]> wrote: Also a random day at the desktop, it is quite a broad scope and pretty well impossible to analyse. It is pretty broad, but that's also what swap prefetch is targetting. As for hard to analyze, I'm not sure I agree. One can black-box test this stuff with only a few controls. e.g., if I use the same apps each day (mercurial, firefox, xorg, gcc), and the total I/O wait time consistently goes down on a swap prefetch kernel (normalized by some control statistic, such as application CPU time or total I/O, or something), then that's a useful measurement. I'm not saying that we can't try to tackle that problem, but first of all you have a really nice narrow problem where updatedb seems to be causing the kernel to completely do the wrong thing. So we start on that. If we can first try looking at some specific problems that are easily identified. Always easier, true. Let's start with "My mouse jerks around under memory load." A Google Summer of Code student working on X.Org claims that mlocking the mouse handling routines gives a smooth cursor under load ([1]). It's surprising that the kernel would swap that out in the first place. [1] http://vignatti.wordpress.com/2007/07/06/xorg-input-thread-summary-or-something/ OK, I'm not sure what the point is though. Under heavy memory load, things are going to get swapped out... and swap prefetch isn't going to help there (at least, not during the memory load). There are also other issues like whether the CPU scheduler is at fault, etc. Interactive workloads are always the hardest to work out. updatedb is a walk in the park by comparison. Looking at your past email, you have a 1GB desktop system and your overnight updatedb run is causing stuff to get swapped out such that swap prefetch makes it significantly better. This is really intriguing to me, and I would hope we can start by making this particular workload "not suck" without swap prefetch (and hopefully make it even better than it currently is with swap prefetch because we'll try not to evict useful file backed pages as well). updatedb is an annoying case, because one would hope that there would be a better way to deal with that highly specific workload. It's also pretty stat dominant, which puts it roughly in the same category as a git diff. (They differ in that updatedb does a lot of open()s and getdents on directories, git merely does a ton of lstat()s instead.) Yeah, and I suspect we might be able to do better use-once of inode and dentry caches. It isn't really highly specific: lots of things tend to just scan over a few files once -- updatedb just scans a lot so the problem becomes more noticable. Anyway, my point is that I worry that tuning for an unusual and infrequent workload (which updatedb certainly is), is the wrong way to go. Well it runs every day or so for every desktop Linux user, and it has similarities with other workloads. We don't want to optimise it at the expense of other things, but it _really_ should not be pushing a 1-2GB desktop into swap, I don't think. After that we can look at other problems that swap prefetch helps with, or think of some ways to measure your "whole day" scenario. So when/if you have time, I can cook up a list of things to monitor and possibly a patch to add some instrumentation over this updatedb run. That would be appreciated. Don't spend huge amounts of time on it, okay? Point me the right direction, and we'll see how far I can run with it. I guess /proc/meminfo, /proc/zoneinfo, /proc/vmstat, /proc/slabinfo before and after the updatedb run with the latest kernel would be a first step. top and vmstat output during the run wouldn't hurt either. Thanks, Nick -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] readahead: drop behind
On Sat, 2007-21-07 at 23:00 +0200, Peter Zijlstra wrote: > Use the read-ahead code to provide hints to page reclaim. > > This patch has the potential to solve the streaming-IO trashes my > desktop problem. > > It tries to aggressively reclaim pages that were loaded in a strong > sequential pattern and have been consumed. Thereby limiting the damage > to the current resident set. > > Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]> (sorry for the delay) Ok, I've done some tests with your patches, I came up with a test program that should approximate my use case. It simply mmap() and scan (read) a 375M file which represent the usual used memory on my desktop system. This data is frequently used, and should stay cached as much as possible in preference over the "used once" data read in the page cache when copying large files. I don't claim that the test program is perfect or even correct, I'm open for suggestions. Test system: - Linux x86_64 2.6.23-rc1 - 1G of RAM - I use the basic drop behind and sysctl patches. The readahead size patch is _not_ included. Setting up: dd if=/dev/zero of=/tmp/375M_file bs=1M count=375 dd if=/dev/zero of=/tmp/5G_file bs=1M count=5120 Tests with stock kernel (drop behind disabled): echo 0 >/proc/sys/vm/drop_behind Base test: sync; echo 1 >/proc/sys/vm/drop_caches time ./large_app_load_simul /tmp/375M_file time ./large_app_load_simul /tmp/375M_file time ./large_app_load_simul /tmp/375M_file time ./large_app_load_simul /tmp/375M_file 1st execution: 0m7.146s 2nd execution: 0m1.119s 3rd execution: 0m1.109s 4th execution: 0m1.105s Reading a large file test: sync; echo 1 >/proc/sys/vm/drop_caches time ./large_app_load_simul /tmp/375M_file time ./large_app_load_simul /tmp/375M_file cp /tmp/5G_file /dev/null time ./large_app_load_simul /tmp/375M_file time ./large_app_load_simul /tmp/375M_file 1st execution: 0m7.224s 2nd execution: 0m1.114s 3rd execution: 0m7.178s <<< Much slower 4th execution: 0m1.115s Copying (read+write) a large file test: sync; echo 1 >/proc/sys/vm/drop_caches time ./large_app_load_simul /tmp/375M_file time ./large_app_load_simul /tmp/375M_file cp /tmp/5G_file /tmp/copy_of_5G_file time ./large_app_load_simul /tmp/375M_file time ./large_app_load_simul /tmp/375M_file rm /tmp/copy_of_5G_file 1st execution: 0m7.203s 2nd execution: 0m1.147s 3rd execution: 0m7.238s <<< Much slower 4th execution: 0m1.129s Tests with drop behind enabled: echo 1 >/proc/sys/vm/drop_behind Base test: [same tests as above] 1st execution: 0m7.206s 2nd execution: 0m1.110s 3rd execution: 0m1.102s 4th execution: 0m1.106s Reading a large file test: [same tests as above] 1st execution: 0m7.197s 2nd execution: 0m1.116s 3rd execution: 0m1.114s <<< Great!!! 4th execution: 0m1.111s Copying (read+write) a large file test: [same tests as above] 1st execution: 0m7.186s 2nd execution: 0m1.111s 3rd execution: 0m7.339s <<< Not fixed 4th execution: 0m1.121s Conclusion: - The drop-behind patch works and really prevents the page cache content from being fulled with useless read-once data. - It doesn't help the copy (read+write) case. This should also be fixed, as it's a common workload. Tested-By: Eric St-Laurent ([EMAIL PROTECTED]) Best regards, - Eric (*) Test program and batch file are attached. diff -urN linux-2.6/include/linux/swap.h linux-2.6-drop-behind/include/linux/swap.h --- linux-2.6/include/linux/swap.h 2007-07-21 18:26:00.0 -0400 +++ linux-2.6-drop-behind/include/linux/swap.h 2007-07-22 16:22:48.0 -0400 @@ -180,6 +180,7 @@ /* linux/mm/swap.c */ extern void FASTCALL(lru_cache_add(struct page *)); extern void FASTCALL(lru_cache_add_active(struct page *)); +extern void FASTCALL(lru_demote(struct page *)); extern void FASTCALL(activate_page(struct page *)); extern void FASTCALL(mark_page_accessed(struct page *)); extern void lru_add_drain(void); diff -urN linux-2.6/kernel/sysctl.c linux-2.6-drop-behind/kernel/sysctl.c --- linux-2.6/kernel/sysctl.c 2007-07-21 18:26:01.0 -0400 +++ linux-2.6-drop-behind/kernel/sysctl.c 2007-07-22 16:20:27.0 -0400 @@ -163,6 +163,7 @@ extern int prove_locking; extern int lock_stat; +extern int sysctl_dropbehind; /* The default sysctl tables: */ @@ -1048,6 +1049,14 @@ .extra1 = , }, #endif + { + .ctl_name = CTL_UNNUMBERED, + .procname = "drop_behind", + .data = _dropbehind, + .maxlen = sizeof(sysctl_dropbehind), + .mode = 0644, + .proc_handler = _dointvec, + }, /* * NOTE: do not add new entries to this table unless you have read * Documentation/sysctl/ctl_unnumbered.txt diff -urN linux-2.6/mm/readahead.c linux-2.6-drop-behind/mm/readahead.c --- linux-2.6/mm/readahead.c 2007-07-21 18:26:01.0 -0400 +++ linux-2.6-drop-behind/mm/readahead.c 2007-07-22 16:41:47.0 -0400 @@ -15,6 +15,7 @@ #include #include #include +#include void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page) { @@ -429,6 +430,8 @@ }
[RFC] fs/super.c: Why alloc_super use a static variable default_op?
Why alloc_super use a static variable default_op? the static struct super_operations default_op is just all zeros, and just referenced as the initial value of a new allocated super_block, what does it for? the filesystem dependent code such as ext2_fill_super would fill this field eventually, and after carefully checked, it seems no one filesystem would need a all zero default_op, as the command output in the kernel source tree: $ grep -RInw s_op fs/ You could check all the use of s_op. /** * alloc_super - create new superblock * @type: filesystem type superblock should belong to * * Allocates and initializes a new super_block. alloc_super() * returns a pointer new superblock or %NULL if allocation had failed. */ static struct super_block *alloc_super(struct file_system_type *type) { struct super_block *s = kzalloc(sizeof(struct super_block), GFP_USER); static struct super_operations default_op; if (s) { ... s->s_op = _op; s->s_time_gran = 10; } out: return s; } -- Denis Cheng Linux Application Developer "One of my most productive days was throwing away 1000 lines of code." - Ken Thompson. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc1 regression: mm: fix fault vs invalidate race for linear mappings
Bret Towe wrote: for a while in -git I've had an issue that on boot when gdm loads the screen stays black using ctrl-f1 doesn't return to a console and killing X doesn't help any ssh'ing into the box does work top only shows 100% io-wait dmesg shows nothing odd the work around I have is at the moment is to just move the radeon drm module out of the way so it doesn't load on boot and X works just fine like that I did some bisecting which took a few days and tracked it down to commit d00806b183152af6d24f46f0c33f14162ca1262a its way to complex for me to revert it on top of -rc1 to verify that's the issue tho I keep forgetting to get a trace of what its waiting on when I'm in that kernel I assume that would be of use and Ill get that later the box this is happening on is a g4 mac mini the built in card is a radeon 9200 I'm not seeing any issues on an amd64 box with radeon card it's also a 9600 tho Is this with a binary-only module? We saw an issue with that in SLES9 where the module is returning a locked page from its nopage handler when it isn't really supposed to. It might be fixed in latest drivers, have you tried them? Thanks, Nick -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: CONFIG_PAGE_SHIFT (aka software PAGE_SIZE)
On Wed, Jul 18, 2007 at 06:32:22AM -0700, William Lee Irwin III wrote: >> Actually I'd worked on what was called MPSS (Multiple Page Size Support) >> before I ever started on pgcl. Some large portion of the pgcl proposal >> as I presented it internally was to reduce the order of large page >> allocations and provide a promotion and demotion mechanism enabling >> different processes to have different sized translations for the same >> large page, and hence no out-of-context pagetable/TLB updates during >> promotion and demotion, essentially by making the TLB translation to >> page relation M:N. ISTR describing this in a KS presentation for which >> IIRC you were present. But that's neither here nor there. On Tue, Jul 24, 2007 at 09:44:18PM +0200, Andrea Arcangeli wrote: > Well the whole difference between you back then and SGI now, is that > your stuff wasn't being pushed to be merged very hard (it was proposed > but IIRC more as research topic, like the large PAGE_SIZE also fallen > into that same research area). See now the emails from SGI fs folks > about variable order page size, they want it merged badly instead. Neither were research topics, but I'm tired of correcting the history of my failures. I've got enough ongoing failures as things stand. On Tue, Jul 24, 2007 at 09:44:18PM +0200, Andrea Arcangeli wrote: > My whole point is that the single moment the variable order page size > isn't pure research anymore like MPSS, the CONFIG_PAGE_SHIFT isn't > research anymore either, like the tail packing in pagecache with > kmalloc also isn't research anymore. There was never any research involved in the page clustering per se. It was supposed to be a generally advantageous thing that Linus had at least once explicitly approved of that just so happened to relieve mem_map[] pressure on 64GB i386, the side effect intended to attract corporate patronage. That last fact was not only demonstrable, it was used in the first ever public demonstration of a 64GB i386 machine running Linux, which I personally carried out. Beyond active hindrances and lacks of cooperation, a "competing solution" with distro backing appeared that removed the last vestige of corporate patronage from the project. It ended up bitrotting faster than I could singlehandedly do all the maintenance, testing, and coding work on it while also trying to get anything else done. MPSS was not as well-developed at the time the hugetlb "solution" killed it, but is not terribly dissimilar in how it came into being, developed, and then died, apart from less active hindrance. The one and only aspect in which any research was involved was a proposal, never accepted or pursued, to investigate how larger base page sizes implemented via page clustering mitigated external fragmentation for the purposes of MPSS and also how certain techniques borrowed from page clustering could reduce the frequency of and performance penalties associated with demotion in MPSS. The proposal has never been publicly circulated, though some of its content was described in the KS presentation as "future directions" or similar. On Tue, Jul 24, 2007 at 09:44:18PM +0200, Andrea Arcangeli wrote: > About the fs deciding the size of the pagecache granularity I totally > dislike that design, there's no reason why the fs should control that, [...] This is all valid commentary, though I don't have any particular response to it. In any event, I've never been involved in a research project, though I would've liked to have been. The emphasis in all cases was enabling specific functionality in production, using techniques whose viability had furthermore already been demonstrated elsewhere, by others. In both instances, insurmountable nontechnical obstacles were present, which remain in place and effectively limit the scale and scope of any sort of project I can personally lead with any sort of likelihood of mainline acceptance. Where I am limited, you are not. Good luck to you. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RFC] getting rid of stupid loop in BUG()
Trent Piepho (on Tue, 24 Jul 2007 19:31:36 -0700 (PDT)) wrote: >Adding __builtin_trap after the >asm might be an ok fix. It will emit a spurious int 6, but that won't even be >reached since the asm doesn't return, and it probably be less extra code than >the loop. int 6 is a two byte instruction, the loop generates jmp with an 8 bit offset, also two bytes. No change in code size. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: commit 7e92b4fc34 - x86, serial: convert legacy COM ports to platform devices - broke my serial console
On Tuesday 24 July 2007 02:33:05 pm Yinghai Lu wrote: > I have a system that has the same problem, and it turns out that FW > missed PNP0501 is DSDT for uart. and add that it into DSDT works well. Is this FW that has been shipped? Can you give any more details, like DMI info and a copy of the DSDT? We can't expect users to upgrade their firmware or use a custom DSDT. Bjorn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] extent mapped page cache
On Tue, Jul 24, 2007 at 07:25:09PM -0400, Chris Mason wrote: > On Tue, 24 Jul 2007 23:25:43 +0200 > Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > The tree is a critical part of the patch, but it is also the easiest to > rip out and replace. Basically the code stores a range by inserting > an object at an index corresponding to the end of the range. > > Then it does searches by looking forward from the start of the range. > More or less any tree that can search and return the first key >= > than the requested key will work. > > So, I'd be happy to rip out the tree and replace with something else. > Going completely lockless will be tricky, its something that will deep > thought once the rest of the interface is sane. Just having the other tree and managing it is what makes me a little less positive of this approach, especially using it to store pagecache state when we already have the pagecache tree. Having another tree to store block state I think is a good idea as I said in the fsblock thread with Dave, but I haven't clicked as to why it is a big advantage to use it to manage pagecache state. (and I can see some possible disadvantages in locking and tree manipulation overhead). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RFC] getting rid of stupid loop in BUG()
On Tue, 24 Jul 2007, Al Viro wrote: > AFAICS, the patch below should do it for i386; instead of > using a dummy loop to tell gcc that this sucker never returns, > we do > static void __always_inline __noreturn __BUG(const char *file, int line); > containing the actual asm we want to insert and define BUG() as > __BUG(__FILE__, __LINE__). It looks safe, but I don't claim enough > experience with gcc __asm__ potential nastiness, so... Sounds like it doesn't work: http://gcc.gnu.org/ml/gcc/2007-02/msg00107.html [The] programmer won't get optimization he wants as after inlining this as after inlining this attribute information becomes completely lost. What about __builtin_trap? It results in int 6 that might not be applicable, but adding some control over it to i386 backend is definitly an option. Honza It seems like if __BUG() is not inlined, you get the bogus noreturn does return warning. If it is inlined, then you lose the noreturn attribute and un-reachable code paths aren't eliminated. Adding __builtin_trap after the asm might be an ok fix. It will emit a spurious int 6, but that won't even be reached since the asm doesn't return, and it probably be less extra code than the loop. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] hwmon: Add missing __devexit tags in various drivers
Hi Jean: * Jean Delvare <[EMAIL PROTECTED]> [2007-07-22 12:09:48 +0200]: > On Sun, 22 Jul 2007 00:30:56 +0200, Gabriel C wrote: > > I noticed this warnings on current git: > > > > drivers/hwmon/pc87360.c:1082: warning: 'pc87360_remove' defined but not used > > drivers/hwmon/sis5595.c:580: warning: 'sis5595_remove' defined but not used > > drivers/hwmon/smsc47m1.c:608: warning: 'smsc47m1_remove' defined but not > > used > > drivers/hwmon/via686a.c:648: warning: 'via686a_remove' defined but not used > > drivers/hwmon/vt8231.c:755: warning: 'vt8231_remove' defined but not used > > Signed-off-by: Jean Delvare <[EMAIL PROTECTED]> > --- > drivers/hwmon/it87.c |2 +- > drivers/hwmon/pc87360.c |2 +- > drivers/hwmon/sis5595.c |2 +- > drivers/hwmon/smsc47m1.c |2 +- > drivers/hwmon/via686a.c |2 +- > drivers/hwmon/vt8231.c |4 ++-- > drivers/hwmon/w83627hf.c |2 +- > 7 files changed, 8 insertions(+), 8 deletions(-) > > --- linux-2.6.23-pre.orig/drivers/hwmon/it87.c2007-07-22 > 11:51:47.0 +0200 > +++ linux-2.6.23-pre/drivers/hwmon/it87.c 2007-07-22 11:56:48.0 > +0200 > @@ -252,7 +252,7 @@ struct it87_data { > > > static int it87_probe(struct platform_device *pdev); > -static int it87_remove(struct platform_device *pdev); > +static int __devexit it87_remove(struct platform_device *pdev); > > static int it87_read_value(struct it87_data *data, u8 reg); > static void it87_write_value(struct it87_data *data, u8 reg, u8 value); > --- linux-2.6.23-pre.orig/drivers/hwmon/pc87360.c 2007-07-22 > 09:54:08.0 +0200 > +++ linux-2.6.23-pre/drivers/hwmon/pc87360.c 2007-07-22 11:56:48.0 > +0200 > @@ -220,7 +220,7 @@ struct pc87360_data { > */ > > static int pc87360_probe(struct platform_device *pdev); > -static int pc87360_remove(struct platform_device *pdev); > +static int __devexit pc87360_remove(struct platform_device *pdev); > > static int pc87360_read_value(struct pc87360_data *data, u8 ldi, u8 bank, > u8 reg); > --- linux-2.6.23-pre.orig/drivers/hwmon/sis5595.c 2007-07-22 > 09:54:08.0 +0200 > +++ linux-2.6.23-pre/drivers/hwmon/sis5595.c 2007-07-22 11:56:48.0 > +0200 > @@ -187,7 +187,7 @@ struct sis5595_data { > static struct pci_dev *s_bridge; /* pointer to the (only) sis5595 */ > > static int sis5595_probe(struct platform_device *pdev); > -static int sis5595_remove(struct platform_device *pdev); > +static int __devexit sis5595_remove(struct platform_device *pdev); > > static int sis5595_read_value(struct sis5595_data *data, u8 reg); > static void sis5595_write_value(struct sis5595_data *data, u8 reg, u8 value); > --- linux-2.6.23-pre.orig/drivers/hwmon/smsc47m1.c2007-07-22 > 09:54:08.0 +0200 > +++ linux-2.6.23-pre/drivers/hwmon/smsc47m1.c 2007-07-22 11:56:48.0 > +0200 > @@ -134,7 +134,7 @@ struct smsc47m1_sio_data { > > > static int smsc47m1_probe(struct platform_device *pdev); > -static int smsc47m1_remove(struct platform_device *pdev); > +static int __devexit smsc47m1_remove(struct platform_device *pdev); > static struct smsc47m1_data *smsc47m1_update_device(struct device *dev, > int init); > > --- linux-2.6.23-pre.orig/drivers/hwmon/via686a.c 2007-07-22 > 09:54:08.0 +0200 > +++ linux-2.6.23-pre/drivers/hwmon/via686a.c 2007-07-22 11:56:48.0 > +0200 > @@ -314,7 +314,7 @@ struct via686a_data { > static struct pci_dev *s_bridge; /* pointer to the (only) via686a */ > > static int via686a_probe(struct platform_device *pdev); > -static int via686a_remove(struct platform_device *pdev); > +static int __devexit via686a_remove(struct platform_device *pdev); > > static inline int via686a_read_value(struct via686a_data *data, u8 reg) > { > --- linux-2.6.23-pre.orig/drivers/hwmon/vt8231.c 2007-07-22 > 09:54:08.0 +0200 > +++ linux-2.6.23-pre/drivers/hwmon/vt8231.c 2007-07-22 11:56:48.0 > +0200 > @@ -167,7 +167,7 @@ struct vt8231_data { > > static struct pci_dev *s_bridge; > static int vt8231_probe(struct platform_device *pdev); > -static int vt8231_remove(struct platform_device *pdev); > +static int __devexit vt8231_remove(struct platform_device *pdev); > static struct vt8231_data *vt8231_update_device(struct device *dev); > static void vt8231_init_device(struct vt8231_data *data); > > @@ -751,7 +751,7 @@ exit_release: > return err; > } > > -static int vt8231_remove(struct platform_device *pdev) > +static int __devexit vt8231_remove(struct platform_device *pdev) > { > struct vt8231_data *data = platform_get_drvdata(pdev); > int i; > --- linux-2.6.23-pre.orig/drivers/hwmon/w83627hf.c2007-07-22 > 11:51:49.0 +0200 > +++ linux-2.6.23-pre/drivers/hwmon/w83627hf.c 2007-07-22 11:56:48.0 > +0200 > @@ -384,7 +384,7 @@ struct w83627hf_sio_data { > > > static int w83627hf_probe(struct platform_device *pdev); >
Re: [PATCH][07/37] Clean up duplicate includes in drivers/hwmon/
Hi Jesper: * Jesper Juhl <[EMAIL PROTECTED]> [2007-07-21 17:02:01 +0200]: > Hi, > > This patch cleans up duplicate includes in > drivers/hwmon/ > > > Signed-off-by: Jesper Juhl <[EMAIL PROTECTED]> > --- > > diff --git a/drivers/hwmon/ams/ams-core.c b/drivers/hwmon/ams/ams-core.c > index 6db9737..a112a03 100644 > --- a/drivers/hwmon/ams/ams-core.c > +++ b/drivers/hwmon/ams/ams-core.c > @@ -23,7 +23,6 @@ > #include > #include > #include > -#include > #include > #include > Applied to hwmon-2.6.git/testing, thanks. -- Mark M. Hoffman [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: v2.6.22.1-rt5
On Tuesday 24 July 2007, Ingo Molnar wrote: >* Gene Heskett <[EMAIL PROTECTED]> wrote: >> The above stanza still needs some tlc. I built a 2.6.22.1-rt6 (rt5 >> wouldn't build) using the same old config that a make oldconfig didn't >> fuss about, but the reboot never completed, see the attached, heavily >> smunched camera shot of the panic. >> >> Kinda looks like hda/sda confusion, with rt3 (this boot), its hda*, >> what is it now? fstab or kernel config error? > >yeah, as long as your filesystems are created with a proper label, all >that you need to do is to change all 'hda' to 'sda' in the new kernel's >/etc/grub.conf entry. (or enable the old IDE code in the .config, under >CONFIG_IDE) > > Ingo I believe it is on: [EMAIL PROTECTED] linux-2.6.22.1-rt6]# grep CONFIG_IDE .config CONFIG_IDE=y # CONFIG_IDEDISK_MULTI_MODE is not set # CONFIG_IDE_TASK_IOCTL is not set CONFIG_IDE_PROC_FS=y CONFIG_IDE_GENERIC=y # CONFIG_IDEPCI_SHARE_IRQ is not set CONFIG_IDEPCI_PCIBUS_ORDER=y # CONFIG_IDEDMA_ONLYDISK is not set # CONFIG_IDE_ARM is not set # CONFIG_IDE_CHIPSETS is not set # CONFIG_IDEDMA_IVB is not set -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Computer programmers never die, they just get lost in the processing. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rtc-linux] Re: rtc-ds1307.c: array overrun
On Sun, 22 Jul 2007 18:17:17 -0700 David Brownell <[EMAIL PROTECTED]> wrote: > > On Sunday 22 July 2007, Adrian Bunk wrote: > > The Coverity checker spotted the following array overrun > > in drivers/rtc/rtc-ds1307.c: > > Typo -- thanks, fix is attached. > > CUT HERE > Fix a typo turned up by a Coverity check: referring to the wrong register, > which could cause problems restarting DS1338 RTCs after their oscillator > halted. (For example, if the backup battery died.) > > Signed-off-by: David Brownell <[EMAIL PROTECTED]> Acked-by: Alessandro Zummo <[EMAIL PROTECTED]> -- Best regards, Alessandro Zummo, Tower Technologies - Torino, Italy http://www.towertech.it - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rtc-linux] [PATCH] s3c2410: fixup after arch moves
On Tue, 24 Jul 2007 13:40:04 +0100 Ben Dooks <[EMAIL PROTECTED]> wrote: > > Fixup the changes from moving around the arch > support for s3c24xx based systems. > > Signed-off-by: Ben Dooks <[EMAIL PROTECTED]> Acked-by: Alessandro Zummo <[EMAIL PROTECTED]> -- Best regards, Alessandro Zummo, Tower Technologies - Torino, Italy http://www.towertech.it - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix arch/i386/kernel/nmi.c - 'unknown_nmi_panic_callback' declared 'static' but never defined warning
On Sun, 22 Jul 2007 21:20:38 +0200 Gabriel C <[EMAIL PROTECTED]> wrote: > I get this warning when CONFIG_SYSCTL is not set : > > ... > > arch/i386/kernel/nmi.c:52: warning: 'unknown_nmi_panic_callback' declared > 'static' but never defined > > ... > > Signed-off-by: Gabriel Craciunescu <[EMAIL PROTECTED]> > > --- > > diff --git a/arch/i386/kernel/nmi.c b/arch/i386/kernel/nmi.c > index 03b7f55..cf11121 100644 > --- a/arch/i386/kernel/nmi.c > +++ b/arch/i386/kernel/nmi.c > @@ -49,8 +49,9 @@ static unsigned int nmi_hz = HZ; > static DEFINE_PER_CPU(short, wd_enabled); > > /* local prototypes */ > +#ifdef CONFIG_SYSCTL > static int unknown_nmi_panic_callback(struct pt_regs *regs, int cpu); > - > +#endif > static int endflag __initdata = 0; guys, please take a closer look at the code which you're changing? We can obviously move do_nmi_callback() down to above __trigger_all_cpu_backtrace() and then do away with this declaration altogether. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: v2.6.22.1-rt5
On Tuesday 24 July 2007, Ingo Molnar wrote: >* Gene Heskett <[EMAIL PROTECTED]> wrote: >> The above stanza still needs some tlc. I built a 2.6.22.1-rt6 (rt5 >> wouldn't build) using the same old config that a make oldconfig didn't >> fuss about, but the reboot never completed, see the attached, heavily >> smunched camera shot of the panic. >> >> Kinda looks like hda/sda confusion, with rt3 (this boot), its hda*, >> what is it now? fstab or kernel config error? > >yeah, as long as your filesystems are created with a proper label, all >that you need to do is to change all 'hda' to 'sda' in the new kernel's >/etc/grub.conf entry. (or enable the old IDE code in the .config, under >CONFIG_IDE) > > Ingo Changing the "root (hd0,0)" to (sd0,0) failed. Grub can't parse the (sd0,0). -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) WHERE CAN THE MATTER BE Oh, dear, where can the matter be When it's converted to energy? There is a slight loss of parity. Johnny's so long at the fair. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: -mm merge plans for 2.6.23
From: "Matthew Hawkins" <[EMAIL PROTECTED]> Date: Wed, 25 Jul 2007 11:26:57 +1000 > On 7/24/07, Andrew Morton <[EMAIL PROTECTED]> wrote: > > The other consideration here is, as Nick points out, are the problems which > > people see this patch solving for them solveable in other, better ways? > > IOW, is this patch fixing up preexisting deficiencies post-facto? > > So let me get this straight - you don't want to merge swap prefetch > which exists now and solves issues many people are seeing, and has > been tested more than a gazillion other bits & pieces that do get > merged - because it could be possible that in the future some other > patch, which doesn't yet exist and nobody is working on, may solve the > problem better? I have to generally agree that the objections to the swap prefetch patches have been conjecture and in general wasting time and frustrating people. There is a point at which it might be wise to just step back and let the river run it's course and see what happens. Initially, it's good to play games of "what if", but after several months it's not a productive thing and slows down progress for no good reason. If a better mechanism gets implemented, great! We'll can easily replace the swap prefetch stuff at such time. But until then swap prefetch is what we have and it's sat long enough in -mm with no major problems to merge it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] powerpc: Pegasos keyboard detection
As of 2.6.22 the kernel doesn't recognize the i8042 keyboard/mouse controller on the PegasosPPC. This is because of a feature/bug in the OF device tree: the "device_type" attribute is an empty string instead of "8042" as the kernel expects. This patch (against 2.6.22.1) adds a secondary detection which looks for a device whose *name* is "8042" if there is no device whose *type* is "8042". Signed-off-by: Alan Curry <[EMAIL PROTECTED]> --- arch/powerpc/kernel/setup-common.c.orig 2007-07-24 19:04:17.0 -0500 +++ arch/powerpc/kernel/setup-common.c 2007-07-24 19:06:36.0 -0500 @@ -487,6 +487,10 @@ int check_legacy_ioport(unsigned long ba switch(base_port) { case I8042_DATA_REG: np = of_find_node_by_type(NULL, "8042"); + /* Pegasos has no device_type on its 8042 node, look for the +* name instead */ + if (!np) + np = of_find_node_by_name(NULL, "8042"); break; case FDC_BASE: /* FDC1 */ np = of_find_node_by_type(NULL, "fdc"); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: v2.6.22.1-rt5
On Tuesday 24 July 2007, Ingo Molnar wrote: >* Gene Heskett <[EMAIL PROTECTED]> wrote: >> The above stanza still needs some tlc. I built a 2.6.22.1-rt6 (rt5 >> wouldn't build) using the same old config that a make oldconfig didn't >> fuss about, but the reboot never completed, see the attached, heavily >> smunched camera shot of the panic. >> >> Kinda looks like hda/sda confusion, with rt3 (this boot), its hda*, >> what is it now? fstab or kernel config error? > >yeah, as long as your filesystems are created with a proper label, all >that you need to do is to change all 'hda' to 'sda' in the new kernel's >/etc/grub.conf entry. (or enable the old IDE code in the .config, under >CONFIG_IDE) > > Ingo Damn, I didn't say it clear enough, the / is on an LVM volume. /Boot is on /dev/hda1, aka (hd0,0) in the first line. Since the msg pointed at 0,0, I'll switch that line to "root (sd0,0)" just for grins. I take it that was an auto-conversion? I did nothing to confirm any changes when I ran a make oldconfig, using the 2.6.22.1-rt3 .config as the src config. Thanks. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) But I was there and I saw what you did, I saw it with my own two eyes. So you can wipe off that grin; I know where you've been-- It's all been a pack of lies! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: -mm merge plans for 2.6.23
On 7/24/07, Andrew Morton <[EMAIL PROTECTED]> wrote: The other consideration here is, as Nick points out, are the problems which people see this patch solving for them solveable in other, better ways? IOW, is this patch fixing up preexisting deficiencies post-facto? So let me get this straight - you don't want to merge swap prefetch which exists now and solves issues many people are seeing, and has been tested more than a gazillion other bits & pieces that do get merged - because it could be possible that in the future some other patch, which doesn't yet exist and nobody is working on, may solve the problem better? You know what, just release Linux 0.02 as 2.6.23 because, using your logic, everything that was merged since October 5, 1991 could be replaced by something better. Perhaps. So there's obviously no point having it there in the first place & there'll be untold savings in storage costs and compilation time for the kernel tree, also bandwidth for the mirror sites etc. in the mean time while we wait for the magic pixies to come and deliver the one true piece of code that cannot be improved upon. Well. The above, plus there's always a lot of stuff happening in MM land, and I haven't seen much in the way of enthusiasm from the usual MM developers. I haven't seen much in the way of enthusiasm from developers, period. People are tired of maintaining patches for years that never get merged into mainline because of totally bullshit reasons (usually amounting to NIH syndrome) -- Matt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] ps3: Disk Storage Driver
On Wed, 25 Jul 2007 11:09:21 +1000 Paul Mackerras <[EMAIL PROTECTED]> wrote: > Also, I prefer the style where the ? and : operators have a space > after them but not before them, rather than a space either side. Could I point out that your likes and dislikes are immaterial? The whole point here is to get kernel code looking consistent. That means that basically everyone ends up doing things which they'd prefer not to do. That certainly applies to me. The idea is that the benefit of making things consistent exceeds the costs of some individuals adopting styles which they are less used to. So telling people what you do and don't like is simply irrelevant, except for when it is used as an input in determining what the standard kernel style is to be. (And that is largely determined by observing what we have now). And sure, major subsytems can and do go off and do their own thing - ia64 for example has done a lot of that, pretty consistently. The world hasn't ended as a result. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Time Problems with 2.6.23-rc1-gf695baf2
On Wednesday 25 July 2007, Bartlomiej Zolnierkiewicz wrote: > > Hi, > > On Wednesday 25 July 2007, Michal Piotrowski wrote: > > Hi, > > > > On 24/07/07, Eric Sesterhenn / Snakebyte <[EMAIL PROTECTED]> wrote: > > > hi, > > > > > > seems like the clock got screwed or something similar. During bootup the > > > computer hangs (no response on keyboard leds when pressing caps lock), > > > the only way to make sure it resumes booting is by repeatedly pressing > > > the power switch, > > > > :) > > > > > see second 13 to 510, after pressing it about ten > > > times, it continues booting. > > > > Probing IDE interface... > > > > [ 13.867939] VP_IDE: VIA vt82c686a (rev 22) IDE UDMA66 controller on > > pci:00:04.1 > > [ 13.868062] ide0: BM-DMA at 0xd800-0xd807, BIOS settings: > > hda:DMA, hdb:pio > > [ 13.868268] ide1: BM-DMA at 0xd808-0xd80f, BIOS settings: > > hdc:DMA, hdd:DMA > > [ 13.868574] Probing IDE interface ide0... > > [ 387.279576] Clocksource tsc unstable (delta = 370195339890 ns) > > [ 496.200082] hda: ST340823A, ATA DISK drive > > [ 510.264511] hda: selected mode 0x44 > > [ 510.264826] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 > > > > Could you please try to revert these commits > > It doesn't seem like a IDE bug et all, rather seems to be some issue > related to the recent "lack of the proper clocksource fallback" bug... or ACPI > > > [ 13.506890] ACPI Exception (processor_throttling-0084): AE_NOT_FOUND, > > > Evaluating _PTC [20070126] > > > [ 13.507101] ACPI Exception (processor_throttling-0147): AE_NOT_FOUND, > > > Evaluating _TSS [20070126] Bart - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] update checkpatch.pl to version 0.08
On Tue, Jul 24, 2007 at 03:32:59PM -0500, jschopp wrote: >>> Yep I think the consensus is we need a >>> "--i-don't-agree-just-check-things-which-will-get-me-rejected-out-of-hand" >>> option of some sort which will restrict output to the real errors. >> No, the default should be to show only the real errors. > > CodingStyle violations are real errors. > > If we have agreed that code should look a certain way, and there is a patch > that doesn't look that way, that is an error. Maybe not a runtime error, > but a readability error. A reviewability error. A maintainability error. > A big waste of everybodies time. > > I personally don't care if code is indented with 2 spaces, 4 spaces, or a > tab. What I do care about is that all the code is indented consistently so > we don't waste an ounce of our energy reading code/patches and thinking > about indentation or even worse spending our time arguing over it on > mailing lists when there are better things to argue about. > > Back when I wrote the early versions of this script I didn't write it > because I'm anal retentive about CodingStyle. I wrote it for the exact > opposite reason. I was tired of seeing email on mailing lists reviewing > patches saying there was indentation with spaces instead of tabs, or > trailing whitespace, or { on the wrong line. It was a waste of the > reviewers time, it was a waste of the developers time, it was a waste of > the time of everybody on the mailing lists. We should spend all that > energy arguing over the merits of what the code does. There's a relatively small amount of common codingstyle mistakes accounting for most of these mistakes. > So let's argue over the CodingStyle once and be done with the argument > instead of having the argument every day on the mailing lists forever. We > end up with more time to argue over much more interesting subjects and we > end up with consistent code that is easy to read, review, and maintain. It's also important to note that there are slightly different codingstyles in different parts of the kernel, and you won't get people to agree on one. A common codingstyle is important, but unifying the last bits is simply not worth the hassle. There are more important things than exploiting the corner cases of codingstyle, e.g. could you teach checkpatch.pl to give exactly two errors for the following code? while (a); for (b = 0; b < 50; b++); for (c = 0; c < sizeof(struct module); c++) d = e; cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/3] ps3: Disk Storage Driver
Andy Whitcroft writes: > Ok, this is something we need to decide on. Currently we only ask for > consistent spacing on all the mathematic operators. This is mostly as > we do see a large number of non-spaced uses in defines and the like. > > I am happy to expand these tests so they are always spaced on both sides > style if that is the preference. It depends very much on the context - on the precedence and relative importance of one operator with respect to other operators and the statement as a whole. In general I prefer spaces around binary operators, but there are situations where not putting spaces around some operators can enhance the readability of the statement as a whole. If checkpatch.pl starts whinging about operators without spaces that will just be yet another reason not to use it IMHO. Also, I prefer the style where the ? and : operators have a space after them but not before them, rather than a space either side. Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Patches for REALLY TINY 386 kernels
On 7/24/07, Adrian Bunk <[EMAIL PROTECTED]> wrote: On Tue, Jul 24, 2007 at 01:50:35PM -0700, Yinghai Lu wrote: > On 7/24/07, Helge Hafting <[EMAIL PROTECTED]> wrote: >> Andi Kleen wrote: >> >> Some people are putting Linux kernels in the "BIOS" (i.e. ROM chip) >> when >> >> using LinuxBIOS (www.linuxbios.org). It _does_ make a lot of difference >> >> there how big the kernel is. At the moment you can't do that with >> >> anything smaller than a 1 MB chip. But if people could use 512 KB chips >> >> because the kernel is small enough that would sure be a great thing. >> >> >> > >> > I'm sure it would be possibel to save a lot of text size. But I don't >> > think removing the relatively small CPUID code is the right way. >> > That is just a big maintenance issue for little gain. >> > >> Well - anyone compiling linux for BIOS usage is targetting >> a single machine. So an ability to target a single machine is useful, >> i.e. run the CPUID at compile-time, put the answer in a constant/macro, >> let the optimizer prune the alternatives. :-) > > we are using AMD64 + LinuxBIOS + Kernel (without acpi) + kexec to load > final kernel. > So we can use drivers in kernel for any media (SCSI, SATA, IB,...), > not like EFI need every driver re-porting. and We could use KVM in > kernel to load other OS if needed. > > The problem is Kernel is getting bigger and bigger. and old Tiny > kernel is stopping at 2.6.18... >... Please send: - the .config for the last kernel small enough - your size limit - your gcc version and I'll look at this. http://www.linuxbios.org/Tyan_S2892_Build_Tutorial http://www.linuxbios.org/pipermail/linuxbios/2006-October/016558.html YH - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On Wed, 25 Jul 2007, Jerome Glisse wrote: On 7/24/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: On Tue, 24 Jul 2007, Jerome Glisse wrote: > On 7/23/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > On Mon, 23 Jul 2007, Igor Stoppa wrote: > > > again, HAL / OHM / Mobilin > > > > I was trying to define the lower level interfaces that these tools > > need. > > today they can only know what is possible by reading the source code > > for > > each driver and implementing the driver-specific interfaces nessasary > > to > > set things, I was proposing a common interface that tools like this > > could > > use instead of requiring all the driver-specific knowledge. > > > > > > in a nutshell (and I know this is probably not detailed to be > > acceptable) > > > > 1. the software needs to know what the interconnects and dependancies > > between devices are (supposedly this is provided via sysfs) > > > > 2. the software needs to know what type of device this is (again, > > supposedly this is provided via sysfs) > > > > 3. the software needs to know what modes exist for a driver/piece of > > hardware. to make any decisions this infomation needs to provide > > some > > information about the capability of the mode and the power > > consumed in > > that mode. in addition there will need to be flags to indicate > > any > > special restrictions of a mode > > > > 4. the software needs to know the cost of switching from any mode to > > any > > other mode. since some transitions will interact with other > > devices > > there will need to be flags to indicate such requirements for > > specific > > transitions. > > > > 5. the software needs to be able to find out what mode a device is > > in. > > > > 6. the software needs to be able to tell the driver to switch to a > > different mode (I think it would be a very good thing if going to > > a > > particular mode was always the same command, no matter what mode > > it is > > currently in) > > > > 7. the software needs to figure out the desire of the user. > > > > my proposal was addressing items #3-#6. it isn't trying to decide > > what to > > do, simply to allow the software that _is_ trying to decide what to > > do a > > way to find out what it can do. > > > > David Lang > > I believe a central place where user can set/change hw state to save > power or to increase computational power is definitely a goal to pursue. > But i truly think that the OHM approach is the best one ie using plugins > so that one can make a plugin specific for each device. The point is > that > i believe there is no way to do an abstract interface for this and > trying to > do so will endup doing ugly code and any interface would fail to > encompass > all possible tweak that might exist for all devices. will each plugin have it's own interface? or will you have one interface to access the plugins and then the plugins do things behind the scenes? I'll bet that the API for the plugins is common, and if so then it could be similar to the API that I suggested. I take here ohm as a reference (this come from my limited understanding of this daemon so there might be inaccuracy) driver export through HAL there power management tunning capacity, Then an ohm plugin would use HAL to give a higher view of this capacity and also manage policy, preference, permission, ... Last consumer in power management food chain would be an user interface which will communicate with ohm (and with all ohm plugin) so desktop writter (gnome, kde, ...) can write some kind of power management center where each ohm plugin can have its own panel. So in the end the user got one place to do all its power management which is the goal i think you are trying to aim. no. I am talking about the interface to the drivers that things like HAL would use > For instance on graphics card you could do the following (maybe more): > -change GPU clock > -change memory clock > -disable part of engine > -disable unit > i truly don't think you can make a common interface for all this, more > over there might be constraint on how you can change things (GPU & > memory clock might need to follow a given ratio). So you definitely > need knowledge in the user space program to handle this. sure you can, just enumerate all the options the driver writer wants to offer as options. yes this could be a lengthy list, so what? My point was that your interface by trying to fit square pegs into round hole will fail to expose all subtility of each device which might in the end bring to wrong power management decision. So i believe we can't sum up power management to list of mode whose attribute are power consumption & capacity. it's possible (which is part of the reason I started the thread), but so far there
Re: [patch 2.6.23-rc1] dma_free_coherent() needs irqs enabled (sigh)
On Tuesday 24 July 2007, Russell King wrote: > > > > > > I think you got the year wrong: > > > > > > 5edf71ae (Russell King 2005-11-25 15:52:51 + 364) > > > WARN_ON(irqs_disabled()); > > > > > > which is due to this commit: > > > > > > [ARM] Do not call flush_tlb_kernel_range() with IRQs disabled. > > > > This little "to do" list item has been sitting in my mailbox way > > too long then. Certainly since it was fair to say "last year"! ;) > > Are you intentionally not reading what I said? Hardly. Go back and read what *I* wrote! It just took a while to notice that behavioral change, since I don't normally run the relevant regression tests using lockdep. It was sometime in the first half of 2006, ergo "since it was fair to say 'last year'". A bunch of piecemeal workarounds followed; and recently they were all replaced with a more fundamental fix. This doc patch was the tail end of the process of recovering from that change ... and the warnings are there to help other folk from seeing the same issue in other contexts. - Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 13/16] Switch to operating with pid_numbers instead of pids
Pavel Emelianov [EMAIL PROTECTED] wrote: | Make alloc_pid() initialize pid_numbers and hash them | into the hashtable, not the struct pid itself. | | Signed-off-by: Pavel Emelianov <[EMAIL PROTECTED]> | | --- | | pid.c | 47 +-- | 1 files changed, 33 insertions(+), 14 deletions(-) | | --- ./kernel/pid.c.ve12 2007-07-05 11:06:41.0 +0400 | +++ ./kernel/pid.c2007-07-05 11:08:23.0 +0400 | @@ -28,8 +28,10 @@ | #include | #include | #include | +#include | | -#define pid_hashfn(nr) hash_long((unsigned long)nr, pidhash_shift) | +#define pid_hashfn(nr, ns) \ | + hash_long((unsigned long)nr + (unsigned long)ns, pidhash_shift) | static struct hlist_head *pid_hash; | static int pidhash_shift; | struct pid init_struct_pid = INIT_STRUCT_PID; | @@ -194,7 +198,7 @@ fastcall void put_pid(struct pid *pid) | if (!pid) | return; | | - ns = pid->numbers[0].ns; | + ns = pid->numbers[pid->level].ns; | if ((atomic_read(>count) == 1) || |atomic_dec_and_test(>count)) | kmem_cache_free(ns->pid_cachep, pid); | @@ -210,13 +214,17 @@ static void delayed_put_pid(struct rcu_h | fastcall void free_pid(struct pid *pid) | { | /* We can be called with write_lock_irq(_lock) held */ | + int i; | unsigned long flags; | | spin_lock_irqsave(_lock, flags); | - hlist_del_rcu(>pid_chain); | + for (i = 0; i <= pid->level; i++) | + hlist_del_rcu(>numbers[i].pid_chain); | spin_unlock_irqrestore(_lock, flags); | | - free_pidmap(_pid_ns, pid->nr); | + for (i = 0; i <= pid->level; i++) | + free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr); | + | call_rcu(>rcu, delayed_put_pid); | } | | @@ -224,30 +232,43 @@ struct pid *alloc_pid(struct pid_namespa | { | struct pid *pid; | enum pid_type type; | - int nr = -1; | + struct pid_namespace *ns; | + int i, nr; | | - pid = kmem_cache_alloc(init_pid_ns.pid_cachep, GFP_KERNEL); | + pid = kmem_cache_alloc(pid_ns->pid_cachep, GFP_KERNEL); | if (!pid) | goto out; | | - nr = alloc_pidmap(current->nsproxy->pid_ns); | - if (nr < 0) | - goto out_free; | + ns = pid_ns; | + for (i = pid_ns->level; i >= 0; i--) { | + nr = alloc_pidmap(ns); | + if (nr < 0) | + goto out_free; If pid_ns->level is say 3 and alloc_pidmap() succeeds when i=0,1 and fails when i=2, we would try to free_pidmap() even from pid->pid_number[2].pid_ns. This would incorrectly a) drop reference count on that pid namespace, and incorrectly increment pidmap->nr_free. Should we use kmem_cache_zalloc() and check for a non-NULL pid_ns before calling free_pidmap() below ? | | + pid->numbers[i].nr = nr; | + pid->numbers[i].ns = ns; | + ns = ns->parent; | + } | + | + pid->level = pid_ns->level; | atomic_set(>count, 1); | - pid->nr = nr; | for (type = 0; type < PIDTYPE_MAX; ++type) | INIT_HLIST_HEAD(>tasks[type]); | | spin_lock_irq(_lock); | - hlist_add_head_rcu(>pid_chain, _hash[pid_hashfn(pid->nr)]); | + for (i = pid->level; i >= 0; i--) | + hlist_add_head_rcu(>numbers[i].pid_chain, | + _hash[pid_hashfn(pid->numbers[i].nr, | + pid->numbers[i].ns)]); | spin_unlock_irq(_lock); | - | out: | return pid; | | out_free: | - kmem_cache_free(init_pid_ns.pid_cachep, pid); | + for (i++; i <= pid->level; i++) | + free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr); i.e all pid->numbers[] may not be initialized here right ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: crash with 2.6.22.1 crash:ll_rw_blk.c blk_remove_plug()
On 7/23/07, Jens Axboe <[EMAIL PROTECTED]> wrote: On Sun, Jul 22 2007, Satyam Sharma wrote: > Hi Walter, > > Thanks for reporting this. > > On 7/22/07, walter harms <[EMAIL PROTECTED]> wrote: >> hello all, >> on my asus notebook tm620 there is a crash with 2.6.22 and 2.6.21 > > Did this happen when you were resuming from a suspend-to-ram/disk? > [ I ask because I see swsusp in the trace below, linux-pm added to Cc: ] > >> >> Using IPI Shortcut mode >> WARNING: at block/ll_rw_blk.c:1575 blk_remove_plug() >> [] blk_remove_plug+0x36/0x5a >> [] __generic_unplug_device+0x14/0x1f >> [] __make_request+0x39b/0x49c >> [] generic_make_request+0x228/0x255 >> [] submit_bio+0xa5/0xac >> [] mempool_alloc+0x37/0xae >> [] submit+0xc2/0x11d >> [] bio_read_page+0x24/0x27 >> [] swsusp_check+0x4f/0xaf >> [] software_resume+0x5f/0x108 >> [] kernel_init+0xb0/0x212 >> [] ret_from_fork+0x6/0x1c >> [] kernel_init+0x0/0x212 >> [] kernel_init+0x0/0x212 >> [] kernel_thread_helper+0x7/0x10 >> === > > Surprising, that's a WARN_ON(!irqs_disabled()) but IRQs are disabled > alright on that codepath. OTOH, __make_request() is heavily goto-driven, > uses the non-save/restore variants of spin_lock_irq, and does not even > balance locks / unlocks for some error paths ... gaah. __make_request() must be called from process context, hence spin_lock_irq() is perfectly already and the fastest way to go. And of course the locking is balanced! So please save your 'gaah's for code you actually took the time to try and understand. You're right, I didn't really look at that code for long (it even explicitly comments about what's going with the locking in there!) sorry about that. [ Off-topic: BTW does every call to __make_request() end up in blk_remove_plug()? Since you're explicitly making the assumption that it *must* be called from process context (and hence the use of the non-save/restore variants), you could consider putting a WARN_ON(irqs_disabled()) over there, and perhaps a WARN_ON (!spin_is_locked(queue_lock)) in blk_remove_plug() instead, and other such similar functions that currently have the !irqs_disabled check. This way you'd effectively cover _both_ the assertions, and in appropriate places -- just a suggestion. ] But it does look like unbalanced irq disable/enable calls. I'd guess in the suspend/resume path. Obviously something more esoteric, since this is the first such report for 2.6.22, so like some not-very-used driver for instance. Now that I do look at the codepath, it does seem surprising irqs were not disabled there. There are a bunch of calls to _other_ functions between the spin_lock_irq and the blk_remove_plug via __generic_unplug_device that would also have complained about !irqs_disabled. Walter, does this happen reproducibly? Satyam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20->2.6.21 - networking dies after random time
On Tue, 2007-07-24 at 22:04 +0200, Ingo Molnar wrote: > Marcin, could you try the patch below too? [without having any other > patch applied.] It basically turns the critical section into an irqs-off > critical section and thus checks whether your problem is related to that > particular area of code. > I read back on this thread and I think the problem is somewhere else: delayed disable relies on the ability to re-trigger the interrupt in the case that a real interrupt happens after the software disable was set. In this case we actually disable the interrupt on the hardware level _after_ it occurred. On enable_irq, we need to re-trigger the interrupt. On i386 this relies on a hardware resend mechanism (send_IPI_self()). Actually we only need the resend for edge type interrupts. Level type interrupts come back once enable_irq() re-enables the interrupt line. I assume that the interrupt in question is level triggered because it is shared and above the legacy irqs 0-15: 17: 12 IO-APIC-fasteoi eth1, eth0 Looking into the IO_APIC code, the resend via send_IPI_self() happens unconditionally. So the resend is done for level and edge interrupts. This makes the problem more mysterious. The code in question lib8390.c does disable_irq(); fiddle_with_the_network_card_hardware() enable_irq(); The fiddle_with_the_network_card_hardware() might cause interrupts, which are cleared in the same code path again, Marcin found that when he disables the irq line on the hardware level (removing the delayed disable) the card is kept alive. So the difference is that we can get a resend on enable_irq, when an interrupt happens during the time, where we are in the disabled region. No idea how this affects the network card, as the code there must be able to handle interrupts, which are not originated from the card due to interrupt sharing. Marcin, can you please try the patch below ? It's just a debugging aid to gather some more data about that problem. If the patch fixes the problem, then we should try to disable the resend mechanism for not edge type irq lines on the irq_chip level (i.e. the IOAPIC code) Thanks, tglx --- linux-2.6.orig/kernel/irq/resend.c +++ linux-2.6/kernel/irq/resend.c @@ -62,6 +62,15 @@ void check_irq_resend(struct irq_desc *desc, unsigned int irq) */ desc->chip->enable(irq); + /* +* Temporary hack to figure out more about the problem, which +* is causing the ancient network cards to die. +*/ + if (desc->handle_irq != handle_edge_irq) { + printk(KERN_DEBUG "Skip resend for irq %u\n", irq); + return; + } + if ((status & (IRQ_PENDING | IRQ_REPLAY)) == IRQ_PENDING) { desc->status = (status & ~IRQ_PENDING) | IRQ_REPLAY; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sysfs/udev broken in 2.6.23-rc1 [input, i2c, ...] (Was: sysfs/udev broken in latest git?)
On 7/24/07, Simon Arlott <[EMAIL PROTECTED]> wrote: On 24/07/07 17:34, Kay Sievers wrote: > On 7/24/07, Simon Arlott <[EMAIL PROTECTED]> wrote: >> On 24/07/07 13:54, Cornelia Huck wrote: >> > On Tue, 24 Jul 2007 11:20:02 +0200, >> > "Kay Sievers" <[EMAIL PROTECTED]> wrote: >> > >> >> It looks fine to me. "device" links must never point to anything else >> >> than a bus device. While it's still true, for input we have special rules because the "stacked class devices" existed only there. At least for SYSFS_DEPRECATED, all input devices should have a "device" symlink pointing to the bus-device. >> > Hm, but then >> > 1. The patch sneaks this check in (the old code only checked for >> >dev->parent) >> > 2. The code is rather inconsistent now, since none of the other code >> >paths check for dev->parent->bus... Yeah, that's true. >> Removing the dev->parent->bus check fixes it: Yes, let's remove the check, I will check now if we possibly need to fix more than this or only the block-device patch. Thanks, Kay - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add __GFP_ZERO to GFP_LEVEL_MASK
On Tue, 24 Jul 2007 16:58:51 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> wrote: > On Tue, 24 Jul 2007, Andrew Morton wrote: > > > __GFP_COMP I'm not so sure about. > > drivers/char/drm/drm_pci.c:drm_pci_alloc() (and other places like > > infiniband) > > pass it into dma_alloc_coherent() which some architectures implement via > > slab. umm, > > arch/arm/mm/consistent.c is one such. > > Should drm_pci_alloc really aright in setting __GFP_COMP? I don't see what's special about that dma_alloc_coherent() call. > dma_alloc_coherent does not set __GFP_COMP for other higher order allocs > and expects to be able to operate on the page structs indepedently. That > is not the case for a compound page. > > Creates a really interesting case for SLAB. Slab did not use __GFP_COMP in > order to be able to allow the use page->private (No longer an issue since > the 2.6.22 cleanups and avoiding the use of page->private for the compound > head). > > Now the __GFP_COMP flag is passed through for any higher order page alloc > (such as a kmalloc allocation > PAGE_SIZE). Then we may have allocated one > slab that is a compound page amoung others higher order pages allocated > without __GFP_COMP. May have caused rare and strange failures in 2.6.21 > and earlier because of the concurrent page->private use in compound head > pages and arch pages. > > SLUB will always use __GFP_COMP so the pages are consistent regardless if > __GFP_COMP is passed in or not. > > The strange scenarios come about by expecting a page allocation when > sometimes we just substitute a slab alloc. > > We could filter __GFP_COMP out to avoid the BUG()? Or deal with it on a > case by case basis? Fix callers, I'd suggest. There are a number of fishy-looking open-coded usages of __GFP_COMP around the place. It's a bit sad that some architectures are using slab for dma_alloc_coherent() while others go to alloc_pages(). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add __GFP_ZERO to GFP_LEVEL_MASK
On Tue, 24 Jul 2007, Andrew Morton wrote: > __GFP_COMP I'm not so sure about. > drivers/char/drm/drm_pci.c:drm_pci_alloc() (and other places like infiniband) > pass it into dma_alloc_coherent() which some architectures implement via > slab. umm, > arch/arm/mm/consistent.c is one such. Should drm_pci_alloc really aright in setting __GFP_COMP? dma_alloc_coherent does not set __GFP_COMP for other higher order allocs and expects to be able to operate on the page structs indepedently. That is not the case for a compound page. Creates a really interesting case for SLAB. Slab did not use __GFP_COMP in order to be able to allow the use page->private (No longer an issue since the 2.6.22 cleanups and avoiding the use of page->private for the compound head). Now the __GFP_COMP flag is passed through for any higher order page alloc (such as a kmalloc allocation > PAGE_SIZE). Then we may have allocated one slab that is a compound page amoung others higher order pages allocated without __GFP_COMP. May have caused rare and strange failures in 2.6.21 and earlier because of the concurrent page->private use in compound head pages and arch pages. SLUB will always use __GFP_COMP so the pages are consistent regardless if __GFP_COMP is passed in or not. The strange scenarios come about by expecting a page allocation when sometimes we just substitute a slab alloc. We could filter __GFP_COMP out to avoid the BUG()? Or deal with it on a case by case basis? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Regression in serial console on ia64 after 2.6.22
IA64 Subject : Regression in serial console on ia64 after 2.6.22 References : http://marc.info/?l=linux-ia64=118483645914066=2 Last known good : ? Submitter : Horms <[EMAIL PROTECTED]> Caused-By : Yinghai Lu <[EMAIL PROTECTED]> commit 18a8bd949d6adb311ea816125ff65050df1f3f6e Handled-By : ? Status : unknown please test this patch. YH [PATCH] ia64: move machvec_init before parse_early_param So ia64_mv is initialized before early console Signed-off-by: Yinghai Lu <[EMAIL PROTECTED]> diff --git a/arch/ia64/kernel/machvec.c b/arch/ia64/kernel/machvec.c index 13df337..a94feaa 100644 --- a/arch/ia64/kernel/machvec.c +++ b/arch/ia64/kernel/machvec.c @@ -14,12 +14,6 @@ struct ia64_machine_vector ia64_mv; EXPORT_SYMBOL(ia64_mv); static __initdata const char *mvec_name; -static __init int setup_mvec(char *s) -{ - mvec_name = s; - return 0; -} -early_param("machvec", setup_mvec); static struct ia64_machine_vector * __init lookup_machvec (const char *name) @@ -42,6 +36,10 @@ machvec_init (const char *name) if (!name) name = mvec_name ? mvec_name : acpi_get_sysname(); + + if (!mvec_name) + mvec_name = name; + mv = lookup_machvec(name); if (!mv) panic("generic kernel failed to find machine vector for" diff --git a/arch/ia64/kernel/setup.c b/arch/ia64/kernel/setup.c index cf06fe7..b06d7b7 100644 --- a/arch/ia64/kernel/setup.c +++ b/arch/ia64/kernel/setup.c @@ -481,6 +481,9 @@ int __init reserve_elfcorehdr(unsigned long *start, unsigned long *end) void __init setup_arch (char **cmdline_p) { +#ifdef CONFIG_IA64_GENERIC + char *mvstr; +#endif unw_init(); ia64_patch_vtop((u64) __start___vtop_patchlist, (u64) __end___vtop_patchlist); @@ -491,12 +494,15 @@ setup_arch (char **cmdline_p) efi_init(); io_port_init(); - parse_early_param(); - #ifdef CONFIG_IA64_GENERIC - machvec_init(NULL); + mvstr = strstr(*cmd_line_p, "machvec=") + if (mvstr) + mvstr = strchr(mvstr, '=') + 1; + machvec_init(mvstr); #endif + parse_early_param(); + if (early_console_setup(*cmdline_p) == 0) mark_bsp_online(); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Thinkpad ACPI
On Tue, 24 Jul 2007 17:19:17 -0500, YOSHIFUJI Hideaki / 吉藤英明 wrote: > Linux 2.6.23-rc1 fails to power off my ThinkPad T42. Git-bisect told me > that the following commit is to blame, and by reverting that commit, it > works appropriately. I have noted the same behavior on a Thinkpad 600X. On the topic of Thinkpad ACPI, the following has occured twice (out of over one hundred boots) in the last month during boot, once with a 2.6.21.5 kernel and once with 2.6.22.1. Jul 23 06:51:31 celestial kernel: [ 204.882991] ACPI: Core revision 20070126 Jul 23 06:51:31 celestial kernel: [ 204.947102] ACPI: setting ELCR to 0a00 (from 0800) Jul 23 06:51:31 celestial kernel: [ 208.791654] ACPI Error (hwacpi-0142): Hardware did not change modes [20070126] Jul 23 06:51:31 celestial kernel: [ 208.792229] ACPI Error (evxfevnt-0086): Could not transition to ACPI mode [20070126] Jul 23 06:51:31 celestial kernel: [ 208.792806] ACPI Warning (utxface-0139): AcpiEnable failed [20070126] Jul 23 06:51:31 celestial kernel: [ 208.793299] ACPI: Unable to enable ACPI - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Power Management framework proposal
On Tue, 2007-07-24 at 16:02 -0700, [EMAIL PROTECTED] wrote: > > what requirements are needed? (I'm sure that there are others, but > hopefully it's possible to avoid requirements like 'the clock speed > for > device A must be >X to allow device B to operate in mode Y') I had an idea a while ago, might still be in the pm list archives, of exposing constraints as opaque bitmaps. The bits have defined meaning for a given bus, but are opaque to the core. The devices however, provide tables indicating to the core their list of power states (with names) and their requirements in term of parent states (using such bitmasks). Thus, the core can resolve the dependency requirements without having to know about the actual meaning of the states of the various busses. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] scheduler: improve SMP fairness in CFS
Chris Snook wrote: A fraction of *each* CPU, or a fraction of *total* CPU? Per-cpu granularity doesn't make anything more fair. Well, our current solution uses per-cpu weights, because our vendor couldn't get the load balancer working accurately enough. Having per-cpu weights and cpu affinity gives acceptable results for the case where we're currently using it. If the load balancer is good enough, per-system weights would be fine. It would have to play nicely with affinity though, in the case where it makes sense to lock tasks to particular cpus. If I have two threads with the same priority, and two CPUs, the scheduler will put one on each CPU, and they'll run happily without any migration or balancing. Sure. Now add a third thread. How often do you migrate? Put another way, over what time quantum do we ensure fairness? Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc1 sky2 boot crash in sky2_mac_intr
Hi Florian, On 24/07/07, Florian Lohoff <[EMAIL PROTECTED]> wrote: On Tue, Jul 24, 2007 at 09:50:08AM +0100, Stephen Hemminger wrote: > The problem is related to power management. The PHY has a number of PCI configuration > registers for power control, and the function of these changes based on the version and > revision of the chip. The driver does work on older versions of the EC-U, in > Fujitsu laptop's, it is just the new rev that is broken. > > The driver should probably fail smarter (by not loading) if the PHY isn't powered > up correctly, but that doesn't help your problem. > > The vendor has provided me with documentation on many versions > of the chip, but I don't have doc's on the lastest revision differences of the EC Ultra, > so a proper solution is not easily available. The best method for resolving this would > be to first try the vendor driver version of sk98lin and see if that fixes it. If so, > then it is easy to change sky2, to match the phy setup in the vendor driver. > Another possibility is to look for places in sky2 driver where there are places > that compare version/revision. > > The most likely bits that need to change are in PCI registers: 0x80, 0x84 and 0x88 > You could also load the windows driver and dump PCI config space (with lspci from > cygwin), and see what the settings are there. > > I am away from my office for a month, and therefore away from any sky2 > hardware for testing. I'll try the above and keep you posted. The crash itself seems to be a 2.6.23-rc1 regression though. I never experienced this with 2.6.22-rc5 which i was running before. Can you try to figure out what is causing this crash and then use git-bisect? Regards, Michal -- LOG http://www.stardust.webpages.pl/log/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] extent mapped page cache
On Tue, 24 Jul 2007 23:25:43 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote: > On Tue, 2007-07-24 at 16:13 -0400, Trond Myklebust wrote: > > On Tue, 2007-07-24 at 16:00 -0400, Chris Mason wrote: > > > On Tue, 10 Jul 2007 17:03:26 -0400 > > > Chris Mason <[EMAIL PROTECTED]> wrote: > > > > > > > This patch aims to demonstrate one way to replace buffer heads > > > > with a few extent trees. Buffer heads provide a few different > > > > features: > > > > > > > > 1) Mapping of logical file offset to blocks on disk > > > > 2) Recording state (dirty, locked etc) > > > > 3) Providing a mechanism to access sub-page sized blocks. > > > > > > > > This patch covers #1 and #2, I'll start on #3 a little later > > > > next week. > > > > > > > Well, almost. I decided to try out an rbtree instead of the > > > radix, which turned out to be much faster. Even though > > > individual operations are slower, the rbtree was able to do many > > > fewer ops to accomplish the same thing, especially for merging > > > extents together. It also uses much less ram. > > > > The problem with an rbtree is that you can't use it together with > > RCU to do lockless lookups. You can probably modify it to allocate > > nodes dynamically (like the radix tree does) and thus make it > > RCU-compatible, but then you risk losing the two main benefits that > > you list above. The tree is a critical part of the patch, but it is also the easiest to rip out and replace. Basically the code stores a range by inserting an object at an index corresponding to the end of the range. Then it does searches by looking forward from the start of the range. More or less any tree that can search and return the first key >= than the requested key will work. So, I'd be happy to rip out the tree and replace with something else. Going completely lockless will be tricky, its something that will deep thought once the rest of the interface is sane. -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] scheduler: improve SMP fairness in CFS
On Tue, Jul 24, 2007 at 05:22:47PM -0400, Chris Snook wrote: > Bill Huey (hui) wrote: > Well, you need enough CPU time to meet your deadlines. You need > pre-allocated memory, or to be able to guarantee that you can allocate > memory fast enough to meet your deadlines. This principle extends to any > other shared resource, such as disk or network. I'm being vague because > it's open-ended. If a medical device fails to meet realtime guarantees > because the battery fails, the patient's family isn't going to care how > correct the software is. Realtime engineering is hard. ... > Actually, it's worse than merely an open problem. A clairvoyant fair > scheduler with perfect future knowledge can underperform a heuristic fair > scheduler, because the heuristic scheduler can guess the future incorrectly > resulting in unfair but higher-throughput behavior. This is a perfect > example of why we only try to be as fair as is beneficial. I'm glad we agree on the above points. :) It might be that there needs to be another more stiff policy than what goes into SCHED_OTHER in that we also need a SCHED_ISO or something has more strict rebalancing semantics for -rt applications, sort be a super SCHED_RR. That's definitely needed and I don't see how the current CFS implementation can deal with this properly even with numerical running averages, etc... at this time. SCHED_FIFO is another issue, but this actually more complicated than just per cpu run queues in that a global priority analysis. I don't see how CFS can deal with SCHED_FIFO efficiently without moving to a single run queue. This is kind of a complicated problem with a significant set of trade off to take into account (cpu binding, etc..) >> Tong's previous trio patch is an attempt at resolving this using a generic >> grouping mechanism and some constructive discussion should come of it. > > Sure, but it seems to me to be largely orthogonal to this patch. It's based on the same kinds of ideas that he's been experimenting with in Trio. I can't name a single other engineer that's posted to lkml recently that has quite the depth of experience in this area than him. It would be nice to facilitted/incorporate some his ideas or get him to and work on something to this end that's suitable for inclusion in some tree some where. bill - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Power Management framework proposal
On 7/24/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: On Tue, 24 Jul 2007, Jerome Glisse wrote: > On 7/23/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: >> On Mon, 23 Jul 2007, Igor Stoppa wrote: >> > again, HAL / OHM / Mobilin >> >> I was trying to define the lower level interfaces that these tools need. >> today they can only know what is possible by reading the source code for >> each driver and implementing the driver-specific interfaces nessasary to >> set things, I was proposing a common interface that tools like this could >> use instead of requiring all the driver-specific knowledge. >> >> >> in a nutshell (and I know this is probably not detailed to be acceptable) >> >> 1. the software needs to know what the interconnects and dependancies >> between devices are (supposedly this is provided via sysfs) >> >> 2. the software needs to know what type of device this is (again, >> supposedly this is provided via sysfs) >> >> 3. the software needs to know what modes exist for a driver/piece of >> hardware. to make any decisions this infomation needs to provide some >> information about the capability of the mode and the power consumed in >> that mode. in addition there will need to be flags to indicate any >> special restrictions of a mode >> >> 4. the software needs to know the cost of switching from any mode to any >> other mode. since some transitions will interact with other devices >> there will need to be flags to indicate such requirements for specific >> transitions. >> >> 5. the software needs to be able to find out what mode a device is in. >> >> 6. the software needs to be able to tell the driver to switch to a >> different mode (I think it would be a very good thing if going to a >> particular mode was always the same command, no matter what mode it is >> currently in) >> >> 7. the software needs to figure out the desire of the user. >> >> my proposal was addressing items #3-#6. it isn't trying to decide what to >> do, simply to allow the software that _is_ trying to decide what to do a >> way to find out what it can do. >> >> David Lang > > I believe a central place where user can set/change hw state to save > power or to increase computational power is definitely a goal to pursue. > But i truly think that the OHM approach is the best one ie using plugins > so that one can make a plugin specific for each device. The point is that > i believe there is no way to do an abstract interface for this and trying to > do so will endup doing ugly code and any interface would fail to encompass > all possible tweak that might exist for all devices. will each plugin have it's own interface? or will you have one interface to access the plugins and then the plugins do things behind the scenes? I'll bet that the API for the plugins is common, and if so then it could be similar to the API that I suggested. I take here ohm as a reference (this come from my limited understanding of this daemon so there might be inaccuracy) driver export through HAL there power management tunning capacity, Then an ohm plugin would use HAL to give a higher view of this capacity and also manage policy, preference, permission, ... Last consumer in power management food chain would be an user interface which will communicate with ohm (and with all ohm plugin) so desktop writter (gnome, kde, ...) can write some kind of power management center where each ohm plugin can have its own panel. So in the end the user got one place to do all its power management which is the goal i think you are trying to aim. > For instance on graphics card you could do the following (maybe more): > -change GPU clock > -change memory clock > -disable part of engine > -disable unit > i truly don't think you can make a common interface for all this, more > over there might be constraint on how you can change things (GPU & > memory clock might need to follow a given ratio). So you definitely > need knowledge in the user space program to handle this. sure you can, just enumerate all the options the driver writer wants to offer as options. yes this could be a lengthy list, so what? My point was that your interface by trying to fit square pegs into round hole will fail to expose all subtility of each device which might in the end bring to wrong power management decision. So i believe we can't sum up power management to list of mode whose attribute are power consumption & capacity. And there is no way to design an abstraction given that all hw we will have to deal with are too much different and do not follow any standard things (beside ACPI there is other way to save power brightness, gpu/memory clock, pll, ...) so i don't see how one might give a common view of things which are fundamentally different in how they affect consumption (same end result with many different paths leading to it). best, Jerome Glisse - To unsubscribe from this list: send the line
Re: [PATCH] add __GFP_ZERO to GFP_LEVEL_MASK
On Tue, 24 Jul 2007 16:00:32 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> wrote: > On Tue, 24 Jul 2007, Andrew Morton wrote: > > > I think I'll duck this for now. Otherwise I have a suspicion that I'll > > be the first person to run it and I'm too old for such excitement. > > I always had the suspicion that you have some magical script > which will immediately tell you that a patch is not working ;-) sort of a defensive crouch. > Works fine on x86_64 (on top of the ctor cleanup patchset) and passes the > kernel build test but then there may be creatively designed drivers and > such that pass these flags to the slab allocators which will now BUG. __GFP_COLD looks OK. __GFP_COMP I'm not so sure about. drivers/char/drm/drm_pci.c:drm_pci_alloc() (and other places like infiniband) pass it into dma_alloc_coherent() which some architectures implement via slab. umm, arch/arm/mm/consistent.c is one such. __GFP_MOVABLE looks OK. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Is PIE randomization breaking klibc binaries?
On 07-07-24 15:45 H. Peter Anvin wrote: > Chuck Ebbert wrote: > > > >Okay, I tested with Fedora on x86_64 and it worked there too. > >(Not that that proves much.) > > > >Did you capture any of the error messages, like the address > >of the segfault? > > > > FWIW, on x86-64, this should show up in dmesg. > > -hpa Guys, this is at boot time and most of the binaries don't work. However at the end busybox is called and then there is a shell, where I can call the binaries and force the segmentation violation. Pencil and paper work usually. But right now I don't have the broken kernel anymore and it's 1 am here. Wait for tomorrow. -- Uli Kunitz - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [DRIVER SUBMISSION] DRBD wants to go mainline
Hi Lars, On 7/24/07, Lars Ellenberg <[EMAIL PROTECTED]> wrote: On Mon, Jul 23, 2007 at 07:10:58PM +0530, Satyam Sharma wrote: > On 7/23/07, Lars Ellenberg <[EMAIL PROTECTED]> wrote: > >On Sun, Jul 22, 2007 at 09:32:02PM -0400, Kyle Moffett wrote: > >[...] > >> Don't use signals between kernel threads, use proper primitives like > >> notifiers and waitqueues, which means you should also probably switch > >away > >> from kernel_thread() to the kthread_*() APIs. Also you should fix this > >> FIXME or remove it if it no longer applies:-D. > > > >right. > >but how to I tell a network thread in tcp_recvmsg to stop early, > >without using signals? > > Yup, kthreads API cannot handle (properly stop) kernel threads that want > to sleep on possibly-blocking-forever-till-signalled-functions such as > tcp_recvmsg or skb_recv_datagram etc etc. > > There are two workarounds: > 1. Use sk_rcvtimeo and related while-continue logic > 2. force_sig(SIGKILL) to your kernel thread just before kthread_stop > (note that you don't need to allow / write code to handle / etc signals > in your kthread code -- force_sig will work automatically) this is not only at stop time. for example our "drbd_asender" thread does receive as well as send, That's normal -- in fact it would've been surprising if your kthread only did recvs but no sends! But where does the "send" come into the picture over here -- a send won't block forever, so I don't foresee any issues whatsoever w.r.t. kthreads conversion for that. [ BTW I hope you're *not* using any signals-based interface for your kernel thread _at all_. Kthreads disallow (ignore) all signals by default, as they should, and you really shouldn't need to write any logic to handle or do-certain-things-on-seeing a signal in a well designed kernel thread. ] and the sending latency is crucial to performance, while the recv will not timeout for the next few seconds. Again, I don't see what sending latency has to do with a kernel_thread to kthread conversion. Or with signals, for that matter. Anyway, as Kyle Moffett mentioned elsewhere, you could probably look at other examples (say cifs_demultiplexer_thread() in fs/cifs/connect.c). [ I didn't really want to give that example, because I get a nervous breakdown when looking at that code myself, and would actively like to save other fellow developers from a similar fate. To know what I'm talking about, set your xterm to display 40 rows, and then look at the line numbers 3139-3218 in that file, especially 3190-3212. Yes, what you see there is a map of Sulawesi [1] subliminally hidden in Linux kernel code :-) ] Anyway, cifs_demultiplexer_thread() is just your normal kthread that: (1) Ignores all signals (2) Calls perma-blocking-till-signalled functions such as tcp_recvmsg (via kernel_recvmsg) (3) Calls send-to-socket kind of functions Hence, it could get into trouble when the umount(2) code wants to stop it with kthread_stop() and it happens to be blocked in tcp_recvmsg() with noblock = 0 (hence sk_rcvtimeo == MAX_SCHEDULE_TIMEOUT), thus would handle the wake_up_process() internally, and not break out, hence not check kthread_should_stop() which it should -- all this ensuring that the kthread never gets killed, kthread_stop() hangs, and the umount(2) from userspace never returns ... But they've solved it as follows (as I suggested earlier): (1) First, set sock->sk_rcvtimeo to some "magical value" in your code that sets up the socket params after socket->proto_ops->connect(). See ipv4_connect(), f.e. in CIFS they've set it up to 7 seconds. But that's arbitrarily chosen -- this'll ensure your tcp_recvmsg() isn't perma-blocking in the first place, but will unblock/return every 7 secs, and thus get a chance to check kthread_should_stop(). (2) From the code that wants to kill/stop the kthread (module exit, or umount(2) most probably), just ensure you make a call to force_sig() before kthread_stop() on that kthread -- see cifs_umount() in the same file. This'll ensure that even if the kthread is currently sleeping in tcp_recvmsg(), it'll be signalled to break out from there, and thus check kthread_should_stop(). (3) Note that not a single line of code needs to be written extra in the kthread itself for this to work -- nothing to allow / handle signals ... Just this, should be enough for a smooth conversion to kthreads, IMHO. > >> +/* THINK maybe we actually want to use the default "event/%s" worker > >threads > >> + * or similar in linux 2.6, which uses per cpu data and threads. > >> + * > >> + * To be general, this might need a spin_lock member. > >> + * For now, please use the mdev->req_lock to protect list_head, > >> + * see drbd_queue_work below. > >> + */ > >> +struct drbd_work_queue { > >> + struct list_head q; > >> + struct semaphore s; /* producers up it, worker down()s it */ > >> + spinlock_t q_lock; /* to protect the list. */ > >> +}; > >> > >> Umm, how about fixing this
Re: [patch 2.6.23-rc1] dma_free_coherent() needs irqs enabled (sigh)
On Tue, Jul 24, 2007 at 04:08:11PM -0700, David Brownell wrote: > On Tuesday 24 July 2007, Russell King wrote: > > On Tue, Jul 24, 2007 at 02:29:05PM -0700, David Brownell wrote: > > > On at least ARM (and I'm told MIPS too) dma_free_coherent() has a newish > > > call context requirement: unlike its dma_alloc_coherent() sibling, it > > > may not be called with IRQs disabled. (This was new behavior on ARM as > > > of late 2006, caused by ARM SMP updates.) > > > > I think you got the year wrong: > > > > 5edf71ae (Russell King 2005-11-25 15:52:51 + 364) > > WARN_ON(irqs_disabled()); > > > > which is due to this commit: > > > > [ARM] Do not call flush_tlb_kernel_range() with IRQs disabled. > > This little "to do" list item has been sitting in my mailbox way > too long then. Certainly since it was fair to say "last year"! ;) Are you intentionally not reading what I said? > > Signed-off-by: Russell King <[EMAIL PROTECTED]> > > Thanks... That was part of the commit I quoted, not an endorsement of your patch, though I think it does deserve an: Acked-by: Russell King <[EMAIL PROTECTED]> -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2.6.23-rc1] dma_free_coherent() needs irqs enabled (sigh)
On Tuesday 24 July 2007, Russell King wrote: > On Tue, Jul 24, 2007 at 02:29:05PM -0700, David Brownell wrote: > > On at least ARM (and I'm told MIPS too) dma_free_coherent() has a newish > > call context requirement: unlike its dma_alloc_coherent() sibling, it > > may not be called with IRQs disabled. (This was new behavior on ARM as > > of late 2006, caused by ARM SMP updates.) > > I think you got the year wrong: > > 5edf71ae (Russell King 2005-11-25 15:52:51 + 364) > WARN_ON(irqs_disabled()); > > which is due to this commit: > > [ARM] Do not call flush_tlb_kernel_range() with IRQs disabled. This little "to do" list item has been sitting in my mailbox way too long then. Certainly since it was fair to say "last year"! ;) Of course, 2.6.23-rc1 also merges all the gadget API updates I included to cope with this annoyance. So with any luck the issue will now finally have been properly whupped. > Signed-off-by: Russell King <[EMAIL PROTECTED]> Thanks... > > Since it looks like that restriction won't be removed, this patch changes > > the definition of the API to include that requirement. > > The PCI DMA-mapping API had this restriction. For some reason, this > restriction was not carried forward into the DMA-API. Unfortunately > the restriction can not be removed without causing the problems > described in the commit which introduced it. Right, I noticed that. - Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pata_hpt37x: Fix 2.6.22 clock PLL regression
On Tue, 24 Jul 2007, Alan Cox wrote: > > Just one version of Linux ago > The PLL code broke - oh no! > But set the right mode > And fix up the code > Makes the PLL timing sync go Alan, I'm getting a bit worried about you. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Power Management framework proposal
On Wed, 25 Jul 2007, Benjamin Herrenschmidt wrote: On Tue, 2007-07-24 at 13:14 -0700, [EMAIL PROTECTED] wrote: I think we need a set of constraints that trickle down the power tree and limit what a given driver can do locally. what sort of contraints are you thinking of? A parent power state defines what states children can be in. For example. A way to express those dependencies would be nice. Or alternatiely, the power state of all the children defines the power state a parent can go in automatically. Ok, I see tow things here. 1. do you really want to try and propogate things like this from one to the other, or would it be good enough to flag the issue and let the software selecting the modes implement this contraint? 2. how can you standardize the requirements? at the very least you have for this mode all children must be off for this mode all children must be in a mode that includes a 'suspended' flag (this could be made implicit by saying that you must suspend children before parents) and then just flagging the 'suspended, but not off' modes) what requirements are needed? (I'm sure that there are others, but hopefully it's possible to avoid requirements like 'the clock speed for device A must be >X to allow device B to operate in mode Y') David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: understanding firmware loader for speedtouch (kernel 2.6.21.5)
Hi Mikie, > Do you have any news regarding my case of slow transfers via > Speedtouch USB modem on linux ? I found my old speedtouch modem and tested here. I got 2.1 Mbaud bulk downspeed, and 3 Mbaud isoc downspeed. This last is half the speed my line supports, so something is wrong [*]. Unfortunately I'm not very motivated to try to find out what, because I don't use this modem myself anymore. It looks like someone needs to do some more reverse engineering work on the windows driver. Ciao, Duncan. [*] I got the same numbers the last time I tested isoc support, but at that time 3 Mbaud was slightly less than the maximum speed of my line, which explains why I didn't realize that there is a problem. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add __GFP_ZERO to GFP_LEVEL_MASK
On Tue, 24 Jul 2007, Andrew Morton wrote: > I think I'll duck this for now. Otherwise I have a suspicion that I'll > be the first person to run it and I'm too old for such excitement. I always had the suspicion that you have some magical script which will immediately tell you that a patch is not working ;-) Works fine on x86_64 (on top of the ctor cleanup patchset) and passes the kernel build test but then there may be creatively designed drivers and such that pass these flags to the slab allocators which will now BUG. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Patches for REALLY TINY 386 kernels
On Tue, Jul 24, 2007 at 01:50:35PM -0700, Yinghai Lu wrote: > On 7/24/07, Helge Hafting <[EMAIL PROTECTED]> wrote: >> Andi Kleen wrote: >> >> Some people are putting Linux kernels in the "BIOS" (i.e. ROM chip) >> when >> >> using LinuxBIOS (www.linuxbios.org). It _does_ make a lot of difference >> >> there how big the kernel is. At the moment you can't do that with >> >> anything smaller than a 1 MB chip. But if people could use 512 KB chips >> >> because the kernel is small enough that would sure be a great thing. >> >> >> > >> > I'm sure it would be possibel to save a lot of text size. But I don't >> > think removing the relatively small CPUID code is the right way. >> > That is just a big maintenance issue for little gain. >> > >> Well - anyone compiling linux for BIOS usage is targetting >> a single machine. So an ability to target a single machine is useful, >> i.e. run the CPUID at compile-time, put the answer in a constant/macro, >> let the optimizer prune the alternatives. :-) > > we are using AMD64 + LinuxBIOS + Kernel (without acpi) + kexec to load > final kernel. > So we can use drivers in kernel for any media (SCSI, SATA, IB,...), > not like EFI need every driver re-porting. and We could use KVM in > kernel to load other OS if needed. > > The problem is Kernel is getting bigger and bigger. and old Tiny > kernel is stopping at 2.6.18... >... Please send: - the .config for the last kernel small enough - your size limit - your gcc version and I'll look at this. > YH cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Is PIE randomization breaking klibc binaries?
Chuck Ebbert wrote: Okay, I tested with Fedora on x86_64 and it worked there too. (Not that that proves much.) Did you capture any of the error messages, like the address of the segfault? FWIW, on x86-64, this should show up in dmesg. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Patches for REALLY TINY 386 kernels
On Wed, Jul 18, 2007 at 08:55:50AM -0700, H. Peter Anvin wrote: > Andi Kleen wrote: > > > >> Already with these patches I can compile a zImage kernel that is 450kb > >> large (890kb decompressed) > > > > The important part is not how big the vmlinux is, but how much > > memory is actually used after boot. > > > > I expect concentrating some of the dynamic data structures would > > be more fruitful in fact. > > > > Well, how big the vmlinux file is matters if it doesn't fit in memory > with enough time to get to the phase where it is dumping the init > sections. *If that is not the issue*, then axing stuff like CPUID is a > major lose in terms of code maintainability for zero gain. Not only that, but the size of the vmlinux matters when you have limited flash memory to put it on. Having packaged single floppy-based firewalls for a few years, I can assure you that even one kB sometimes matters! > -hpa Regards, Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Is PIE randomization breaking klibc binaries?
On 07/24/2007 06:00 PM, Ulrich Kunitz wrote: > On 07-07-24 16:57 Chuck Ebbert wrote: > >>> $ strace ./cat >>> execve("./cat", ["./cat"], [/* 55 vars */]) = -1 ENOENT (No such file or >>> directory) >>> ... > > Chuck, my binaries run always into a segmentation violation. So > ENOENT is not the issue. (Notify it was on an x86-64.) > Okay, I tested with Fedora on x86_64 and it worked there too. (Not that that proves much.) Did you capture any of the error messages, like the address of the segfault? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/7] lguest: documentation pt I: Preparation
On Tue, 2007-07-24 at 13:04 +0100, Alan Cox wrote: > Dear Rusty I think that we know > Your code has good things to show > But an unreliable guide > To the poetic aside > Would probably steal the show That and your (slightly dated?) mm documentation were awesome. But can we stop now? Please? Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/8] i386: bitops: Don't mark memory as clobbered unnecessarily
On Tue, 2007-07-24 at 17:55 -0400, Trond Myklebust wrote: > > If you want to use bitops as spinlocks you should rather be using > . That also does the right thing w.r.t. > pre-emption and sparse locking annotations. Heh, I didn't know about those... A bit annoying that I can't override them in the arch, I might be able to save a barrier or two here. Our test_and_set_bit() contains both barriers for lock and unlock semantics to cope with all kind of abuses, but your bit_spinlock obviously doesn't need that. Cheers, Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/8] dm: Fix workqueue leak for raid5
On 7/24/07, Dmitry Monakhov <[EMAIL PROTECTED]> wrote: Signed-off-by: Dmitry Monakhov <[EMAIL PROTECTED]> --- drivers/md/raid5.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 0f30826..79dd2c7 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -4985,6 +4985,8 @@ static int run(mddev_t *mddev) abort: if (conf) { print_raid5_conf(conf); + if (conf->workqueue) + destroy_workqueue(conf->workqueue); safe_put_page(conf->spare_page); kfree(conf->disks); kfree(conf->stripe_hashtbl); -- I assume this patch is against -mm. I will fold it into: git://lost.foo-projects.org/~dwillia2/git/iop md-for-linus Thanks, Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Lksctp-developers] __unsafe() usage
On Tue, 2007-07-24 at 09:05 -0400, Vlad Yasevich wrote: > > Please don't remove module_exit point for SCTP. Simply removing the > __unsafe() call will > be sufficient. > > The code has recently been cleaned up to allow safe unloading and I working > on final > cleanups. It currently works correctly with forced unloading. Thanks Vlad! I think that's everyone... Cheers, Rusty. === Remove "unsafe" from module struct Adrian Bunk points out that "unsafe" was used to mark modules touched by the deprecated MOD_INC_USE_COUNT interface, which has long gone. It's time to remove the member from the module structure, as well. If you want a module which can't unload, don't register an exit function. (Vlad Yasevich says SCTP is now safe to unload, so just remove the __unsafe there). Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> Acked-by: Shannon Nelson <[EMAIL PROTECTED]> Acked-by: Dan Williams <[EMAIL PROTECTED]> Cc: Vlad Yasevich <[EMAIL PROTECTED]> diff -r d7af727512fd drivers/dma/ioatdma.c --- a/drivers/dma/ioatdma.c Tue Jul 24 08:30:05 2007 +1000 +++ b/drivers/dma/ioatdma.c Tue Jul 24 09:11:11 2007 +1000 @@ -811,18 +811,17 @@ MODULE_AUTHOR("Intel Corporation"); static int __init ioat_init_module(void) { - /* it's currently unsafe to unload this module */ - /* if forced, worst case is that rmmod hangs */ - __unsafe(THIS_MODULE); - return pci_register_driver(_pci_driver); } module_init(ioat_init_module); +/* it's currently unsafe to unload this module */ +#if 0 static void __exit ioat_exit_module(void) { pci_unregister_driver(_pci_driver); } module_exit(ioat_exit_module); +#endif diff -r d7af727512fd drivers/dma/iop-adma.c --- a/drivers/dma/iop-adma.cTue Jul 24 08:30:05 2007 +1000 +++ b/drivers/dma/iop-adma.cTue Jul 24 09:11:30 2007 +1000 @@ -1446,21 +1446,20 @@ static struct platform_driver iop_adma_d static int __init iop_adma_init (void) { - /* it's currently unsafe to unload this module */ - /* if forced, worst case is that rmmod hangs */ - __unsafe(THIS_MODULE); - return platform_driver_register(_adma_driver); } +/* it's currently unsafe to unload this module */ +#if 0 static void __exit iop_adma_exit (void) { platform_driver_unregister(_adma_driver); return; } +module_exit(iop_adma_exit); +#endif module_init(iop_adma_init); -module_exit(iop_adma_exit); MODULE_AUTHOR("Intel Corporation"); MODULE_DESCRIPTION("IOP ADMA Engine Driver"); diff -r d7af727512fd include/linux/module.h --- a/include/linux/module.hTue Jul 24 08:30:05 2007 +1000 +++ b/include/linux/module.hTue Jul 24 09:00:19 2007 +1000 @@ -312,9 +312,6 @@ struct module /* Arch-specific module values */ struct mod_arch_specific arch; - /* Am I unsafe to unload? */ - int unsafe; - unsigned int taints;/* same bits as kernel:tainted */ #ifdef CONFIG_GENERIC_BUG @@ -441,16 +438,6 @@ static inline void __module_get(struct m __mod ? __mod->name : "kernel"; \ }) -#define __unsafe(mod) \ -do {\ - if (mod && !(mod)->unsafe) { \ - printk(KERN_WARNING \ - "Module %s cannot be unloaded due to unsafe usage in" \ - " %s:%u\n", (mod)->name, __FILE__, __LINE__); \ - (mod)->unsafe = 1; \ - }\ -} while(0) - /* For kallsyms to ask for address resolution. NULL means not found. */ const char *module_address_lookup(unsigned long addr, unsigned long *symbolsize, @@ -518,8 +505,6 @@ static inline void module_put(struct mod #define module_name(mod) "kernel" -#define __unsafe(mod) - /* For kallsyms to ask for address resolution. NULL means not found. */ static inline const char *module_address_lookup(unsigned long addr, unsigned long *symbolsize, diff -r d7af727512fd kernel/module.c --- a/kernel/module.c Tue Jul 24 08:30:05 2007 +1000 +++ b/kernel/module.c Tue Jul 24 09:00:58 2007 +1000 @@ -692,8 +692,7 @@ sys_delete_module(const char __user *nam } /* If it has an init func, it must have an exit func to unload */ - if ((mod->init != NULL && mod->exit == NULL) - || mod->unsafe) { + if (mod->init && !mod->exit) { forced = try_force_unload(flags); if (!forced) { /* This module can't be removed */ @@ -739,11 +738,6 @@ static void print_unload_info(struct seq list_for_each_entry(use, >modules_which_use_me, list) { printed_something = 1;
Re: [2/2] 2.6.23-rc1: known regressions
Am 23.07.2007 11:47 schrieb Michal Piotrowski: > Virtualization > > Subject : 2.6.22-git17 boot failure (XEN) > References : http://lkml.org/lkml/2007/7/22/266 > Last known good : ? > Submitter : Tilman Schmidt <[EMAIL PROTECTED]> > Caused-By : ? > Handled-By : ? > Status : unknown "Not a regression." With the help of Jeremy Fitzhardinge, Andi Kleen and Olaf Hering, I was able to isolate the cause. It lies outside the kernel, in the distribution's init script. So the entry can be dropped. Thanks, Tilman -- Tilman Schmidt E-Mail: [EMAIL PROTECTED] Bonn, Germany Diese Nachricht besteht zu 100% aus wiederverwerteten Bits. Ungeöffnet mindestens haltbar bis: (siehe Rückseite) signature.asc Description: OpenPGP digital signature
[2.6.23-rc1 REGRESSION] ThinkPad T42 poweroff failure by "PM: Introduce pm_power_off_prepare"
Hello. Linux 2.6.23-rc1 fails to power off my ThinkPad T42. Git-bisect told me that the following commit is to blame, and by reverting that commit, it works appropriately. Regards, --yoshfuji bd804eba1c8597cbb7cd5a5f9fe886aae16a079a is first bad commit commit bd804eba1c8597cbb7cd5a5f9fe886aae16a079a Author: Rafael J. Wysocki <[EMAIL PROTECTED]> Date: Thu Jul 19 01:47:40 2007 -0700 PM: Introduce pm_power_off_prepare Introduce the pm_power_off_prepare() callback that can be registered by the interested platforms in analogy with pm_idle() and pm_power_off(), used for preparing the system to power off (needed by ACPI). This allows us to drop acpi_sysclass and device_acpi that are only defined in order to register the ACPI power off preparation callback, which is needed by pm_power_off() registered in a much different way. Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]> Acked-by: Pavel Machek <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]> :04 04 624870eb14bf9841fa2dca2cf13cc4c9a0479005 af79f843f3383bbecaed84d493926939cf0e1c12 M drivers :04 04 9b28a21970668ce133916bbe8d8fd4a61bce23d7 80fc84d7982369205dcf94029e3958c90db14bf0 M include :04 04 9ce5c8b5d3f87c121b2f7bc6e02bc814648a2739 2e2e1468dfa0db9dee5bd204fd3f802a975a6454 M kernel -- YOSHIFUJI Hideaki @ USAGI Project <[EMAIL PROTECTED]> GPG-FP : 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Towards eliminating the freezer
On Tue, 24 Jul 2007, Rafael J. Wysocki wrote: > > Then device_suspend() can be simplified: > > > > int device_suspend(pm_message_t state) > > { > > int error = 0; > > > > might_sleep(); > > list_for_each_entry_reverse(dev, _locked, power.entry) { > > error = suspend_device(dev, state); > > > > if (error) { > > printk(KERN_ERR "Could not suspend device %s: " > > "error %d%s\n", > > kobject_name(>kobj), error, > > error == -EAGAIN ? " (please convert to > > suspend_late)" : ""); > > break; > > } > > list_move(>power.entry, _off); > > Is that safe with list_for_each_entry_reverse? No. I guess it'll have to resemble the other code. > Yes, that looks fine. > > So, who's writing the patch? ;-) I can do it. You haven't made any changes to this part of the code, have you? My work tends to be based on Linus's tree, not -mm. Something to watch out for: With all the extra locking, we run the risk of blocking the keventd workqueue. This may or may not matter, but to be safe perhaps there should be a new general-purpose workqueue which _expects_ to block (or freeze) during suspends. Any work routine that involves adding or removing a device should go on the new workqueue. > > Incidentally, what is dpm_mtx for? It doesn't seem to do anything > > useful. Is it a relic of the former runtime PM support? > > I think so. IMO it can be removed. > > I also think it would be nicer to have all of the functions in > drivers/base/power/{main|suspend|resume}.c moved to one file. Yes, they are all similar enough that there isn't much point keeping them separate. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add __GFP_ZERO to GFP_LEVEL_MASK
On Tue, 24 Jul 2007 12:36:59 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> wrote: > On Tue, 24 Jul 2007, Andrew Morton wrote: > > > > __GFP_MOVABLE The movability of a slab is determined by the > > > options specified at kmem_cache_create time. If this is > > > specified at kmalloc time then we will have some random > > > slabs movable and others not. > > > > Yes, they seem inappropriate. Especially the first two. > > The third one would randomize __GFP_MOVABLE allocs from the page allocator > since one __GFP_MOVABLE alloc may allocate a slab that is then used for > !__GFP_MOVABLE allocs. > > Maybe something like this? Note that we may get into some churn here > since slab allocations that any of these flags will BUG. > > > > GFP_LEVEL_MASK: Remove __GFP_COLD, __GFP_COMP and __GFPMOVABLE > > Add an explanation for the GFP_LEVEL_MASK and remove the flags > that should not be passed through derived allocators. > > Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> I think I'll duck this for now. Otherwise I have a suspicion that I'll be the first person to run it and I'm too old for such excitement. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2.6.23-rc1] dma_free_coherent() needs irqs enabled (sigh)
On Tue, Jul 24, 2007 at 02:29:05PM -0700, David Brownell wrote: > On at least ARM (and I'm told MIPS too) dma_free_coherent() has a newish > call context requirement: unlike its dma_alloc_coherent() sibling, it > may not be called with IRQs disabled. (This was new behavior on ARM as > of late 2006, caused by ARM SMP updates.) I think you got the year wrong: 5edf71ae (Russell King 2005-11-25 15:52:51 + 364) WARN_ON(irqs_disabled()); which is due to this commit: [ARM] Do not call flush_tlb_kernel_range() with IRQs disabled. We must not call TLB maintainence operations with interrupts disabled, otherwise we risk a lockup in the SMP IPI code. This means that consistent_free() can not be called from a context with IRQs disabled. In addition, we must not hold the lock in consistent_free when we call flush_tlb_kernel_range(). However, we must continue to prevent consistent_alloc() from re-using the memory region until we've finished tearing down the mapping and dealing with the TLB. Therefore, leave the vm_region entry in the list, but mark it inactive before dropping the lock and starting the tear-down process. After the mapping has been torn down, re-acquire the lock and remove the entry from the list. Signed-off-by: Russell King <[EMAIL PROTECTED]> > Since it looks like that restriction won't be removed, this patch changes > the definition of the API to include that requirement. The PCI DMA-mapping API had this restriction. For some reason, this restriction was not carried forward into the DMA-API. Unfortunately the restriction can not be removed without causing the problems described in the commit which introduced it. Or alternatively we scrap ARM SMP entirely, which isn't going to happen. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: commit 7e92b4fc34 - x86, serial: convert legacy COM ports to platform devices - broke my serial console
> That cannot be a justification for breaking serial port probe that has > been working for 10+ years. Agree. With my "nearest thing we have to a serial maintainer" hat on please revert this Andrew. Bjorn - lets discuss putting the right APIs in place so you can busy out serial ports from other drivers when they are a shared resource. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Is PIE randomization breaking klibc binaries?
On 07-07-24 16:57 Chuck Ebbert wrote: > > $ strace ./cat > > execve("./cat", ["./cat"], [/* 55 vars */]) = -1 ENOENT (No such file or > > directory) > > ... Chuck, my binaries run always into a segmentation violation. So ENOENT is not the issue. (Notify it was on an x86-64.) > > $ file cat > > cat: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically > > linked (uses shared libs), stripped > > > > Funny nobody noticed that before... > > > > After installing klibc.so and klibc-.so into /lib everything works: > > Program Headers: > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align > PHDR 0x34 0x08048034 0x08048034 0xa0 0xa0 R E 0x4 > INTERP 0xd4 0x080480d4 0x080480d4 0x2a 0x2a R 0x1 > [Requesting program interpreter: > /lib/klibc-58kBUyV_qhVvkMnaxy8A7N8rLak.so] > Yes, these files were present in the initrd.img file. I checked it by unpacking the initrd.img file. Notify also that I used git-bisect to identify the PIE patch. This requires successful builds. Reverting the patch clearly resolved the issue at the end. > Ulrich, did your initrd contain the correct .so? Sure! I have only one klibc-*.so on my box in /lib. I diffed the file in the unpacked initrd.img with the file in /lib and there has been no difference. I always recreate the initial ramdisk after the kernel rebuild with make install and my own installkernel script, which uses mkinitramfs. The mkinitramfs script ensures that the klibc so object from /lib and the klibc binaries from /usr/lib/klibc/bin are copied into the initrd image. Usually that works without any issue on x86, x86-64. PPC can't use make install, but I use mkinitramfs there too, which handles klibc the same way. > Did you try rebuilding klibc after building the new kernel? Rebuilding klibc doesn't make sense from my point of view. What should be the point of it? -- Uli Kunitz - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: commit 7e92b4fc34 - x86, serial: convert legacy COM ports to platform devices - broke my serial console
> - use setserial to make the serial driver forget about ttyS2 > so an IR driver could claim it, or > > - use setserial to change the IRQ to 3 and just use the device > in SIR mode, which is 16550-compatible so you can use the > serial driver > > I didn't express that very clearly in the changelog. So the actual problem is quite different. Your IR driver for the port should have an interface to tell the serial layer to make it unavailable. End of problem and you can then use either service without setserial magic Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc1: known regressions with patches
On 24/07/07, Greg KH <[EMAIL PROTECTED]> wrote: On Mon, Jul 23, 2007 at 11:47:44AM +0200, Michal Piotrowski wrote: > Unclassified > > Subject : kobject link failure > References : http://lkml.org/lkml/2007/7/19/495 > Last known good : ? This is caused by a patch that happened after 2.6.22 was released, so it is a regression. Yes, I know. "?" is a default value. When someone says that bug appeared after 2.6.22-git7 I'm adding this information to "Last known good". > Submitter : Jan Engelhardt <[EMAIL PROTECTED]> > Caused-By : ? > Handled-By : Cornelia Huck <[EMAIL PROTECTED]> > Greg Kroah-Hartman <[EMAIL PROTECTED]> > Patch : http://lkml.org/lkml/2007/7/20/143 > Status : patch available I'll be sending Cornelia's patch to Linus within the week to fix this. thanks, greg k-h Regards, Michal -- LOG http://www.stardust.webpages.pl/log/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/8] i386: bitops: Don't mark memory as clobbered unnecessarily
On Wed, 2007-07-25 at 07:37 +1000, Benjamin Herrenschmidt wrote: > On Tue, 2007-07-24 at 11:13 -0700, Linus Torvalds wrote: > > > > IOW, if you do a spinlock with the bitops, the locking side should be > > able > > to use a "test_and_set_bit()" on its own, but the unlocking side > > should be > > > > smp_mb__before_clear_bit(); > > clear_bit(); > > > > because the ones that don't return a value also don't imply a memory > > barrier. > > Yup. But I much prefer Nick's clear_bit_unlock() :-) > > Ben If you want to use bitops as spinlocks you should rather be using . That also does the right thing w.r.t. pre-emption and sparse locking annotations. Trond - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: console UTF-8 fixes
Samuel Thibault wrote: > Hi, > > Egmont got some UTF-8 fixes in mainline, Andrew Morton suggested it > might be a good time to remember about bug 7746 Support for unicode dead > keys: http://bugzilla.kernel.org/show_bug.cgi?id=7746 : > > « Quoting a mail from Vojtech Pavlik: > > "Several languages (polish, czech, slovak, ...) use dead keys (keys that > don't do anything per se, but put an accent on the next letter). And > now almost everyone is switching to unicode. And Linux kernel doesn't > support unicode for dead keys. This means trouble." > (full mail at > http://www.ussg.iu.edu/hypermail/linux/kernel/0405.3/1387.html) > > And indeed, see http://bugs.debian.org/404503 > > There is a more recent patch proposed on > http://www.ussg.iu.edu/hypermail/linux/kernel/0503.2/1723.html > > Is there any objection to the proposed way? (extending the internal > type, and add a new ioctl for uploading unicode dead keys). » > Makes sense to me. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] scheduler: improve SMP fairness in CFS
Chris Friesen wrote: Chris Snook wrote: I don't think Chris's scenario has much bearing on your patch. What he wants is to have a task that will always be running, but can't monopolize either CPU. This is useful for certain realtime workloads, but as I've said before, realtime requires explicit resource allocation. I don't think this is very relevant to SCHED_FAIR balancing. I'm not actually using the scenario I described, its just sort of a worst-case load-balancing thought experiment. What we want to be able to do is to specify a fraction of each cpu for each task group. We don't want to have to affine tasks to particular cpus. A fraction of *each* CPU, or a fraction of *total* CPU? Per-cpu granularity doesn't make anything more fair. You've got a big bucket of MIPS you want to divide between certain groups, but it shouldn't make a difference which CPUs those MIPS come from, other than the fact that we try to minimize overhead induced by migration. This means that the load balancer must be group-aware, and must trigger a re-balance (possibly just for a particular group) as soon as the cpu allocation for that group is used up on a particular cpu. If I have two threads with the same priority, and two CPUs, the scheduler will put one on each CPU, and they'll run happily without any migration or balancing. It sounds like you're saying that every X milliseconds, you want both to expire, be forbidden from running on the current CPU for the next X milliseconds, and then migrated to the other CPU. There's no gain in fairness here, and there's a big drop in performance. I suggested local fairness as a means to achieve global fairness because it could reduce overhead, and by adding the margin of error at each level in the locality hierarchy, you can get an algorithm which naturally tolerates the level of unfairness beyond which it is impossible to optimize. Strict local fairness for its own sake doesn't accomplish anything that's better than global fairness. -- Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: console UTF-8 fixes
Hi, Egmont got some UTF-8 fixes in mainline, Andrew Morton suggested it might be a good time to remember about bug 7746 Support for unicode dead keys: http://bugzilla.kernel.org/show_bug.cgi?id=7746 : « Quoting a mail from Vojtech Pavlik: "Several languages (polish, czech, slovak, ...) use dead keys (keys that don't do anything per se, but put an accent on the next letter). And now almost everyone is switching to unicode. And Linux kernel doesn't support unicode for dead keys. This means trouble." (full mail at http://www.ussg.iu.edu/hypermail/linux/kernel/0405.3/1387.html) And indeed, see http://bugs.debian.org/404503 There is a more recent patch proposed on http://www.ussg.iu.edu/hypermail/linux/kernel/0503.2/1723.html Is there any objection to the proposed way? (extending the internal type, and add a new ioctl for uploading unicode dead keys). » Samuel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/8] i386: bitops: Don't mark memory as clobbered unnecessarily
On Tue, 24 Jul 2007, Jeremy Fitzhardinge wrote: > > > > But gcc docs also talk about the other things volatile means, including > > "not significantly moved". > > Actually, it doesn't. In fact it goes out of its way to say that "asm > volatile" statements can be moved quite a bit, with respect to other > asms, other code, jumps, basic blocks, etc. Ahh. That's newer. Historically, gcc manuals used to say "may not be deleted or significantly reordered". So they've weakened what it means, probably exactly because it wasn't well-defined before either. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/