Re: pluggable scheduler thread (was Re: Volanomark slows by 80% under CFS)
Andrea Arcangeli wrote: On Fri, Jul 27, 2007 at 11:43:23PM -0400, Chris Snook wrote: I'm pretty sure the point of posting a patch that triples CFS performance on a certain benchmark and arguably improves the semantics of sched_yield was to improve CFS. You have a point, but it is a point for a different thread. I have taken the liberty of starting this thread for you. I've no real interest in starting or participating in flamewars (especially the ones not backed by hard numbers). So I adjusted the subject a bit in the hope the discussion will not degenerate as you predicted, hope you don't mind. Not at all. I clearly misread your tone. I'm pretty sure the point of posting that email was to show the remaining performance regression with the sched_yield fix applied too. Given you considered my post both offtopic and inflammatory, I guess you think it's possible and reasonably easy to fix that remaining regression without a pluggable scheduler, right? So please enlighten us on your intend to achieve it. There are four possibilities that are immediately obvious to me: a) The remaining difference is due mostly to the algorithmic complexity of the rbtree algorithm in CFS. If this is the case, we should be able to vary the test parameters (CPU count, thread count, etc.) graph the results, and see a roughly logarithmic divergence between the schedulers as some parameter(s) vary. If this is the problem, we may be able to fix it with data structure tweaks or optimized base cases, like how quicksort can be optimized by using insertion sort below a certain threshold. b) The remaining difference is due mostly to how the scheduler handles volanomark. vmstat can give us a comparison of context switches between O(1), CFS, and CFS+patch. If the decrease in throughput correlates with an increase in context switches, we may be able to induce more O(1)-like behavior by charging tasks for context switch overhead. c) The remaining difference is due mostly to how the scheduler handles something other than volanomark. If context switch count is not the problem, context switch pattern still could be. I doubt we'd see a 40% difference due to cache misses, but it's possible. Fortunately, oprofile can sample based on cache misses, so we can debug this too. d) The remaining difference is due mostly to some implementation detail in CFS. It's possible there's some constant-factor overhead in CFS that is magnified heavily by the context switching volanomark deliberately induces. If this is the case, oprofile sampling on clock cycles should catch it. Tim -- Since you're already set up to do this benchmarking, would you mind varying the parameters a bit and collecting vmstat data? If you want to run oprofile too, that wouldn't hurt. Also consider the other numbers likely used nptl so they shouldn't be affected by sched_yield changes. Sure there is. We can run a fully-functional POSIX OS without using any block devices at all. We cannot run a fully-functional POSIX OS without a scheduler. Any feature without which the OS cannot execute userspace code is sufficiently primitive that somewhere there is a device on which it will be impossible to debug if that feature fails to initialize. It is quite reasonable to insist on only having one implementation of such features in any given kernel build. Sounds like a red-herring to me... There aren't just pluggable I/O schedulers in the kernel, there are pluggable packet schedulers too (see `tc qdisc`). And both are switchable at runtime (not just at boot time). Can you run your fully-functional POSIX OS without a packet scheduler and without an I/O scheduler? I wonder where are you going to read/write data without HD and network? If I'm missing both, I'm pretty screwed, but if either one is functional, I can send something out. Also those pluggable things don't increase the risk of crash much, if compared to the complexity of the schedulers. Whether or not these alternatives belong in the source tree as config-time options is a political question, but preserving boot-time debugging capability is a perfectly reasonable technical motivation. The scheduler is invoked very late in the boot process (printk and serial console, kdb are working for ages when scheduler kicks in), so it's fully debuggable (no debugger depends on the scheduler, they run inside the nmi handler...), I don't really see your point. I'm more concerned about embedded systems. These are the same people who want userspace character drivers to control their custom hardware. Having the robot point to where it hurts is a lot more convenient than hooking up a JTAG debugger. And even if there would be a subtle bug in the scheduler you'll never trigger it at boot with so few tasks and so few context switches. Sure, but it's the non-subtle bugs that worry me. These are usually related to low-level hardware setup, so they could miss the mainst
Re: request for patches: showing mount options
On Fri, 2007-07-27 at 17:40 +0200, Miklos Szeredi wrote: > > all - fs has options, but doesn't define ->show_options() > > some - fs defines ->show_options(), but some options are not shown > > noopt - fs does not have options > > good - fs shows all options > > patch - I have a patch > > [...] > > > > autofs all > > > > I'm not sure I understand this. > > How does autofs show it's options without a ->show_options method? > > It doesn't. The "all" means, all of them need to be added to > ->show_options(), not that all are shown. Oh .. sorry, I wasn't paying enough attention. But now might be a good time to propose the removal of autofs and rename autofs4 to autofs. I would need to provide some way to map autofs4 module load requests to autofs for backward compatibility but haven't thought about that yet. > > I can see now that this is slightly confusing, sorry. > > So the ones that need attention are "all" and "some". The others are > fine in theory. Of course I may have missed something. > > > > autofs4 some > > > > OK, uid and gid aren't shown. > > That should be straight forward to fix. > > What's your time frame for this? > > ASAP ;) > > 2.6.24 would be a nice, but it won't be easy... The autofs4 (and, if needed autofs) should be straight forward. I'll do these. Ian - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm/sparse.c:482: error: implicit declaration of function ‘sparse_early_usemap_alloc’
On Fri, 27 Jul 2007 23:14:04 -0700 "Miles Lane" <[EMAIL PROTECTED]> wrote: > On 7/27/07, Miles Lane <[EMAIL PROTECTED]> wrote: > > Do you need my .config file? > > > > CC mm/sparse.o > > mm/sparse.c: In function 'sparse_init': > > mm/sparse.c:482: error: implicit declaration of function > > 'sparse_early_usemap_alloc' > > mm/sparse.c:482: warning: assignment makes pointer from integer without a > > cast > > make[1]: *** [mm/sparse.o] Error 1 > > > > # > # Automatically generated make config: don't edit > # Linux kernel version: 2.6.23-rc1-mm1 > # Fri Jul 27 22:54:36 2007 Whatever it is was gone away in the current -mm lineup so I guess one of the post-2.6.23-rc1-mm1 patches I merged fixed it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc1-mm1 + hotfixes -- Section mismatches
On Fri, Jul 27, 2007 at 10:16:35PM -0700, Miles Lane wrote: > MODPOST vmlinux.o > WARNING: vmlinux.o(.text+0x183): Section mismatch: reference to > .init.text.1:start_kernel (between 'is386' and 'check_x87') This one is not fixed - yet. The rest are fixed in latest -linus. modpost choked over the added number following the section name. Like in .init.text.4 below. ^^ > WARNING: vmlinux.o(.data+0x53c0): Section mismatch: reference to > .init.text.4:native_smp_prepare_boot_cpu (between 'smp_ops' and > 'call_lock') Sam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm/sparse.c:482: error: implicit declaration of function ‘sparse_early_usemap_alloc’
On 7/27/07, Adrian Bunk <[EMAIL PROTECTED]> wrote: > On Fri, Jul 27, 2007 at 11:00:54PM -0700, Miles Lane wrote: > > Do you need my .config file? > > Please always send the .config - it makes reproducing an error and > verifying a fix much easier. > > This list has a 400 kB per email limit, and as long as you don't hit > this limit you have never sent too much information. > > > CC mm/sparse.o > > mm/sparse.c: In function 'sparse_init': > > mm/sparse.c:482: error: implicit declaration of function > > 'sparse_early_usemap_alloc' > > mm/sparse.c:482: warning: assignment makes pointer from integer without a > > cast > > make[1]: *** [mm/sparse.o] Error 1 > > The .config also tells which kernel you are using. > > This doesn't seem to be Linus' tree. > This seems to be 2.6.23-rc1-mm1? Rats. I almost always remember to specify the kernel version. Sorry. Yes, it's the Andrew's latest tree, plus hotfixes. Miles - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ia64: fix a few section mismatch warnings
On Fri, Jul 27, 2007 at 03:32:13PM -0700, Luck, Tony wrote: > - mca_data = alloc_bootmem(sizeof(struct ia64_mca_cpu) > - * NR_CPUS + KERNEL_STACK_SIZE); > + mca_data = mca_bootmem(NR_CPUS + KERNEL_STACK_SIZE); > > Oops. You moved the multiply by sizeof(struct ia64_mca_cpu) up into > the mca_bootmem() function to make it very specific to this use. But > mutiply has higher precedence than addition. Oh crap - good catch. Shall I resubmit a corrected patch? Sam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm/sparse.c:482: error: impl icit declaration of function ‘sparse_early_usemap _alloc’
On Fri, Jul 27, 2007 at 11:00:54PM -0700, Miles Lane wrote: > Do you need my .config file? Please always send the .config - it makes reproducing an error and verifying a fix much easier. This list has a 400 kB per email limit, and as long as you don't hit this limit you have never sent too much information. > CC mm/sparse.o > mm/sparse.c: In function 'sparse_init': > mm/sparse.c:482: error: implicit declaration of function > 'sparse_early_usemap_alloc' > mm/sparse.c:482: warning: assignment makes pointer from integer without a cast > make[1]: *** [mm/sparse.o] Error 1 The .config also tells which kernel you are using. This doesn't seem to be Linus' tree. This seems to be 2.6.23-rc1-mm1? cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] flush icache before set_pte() take 5. [2/2] sync icache dcache for ia64
flush icache for ia64 take4. This patch is against 2.6.23-rc1. Changes V4 -> V5: - removed sync_icache_dcache from do_wp_page() page reuse case. Changes v3 -> v4: - avoid implementing flush_(i)cache_pages(). - added sync_icache_dcache() call. - change Documentation/cachetlb.txt Current ia64 kernel flushes icache by lazy_mmu_prot_update() *after* set_pte(). This is wrong. This patch removes lazy_mmu_prot_update and add sync_icache_dcache(). sync_icache_dcache() is called before set_pte() if necessary and synchronize icache with dcache (fc.i instruction). This patch fixes SIGILL problem on NFS/ia64. About Icache-Dcache inconsistency in ia64 - When the cache line is modified, Icache and Dcache are purged. - When I-cache misses, I-cache will access just the lower layer cache(memory). Then, If the lower_layer_cache is not up-to-date, I-cache will see old information. For avoiding this case, Icache-Dcache synchronization(fc.i) is necessary. (Icache-Dcache synchronization means making Dcache and lower layer unified cache(memory) consistent.) Details: - In general, cache flushing macro are used for virtually tagged caches. IA64 has physically tagged caches but doesn't guarantee consistency between Icache and Dcache. So, new macro, sync_icache_dcache() is added. This is NO-OP in other archs. - sync_icache_dcache() only works if pte is executable. - sync_icache_dcache must be called before set_pte(). - A page which is consistent is marked as PG_arch_1. About changes in generic codes: - do_wp_page() need to sync newly copied page. Here, lazy_mmu_prot_update() was done before set_pte(). This was because someone mets SIGILL in JAVA and small fix was applied. - do_anonymous_page() newly installed anon pages doesn't contains any instruction when set_pte() is executed, icache-dcache synchronization is not necessary. - __do_fault() need to sync newly-installed page. - handle_pte_fault() just changes access bit...then, no need to sync. - remove_migration_pte() need to sync newly-installed page. - change_pte_range() need to sync icache-dcache. When a user writes instruction into the page and modifies protection to be executable, it should be synced. - hugetlb_change_protection() Maybe cache will be expired...but it is safe to sync Icache before set_pte(). - page_mkclean_one() no need to sync icache-dcache. There is no page contents modification. And there is no protection change. Thanks to Zoltan Menyhart for his advices. Signed-Off-By: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> --- Documentation/cachetlb.txt| 11 +++ arch/ia64/mm/init.c |6 ++ include/asm-generic/pgtable.h |8 include/asm-ia64/pgtable.h| 15 ++- mm/hugetlb.c |3 +-- mm/memory.c |7 ++- mm/migrate.c |2 +- mm/mprotect.c |2 +- mm/rmap.c |1 - 9 files changed, 28 insertions(+), 27 deletions(-) Index: linux-2.6.23-rc1.test/include/asm-generic/pgtable.h === --- linux-2.6.23-rc1.test.orig/include/asm-generic/pgtable.h +++ linux-2.6.23-rc1.test/include/asm-generic/pgtable.h @@ -124,14 +124,14 @@ static inline void ptep_set_wrprotect(st #define pgd_offset_gate(mm, addr) pgd_offset(mm, addr) #endif -#ifndef __HAVE_ARCH_LAZY_MMU_PROT_UPDATE -#define lazy_mmu_prot_update(pte) do { } while (0) -#endif - #ifndef __HAVE_ARCH_MOVE_PTE #define move_pte(pte, prot, old_addr, new_addr)(pte) #endif +#ifndef __HAVE_ARCH_SYNC_ICACHE_DCACHE +#define sync_icache_dcache(pte)do {} while (0) +#endif + /* * A facility to provide lazy MMU batching. This allows PTE updates and * page invalidations to be delayed until a call to leave lazy MMU mode Index: linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h === --- linux-2.6.23-rc1.test.orig/include/asm-ia64/pgtable.h +++ linux-2.6.23-rc1.test/include/asm-ia64/pgtable.h @@ -484,11 +484,17 @@ extern struct page *zero_page_memmap_ptr #endif /* - * IA-64 doesn't have any external MMU info: the page tables contain all the necessary - * information. However, we use this routine to take care of any (delayed) i-cache - * flushing that may be necessary. + * IA-64 doesn't guarantee Icache is consistent with Dcache. For ensure + * Icache consistency, we have to synchronize them before setting pte + * as an executable pte. */ -extern void lazy_mmu_prot_update (pte_t pte); +extern void __sync_icache_dcache(pte_t pte); +static inline void sync_icache_dcache(pte_t pte) { + if
[PATCH] flush icache before set_pte() take 5. [1/2] cache flush in migration
In migration, a new page should be cache flushed before set_pte() in some archs which have virtually-tagged cache.. V4 -> V5: * changed flush_icache_page to flush_cache_page. Signed-Off-By: KAMEZAWA Hiruyoki <[EMAIL PROTECTED]> --- mm/migrate.c |1 + 1 file changed, 1 insertion(+) Index: linux-2.6.23-rc1.test/mm/migrate.c === --- linux-2.6.23-rc1.test.orig/mm/migrate.c +++ linux-2.6.23-rc1.test/mm/migrate.c @@ -172,6 +172,7 @@ static void remove_migration_pte(struct pte = pte_mkold(mk_pte(new, vma->vm_page_prot)); if (is_write_migration_entry(entry)) pte = pte_mkwrite(pte); + flush_cache_page(vma, addr, pte_pfn(pte)); set_pte_at(mm, addr, ptep, pte); if (PageAnon(new)) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mm/sparse.c:482: error: implicit declaration of function ‘sparse_early_usemap_alloc’
On Fri, 27 Jul 2007 23:00:54 -0700 Miles Lane wrote: > Do you need my .config file? Ideally, yes. Is this for 2.6.23-rc1-mm1? > CC mm/sparse.o > mm/sparse.c: In function 'sparse_init': > mm/sparse.c:482: error: implicit declaration of function > 'sparse_early_usemap_alloc' > mm/sparse.c:482: warning: assignment makes pointer from integer without a cast > make[1]: *** [mm/sparse.o] Error 1 --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] flush icache before set_pte() take 5.
Appliled comments on take 4. patches are against 2.6.23-rc1. Changes: - changes flush_icache_page to be flush_cache_page() in remove_migration_pte(). - removed sync_icache_dcahe() in page reuse case of do_wp_page(). Considerations: - I can add CONFIG_MONTECITO if necessary. But it will be confusing, I think. Thanks, -Kame - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
mm/sparse.c:482: error: implicit declaration of function ‘sparse_early_usemap_alloc’
Do you need my .config file? CC mm/sparse.o mm/sparse.c: In function 'sparse_init': mm/sparse.c:482: error: implicit declaration of function 'sparse_early_usemap_alloc' mm/sparse.c:482: warning: assignment makes pointer from integer without a cast make[1]: *** [mm/sparse.o] Error 1 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH]: Fix procfs compat_ioctl regression.
Alexey reviewed the patch and is fine with this fix. Please apply, thanks! [PROCFS]: Fix ioctl regression. It is important to only provide the compat_ioctl method if the downstream de->proc_fops does too, otherwise this utterly confuses the logic in fs/compat_ioctl.c and we end up doing the wrong thing. Signed-off-by: David S. Miller <[EMAIL PROTECTED]> Acked-by: Alexey Dobriyan <[EMAIL PROTECTED]> diff --git a/fs/proc/inode.c b/fs/proc/inode.c index 94e2c1a..a5b0dfd 100644 --- a/fs/proc/inode.c +++ b/fs/proc/inode.c @@ -386,6 +386,19 @@ static const struct file_operations proc_reg_file_ops = { .release= proc_reg_release, }; +#ifdef CONFIG_COMPAT +static const struct file_operations proc_reg_file_ops_no_compat = { + .llseek = proc_reg_llseek, + .read = proc_reg_read, + .write = proc_reg_write, + .poll = proc_reg_poll, + .unlocked_ioctl = proc_reg_unlocked_ioctl, + .mmap = proc_reg_mmap, + .open = proc_reg_open, + .release= proc_reg_release, +}; +#endif + struct inode *proc_get_inode(struct super_block *sb, unsigned int ino, struct proc_dir_entry *de) { @@ -413,8 +426,15 @@ struct inode *proc_get_inode(struct super_block *sb, unsigned int ino, if (de->proc_iops) inode->i_op = de->proc_iops; if (de->proc_fops) { - if (S_ISREG(inode->i_mode)) - inode->i_fop = &proc_reg_file_ops; + if (S_ISREG(inode->i_mode)) { +#ifdef CONFIG_COMPAT + if (!de->proc_fops->compat_ioctl) + inode->i_fop = + &proc_reg_file_ops_no_compat; + else +#endif + inode->i_fop = &proc_reg_file_ops; + } else inode->i_fop = de->proc_fops; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] mm: reduce pagetable-freeing latencies
On Sat, 28 Jul 2007, Benjamin Herrenschmidt wrote: > > As I'm sweeping through arch code etc... preparing the ground for the > proper mmu_gather surgery, I've been thinking about the way to deal with > that per-cpu page list and finally came up with the idea that the best > we can do is around the lines of trying to allocate the list via gfp, > and if that fails, fallback to a (smaller than now) per-cpu. I'm > reworking the interfaces such that the higher level code doesn't have to > care whether preemption is enabled or disabled at a given point. That doesn't sound like the best way to me at all. Using two means of buffering, one with preemption enabled and the other not, seems complex and prone to error (perhaps not while you're working on it, but later on). We do already have that problem (the i_mmap_lock case versus the others), but it's not a complication I'd want to extend. The onstack array seems fine to me, even if you do end up deciding on an array of one. Is there any evidence that it's a problem getting a page for the freeing (other than in circumstances that are already badly slowed down)? It's obvious that we need a fallback route, but optimizing throughput on that route seems premature. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] flush cache fixes for ia64 [1/2] migration fix
On Sat, 28 Jul 2007 07:06:09 +0900 KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote: > On Fri, 27 Jul 2007 09:39:16 -0700 (PDT) > Christoph Lameter <[EMAIL PROTECTED]> wrote: > > > This will have no effect on x86_64, ia64 and i386. Maybe useful for > > virtually mapped platforms (parisc)? > > > yes. > Ahh... but I should notify you that I added sync_icahce_dcache() (for ia64) here. I'll post take5. Thanks, -Kame - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: request for patches: showing mount options
> >> Some mount options are never passed to the kernel, and thus can't appear > >> in /proc/mounts. Examples include user, users, and _netdev for NFS. > > > > These options control *who* may mount and *when* to mount. They are > > not a property of the mount itself and are not added to /etc/mtab. > > > > There's a "user=ID" option that is added to /etc/mtab in case of user > > mounts. This identifies the owner of the mount, so that it can be > > unmounted by that user. There are patches in -mm that enable the > > kernel to store this info. > > > > Do you have other examples in mind? > > [no]quota comes to mind; These are passed to the kernel. > also auto, This controls when a filesystem is mounted, same category as '_netdev' > [no]owner, [no]group, These control who can mount the filesystem, same category as 'user' and 'users' > quiet/loud, I can't find these in the manual as universal options. Quiet is defined for a couple of filesystems but with different meaning for each of them. > Aside: It's a confusing artifact of the mount CLI that these options > control who/when but are passed to the mount command in the same way the > other options are. Yes, slightly. Actually most of these options are just ignored on the command line. They only have an affect in /etc/fstab. The right behavior of mount(8) would probably be to error out on these options, since they make no sense on the command line. But this is not a kernel issue. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]
On 07/27/2007 10:28 PM, Daniel Hazelton wrote: Check the attitude at the door then re-read what I actually said: Attitude? You wanted attitude dear boy? Updatedb or another process that uses the FS heavily runs on a users 256MB P3-800 (when it is idle) and the VFS caches grow, causing memory pressure that causes other applications to be swapped to disk. In the morning the user has to wait for the system to swap those applications back in. I never said that it was the *program* itself - or *any* specific program (I used "Updatedb" because it has been the big name in the discussion) - doing the filling of memory. I actually said that the problem is that the kernel's caches - VFS and others - will grow *WITHOUT* *LIMIT*, filling all available memory. WHICH SWAP-PREFETCH DOES NOT HELP WITH. WHICH SWAP-PREFETCH DOES NOT HELP WITH. WHICH SWAP-PREFETCH DOES NOT HELP WITH. And now finally get that through your thick scull or shut up, right fucking now. You want to know what causes the problem? The current design of the caches. They will extend without much limit, to the point of actually pushing pages to disk so they can grow even more. Due to being a generally nice guy, I am going to try _once_ more to try and make you understand. Not twice, once. So pay attention. Right now. Those caches are NOT causing any problem under discussion. If any caches grow to the point of causing swap-out, they have filled memory and swap-prefetch cannot and will not do anything since it needs free (as in not occupied by caches) memory. As such, people maintaining that swap-prefetch helps their situation are not being hit by caches. The only way swap-prefetch can (and will) do anything is when something that by itself takes up lots of memory runs and exits. So can we now please finally drop the fucking red herring and start talking about swap-prefetch? If we accept that some of the people maintaining that swap-prefetch helps them are not in fact deluded -- a bit of a stretch seeing as how not a single one of them is substantiating anything -- we have a number of slightly different possibilities for "something" in the above. -- 1) It could be an inefficient updatedb. Although he isn't experiencing the problem, Bjoern Steinbrink is posting numbers (w!) that show that at least the GNU version spawns a large memory "sort" process meaning that on a low-memory box updatedb itself can be what causes the observed problem. While in this situation switching to a different updatedb (slocate, mlocate) obviously makes sense it's the kind of situation where swap-prefetch will help. -- 2) It could be something else entirely such as a backup run. I suppose people would know if they were running anything of the sort though and wouldn't blaim anything on updatedb. Other than that, it's again the situation where swap-prefetch would help. -- 3) The something else entirely can also run _after_ updatedb, kicking out the VFS caches and leaving free memory upon exit. I still suppose the same thing as under (2) but this is the only way how updatedb / VFS caches can even be part of any problem, if the _combined_ memory pressure is just enough to make the difference. The direct problem is still just the "something else entirely" and needs someone affected to tell us what it is. I already did. You completely ignored it because I happened to use the magic words "updatedb" and "swap prefetch". No I did not. This thread is about swap-prefetch and you used the magic words VFS caches. I don't give a fryin' fuck if their filling is caused by updatedb or the cat sleeping on the "find /" keys on your keyboard, they're still not causing anything swap-prefetch helps with. This thread has seen input from a selection of knowledgeable people and Morton was even running benchmarks to look at this supposed VFS cache problem and not finding it. The only further input this thread needs is someone affected by the supposed problem. Which I ofcourse notice in a followup of yours you are not either -- you're just here to blabber, not to solve anything. Rene. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.23-rc1-mm1 + hotfixes -- Section mismatches
MODPOST vmlinux.o WARNING: vmlinux.o(.text+0x183): Section mismatch: reference to .init.text.1:start_kernel (between 'is386' and 'check_x87') WARNING: vmlinux.o(.data+0x53c0): Section mismatch: reference to .init.text.4:native_smp_prepare_boot_cpu (between 'smp_ops' and 'call_lock') WARNING: vmlinux.o(.data+0x53c4): Section mismatch: reference to .init.text.4:native_smp_prepare_cpus (between 'smp_ops' and 'call_lock') WARNING: vmlinux.o(.data+0x53cc): Section mismatch: reference to .init.text.4:native_smp_cpus_done (between 'smp_ops' and 'call_lock') WARNING: vmlinux.o(.data+0x6598): Section mismatch: reference to .init.text.6:machine_specific_memory_setup (between 'paravirt_ops' and 'reserve_ioports') WARNING: vmlinux.o(.data+0x65a0): Section mismatch: reference to .init.text.4:native_init_IRQ (between 'paravirt_ops' and 'reserve_ioports') WARNING: vmlinux.o(.data+0x65a4): Section mismatch: reference to .init.text.4:hpet_time_init (between 'paravirt_ops' and 'reserve_ioports') WARNING: vmlinux.o(.data+0x65a8): Section mismatch: reference to .init.text.5:native_pagetable_setup_start (between 'paravirt_ops' and 'reserve_ioports') WARNING: vmlinux.o(.data+0x65ac): Section mismatch: reference to .init.text.5:native_pagetable_setup_done (between 'paravirt_ops' and 'reserve_ioports') WARNING: vmlinux.o(.data+0x65b0): Section mismatch: reference to .init.text.4:default_banner (between 'paravirt_ops' and 'reserve_ioports') WARNING: vmlinux.o(.data+0x6674): Section mismatch: reference to .init.text.4:setup_boot_APIC_clock (between 'paravirt_ops' and 'reserve_ioports') WARNING: vmlinux.o(.data+0x17840): Section mismatch: reference to .init.text.19:vesafb_probe (between 'vesafb_driver' and 'vesafb_ops') WARNING: vmlinux.o(.data+0x1ef00): Section mismatch: reference to .init.text.19:hvc_console_setup (between 'hvc_con_driver' and 'vtermnos') WARNING: vmlinux.o(.data+0x20780): Section mismatch: reference to .init.text.19:serial8250_console_setup (between 'serial8250_console' and 'serial8250_reg') WARNING: vmlinux.o(.data+0x20784): Section mismatch: reference to .init.text.19:serial8250_console_early_setup (between 'serial8250_console' and 'serial8250_reg') WARNING: vmlinux.o(.data+0x259cc): Section mismatch: reference to .init.text.19:smsc_ircc_pnp_probe (between 'smsc_ircc_pnp_driver' and '__param_str_ircc_transceiver') WARNING: vmlinux.o(.data+0x2e5f0): Section mismatch: reference to .init.text.19:pci_eisa_init (between 'pci_eisa_driver' and 'pci_eisa_pci_tbl') - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pluggable scheduler thread (was Re: Volanomark slows by 80% under CFS)
On Fri, Jul 27, 2007 at 11:43:23PM -0400, Chris Snook wrote: > I'm pretty sure the point of posting a patch that triples CFS performance > on a certain benchmark and arguably improves the semantics of sched_yield > was to improve CFS. You have a point, but it is a point for a different > thread. I have taken the liberty of starting this thread for you. I've no real interest in starting or participating in flamewars (especially the ones not backed by hard numbers). So I adjusted the subject a bit in the hope the discussion will not degenerate as you predicted, hope you don't mind. I'm pretty sure the point of posting that email was to show the remaining performance regression with the sched_yield fix applied too. Given you considered my post both offtopic and inflammatory, I guess you think it's possible and reasonably easy to fix that remaining regression without a pluggable scheduler, right? So please enlighten us on your intend to achieve it. Also consider the other numbers likely used nptl so they shouldn't be affected by sched_yield changes. > Sure there is. We can run a fully-functional POSIX OS without using any > block devices at all. We cannot run a fully-functional POSIX OS without a > scheduler. Any feature without which the OS cannot execute userspace code > is sufficiently primitive that somewhere there is a device on which it will > be impossible to debug if that feature fails to initialize. It is quite > reasonable to insist on only having one implementation of such features in > any given kernel build. Sounds like a red-herring to me... There aren't just pluggable I/O schedulers in the kernel, there are pluggable packet schedulers too (see `tc qdisc`). And both are switchable at runtime (not just at boot time). Can you run your fully-functional POSIX OS without a packet scheduler and without an I/O scheduler? I wonder where are you going to read/write data without HD and network? Also those pluggable things don't increase the risk of crash much, if compared to the complexity of the schedulers. > Whether or not these alternatives belong in the source tree as config-time > options is a political question, but preserving boot-time debugging > capability is a perfectly reasonable technical motivation. The scheduler is invoked very late in the boot process (printk and serial console, kdb are working for ages when scheduler kicks in), so it's fully debuggable (no debugger depends on the scheduler, they run inside the nmi handler...), I don't really see your point. And even if there would be a subtle bug in the scheduler you'll never trigger it at boot with so few tasks and so few context switches. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
IRQ Delivery Problem for MCP65
Hello, I'm having trouble getting Linux to see any hard drives on an ASUS M2N-X motherboard with an MCP65 (nForce 520) chipset. When the kernel probes the AHCI controllers, it hangs for a minute or so on each one and returns the following; ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) Thanks in advance for any help you can lend, - Craig Got a little couch potato? Check out fun summer activities for kids. http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[2.6 patch] SOFTWARE_SUSPEND: handle HOTPLUG_CPU automatically
On Fri, Jul 27, 2007 at 03:57:39PM -0700, Linus Torvalds wrote: > > > On Sat, 28 Jul 2007, Adrian Bunk wrote: > > > > The dependency of SUSPEND_SMP on HOTPLUG_CPU is quite unintuitive, so > > what about something like the patch below? > > Yeah, this looks reasonable. > > May I suggest another level of indirection, though: > > > +config SUSPEND_SMP_POSSIBLE > > + bool > > + depends on (X86 && !X86_VOYAGER) || (PPC64 && (PPC_PSERIES || PPC_PMAC)) > > + depends on SMP > > + default y > > How about making this a bit more split up, and do it as > > # SMP suspend is possible on .. > config SUSPEND_SMP_POSSIBLE > bool > depends on (X86 && !X86_VOYAGER) || (PPC64 && (PPC_PSERIES || > PPC_PMAC)) > default y > > # UP suspend is possible on .. > config SUSPEND_UP_POSSIBLE > bool > depends on X86 || PPC64_SWSUSP || FRV || PPC32 > default y Sounds good. > # Can we suspend? > config SUSPEND_POSSIBLE > bool > depends on (SMP && SUSPEND_SMP_POSSIBLE) || > (SUSPEND_UP_POSSIBLE && !SMP) > default y IMHO not required: config SOFTWARE_SUSPEND bool "Software Suspend (Hibernation)" depends on PM && SWAP depends on SUSPEND_UP_POSSIBLE || SUSPEND_SMP_POSSIBLE > and then we have just a > > config SOFTWARE_SUSPEND > bool "Software Suspend (Hibernation)" > depends on PM && SWAP > depends on SUSPEND_POSSIBLE > > config SUSPEND_SMP > bool > depends on SOFTWARE_SUSPEND && SMP > select HOTPLUG_CPU > default y > > and now each of the config options looks pretty simple and describe one > thing. > > [ For extra bonus points: the SUSPEND_POSSIBLE thing is still pretty > complicated, and it might actually be a better idea to make it a > per-arch config option, and just make the x86/arch say > > config SUSPEND_POSSIBLE > bool > depends on !(X86_VOYAGER && SMP) > default y This would give you "trying to assign nonexistent symbol SUSPEND_POSSIBLE" kconfig warnings on architectures without SUSPEND_POSSIBLE. (And you missed the UP case in your example.) > instead: since SUSPEND_POSSIBLE is always true on x86 regardless of SMP > or not, just not on X86_VOYAGER. Then, each architecture can have its > own private rules for whether that architecture has SUSPEND_POSSIBLE or > not, so on ppc, it might look like > > config SUSPEND_POSSIBLE > bool > depends on (PPC64 && (PPC_PSERIES || PPC_PMAC)) || PPC_SWSUSP > bool y > > or something, but the point is, now the complexity is a per-architecture > thing, so other architectures simply don't have to care any more! ] > > And the user only ever sees one single question: the one for > "SOFTWARE_SUSPEND". All the others would directly flow either from the > architecture choice, or from that. > > Anybody willing to rewrite it that way? Patch below. > Linus cu Adrian <-- snip --> An implementation detail of the suspend code that is not intuitive for the user is the HOTPLUG_CPU dependency of SOFTWARE_SUSPEND if SMP. This patch handles the dependency of SOFTWARE_SUSPEND on HOTPLUG_CPU automatically without the user requiring to know about it. Thanks to Stefan Richter and Linus Torvalds for valuable feedback. Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> --- arch/i386/Kconfig| 16 +++- arch/powerpc/Kconfig | 11 +-- arch/x86_64/Kconfig | 19 --- kernel/power/Kconfig | 23 +-- 4 files changed, 49 insertions(+), 20 deletions(-) commit bb14e6721dc4e1a97efbfa5398d6021b321af52d Author: Adrian Bunk <[EMAIL PROTECTED]> Date: Sat Jul 28 06:47:03 2007 +0200 asdf diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig index abb582b..eb00a12 100644 --- a/arch/i386/Kconfig +++ b/arch/i386/Kconfig @@ -903,13 +903,19 @@ config PHYSICAL_ALIGN Don't change this unless you know what you are doing. -config HOTPLUG_CPU - bool "Support for suspend on SMP and hot-pluggable CPUs (EXPERIMENTAL)" +config HOTPLUG_CPU_POSSIBLE + bool depends on SMP && HOTPLUG && EXPERIMENTAL && !X86_VOYAGER + select SUSPEND_SMP_POSSIBLE + default y + +config HOTPLUG_CPU + bool "Support for hot-pluggable CPUs (EXPERIMENTAL)" if !SUSPEND_SMP + depends on HOTPLUG_CPU_POSSIBLE + default y if SUSPEND_SMP ---help--- - Say Y here to experiment with turning CPUs off and on, and to - enable suspend on SMP systems. CPUs can be controlled through - /sys/devices/system/cpu. + Say Y here to experiment with turning CPUs off and on. + CPUs can be controlled through /sys/devices/system/cpu. config COMPAT
Re: serial flow control appears broken
Paul Fulghum wrote: So this seems to be a latency issue reading the receive FIFO in the ISR. The current rx FIFO trigger level should be 8 bytes (UART_FCR_R_TRIG_10) which gives the ISR 694usec to get the data at 115200bps. IIRC, in 2.2.X kernels this defaulted to 4 bytes (TRIG_01) which gave a little more time to service the interrupt. How does the data rate affect the frequency of the overrun errors? Does 57600bps make them go away? The overrun error message does not occur on every instance of data corruption. (I just became aware of this as I've not been paying so much attention to the error messages as I have been to the corrupt data.) The data gets far more corrupted than the error messages would lead me to believe. Since the data being sent from the fax modem to the host is identical (same image data) every time it's easier for me to measure the effect of one bitrate over another by examining the number of missing bytes from the data. The image has a total of 140465 bytes. Just now I sent it 5 times each at 115200, 57600, 38400, and 19200 bps. At 115200 bps the number of bytes skipped were: 63, 5, 44, 48, and 2. At 57600 bps the number of bytes skipped were: 0, 1, 13, 9, and 12. At 38400 bps the number of bytes skipped were 858, 0, 0, 0, and 8. At 19200 bps the number of bytes skipped were 0, 0, 0, 0, and 0. Curiously, the session at 38400 bps that skipped 858 bytes... coincided, not just in sequence but also in precice timing within the session, with a small but noticeable disk load that I caused by grepping through a hundred session logs. (I can't reproduce it easily, though, because of disk caching.) And, perhaps this is relevant... the way that I have the fax modem sending the data to the host is by receiving it from another fax modem which is sending it. Thus, the modem on ttyS0 is sending a fax to the modem on ttyS1. Due to the error correction protocol that is performed between the two fax endpoints I can guarantee that the data is correct as it leaves the DCE. I mention this in case there is any limitation to how the 8250 driver performs when two modems are being run simultaneously. Thanks, Lee. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC/RFT 0/5] Input locking patches
Indan Zupancic wrote: > On Tue, July 24, 2007 06:45, Dmitry Torokhov wrote: > > Hi everyone, > > > > I finally managed to put together some patches implementing > > locking in input core and main input handles. Please look > > over them and give them a spin. > > Since kernel 2.6.21 or so I was annoyed by a warping mouse, and > one kernel version later also by "stuck" keys, causing repeated input > at the most inconvenient moments (e.g. when opening a program by > pressing F1). > > As it happened irregularly and unpredictable it was hard to debug, > and I suspected faulty hardware. My cpu was quite hot, but after > removing all the dust it seems all right again. Unfortunately that > was about the same time I upgraded to 2.6.23-rc1, so all I can say > is that the stuck key problem seems to be gone, though not sure > thanks to what, but that neither the cleaning nor the upgrade fixed > the warping mouse problem. > > I'm running with these locking patches for two days now and the > mouse doesn't warp any more (it can also have fixed the stuck key > problem, not sure). Normally it would warp several times a day, > and that didn't happen yet, so I'm tempted to praise your patches. > > Sorry for the babbling, just wanted to say that I've tested these > patches and that they seem to fix real problems. Thanks for babbling! I'm having these same intermittent problems starting around 2.6.21, and wasn't really sure if it was hardware or not, so didn't bother reporting them. This is what I see sometimes in the logs: = PNP: PS/2 Controller [PNP0303:PS2K,PNP0f03:PS2M] at 0x60,0x64 irq 1,12 serio: i8042 KBD port at 0x60,0x64 irq 1 serio: i8042 AUX port at 0x60,0x64 irq 12 mice: PS/2 mouse device common for all mice input: PC Speaker as /class/input/input3 input: AT Translated Set 2 keyboard as /class/input/input4 input: ImPS/2 Generic Wheel Mouse as /class/input/input5 psmouse.c: bad data from KBC - bad parity psmouse.c: Wheel Mouse at isa0060/serio1/input0 lost synchronization, throwing 2 bytes away. = Thanks! -- Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: swap-prefetch: A smart way to make good use of idle resources (was: updatedb)
Chris Snook wrote: > Al Boldi wrote: > > IMHO, what everybody agrees on, is that swap-prefetch has a positive > > effect in some cases, and nobody can prove an adverse effect (excluding > > power consumption). The reason for this positive effect is also crystal > > clear: It prefetches from swap on idle into free memory, ie: it doesn't > > force anybody out, and they are the first to be dropped without further > > swap-out, which sounds really smart. > > > > Conclusion: Either prove swap-prefetch is broken, or get this merged > > quick. > > If you can't prove why it helps and doesn't hurt, then it's a hack, by > definition. Ok, slow down: swap-prefetch isn't a hack. It's a kernel-thread that adds swap-prefetch functionality to the kernel. > With swap prefetch, we're only optimizing the case when the box isn't > loaded and there's RAM free, but we're not optimizing the case when the > box is heavily loaded and we need for RAM to be free. Exactly, swap-prefetch is very specific, and that's why it's so successful: It does one thing, and it does that very well. > I'm inclined to view swap prefetch as a successful scientific experiment, > and use that data to inform a more reasoned engineering effort. If we can > design something intelligent which happens to behave more or less like > swap prefetch does under the circumstances where swap prefetch helps, and > does something else smart under the circumstances where swap prefetch > makes no discernable difference, it'll be a much bigger improvement. Well, a swapless OS would really be the ultimate, but that's another thread entirely (see thread: '[RFC] VM: I have a dream...') Don't mistake swap-prefetch as trying to additionally fix swap-in slowdown, and if it did, then that would be a hack, but it doesn't. Instead, understand that swap-prefetch is viable even if all swapper issues have been solved, because swapping implies pages being swapped in when needed, and swap-prefetch smartly uses idle time to do so. > Because we cannot prove why the existing patch helps, we cannot say what > impact it will have when things like virtualization and solid state drives > radically change the coefficients of the equation we have not solved. > Providing a sysctl to turn off a misbehaving feature is a poor substitute > for doing it right the first time, and leaving it off by default will > ensure that it only gets used by the handful of people who know enough to > rebuild with the patch anyway. But we do know why it helps: a proc eats memory, then page-cache, then swaps others out, and then dies to free its memory, and now swap-prefetch comes in if the system is idle. Sounds really smart. While many people may definitely benefit, others may just want to turn it off. No harm done. Thanks! -- Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
How can we make page replacement smarter (was: swap-prefetch)
Chris Snook wrote: > Resource size has been outpacing processing latency since the dawn of > time. Disks get bigger much faster than seek times shrink. Main memory > and cache keep growing, while single-threaded processing speed has nearly > ground to a halt. > > In the old days, it made lots of sense to manage resource allocation in > pages and blocks. In the past few years, we started reserving blocks in > ext3 automatically because it saves more in seek time than it costs in > disk space. Now we're taking preallocation and antifragmentation to the > next level with extent-based allocation in ext4. > > Well, we're still using bitmap-style allocation for pages, and the > prefetch-less swap mechanism adheres to this design as well. Maybe it's > time to start thinking about memory in a somewhat more extent-like > fashion. > > With swap prefetch, we're only optimizing the case when the box isn't > loaded and there's RAM free, but we're not optimizing the case when the > box is heavily loaded and we need for RAM to be free. This is a complete > reversal of sane development priorities. If swap batching is an > optimization at all (and we have empirical evidence that it is) then it > should also be an optimization to swap out chunks of pages when we need to > free memory. > > So, how do we go about this grouping? I suggest that if we keep per-VMA > reference/fault/dirty statistics, we can tell which logically distinct > chunks of memory are being regularly used. This would also us to apply > different page replacement policies to chunks of memory that are being > used in different fashions. > > With such statistics, we could then page out VMAs in 2MB chunks when we're > under memory pressure, also giving us the option of transparently paging > them back in to hugepages when we have the memory free, once anonymous > hugepage support is in place. > > I'm inclined to view swap prefetch as a successful scientific experiment, > and use that data to inform a more reasoned engineering effort. If we can > design something intelligent which happens to behave more or less like > swap prefetch does under the circumstances where swap prefetch helps, and > does something else smart under the circumstances where swap prefetch > makes no discernable difference, it'll be a much bigger improvement. > > Because we cannot prove why the existing patch helps, we cannot say what > impact it will have when things like virtualization and solid state drives > radically change the coefficients of the equation we have not solved. > Providing a sysctl to turn off a misbehaving feature is a poor substitute > for doing it right the first time, and leaving it off by default will > ensure that it only gets used by the handful of people who know enough to > rebuild with the patch anyway. > > Let's talk about how we can make page replacement smarter, so it naturally > accomplishes what swap prefetch accomplishes, as part of a design we can > reason about. > > CC-ing linux-mm, since that's where I think we should take this next. Good idea, but unless we understand the problems involved, we are bound to repeat it. So my first question would be: Why is swap-in so slow? As I have posted in other threads, swap-in of consecutive pages suffers a 2x slowdown wrt swap-out, whereas swap-in of random pages suffers over 6x slowdown. Because it is hard to quantify the expected swap-in speed for random pages, let's first tackle the swap-in of consecutive pages, which should be at least as fast as swap-out. So again, why is swap-in so slow? Once we understand this problem, we may be able to suggest a smart improvement. Thanks! -- Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add check do_direct_IO() return val
> I tested Andrew's patch and panic was gone but got few ENOTBLK. > So I tried with Joe's patch , both panic and ENOTBLK are gone now. > But in Joe's patch if (ret == -ENOTBLK && (rw & WRITE)), dio_cleanup(dio) > was not getting called because of break. So I moved dio_cleanup just > after if (ret). Guru, actually, break from the loop with ENOTBLK will call dio_cleanup at leater, if call it too early, that means will put_page(), maybe cause other panic. Thanks, Joe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
pluggable scheduler flamewar thread (was Re: Volanomark slows by 80% under CFS)
Andrea Arcangeli wrote: On Fri, Jul 27, 2007 at 08:31:19PM -0400, Chris Snook wrote: I think Volanomark is being pretty stupid, and deserves to run slowly, but Indeed, any app doing what volanomark does is pretty inefficient. But this is not the point. I/O schedulers are pluggable to help for inefficient apps too. If apps would be extremely smart they would all use async-io for their reads, and there wouldn't be the need of anticipatory scheduler just for an example. I'm pretty sure the point of posting a patch that triples CFS performance on a certain benchmark and arguably improves the semantics of sched_yield was to improve CFS. You have a point, but it is a point for a different thread. I have taken the liberty of starting this thread for you. The fact is there's no technical explanation for which we're forbidden to be able to choose between CFS and O(1) at least at boot time. Sure there is. We can run a fully-functional POSIX OS without using any block devices at all. We cannot run a fully-functional POSIX OS without a scheduler. Any feature without which the OS cannot execute userspace code is sufficiently primitive that somewhere there is a device on which it will be impossible to debug if that feature fails to initialize. It is quite reasonable to insist on only having one implementation of such features in any given kernel build. Whether or not these alternatives belong in the source tree as config-time options is a political question, but preserving boot-time debugging capability is a perfectly reasonable technical motivation. -- Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -mm merge plans for 2.6.23
Andrew Morton wrote: [...] > > And userspace can do a much better implementation of this > how-to-handle-large-load-shifts problem, because it is really quite > complex. The system needs to be monitored to determine what is the "usual" [...] > All this would end up needing runtime configurability and tweakability and > customisability. All standard fare for userspace stuff - much easier than > patching the kernel. But a patch already exist. Which is easier: (1) apply the patch ; or (2) write a new patch? > > So. We can > a) provide a way for userspace to reload pagecache and > b) merge maps2 (once it's finished) (pokes mpm) > and we're done? might be. but merging maps2 have higher risk which should be done in a development branch (er... 2.7, but we don't have it now). -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][Doc] memory hotplug documentaion take 2.
Thanks for your comment. Fixed patch is attached at the last of this mail. > > + > > +Note(1): x86_64's has special implementation for memory hotplug. > > + This test does not describe it. > > text (?) Oops. Yes. > > +1.2. Phases of memory hotplug > > +--- > > +There are 2 phases in Memory Hotplug. > > + 1) Physical Memory Hotplug phase > > + 2) Logical Memory Hotplug phase. > > + > > +The First phase is to communicate hardware/firmware and make/erase > > +environment for hotplugged memory. Basically, this phase is necessary > > +for the purpose (B), but this is good phase for communication between > > +highly virtulaized environments too. > > virtualized Yes. fixed... > > > + > > +When memory is hotplugged, the kernel recognizes new memory, makes new > > memory > > +management tables, and makes sysfs files for new memory's operation. > > + > > +If firmware supports notification of connection of new memory to OS, > > +this phase is triggered automatically. ACPI can notify this event. If not, > > +"probe" operation by system administration works instead of it. > > is used instead. Ah, ok. > > +(see Section 4.). > > + > > +Logical Memory Hotplug phase is to change memory state into > > +avaiable/unavailable for users. Amount of memory from user's view is > > +changed by this phase. The kernel makes all memory in it as free pages > > +when a memory range is into available. > > ?? drop "into" ? > or is a memory range always available? Confusing. Ok. I didn't know it was confusing. Thanks. I dropped it. > > +In this document, this phase is described online/offline. > >described as online/offline. OK. > > + > > +Logical Memory Hotplug phase is trigged by write of sysfs file by system > >triggered Oops. yes. > > > +administrator. When hot-add case, it must be executed after Physical > > Hotplug > > For the hot-add case, OK. > > > +phase by hand. > > +(However, if you writes udev's hotplug scripts for memory hotplug, these > > + phases can be execute in seamless way.) > > + > > + > > +1.3. Unit of Memory online/offline operation > > + > > +Memory hotplug uses SPARSEMEM memory model. SPARSEMEM divides the whole > > memory > > +into chunks of the same size. The chunk is called a "section". The size of > > +a section is architecture dependent. For example, power uses 16MiB, ia64 > > uses > > +1GiB. The unit of online/offline operation is "one section". (see Section > > 3.) > > + > > +To know the size of sections, please read this file: > >To determine the size ... I didn't know "determine" can be used for this sentence. I remembered it means just "decide" due to my English vocabulary problem. Thanks. I changed it. :-) > > +- For using remove memory, followings are necessary too > > To enable memory removal, the following are also necessary Ok. > > > +Allow for memory hot remove(CONFIG_MEMORY_HOTREMOVE) > > +Page Migration (CONFIG_MIGRATION) > > + > > +- For ACPI memory hotplug, followings are necessary too > > the following are also necessary Ok. > > +Now, XXX is defined as start_address_of_section / secion_size. > > section_size. Yes. Thanks. > > + > > +For example, assume 1GiB section size. A device for a memory starts from > > address > >for memory starting at Ok. > > + > > +In general, the firmware (ACPI) which supports memory hotplug defines > > +memory class object of _HID "PNP0C80". When a notify is asserted to > > PNP0C80, > > +Linux's ACPI handler does hot-add memory to the system and calls a hotplug > > udev > > +script. This will be done in automatically. > > drop "in" Ok. > > +If firmware supports NUMA-node hotplug, and define object of _HID > > "ACPI0004", > >defines an object Ok. > > > +"PNP0A05", or "PNP0A06", notification is asserted to it, and ACPI hander > > handler Ah, yes. Thanks again! --- This is add a document for memory hotplug to describe "How to use" and "Current status". --- Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> Signed-off-by: Yasunori Goto <[EMAIL PROTECTED]> Documentation/memory-hotplug.txt | 322 +++ 1 files changed, 322 insertions(+) Index: makedocument/Documentation/memory-hotplug.txt === --- /dev/null 1970-01-01 00:00:00.0 + +++ makedocument/Documentation/memory-hotplug.txt 2007-07-28 11:47:52.0 +0900 @@ -0,0 +1,322 @@ +===
Re: [RFC] scheduler: improve SMP fairness in CFS
Bill Huey (hui) wrote: On Fri, Jul 27, 2007 at 07:36:17PM -0400, Chris Snook wrote: I don't think that achieving a constant error bound is always a good thing. We all know that fairness has overhead. If I have 3 threads and 2 processors, and I have a choice between fairly giving each thread 1.0 billion cycles during the next second, or unfairly giving two of them 1.1 billion cycles and giving the other 0.9 billion cycles, then we can have a useful discussion about where we want to draw the line on the fairness/performance tradeoff. On the other hand, if we can give two of them 1.1 billion cycles and still give the other one 1.0 billion cycles, it's madness to waste those 0.2 billion cycles just to avoid user jealousy. The more complex the memory topology of a system, the more "free" cycles you'll get by tolerating short-term unfairness. As a crude heuristic, scaling some fairly low tolerance by log2(NCPUS) seems appropriate, but eventually we should take the boot-time computed migration costs into consideration. You have to consider the target for this kind of code. There are applications where you need something that falls within a constant error bound. According to the numbers, the current CFS rebalancing logic doesn't achieve that to any degree of rigor. So CFS is ok for SCHED_OTHER, but not for anything more strict than that. I've said from the beginning that I think that anyone who desperately needs perfect fairness should be explicitly enforcing it with the aid of realtime priorities. The problem is that configuring and tuning a realtime application is a pain, and people want to be able to approximate this behavior without doing a whole lot of dirty work themselves. I believe that CFS can and should be enhanced to ensure SMP-fairness over potentially short, user-configurable intervals, even for SCHED_OTHER. I do not, however, believe that we should take it to the extreme of wasting CPU cycles on migrations that will not improve performance for *any* task, just to avoid letting some tasks get ahead of others. We should be as fair as possible but no fairer. If we've already made it as fair as possible, we should account for the margin of error and correct for it the next time we rebalance. We should not burn the surplus just to get rid of it. On a non-NUMA box with single-socket, non-SMT processors, a constant error bound is fine. Once we add SMT, go multi-core, go NUMA, and add inter-chassis interconnects on top of that, we need to multiply this error bound at each stage in the hierarchy, or else we'll end up wasting CPU cycles on migrations that actually hurt the processes they're supposed to be helping, and hurt everyone else even more. I believe we should enforce an error bound that is proportional to migration cost. Even the rt overload code (from my memory) is subject to these limitations as well until it's moved to use a single global queue while using CPU binding to turn off that logic. It's the price you pay for accuracy. If we allow a little short-term fairness (and I think we should) we can still account for this unfairness and compensate for it (again, with the same tolerance) at the next rebalancing. Again, it's a function of *when* and depends on that application. Adding system calls, while great for research, is not something which is done lightly in the published kernel. If we're going to implement a user interface beyond simply interpreting existing priorities more precisely, it would be nice if this was part of a framework with a broader vision, such as a scheduler economy. I'm not sure what you mean by scheduler economy, but CFS can and should be extended to handle proportional scheduling which is outside of the traditional Unix priority semantics. Having a new API to get at this is unavoidable if you want it to eventually support -rt oriented appications that have bandwidth semantics. A scheduler economy is basically a credit scheduler, augmented to allow processes to exchange credits with each other. If you want to get more sophisticated with fairness, you could price CPU time proportional to load on that CPU. I've been house-hunting lately, so I like to think of it in real estate terms. If you're comfortable with your standard of living and you have enough money, you can rent the apartment in the chic part of town, right next to the subway station. If you want to be more frugal because you're saving for retirement, you can get a place out in the suburbs, but the commute will be more of a pain. If you can't make up your mind and keep moving back and forth, you spend a lot on moving and all your stuff gets dented and scratched. All deadline based schedulers have API mechanisms like this to support extended semantics. This is no different. I had a feeling this patch was originally designed for the O(1) scheduler, and this is why. The old scheduler had expired arrays, so adding a round-expired a
Re: Volanomark slows by 80% under CFS
Tim Chen wrote: Ingo, Volanomark slows by 80% with CFS scheduler on 2.6.23-rc1. Benchmark was run on a 2 socket Core2 machine. The change in scheduler treatment of sched_yield could play a part in changing Volanomark behavior. In CFS, sched_yield is implemented by dequeueing and requeueing a process . The time a process has spent running probably reduced the the cpu time due it by only a bit. The process could get re-queued pretty close to head of the queue, and may get scheduled again pretty quickly if there is still a lot of cpu time due. I wonder if this explains the 30% drop in top performance seen with the MySQL sysbench benchmark when the scheduler changed to CFS... See http://people.freebsd.org/~jeff/sysbench.png -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Source organization for two drivers sharing coomon code
Thanks for all the answers. The common code is mostly handling the message passing for hardware initialization, rings creation and some ioctls. drivers/message looks like a good place for this code to live. Subbu From: Jan Engelhardt [mailto:[EMAIL PROTECTED] To: Chris Friesen [mailto:[EMAIL PROTECTED] Cc: Subbu Seetharaman [mailto:[EMAIL PROTECTED], linux-kernel@vger.kernel.org Sent: Fri, 27 Jul 2007 12:34:16 -0700 Subject: Re: Source organization for two drivers sharing coomon code On Jul 27 2007 13:12, Chris Friesen wrote: > Jan Engelhardt wrote: >> On Jul 27 2007 10:17, Subbu Seetharaman wrote: >> >> >What is the recommended way for two drivers to share common code ? >> >...The source code for these dirvers will fit under drivers/net and >> >drivers/scsi. But both drivers share some common code. > >> You could create (in total) three modules, e.g. my-common.ko, >> my-net.ko and my-scsi.ko, of which the latter two use functions from the >> first. > > Where would the common code live, in such a case? Would you just pick one of > the two locations at random, or put it in drivers/misc or maybe lib? Perhaps drivers/message - well I can't answer that exactly. As far as the output object files are concerned, it is not relevant, since they will be autoloaded anyway :) Jan -- ___ This message, together with any attachment(s), contains confidential and proprietary information of ServerEngines LLC and is intended only for the designated recipient(s) named above. Any unauthorized review, printing, retention, copying, disclosure or distribution is strictly prohibited. If you are not the intended recipient of this message, please immediately advise the sender by reply email message and delete all copies of this message and any attachment(s). Thank you. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Documentation: document HFSPlus
On Fri, 27 Jul 2007 21:25:47 -0400 Wyatt Banks wrote: > From: Wyatt Banks <[EMAIL PROTECTED]> > > Documentation: document HFSPlus filesystem and its mount options. > > Signed-off-by:Wyatt Banks <[EMAIL PROTECTED]> Thanks. > --- > > Patched against 2.6.22.1 FYI: Patches should be against the latest -rc or -git (when available), but it probably doesn't matter in this case. > diff -uprN linux-2.6.22.1/Documentation/filesystems/hfsplus.txt > linux-2.6.22.1-devel/Documentation/filesystems/hfsplus.txt > --- linux-2.6.22.1/Documentation/filesystems/hfsplus.txt 1969-12-31 > 19:00:00.0 -0500 > +++ linux-2.6.22.1-devel/Documentation/filesystems/hfsplus.txt > 2007-07-27 21:11:10.0 -0400 > @@ -0,0 +1,59 @@ > + > +Macintosh HFSPlus Filesystem for Linux > +== > + > +HFSPlus is a filesystem first introduced in MacOS 8.1. > +HFSPlus has several extensions to HFS, including 32 bit allocation 32-bit > +blocks, 255 character unicode filenames, and file sizes of 2^63 bytes. 255-character > + > + > +Mount options > += > + > +When mounting an HFSPlus filesystem, the following options are accepted: > + > + creator=, type= > + Specifies the creator/type values as shown by the MacOS finder > + used for creating new files. Default values: ''. > + > + uid=n, gid=n > + Specifies the user/group that owns all files on the filesystem > + that have uninitialized permissions structures. > + Default: user/group id of the mounting process. > + > + umask=n > + Specifies the umask used for files and directories that have > + uninitialized permissions structures. > + Default: umask of the mounting process. in octal > + session=n > + Select the CDROM session to mount as HFSPlus filesystem. Defaults to > + leaving that decision to the CDROM driver. This option will fail > + with anything but a CDROM as underlying devices. > + > + part=n > + Select partition number n from the devices. Does only makes > + sense for CDROMS because they can't be partitioned under Linux. CDROMs or CD-ROMs and this sentence is confusing to me. Please check it. > + For disk devices the generic partition parsing code does this > + for us. Defaults to not parsing the partition table at all. > + > + decompose > + Decompose file name characters. > + > + nodecompose > + Do not decompose file name characters. > + > + force > + Used to force write access to volumes that are marked as journalled > + or locked. Use at your own risk. > + > + nls= > + Encoding to use when presenting file names. > + > + > +References > +== > + > +kernel source: > + > +Apple Technote 1150 http://developer.apple.com/technotes/tn/tn1150.html --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 12/68] 0 -> NULL, for arch/powerpc
Yoann Padioleau writes: > When comparing a pointer, it's clearer to compare it to NULL than to 0. As other people have said, if you're going to spend time on this, testing (!buf) is more idiomatic in the kernel than (buf == NULL). Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linus 2.6.23-rc1
On Sat, 28 Jul 2007, Kasper Sandberg wrote: > > Im still not so keen about this, Ingo never did get CFS to match SD in > smoothness for 3d applications, where my test subjects are quake(s), > world of warcraft via wine, unreal tournament 2004. And this is despite > many patches he sent me to try and tweak it. You realize that different people get different behaviour, don't you? Maybe not. People who think SD was "perfect" were simply ignoring reality. Sadly, that seemed to include Con too, which was one of the main reasons that I never ended entertaining the notion of merging SD for very long at all: Con ended up arguing against people who reported problems, rather than trying to work with them. Andrew also reported an oops in the scheduler when SD was merged into -mm, so there were other issues. > As far as im concerned, i may be forced to unofficially maintain SD for > my own systems(allthough lots in the gaming community is bound to be > interrested, as it does make games lots better) You know what? You can do whatever you want to. That's kind of the point of open source. Keep people honest by having alternatives. But the the thing is, if you want to do a good job of doing that, here's a big hint: instead of keeping to your isolated world, instead of just talking about your own machine and ignoring other peoples machines and issues and instead of just denying that problems may exist, and instead of attacking people who report problems, how about working with them? That was where the SD patches fell down. They didn't have a maintainer that I could trust to actually care about any other issues than his own. So here's a hint: if you think that your particular graphics card setup is the only one that matters, it's not going to be very interesting for anybody else. [ I realize that this comes as a shock to some of the SD people, but I'm told that there was a university group that did some double-blind testing of the different schedulers - old, SD and CFS - and that everybody agreed that both SD and CFS were better than the old, but that there was no significant difference between SD and CFS. You can try asking Thomas Gleixner for more details. ] I'm happy that SD was perfect for you. It wasn't for others, and it had nobody who was even interested in trying to solve those issues. As a long-term maintainer, trust me, I know what matters. And a person who can actually be bothered to follow up on problem reports is a *hell* of a lot more important than one who just argues with reporters. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problems with framebuffer in 2.6.22-git17
On Sat, 2007-07-28 at 10:14 +0800, Antonino A. Daplas wrote: > On Sat, 2007-07-28 at 02:06 +0100, Adrian McMenamin wrote: > > On 28/07/07, Antonino A. Daplas <[EMAIL PROTECTED]> wrote: > > > > > > tmp = transp << var->transp.offset | red << var->red.offset | > green << var->green.offset | blue << var->green.offset; > The above should be: tmp = regno << var->transp.offset | regno << var->red.offset | regno << var->green.offset | regno << var->green.offset; Tony - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problems with framebuffer in 2.6.22-git17
On Sat, 2007-07-28 at 02:06 +0100, Adrian McMenamin wrote: > On 28/07/07, Antonino A. Daplas <[EMAIL PROTECTED]> wrote: > > > > But certainly better at 16bpp > > Can mess about with it later to see if I can get the colours right I suppose. > You can start with pvr2fb_setcolreg() and pvr2fb_set_pal_entry(). A few things I've noticed: 1. In pvr2fb_setcolreg(), pvr2fb_set_pal_entry() is called for bpp 16 and 32. This means that the palette is modifiable, so FB_VISUAL_TRUECOLOR is probably not the correct visual for this driver, FB_VISUAL_DIRECTCOLOR is more appropriate. So, you either remove the call to set_pal_entry() in setcolreg() or change the visual to FB_VISUAL_DIRECTCOLOR. Of course, with directcolor, the pseudo_palette is now written with tmp as: tmp = transp << var->transp.offset | red << var->red.offset | green << var->green.offset | blue << var->green.offset; 2. Perhaps, the 3rd parameter passed to set_pal_entry() is not correct? Maybe you can try doing it like this for all bpp's, assuming ARGB? pvr2fb_set_pal_entry(par, regno, transp << 24 | red << 16 | green << 8 | blue); And if you want to maintain FB_VISUAL_TRUECOLOR format, initialize the palette once on init: for (i = 0; i < 256; i++) pvr2fb_set_pal_entry(par, i, i << 24 | i << 16 | i << 8 | i); to create a linear color map consistent with truecolor, then remove all other calls to pvr2fb_set_pal_entry(). Tony - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linus 2.6.23-rc1
(sorry for repost, but there seemed to have been some troubles..) On Sun, 2007-07-22 at 14:04 -0700, Linus Torvalds wrote: > Ok, right on time, two weeks afetr 2.6.22, there's a 2.6.23-rc1 out there. > > And it has a *ton* of changes as usual for the merge window, way too much > for me to be able to post even just the shortlog or diffstat on the > mailing list (but I had many people who wanted to full logs to stay > around, so you'll continue to see those being uploaded to kernel.org). > > Lots of architecture updates (for just about all of them - x86[-64], arm, > alpha, mips, ia64, powerpc, s390, sh, sparc, um..), lots of driver updates > (again, all over - usb, net, dvb, ide, sata, scsi, isdn, infiniband, > firewire, i2c, you name it). > > Filesystems, VM, networking, ACPI, it's all there. And virtualization all > over the place (kvm, lguest, Xen). > > Notable new things might be the merge of the cfs scheduler, and the UIO > driver infrastructure might interest some people. > Im still not so keen about this, Ingo never did get CFS to match SD in smoothness for 3d applications, where my test subjects are quake(s), world of warcraft via wine, unreal tournament 2004. And this is despite many patches he sent me to try and tweak it. As far as im concerned, i may be forced to unofficially maintain SD for my own systems(allthough lots in the gaming community is bound to be interrested, as it does make games lots better) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] i386 relocable kernel breakes /proc/kcore debugging
Maxim Levitsky <[EMAIL PROTECTED]> writes: > Hello, > > Today I noticed that gdb gets confused when I try to load a vmlinux image. > gdb 'thinks' that all kernel symbols are below 0x8000 , while they are at > 0xC000 > > Turning CONFIG_RELOCATABLE off fixes that, so I assume that is the reason for > that. > > I am using 2.6.23-rc1, although I don't think that older versions are better. > > Best regards, > Maxim Levitsky Weird. Vivek could this be related to the problem of problematic core dumps we were seeing earlier? Eric > PS: > This is what gdb says: > > (gdb) disassemble sys_open > Dump of assembler code for function sys_open: > 0x8026fa60 :Cannot access memory at address 0x8026fa60 > > While real address of sys_open is: > > [EMAIL PROTECTED] linux-2.6]# nm ./.obj/vmlinux | grep sys_open > . > c016ea60 T sys_open > > Strange, but gdb recordnizes the above address directly: > > (gdb) disassemble 0xc016ea60 > Dump of assembler code for function sys_open: > 0xc016ea60 :sub$0x4,%esp > 0xc016ea63 :mov0x10(%esp),%eax > ... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: swap-prefetch: A smart way to make good use of idle resources (was: updatedb)
Al Boldi wrote: People wrote: I believe the users who say their apps really do get paged back in though, so suspect that's not the case. Stopping the bush-circumference beating, I do not. -ck (and gentoo) have this massive Calimero thing going among their users where people are much less interested in technology than in how the nasty big kernel meanies are keeping them down (*). I think the problem is elsewhere. Users don't say: "My apps get paged back in." They say: "My system is more responsive". They really don't care *why* the reaction to a mouse click that takes three seconds with a mainline kernel is instantaneous with -ck. Nasty big kernel meanies, OTOH, want to understand *why* a patch helps in order to decide whether it is really a good idea to merge it. So you've got a bunch of patches (aka -ck) which visibly improve the overall responsiveness of a desktop system, but apparently no one can conclusively explain why or how they achieve that, and therefore they cannot be merged into mainline. I don't have a solution to that dilemma either. IMHO, what everybody agrees on, is that swap-prefetch has a positive effect in some cases, and nobody can prove an adverse effect (excluding power consumption). The reason for this positive effect is also crystal clear: It prefetches from swap on idle into free memory, ie: it doesn't force anybody out, and they are the first to be dropped without further swap-out, which sounds really smart. Conclusion: Either prove swap-prefetch is broken, or get this merged quick. If you can't prove why it helps and doesn't hurt, then it's a hack, by definition. Behind any performance hack is some fundamental truth that can be exploited to greater effect if we reason about it. So let's reason about it. I'll start. Resource size has been outpacing processing latency since the dawn of time. Disks get bigger much faster than seek times shrink. Main memory and cache keep growing, while single-threaded processing speed has nearly ground to a halt. In the old days, it made lots of sense to manage resource allocation in pages and blocks. In the past few years, we started reserving blocks in ext3 automatically because it saves more in seek time than it costs in disk space. Now we're taking preallocation and antifragmentation to the next level with extent-based allocation in ext4. Well, we're still using bitmap-style allocation for pages, and the prefetch-less swap mechanism adheres to this design as well. Maybe it's time to start thinking about memory in a somewhat more extent-like fashion. With swap prefetch, we're only optimizing the case when the box isn't loaded and there's RAM free, but we're not optimizing the case when the box is heavily loaded and we need for RAM to be free. This is a complete reversal of sane development priorities. If swap batching is an optimization at all (and we have empirical evidence that it is) then it should also be an optimization to swap out chunks of pages when we need to free memory. So, how do we go about this grouping? I suggest that if we keep per-VMA reference/fault/dirty statistics, we can tell which logically distinct chunks of memory are being regularly used. This would also us to apply different page replacement policies to chunks of memory that are being used in different fashions. With such statistics, we could then page out VMAs in 2MB chunks when we're under memory pressure, also giving us the option of transparently paging them back in to hugepages when we have the memory free, once anonymous hugepage support is in place. I'm inclined to view swap prefetch as a successful scientific experiment, and use that data to inform a more reasoned engineering effort. If we can design something intelligent which happens to behave more or less like swap prefetch does under the circumstances where swap prefetch helps, and does something else smart under the circumstances where swap prefetch makes no discernable difference, it'll be a much bigger improvement. Because we cannot prove why the existing patch helps, we cannot say what impact it will have when things like virtualization and solid state drives radically change the coefficients of the equation we have not solved. Providing a sysctl to turn off a misbehaving feature is a poor substitute for doing it right the first time, and leaving it off by default will ensure that it only gets used by the handful of people who know enough to rebuild with the patch anyway. Let's talk about how we can make page replacement smarter, so it naturally accomplishes what swap prefetch accomplishes, as part of a design we can reason about. CC-ing linux-mm, since that's where I think we should take this next. -- Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at
Re: Problems with reading DVD using 2.6.21.5
Manuel Reimer wrote: Hello, today I've tried to install Slackware 12.0 As the installer just "skipped" some install steps, I tried to find the error. The problem seems to be unreadable parts on the DVD: http://pastebin.com/f381e8a88 But the DVD is OK. I've checked the MD5sum directly from disc on the same system using the same DVD drive. dmesg says: http://pastebin.com/f63c5c389 The kernel, used on the Slackware setup disk, uses SMP, but my hardware doesn't support this (get error on dmesg). May this (SMP kernel on non-SMP system) cause such bugs? Is this a known bug? How could code, which breaks DVD access, get into stable 2.6.21.5? I don't think this is a bug, the drive was told to read a sector and returned error SK=03, ASC=02, ASCQ=00 which is "NO SEEK COMPLETE", in other words it couldn't find that sector. Could be that the disc is marginally readable and only sometimes causes read errors. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix DMA on Dreamcast
On 7/26/07, Paul Mundt <[EMAIL PROTECTED]> wrote: > On Thu, Jul 26, 2007 at 02:59:51PM +0200, Peter Bortas wrote: > > On 7/26/07, Marcus Comstedt <[EMAIL PROTECTED]> wrote: > > > "Peter Bortas" <[EMAIL PROTECTED]> writes: > > > > On 7/21/07, Adrian McMenamin <[EMAIL PROTECTED]> wrote: > > > >> On 21/07/07, Peter Bortas <[EMAIL PROTECTED]> wrote: > > > >> > Sidenote: Does Linux handle the Dreamcast DMA errata? > > > >> > > > >> You need to explain what you mean (at least to me!). > > > >> > > > >> If you mean will it degrade gracefully - not without this patch if set > > > >> to the (correct) defconfig. With iffy settings it will. > > > > > > > > If I remember correctly (and that's a big if since I last looked at it > > > > in 2001) some revisions of the Dreamcast hardware would sporadically > > > > lock up if you scheduled a new DMA request to quickly after a previous > > > > one, even if you checked the ready bit. It's worked around by a delay > > > > of X microseconds as recommended by Sega engineers. I don't remember > > > > the value of X, nor where exactly in the flow this workaround should > > > > be applied. > > > > > > > > Adding Marcus in case he has a better memory than me. > > > > > > I don't remeber any such delay. Are you sure you're not thinking > > > about the G2 bus problem (where accesses need to be programatically > > > serialized, whether they are PIO or DMA)? > > > > In that case my memory is worse than I thought. I'll see if I can dig > > up my old notes. > > > We've never hit any problems with the SH DMAC, so it would be interesting > if you had some more information on this. The G2 problems are well known > and documented, and the driver takes care of those issues already. After grep-ing through some 8GiB of archived mails and notes it seems marcus is absolutely correct, I'm thinking of the known G2 problem. -- Peter - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
scripts/mod/file2alias.c cross compile problem
On Fri, Jul 27, 2007 at 04:21:47PM -0700, Luck, Tony wrote: > > So it seems on ia64 with gcc 3.3.6 there's some 8 byte alignment of the > > array members? > > > > Sam and the ia64 maintainers Cc'ed - they might know better what's going > > on here. > > This ia64 maintainer is baffled ... but I don't see the problem here (perhaps > because my build machine has gcc 3.4.6). I found what causes this problem, and it only occurs during cross compilation. The struct is: #define ACPI_ID_LEN 9 struct acpi_device_id { __u8 id[ACPI_ID_LEN]; kernel_ulong_t driver_data; }; When compiling for ia64, this results in: struct acpi_device_id { __u8 id[9]; uint64_t driver_data; }; sizeof(struct acpi_device_id) for ia64 is due to different padding after id[] 20 bytes on i386 but 24 bytes on ia64. scripts/mod/file2alias.c is compiled with HOSTCC and ensures that kernel_ulong_t is correct (in this case uint64_t for ia64), but it can't cope with different padding on different architectures. > -Tony cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 06/68] 0 -> NULL, for arch/frv
On 7/27/07, Robin Getz <[EMAIL PROTECTED]> wrote: > If there is a definite style or semantic preference that everyone should live > with - does it make sense to put checks in checkpatch.pl to enforce it? checkpatch.pl does not have enough semantic knowledge to know if the thing being tested is a pointer ... dont know if the sparse utility would be able to pick it out as i'm not familiar with what level that thing runs at -mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] mm: reduce pagetable-freeing latencies
> > So I'll first do patch #1, which will not fix the problem, but will make > > the fix easier to fit in, in the meantime, please provide feedback of > > your preferred solution for avoiding the get/put_cpu of the 2 above, > > unless you find a good 3rd one. > > I too would prefer the former solution. I think preemption notifiers are > a particular iffy hack. > > You could perhaps use C99 variable length arrays to avoid the stack > waste when not needed, however Andi once told me that generates rather > dubious code. As I'm sweeping through arch code etc... preparing the ground for the proper mmu_gather surgery, I've been thinking about the way to deal with that per-cpu page list and finally came up with the idea that the best we can do is around the lines of trying to allocate the list via gfp, and if that fails, fallback to a (smaller than now) per-cpu. I'm reworking the interfaces such that the higher level code doesn't have to care whether preemption is enabled or disabled at a given point. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 06/68] 0 -> NULL, for arch/frv
On Fri 27 Jul 2007 06:18, Yoann Padioleau pondered: > David Howells <[EMAIL PROTECTED]> writes: > > > Yoann Padioleau <[EMAIL PROTECTED]> wrote: > > > >> When comparing a pointer, it's clearer to compare it to NULL than to > 0. > > > > Can you make them of style: > > > > if (!x) > > Yes I can. I can make another semantic patch later to do that > transformation. But some people may prefer (x == NULL) to (!x) > so I don't know. I think that transformation > some 0 to NULL is less controversial. > > > > > > instead? If there is a definite style or semantic preference that everyone should live with - does it make sense to put checks in checkpatch.pl to enforce it? -Robin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[rfc] direct IO submission and completion scalability issues
We have been looking into the linux kernel direct IO scalability issues with database workloads. Comments and suggestions on our below experiments are welcome. In the linux kernel, direct IO requests are not batched at the block layer. i.e, as a new request comes in, the request get directly submitted to the IO controller on the same cpu that the request originates. And the IO completion likely happens on a different cpu which is processing interrupts. This results in cacheline bouncing of some of the hot kernel cachelines (like timers, scsi cmds, slab, sched, etc) and is becoming an important scalability issue as the number of cpus and distance between them increase with multi-core and numa. In case of the controllers which support RIO/ZIO modes (like some qla2xxx), IO submission path on each cpu also checks if there any completed IO commands in the response queue and triggers softirq on the same cpu to process the completed commands. This results in each logical cpu in the system spending sometime in softirq processing and this causes contentions in spinlocks and other data structures. Not sure when the IO controllers with multiple request/response queues will be available in the market. In that case we can dedicate each queue pair to group of cpus(/a node) and be done with this problem. In the absence of such HW today, we were looking into possible solutions for these problemsa and did couple of experiments as part of this. In the first experiment, we removed the completed IO command processing during IO submission. This will now result in the processing of IO commands only on the cpu receiving interrupts. This will result in more interrupts (as we are not doing any proactive processing) but wanted to see if this is a win over each cpu doing the softirq processing. This gave a 1.36% performance improvement on a x86_64 MP system (total 16 logical cpus) and on two node ia64 platform(2 nodes, 8 cores, 16 threads) we got 1.5% improvement [please look at observation #1 below]. Reference patch for this: diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c index c5b3c61..357a497 100644 --- a/drivers/scsi/qla2xxx/qla_iocb.c +++ b/drivers/scsi/qla2xxx/qla_iocb.c @@ -414,11 +414,6 @@ qla2x00_start_scsi(srb_t *sp) WRT_REG_WORD(ISP_REQ_Q_IN(ha, reg), ha->req_ring_index); RD_REG_WORD_RELAXED(ISP_REQ_Q_IN(ha, reg)); /* PCI Posting. */ - /* Manage unprocessed RIO/ZIO commands in response queue. */ - if (ha->flags.process_response_queue && - ha->response_ring_ptr->signature != RESPONSE_PROCESSED) - qla2x00_process_response_queue(ha); - spin_unlock_irqrestore(&ha->hardware_lock, flags); return (QLA_SUCCESS); @@ -844,11 +839,6 @@ qla24xx_start_scsi(srb_t *sp) WRT_REG_DWORD(®->req_q_in, ha->req_ring_index); RD_REG_DWORD_RELAXED(®->req_q_in); /* PCI Posting. */ - /* Manage unprocessed RIO/ZIO commands in response queue. */ - if (ha->flags.process_response_queue && - ha->response_ring_ptr->signature != RESPONSE_PROCESSED) - qla24xx_process_response_queue(ha); - spin_unlock_irqrestore(&ha->hardware_lock, flags); return QLA_SUCCESS; Observation #1: This experiment puts heavy load on the cpu processing interrupts. As such, equal distribution of task load by the scheduler didn't give expected performance improvement(as cpu's with no interrupts race to idle and migrate some tasks during idle balance, leading to some increase in idle time aswell as costs associated with excessive task migration). We tweaked our manual task binding so that cpu's with no interrupts get proportionally more load compared to cpu's which process interrupts and this gave a nice performance boost as mentioned above. Perhaps, we need to make the scheduler load balancing aware of the irq load on that cpu. Second experiment which we did was migrating the IO submission to the IO completion cpu. Instead of submitting the IO on the same cpu where the request arrived, in this experiment the IO submission gets migrated to the cpu that is processing IO completions(interrupt). This will minimize the access to remote cachelines (that happens in timers, slab, scsi layers). The IO submission request is forwarded to the kblockd thread on the cpu receiving the interrupts. As part of this, we also made kblockd thread on each cpu as the highest priority thread, so that IO gets submitted as soon as possible on the interrupt cpu with out any delay. On x86_64 SMP platform with 16 cores, this resulted in 2% performance improvement and 3.3% improvement on two node ia64 platform. Quick and dirty prototype patch(not meant for inclusion) for this io migration experiment is appended to this e-mail. Observation #1 mentioned above is also applicable to this experiment. CPU's processing interrupts will now have to cater IO submission/processing load aswell. Observation #2: This introduces
[PATCH] Documentation: document HFSPlus
From: Wyatt Banks <[EMAIL PROTECTED]> Documentation: document HFSPlus filesystem and its mount options. Signed-off-by: Wyatt Banks <[EMAIL PROTECTED]> --- Patched against 2.6.22.1 diff -uprN linux-2.6.22.1/Documentation/filesystems/hfsplus.txt linux-2.6.22.1-devel/Documentation/filesystems/hfsplus.txt --- linux-2.6.22.1/Documentation/filesystems/hfsplus.txt1969-12-31 19:00:00.0 -0500 +++ linux-2.6.22.1-devel/Documentation/filesystems/hfsplus.txt 2007-07-27 21:11:10.0 -0400 @@ -0,0 +1,59 @@ + +Macintosh HFSPlus Filesystem for Linux +== + +HFSPlus is a filesystem first introduced in MacOS 8.1. +HFSPlus has several extensions to HFS, including 32 bit allocation +blocks, 255 character unicode filenames, and file sizes of 2^63 bytes. + + +Mount options += + +When mounting an HFSPlus filesystem, the following options are accepted: + + creator=, type= + Specifies the creator/type values as shown by the MacOS finder + used for creating new files. Default values: ''. + + uid=n, gid=n + Specifies the user/group that owns all files on the filesystem + that have uninitialized permissions structures. + Default: user/group id of the mounting process. + + umask=n + Specifies the umask used for files and directories that have + uninitialized permissions structures. + Default: umask of the mounting process. + + session=n + Select the CDROM session to mount as HFSPlus filesystem. Defaults to + leaving that decision to the CDROM driver. This option will fail + with anything but a CDROM as underlying devices. + + part=n + Select partition number n from the devices. Does only makes + sense for CDROMS because they can't be partitioned under Linux. + For disk devices the generic partition parsing code does this + for us. Defaults to not parsing the partition table at all. + + decompose + Decompose file name characters. + + nodecompose + Do not decompose file name characters. + + force + Used to force write access to volumes that are marked as journalled + or locked. Use at your own risk. + + nls= + Encoding to use when presenting file names. + + +References +== + +kernel source: + +Apple Technote 1150http://developer.apple.com/technotes/tn/tn1150.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ANNOUNCE] seekwatcher v0.3 IO graphing an animation
Hello everyone, I've tossed out seekwatcher v0.3. The major changes are using rolling averages to smooth out the seek and throughput graphs, and it can generate mpgs of the IO done by a given trace. Here's a sample of the smoother graphs (creating 20 kernel trees): http://oss.oracle.com/~mason/seekwatcher/ext3_vs_btrfs_vs_xfs.png There are details and sample movies of the kernel tree run at: http://oss.oracle.com/~mason/seekwatcher -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]
On Friday 27 July 2007 19:29:19 Andi Kleen wrote: > > Any faults in that reasoning? > > GNU sort uses a merge sort with temporary files on disk. Not sure > how much it keeps in memory during that, but it's probably less > than 150MB. At some point the dirty limit should kick in and write back the > data of the temporary files; so it's not quite the same as anonymous > memory. But it's not that different given. Yes, this should occur. But how many programs use temporary files like that? >From what I can tell FireFox and OpenOffice both keep all their data in memory, only using a single file for some buffering purposes. When they get pushed out by a memory hog (either short term or long term) it takes several seconds for them to be swapped back in. (I'm on a P4-1.3GHz machine with 1G of ram and rarely run more than four programs (Mail Client, XChat, FireFox and a console window) and I've seen this lag in FireFox when switching to it after starting OOo. I've also seen the same sort of lag when exiting OOo. I'll see about getting some numbers about this) > It would be better to measure than to guess. At least Andrew's measurements > on 128MB actually didn't show updatedb being really that big a problem. I agree. As I've said previously, it isn't updatedb itself which causes the problem. It's the way the VFS cache seems to just expand and expand - to the point of evicting pages to make room for itself. However, I may be wrong about that - I haven't actually tested it for myself, just looked at the numbers and other information that has been posted in this thread. > Perhaps some people have much more files or simply a less efficient > updatedb implementation? Yes, it could be the proliferation of files. It could also be some other sort of problem that is exposing a corner-case in the VFS cache or the MM. I, personally, am of the opinion that it is likely the aforementioned corner case for people reporting the "updatedb" problem. If it is, then swap-prefetch is just papering over the problem. However I do not have the knowledge and understanding of the subsystems involved to be able to do much more than make a (probably wrong) guess. > I guess the people who complain here that loudly really need to supply > some real numbers. I've seen numerous "real numbers" posted about this. As was said earlier in the thread "every time numbers are posted they are claimed to be no good". But hey, nobodies perfect :) Anyway, the discussion seems to be turning to the technical merits of swap-prefetch... Now, a completely different question: During the research (and lots of thinking) I've been doing while this thread has been going on I've often wondered why swap prefetch wasn't already in the kernel. The problem of slow swap-in has long been known, and, given current hardware, the optimal solution would be some sort of data prefetch - similar to what is done to speed up normal disk reads. Swap prefetch looks like it does exactly that. The algo could be argued over and/or improved (to suggest ways to do that I'd have to give it more than a 10 minute look) but it does provide a speed-up. This speed increase will probably be enjoyed more by the home users, but the performance increase could also help on enterprise systems. Now I'll be the first one to admit that there is a trade-off there - it will cause more power to be used because the disk's don't get a chance to spin down (or go through a cycle every time the prefetch system starts) but that could, potentially, be alleviated by having "laptop mode" switch it off. (And no, I'm not claiming that it is perfect - but then, what is when its first merged into the kernel?) DRH -- Dialup is like pissing through a pipette. Slow and excruciatingly painful. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problems with framebuffer in 2.6.22-git17
On 28/07/07, Antonino A. Daplas <[EMAIL PROTECTED]> wrote: > > Is this with commit a66ad56eb2c9644717da4d7f05f971d6786145e3 reverted? > Reapply this commit again, it might (fingers crossed) correct the color > problem. > > As to your display doubling/quadrupling with bpp 24/32, I don't have any > answers (no hardware) though it seems to be a framebuffer pitch/display > width mismatch. > Mostly solved the colour problem at 16bpp (black background but pale blue text - had previously been white). At 32bpp just as before - oversized and yellow. At 24bpp much as before too - all against black but two boot logos in greenish shade and everything doubled up on screen in greenish shade (ie around half of pixels in console text message on left, around half in repeat on right). But certainly better at 16bpp Can mess about with it later to see if I can get the colours right I suppose. Adrian - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Volanomark slows by 80% under CFS
On Fri, Jul 27, 2007 at 08:31:19PM -0400, Chris Snook wrote: > I think Volanomark is being pretty stupid, and deserves to run slowly, but Indeed, any app doing what volanomark does is pretty inefficient. But this is not the point. I/O schedulers are pluggable to help for inefficient apps too. If apps would be extremely smart they would all use async-io for their reads, and there wouldn't be the need of anticipatory scheduler just for an example. The fact is there's no technical explanation for which we're forbidden to be able to choose between CFS and O(1) at least at boot time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] scheduler: improve SMP fairness in CFS
On Fri, Jul 27, 2007 at 07:36:17PM -0400, Chris Snook wrote: > I don't think that achieving a constant error bound is always a good thing. > We all know that fairness has overhead. If I have 3 threads and 2 > processors, and I have a choice between fairly giving each thread 1.0 > billion cycles during the next second, or unfairly giving two of them 1.1 > billion cycles and giving the other 0.9 billion cycles, then we can have a > useful discussion about where we want to draw the line on the > fairness/performance tradeoff. On the other hand, if we can give two of > them 1.1 billion cycles and still give the other one 1.0 billion cycles, > it's madness to waste those 0.2 billion cycles just to avoid user jealousy. > The more complex the memory topology of a system, the more "free" cycles > you'll get by tolerating short-term unfairness. As a crude heuristic, > scaling some fairly low tolerance by log2(NCPUS) seems appropriate, but > eventually we should take the boot-time computed migration costs into > consideration. You have to consider the target for this kind of code. There are applications where you need something that falls within a constant error bound. According to the numbers, the current CFS rebalancing logic doesn't achieve that to any degree of rigor. So CFS is ok for SCHED_OTHER, but not for anything more strict than that. Even the rt overload code (from my memory) is subject to these limitations as well until it's moved to use a single global queue while using CPU binding to turn off that logic. It's the price you pay for accuracy. > If we allow a little short-term fairness (and I think we should) we can > still account for this unfairness and compensate for it (again, with the > same tolerance) at the next rebalancing. Again, it's a function of *when* and depends on that application. > Adding system calls, while great for research, is not something which is > done lightly in the published kernel. If we're going to implement a user > interface beyond simply interpreting existing priorities more precisely, it > would be nice if this was part of a framework with a broader vision, such > as a scheduler economy. I'm not sure what you mean by scheduler economy, but CFS can and should be extended to handle proportional scheduling which is outside of the traditional Unix priority semantics. Having a new API to get at this is unavoidable if you want it to eventually support -rt oriented appications that have bandwidth semantics. All deadline based schedulers have API mechanisms like this to support extended semantics. This is no different. > I had a feeling this patch was originally designed for the O(1) scheduler, > and this is why. The old scheduler had expired arrays, so adding a > round-expired array wasn't a radical departure from the design. CFS does > not have an expired rbtree, so adding one *is* a radical departure from the > design. I think we can implement DWRR or something very similar without > using this implementation method. Since we've already got a tree of queued > tasks, it might be easiest to basically break off one subtree (usually just > one task, but not necessarily) and migrate it to a less loaded tree > whenever we can reduce the difference between the load on the two trees by > at least half. This would prevent both overcorrection and undercorrection. > The idea of rounds was another implementation detail that bothered me. In > the old scheduler, quantizing CPU time was a necessary evil. Now that we > can account for CPU time with nanosecond resolution, doing things on an > as-needed basis seems more appropriate, and should reduce the need for > global synchronization. Well, there's nanosecond resolution with no mechanism that exploits it for rebalancing. Rebalancing in general is a pain and the code for it is generally orthogonal to the in-core scheduler data structures that are in use, so I don't understand the objection to this argument and the choice of methods. If it it gets the job done, then these kind of choices don't have that much meaning. > In summary, I think the accounting is sound, but the enforcement is > sub-optimal for the new scheduler. A revision of the algorithm more > cognizant of the capabilities and design of the current scheduler would > seem to be in order. That would be nice. But the amount of error in Tong's solution is much less than the current CFS logic as was previously tested even without consideration to high resolution clocks. So you have to give some kind of credit for that approach and recognized that current methods in CFS are technically a dead end if there's a need for strict fairness in a more rigorous run category than SCHED_OTHER. > I've referenced many times my desire to account for CPU/memory hierarchy in > these patches. At present, I'm not sure we have sufficient infrastructure > in the kernel to automatically optimize for system topology, but I think > whatever de
Re: Problems with framebuffer in 2.6.22-git17
On Sat, 2007-07-28 at 01:32 +0100, Adrian McMenamin wrote: > On 28/07/07, Antonino A. Daplas <[EMAIL PROTECTED]> wrote: > > On Fri, 2007-07-27 at 23:25 +0100, Adrian McMenamin wrote: > > > On 27/07/07, Antonino A. Daplas <[EMAIL PROTECTED]> wrote: > > > > On Fri, 2007-07-27 at 21:18 +0100, Adrian McMenamin wrote: > > > > > On 27/07/07, Adrian McMenamin <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > With the patch reverted and 24bpp, it oopses before freezing (with > > > > > > two > > > > > > odd looking boot logos on the screen): > > > > > > > > > > > Tested this further and it fails on: > > > > > > > > > > rev = fb_readl(par->mmio_base + 0x04); > > > > > > > > Doubtful if this line is the point of failure, this line is executed > > > > only once, on initialization. > > > > > > > > > par->mmio_base is corrupted in some way during the call to > > > register_framebuffer - still investigating how/why. > > > > Possible, par->mmio_base is the last field in struct pvr2fb_par, > > after that is the pseudo_palette. The oops did not manifest when the > > pseudo_palette was written as u16, but oops'ed when written as u32. > > Memory alignment problems? > > > > Try the patch I posted before, might help. > > > Apologies, missed the patch before. > > With the patch applied the Dreamcast no longer crashes or locks with > either 16, 24 or 32 bpp, so that's good. > > With 24bpp everything is doubled up (eg two boot logos on screen) and > about twice (?) the size it should be - though with a black screen. > > With 32 bpp everything is about 4 (?) times the size it should be and > all on a yellow background. > > With 16bpp then everything is on a blue background as before, but is > also the correct size (as before). Is this with commit a66ad56eb2c9644717da4d7f05f971d6786145e3 reverted? Reapply this commit again, it might (fingers crossed) correct the color problem. As to your display doubling/quadrupling with bpp 24/32, I don't have any answers (no hardware) though it seems to be a framebuffer pitch/display width mismatch. Tony - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] core_pattern: allow passing of arguments to user mode helper when core_pattern is a pipe
On Fri, Jul 27, 2007 at 01:54:19PM -0700, Jeremy Fitzhardinge wrote: > Neil Horman wrote: > > + int helper_argc = 0; > > > > + helper_argv = argv_split(GFP_KERNEL, corename+1, &helper_argc); > > > > Hm, I suspect most users of argv_split don't really care about argc, so > it would useful to change argv_split to take NULL as the argc pointer, > rather than declare a bunch of unused variables. Interested in throwing > a patch together? > > J Gladly, I'll take care of it next week. Regards Neil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problems with framebuffer in 2.6.22-git17
On 28/07/07, Antonino A. Daplas <[EMAIL PROTECTED]> wrote: > On Fri, 2007-07-27 at 23:25 +0100, Adrian McMenamin wrote: > > On 27/07/07, Antonino A. Daplas <[EMAIL PROTECTED]> wrote: > > > On Fri, 2007-07-27 at 21:18 +0100, Adrian McMenamin wrote: > > > > On 27/07/07, Adrian McMenamin <[EMAIL PROTECTED]> wrote: > > > > > > > > > With the patch reverted and 24bpp, it oopses before freezing (with two > > > > > odd looking boot logos on the screen): > > > > > > > > > Tested this further and it fails on: > > > > > > > > rev = fb_readl(par->mmio_base + 0x04); > > > > > > Doubtful if this line is the point of failure, this line is executed > > > only once, on initialization. > > > > > > par->mmio_base is corrupted in some way during the call to > > register_framebuffer - still investigating how/why. > > Possible, par->mmio_base is the last field in struct pvr2fb_par, > after that is the pseudo_palette. The oops did not manifest when the > pseudo_palette was written as u16, but oops'ed when written as u32. > Memory alignment problems? > > Try the patch I posted before, might help. > Apologies, missed the patch before. With the patch applied the Dreamcast no longer crashes or locks with either 16, 24 or 32 bpp, so that's good. With 24bpp everything is doubled up (eg two boot logos on screen) and about twice (?) the size it should be - though with a black screen. With 32 bpp everything is about 4 (?) times the size it should be and all on a yellow background. With 16bpp then everything is on a blue background as before, but is also the correct size (as before). So, it's better certainly, but there are still a few issues with the driver, though nothing that takes down the box. So thanks! Adrian - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Volanomark slows by 80% under CFS
Tim Chen wrote: Ingo, Volanomark slows by 80% with CFS scheduler on 2.6.23-rc1. Benchmark was run on a 2 socket Core2 machine. The change in scheduler treatment of sched_yield could play a part in changing Volanomark behavior. In CFS, sched_yield is implemented by dequeueing and requeueing a process . The time a process has spent running probably reduced the the cpu time due it by only a bit. The process could get re-queued pretty close to head of the queue, and may get scheduled again pretty quickly if there is still a lot of cpu time due. It may make sense to queue the yielding process a bit further behind in the queue. I made a slight change by zeroing out wait_runtime (i.e. have the process gives up cpu time due for it to run) for experimentation. Let's put aside gripes that Volanomark should have used a better mechanism to coordinate threads instead sched_yield for a second. Volanomark runs better and is only 40% (instead of 80%) down from old scheduler without CFS. Of course we should not tune for Volanomark and this is reference data. What are your view on how CFS's sched_yield should behave? Regards, Tim The primary purpose of sched_yield is for SCHED_FIFO realtime processes. Where nothing else will run, ever, unless the running thread blocks or yields the CPU. Under CFS, the yielding process will still be leftmost in the rbtree, otherwise it would have already been scheduled out. Zeroing out wait_runtime on sched_yield strikes me as completely appropriate. If the process wanted to sleep a finite duration, it should actually call a sleep function, but sched_yield is essentially saying "I don't have anything else to do right now", so it's hardly fair to claim you've been waiting for your chance when you just gave it up. As for the remaining 40% degradation, if Volanomark is using it for synchronization, the scheduler is probably cycling through threads until it gets to the one that actually wants to do work. The O(1) scheduler will do this very quickly, whereas CFS has a bit more overhead. Interactivity boosting may have also helped the old scheduler find the right thread faster. I think Volanomark is being pretty stupid, and deserves to run slowly, but there are legitimate reasons to want to call sched_yield in a non-SCHED_FIFO process. If I'm performing multiple different calculations on the same set of data in multiple threads, and accessing the shared data in a linear fashion, I'd like to be able to have one thread give the other some CPU time so they can stay at the same point in the stream and improve cache hit rates, but this is only an optimization if I can do it without wasting CPU or gradually nicing myself into oblivion. Having sched_yield zero out wait_runtime seems like an appropriate way to make this use case work to the extent possible. Any user attempting such an optimization should have the good sense to do real work between sched_yield calls, to avoid calling the scheduler in a tight loop. -- Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -mm merge plans for 2.6.23
On Wed, Jul 25, 2007 at 11:50:37PM -0700, Andrew Morton wrote: > On Wed, 25 Jul 2007 23:33:24 -0700 "Ray Lee" <[EMAIL PROTECTED]> wrote: > > > > So. We can > > > > > > a) provide a way for userspace to reload pagecache and > > > > > > b) merge maps2 (once it's finished) (pokes mpm) > > > > > > and we're done? > > > > Eh, dunno. Maybe? > > > > We're assuming we come up with an API for userspace to get > > notifications of evictions (without polling, though poll() would be > > fine -- you know what I mean), and an API for re-victing those things > > on demand. > > I was assuming that polling would work OK. I expect it would. > > > If you think that adding that API and maintaining it is > > simpler/better than including a variation on the above hueristic I > > offered, then yeah, I guess we are. It'll all have that vague > > userspace s2ram odor about it, but I'm sure it could be made to work. > > Actually, I overdesigned the API, I suspect. What we _could_ do is to > provide a way of allowing userspace to say "pretend process A touched page > B": adopt its mm and go touch the page. We in fact already have that: > PTRACE_PEEKTEXT. > > So I suspect this could all be done by polling maps2 and using PEEKTEXT. > The tricky part would be working out when to poll, and when to reestablish. > > A neater implementation than PEEKTEXT would be to make the maps2 files > writeable(!) so as a party trick you could tar 'em up and then, when you > want to reestablish firefox's previous working set, do a untar in > /proc/$(pidof firefox)/ Sick. But thankfully, unnecessary. The pagemaps give you more than just a present bit, which is all we care about here. We simply need to record which pages are mapped, then reference them all back to life.. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -mm merge plans for 2.6.23
On Wed, Jul 25, 2007 at 09:57:17PM -0700, Andrew Morton wrote: > So. We can > > a) provide a way for userspace to reload pagecache and > > b) merge maps2 (once it's finished) (pokes mpm) Consider me poked, despite not being cc:ed. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] misannotation in pppol2tp
Al Viro wrote: Address of auto variable is not a userland pointer. A good thing, too, since if pppol2tp_tunnel_getsockopt() would _really_ get a userland pointer as argument, it would be an instant roothole... Signed-off-by: Al Viro <[EMAIL PROTECTED]> Acked-by: James Chapman <[EMAIL PROTECTED]> Thanks Al. -- James Chapman Katalix Systems Ltd http://www.katalix.com Catalysts for your Embedded Linux software development - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]
On 2007.07.28 01:29:19 +0200, Andi Kleen wrote: > > Any faults in that reasoning? > > GNU sort uses a merge sort with temporary files on disk. Not sure > how much it keeps in memory during that, but it's probably less > than 150MB. At some point the dirty limit should kick in and write back the > data of the temporary files; so it's not quite the same as anonymous memory. > But it's not that different given. Hm, does that change anything? The files need to be read at the end (so they go into the cache) and are delete afterwards (cache gets freed I guess?). > It would be better to measure than to guess. At least Andrew's measurements > on 128MB actually didn't show updatedb being really that big a problem. Here's a before/after memory usage for an updatedb run: [EMAIL PROTECTED]:~# free -m total used free sharedbuffers cached Mem: 2011 1995 15 0269779 -/+ buffers/cache:946 1064 Swap: 1945 0 1945 [EMAIL PROTECTED]:~# updatedb [EMAIL PROTECTED]:~# free -m total used free sharedbuffers cached Mem: 2011 1914 96 0209746 -/+ buffers/cache:958 1052 Swap: 1945 0 1944 81MB more unused RAM afterwards. If anyone can make use of that, here's a snippet from /proc/$PID/smaps of updatedb's sort process, when it was at about its peak memory usage (according to the RSS column in top), which was about 50MB. 2b90ab3c1000-2b90ae4c3000 rw-p 2b90ab3c1000 00:00 0 Size: 50184 kB Rss: 50184 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 50184 kB Referenced:50184 kB > Perhaps some people have much more files or simply a less efficient > updatedb implementation? sort (GNU coreutils) 5.97 GNU updatedb version 4.2.31 > I guess the people who complain here that loudly really need to supply > some real numbers. Just to clarify: I'm not complaining either way, neither about not merging swap prefetch, nor about someone wanting that to be merge. It was rather the "discussion" that caught my attention... Just in case ;-) Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: swap-prefetch: A smart way to make good use of idle resources (was: updatedb)
On Sat, July 28, 2007 01:34, grundig wrote: > El Fri, 27 Jul 2007 15:06:14 -0700, Arjan van de Ven <[EMAIL PROTECTED]> > escribi�: > >> how do you know there will be other activity? You start the IO and that >> basically blacks out the disk for 5 to 10 ms. If the "real" IO gets >> submitted in that time you add latency. You cannot predict that IO >> happening or not happening. > > If there hasn't be much IO for some time, it looks quite reasonable to expect > that there won't be more in the near future. Good argument. > As most of heuristics can fail, but > then this is a feature mostly for desktops, not servers. Bad argument. It doesn't matter for who the feature is intended, it matter what it does and if it does it well or not. In this case, prefetching swap without disturbing anything else. > There's an old saying that says something like "an open source project starts > dying when new people can't participate in the project no matter how hard > they try". It's hard to understand why there's so many people opposing to > this when other more controversial features are merged much faster, (like, fe. > the UIO driver framework). Could people please stop this emotional crap non-argumentation? At best it reduces the chance of swap-prefetch to be merged. Perhaps one of the reasons is that this is core kernel code. And that it isn't a new feature, but a performance improvement with doubtful trade-offs. The problem statement isn't clear either. It seems like a natural enhancement, but is that enough reason to merge it? Maybe, maybe not. But if slow swap-in is the problem, shouldn't that be fixed instead of bypassed? Yes, there are people that say that it works for them, but of those a lot claim updatedb damage is fixed by it too, while that can't be true. And how many of those people did test swap prefetch stand-alone? The ck kernel has other mm patches too, perhaps those are the real goodies... And there don't seem to be many people opposing swap prefetch either. A bunch seem in favour of it, and others seem unconvinced. Me, I don't know if it should be merged or not, it solves one very specific workload, and nothing else (swap is used, and memory becomes free which won't be used in the near future). All in all it seems good, but doubtful, and when in doubt, don't merge. Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: swap-prefetch: A smart way to make good use of idle resources (was: updatedb)
On Sat, 2007-07-28 at 01:34 +0200, grundig wrote: > El Fri, 27 Jul 2007 15:06:14 -0700, Arjan van de Ven <[EMAIL PROTECTED]> > escribió: > > > how do you know there will be other activity? You start the IO and that > > basically blacks out the disk for 5 to 10 ms. If the "real" IO gets > > submitted in that time you add latency. You cannot predict that IO > > happening or not happening. > > If there hasn't be much IO for some time, it looks quite reasonable to expect > that there won't be more in the near future. As most of heuristics can fail exactly this was my point: just saying "there are no downsides" isn't true. > There's an old saying that says something like "an open source project starts > dying when new people can't participate in the project no matter how hard > they try". It's hard to understand why there's so many people opposing to > this when other more controversial features are merged much faster, (like, > fe. > the UIO driver framework). I'm not opposing this or cheering for it. I'm opposing blindly saying "there are no downsides". This needs showing with data at minimum, and my reading of this saga seems to suggest data is the bit that is lacking from the start... -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [1/2] 2.6.23-rc1: known regressions with patches v2
> Subject : ia64 build failure from recent diskquota patch > References : http://lkml.org/lkml/2007/7/18/407 > Last known good : ? > Submitter : Doug Chapman <[EMAIL PROTECTED]> > Caused-By : Vasily Tarasov <[EMAIL PROTECTED]> > commit b716395e2b8e450e294537de0c91476ded2f0395 > Handled-By : Luck, Tony <[EMAIL PROTECTED]> > Patch1 : http://lkml.org/lkml/2007/7/20/255 > Patch2 : http://lkml.org/lkml/2007/7/20/272 > Status : patch available Just sent the "please pull" message to Linus. The fix should show up in his tree soon as commit 7a6c813594c9eb25a9afbcbd30c9865e38ee6f39 -Tony - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Volanomark slows by 80% under CFS
Ingo, Volanomark slows by 80% with CFS scheduler on 2.6.23-rc1. Benchmark was run on a 2 socket Core2 machine. The change in scheduler treatment of sched_yield could play a part in changing Volanomark behavior. In CFS, sched_yield is implemented by dequeueing and requeueing a process . The time a process has spent running probably reduced the the cpu time due it by only a bit. The process could get re-queued pretty close to head of the queue, and may get scheduled again pretty quickly if there is still a lot of cpu time due. It may make sense to queue the yielding process a bit further behind in the queue. I made a slight change by zeroing out wait_runtime (i.e. have the process gives up cpu time due for it to run) for experimentation. Let's put aside gripes that Volanomark should have used a better mechanism to coordinate threads instead sched_yield for a second. Volanomark runs better and is only 40% (instead of 80%) down from old scheduler without CFS. Of course we should not tune for Volanomark and this is reference data. What are your view on how CFS's sched_yield should behave? Regards, Tim --- linux-2.6.23-rc1/kernel/sched_fair.c.orig 2007-07-27 09:39:11.0 -0700 +++ linux-2.6.23-rc1/kernel/sched_fair.c2007-07-27 09:40:41.0 -0700 @@ -841,6 +841,7 @@ * position within the tree: */ dequeue_entity(cfs_rq, &p->se, 0, now); + p->se.wait_runtime = 0; enqueue_entity(cfs_rq, &p->se, 0, now); }
Re: UML compile error
Andrew Morton wrote: > On Sat, 28 Jul 2007 00:46:57 +0200 > Gabriel C <[EMAIL PROTECTED]> wrote: > >> UML does not compile on current git head. >> >> >> $ make defconfig ARCH=um >> [..] >> $ make ARCH=um >> scripts/kconfig/conf -s arch/um/Kconfig >> net/bluetooth/hidp/Kconfig:4:warning: 'select' used by config symbol >> 'BT_HIDP' refers to undefined symbol 'HID' >> drivers/net/wireless/Kconfig:552:warning: 'select' used by config symbol >> 'RTL8187' refers to undefined symbol 'EEPROM_93CX6' >> SYMLINK arch/um/include/kern_constants.h >> CHK arch/um/include/uml-config.h >> UPD arch/um/include/uml-config.h >> CC arch/um/sys-i386/user-offsets.s >> CHK arch/um/include/user_constants.h >> CHK include/linux/version.h >> CHK include/linux/utsrelease.h >> CC arch/um/kernel/asm-offsets.s >> In file included from include/linux/sched.h:54, >> from arch/um/include/sysdep/kernel-offsets.h:2, >> from arch/um/kernel/asm-offsets.c:1: >> include/linux/jiffies.h:18:5: warning: "CONFIG_HZ" is not defined >> include/linux/jiffies.h:20:7: warning: "CONFIG_HZ" is not defined >> include/linux/jiffies.h:22:7: warning: "CONFIG_HZ" is not defined >> include/linux/jiffies.h:24:7: warning: "CONFIG_HZ" is not defined >> include/linux/jiffies.h:26:7: warning: "CONFIG_HZ" is not defined >> include/linux/jiffies.h:28:7: warning: "CONFIG_HZ" is not defined >> include/linux/jiffies.h:30:7: warning: "CONFIG_HZ" is not defined > > I suspect your build setup broke. Try `make mrproper' then > have another go. > Right this auto build tree broke for some reason. A fresh git tree is fine , sorry for the noise. Gabriel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PATCH] ACPI patches for 2.6.23-rc1
Jan Dittmer <[EMAIL PROTECTED]> writes: > Len Brown wrote: >> Hi Linus, >> >> please pull from: >> >> git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6.git release > > This seems to break ia64 defconfig: > > Building modules, stage 2. > MODPOST 157 modules > FATAL: drivers/acpi/button: sizeof(struct acpi_device_id)=20 is not a modulo > of the size of section __mod_acpi_device_table=144. Are you cross-compiling? The definition of kernel_ulong_t won't work on x86. Andreas. -- Andreas Schwab, SuSE Labs, [EMAIL PROTECTED] SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] remove gratuitous space in airo module description
Currently the modinfo looks like: description:Support for Cisco/Aironet 802.11 wireless ethernet cards. Direct support for ISA/PCI/MPI cards and support for PCMCIA when used with airo_cs. Arguably, it should be cut at the end of the first sentence. This at least makes it somewhat more legible. diff -up linux-2.6.22.x86_64/drivers/net/wireless/airo.c.foo linux-2.6.22.x86_64/drivers/net/wireless/airo.c --- linux-2.6.22.x86_64/drivers/net/wireless/airo.c.foo 2007-07-27 19:03:59.0 -0400 +++ linux-2.6.22.x86_64/drivers/net/wireless/airo.c 2007-07-27 19:04:15.0 -0400 @@ -241,8 +241,8 @@ static int proc_perm = 0644; MODULE_AUTHOR("Benjamin Reed"); MODULE_DESCRIPTION("Support for Cisco/Aironet 802.11 wireless ethernet \ - cards. Direct support for ISA/PCI/MPI cards and support \ - for PCMCIA when used with airo_cs."); +cards. Direct support for ISA/PCI/MPI cards and support \ +for PCMCIA when used with airo_cs."); MODULE_LICENSE("Dual BSD/GPL"); MODULE_SUPPORTED_DEVICE("Aironet 4500, 4800 and Cisco 340/350"); module_param_array(io, int, NULL, 0); Signed-off-by: Bill Nottingham <[EMAIL PROTECTED]> Bill - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
ATA scsi driver misbehavior under kdump capture kernel
I've run into a problem with the ATA SCSI disk driver when running in a kdump dump-capture kernel. I'm running on 2-processor x86_64 box. It has 2 scsi disks, /dev/sda and /dev/sdb My kernel is 2.6.22, and built to be a dump capturing kernel loaded by kexec. When I boot this kernel by itself, it finds both sda and sdb. But when it is loaded by kexec and booted on a panic it only finds sda. Any ideas from those familiar with the ATA driver? -Cliff Wickman SGI I put some printk's into it and get this: Standalone: [nv_adma_error_handler] cpw: ata_host_register probe port 1 (error_handler:81348625) cpw: ata_host_register call ata_port_probe cpw: ata_host_register call ata_port_schedule cpw: ata_host_register call ata_port_wait_eh cpw: ata_port_wait_eh entered cpw: ata_port_wait_eh, preparing to wait ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) cpw: ata_dev_configure entered cpw: ata_dev_configure testing class cpw: ata_dev_configure class is ATA_DEV_ATA ata2.00: ATA-6: ST3200822AS, 3.01, max UDMA/133 ata2.00: 390721968 sectors, multi 16: LBA48 cpw: ata_dev_configure exiting cpw: ata_dev_configure entered cpw: ata_dev_configure testing class cpw: ata_dev_configure class is ATA_DEV_ATA cpw: ata_dev_configure exiting cpw: ata_dev_set_mode printing: ata2.00: configured for UDMA/133 cpw: ata_port_wait_eh, finished wait cpw: ata_port_wait_eh exiting cpw: ata_host_register done with probe port 1 When loaded with kexec and booted on a panic: cpw: ata_host_register probe port 1 (error_handler:81348625) cpw: ata_host_register call ata_port_probe cpw: ata_host_register call ata_port_schedule cpw: ata_host_register call ata_port_wait_eh cpw: ata_port_wait_eh entered cpw: ata_port_wait_eh, preparing to wait ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) cpw: ata_port_wait_eh, finished wait cpw: ata_port_wait_eh exiting cpw: ata_host_register done with probe port 1 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: D-Link DFE-580TX 4 port NIC problems
On Fri, 27 Jul 2007 14:08:09 +0200 Clemens Koller <[EMAIL PROTECTED]> wrote: > Hi, Mario! > > Mario Doering schrieb: > > Hello, > > > > are there any news or questions on this issue? > > Can you try the latest kernel to see if the same problem > persists? > Is there any kernel version where it was working fine? Hello Clemens, I have tried different kernerls with no success so far. I have not tried a 2.6.22 kernel yet, but I can do so of course. It would take some time then again to wait for the error to arise ;-) Bye, Mario. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: LinuxPPS & spinlocks
Hi, On 7/28/07, Satyam Sharma <[EMAIL PROTECTED]> wrote: > Hi Rodolfo, > > On 7/28/07, Rodolfo Giometti <[EMAIL PROTECTED]> wrote: > > On Fri, Jul 27, 2007 at 01:40:14PM -0600, Chris Friesen wrote: > > > > > > My point is that the lock should be used to protect specific data. Thus, > > > it > > > would be more correct to say, "spinlock foo is taken because > > > pps_register_source() accesses variable bar". > > > > > > That way, if someone else wants to access "bar", they know that they need > > > to take lock "foo". > > > > Ah, ok! I see. :) > > I only glanced through the code, so could be wrong, but I noticed that > the only global / shared data you have in there is a global "pps_source" > array of pps_s structs. That's accessed / modified from the various > syscalls introduced in the API exported to userspace, as well as the > register/unregister/pps_event API exported to in-kernel client subsystems, > yes? So it looks like you need to introduce proper locking for it, simply > type-qualifying it as "volatile" is not enough. > > However, I think you've introduced two locks for it. The syscalls (that > run in process context, obviously) seem to use a pps_mutex and > pps_event() seems to be using the pps_lock spinlock (because that > gets executed from interrupt context) -- and from the looks of it, the > register/unregister functions are using /both/ the mutex and spinlock (!) > > This isn't quite right, (in fact there's nothing to protect pps_event from > racing against a syscall), so you should use *only* the spinlock for > synchronization -- the spin_lock_irqsave/restore() variants, in fact. Take the race between the time_pps_setparams() syscall and a concurrent pps_event() from an interrupt for instance. From sys_time_pps_setparams, the parameters for an existing source are not modified / set atomically, which means a pps_event() called on the same source in between will see invalid parameters ... and bad things will happen. > [ Also, have you considered making pps_source a list and not an array? > It'll help you lose a whole lot of MAX_SOURCES, pps_is_allocated, etc > kind of gymnastics in there, and you _can_ return a pointer to the > corresponding pps source struct from the register() function to the in-kernel > users, so that way you get to retain the O(1) access to the corresponding > source when a client calls into pps_event(), similar to how you're using the > array index presently. ] I think the above would be sane and safe -- your driver has pretty simple lifetime rules, and "sources" are only created / destroyed from within kernel, as and when clients call pps_register_source() and pps_unregister_source(). So pps_event() can be called on a given source only between the corresponding register() and unregister() -- which means register() can return us a reference/pointer on the source after allocating / adding it to the list (instead of the fixed array index as it presently is), which remains valid for the entire duration of the source, till unregister() is called, after which we can't be calling pps_event() on the same source anyway. > I also noticed code like (from pps_event): > > + /* Try to grab the lock, if not we prefere loose the event... */ > + if (!spin_trylock(&pps_lock)) > + return; > > which looks worrisome and unnecessary. That spinlock looks to be of > fine enough granularity to me, do you think there'd be any contention > on it? I /think/ you can simply make that a spin_lock(). > > Overall the code looks simple / straightforward enough to me (except for > the parport / uart stuff that I have no clue about), and I'll also read up on > the relevant RFC for this and would hopefully try and give you a more > meaningful review over the weekend. Ok, I've looked through (most of) the RFC and code now, and am only commenting on a design-level for now. Anyway, I didn't like the way you've significantly drifted from the RFC in several ways: 1. The RFC mandates no such userspace interface / syscall as the time_pps_cmd() that you've implemented -- it looks, smells, and feels like an ioctl, in fact that's what it is for practical purposes. I'm confused as to why didn't you just go ahead and implement the special-file-and- file-descriptor based approach as advocated / mandated there. [ You've implemented the (optional, as per RFC) time_pps_findsource operation in the kernel using the above "pseudo-ioctl", but that wasn't necessary -- as the RFC itself illustrates, it's something that can easily be done (in fact should be done) completely in userspace itself. ] 2. If you fix the above two issues, you'll notice that you don't need to short-circuit the (RFC-mandated) time_pps_create/destroy(handle) syscalls in the userspace header/library anymore, as you presently are. Here's how I'd go about desiging/implementing this: * At the time of pps_register_source() -- called by an in-kernel client subsystem that creates a PPS source -- allocate a pps source, gener
Re: Kernel modules compilation
Thank you very much for your help! Tomorrow I will try! :-) Bye! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] scheduler: improve SMP fairness in CFS
Tong Li wrote: On Fri, 27 Jul 2007, Chris Snook wrote: Tong Li wrote: I'd like to clarify that I'm not trying to push this particular code to the kernel. I'm a researcher. My intent was to point out that we have a problem in the scheduler and my dwrr algorithm can potentially help fix it. The patch itself was merely a proof-of-concept. I'd be thrilled if the algorithm can be proven useful in the real world. I appreciate the people who have given me comments. Since then, I've revised my algorithm/code. Now it doesn't require global locking but retains strong fairness properties (which I was able to prove mathematically). Thanks for doing this work. Please don't take the implementation criticism as a lack of appreciation for the work. I'd like to see dwrr in the scheduler, but I'm skeptical that re-introducing expired runqueues is the most efficient way to do it. Given the inherently controversial nature of scheduler code, particularly that which attempts to enforce fairness, perhaps a concise design document would help us come to an agreement about what we think the scheduler should do and what tradeoffs we're willing to make to do those things. Do you have a design document we could discuss? -- Chris Thanks for the interest. Attached is a design doc I wrote several months ago (with small modifications). It talks about the two pieces of my design: group scheduling and dwrr. The description was based on the original O(1) scheduler, but as my CFS patch showed, the algorithm is applicable to other underlying schedulers as well. It's interesting that I started working on this in January for the purpose of eventually writing a paper about it. So I knew reasonably well the related research work but was totally unaware that people in the Linux community were also working on similar things. This is good. If you are interested, I'd like to help with the algorithms and theory side of the things. tong --- Overview: Trio extends the existing Linux scheduler with support for proportional-share scheduling. It uses a scheduling algorithm, called Distributed Weighted Round-Robin (DWRR), which retains the existing scheduler design as much as possible, and extends it to achieve proportional fairness with O(1) time complexity and a constant error bound, compared to the ideal fair scheduling algorithm. The goal of Trio is not to improve interactive performance; rather, it relies on the existing scheduler for interactivity and extends it to support MP proportional fairness. Trio has two unique features: (1) it enables users to control shares of CPU time for any thread or group of threads (e.g., a process, an application, etc.), and (2) it enables fair sharing of CPU time across multiple CPUs. For example, with ten tasks running on eight CPUs, Trio allows each task to take an equal fraction of the total CPU time. These features enable Trio to complement the existing Linux scheduler to enable greater user flexibility and stronger fairness. Background: Over the years, there has been a lot of criticism that conventional Unix priorities and the nice interface provide insufficient support for users to accurately control CPU shares of different threads or applications. Many have studied scheduling algorithms that achieve proportional fairness. Assuming that each thread has a weight that expresses its desired CPU share, informally, a scheduler is proportionally fair if (1) it is work-conserving, and (2) it allocates CPU time to threads in exact proportion to their weights in any time interval. Ideal proportional fairness is impractical since it requires that all runnable threads be running simultaneously and scheduled with infinitesimally small quanta. In practice, every proportional-share scheduling algorithm approximates the ideal algorithm with the goal of achieving a constant error bound. For more theoretical background, please refer to the following papers: I don't think that achieving a constant error bound is always a good thing. We all know that fairness has overhead. If I have 3 threads and 2 processors, and I have a choice between fairly giving each thread 1.0 billion cycles during the next second, or unfairly giving two of them 1.1 billion cycles and giving the other 0.9 billion cycles, then we can have a useful discussion about where we want to draw the line on the fairness/performance tradeoff. On the other hand, if we can give two of them 1.1 billion cycles and still give the other one 1.0 billion cycles, it's madness to waste those 0.2 billion cycles just to avoid user jealousy. The more complex the memory topology of a system, the more "free" cycles you'll get by tolerating short-term unfairness. As a crude heuristic, scaling some fairly low tolerance by log2(NCPUS) seems appropriate, but eventually we should take the boot-time computed migration costs into consideration. [1] A.
Re: swap-prefetch: A smart way to make good use of idle resources (was: updatedb)
El Fri, 27 Jul 2007 15:06:14 -0700, Arjan van de Ven <[EMAIL PROTECTED]> escribió: > how do you know there will be other activity? You start the IO and that > basically blacks out the disk for 5 to 10 ms. If the "real" IO gets > submitted in that time you add latency. You cannot predict that IO > happening or not happening. If there hasn't be much IO for some time, it looks quite reasonable to expect that there won't be more in the near future. As most of heuristics can fail, but then this is a feature mostly for desktops, not servers. There's an old saying that says something like "an open source project starts dying when new people can't participate in the project no matter how hard they try". It's hard to understand why there's so many people opposing to this when other more controversial features are merged much faster, (like, fe. the UIO driver framework). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] lzo: Add some missing casts
Add some casts to the LZO compression algorithm after they were removed during cleanup and shouldn't have been. Signed-off-by: Richard Purdie <[EMAIL PROTECTED]> --- This fixes the reported problems for me, I've checked fairly carefully and I can't see any other issues. Edward, could you see if this resolves the problems in your case please? Index: linux-2.6.22/lib/lzo/lzo1x_compress.c === --- linux-2.6.22.orig/lib/lzo/lzo1x_compress.c +++ linux-2.6.22/lib/lzo/lzo1x_compress.c @@ -32,13 +32,13 @@ _lzo1x_1_do_compress(const unsigned char ip += 4; for (;;) { - dindex = ((0x21 * DX3(ip, 5, 5, 6)) >> 5) & D_MASK; + dindex = ((size_t)(0x21 * DX3(ip, 5, 5, 6)) >> 5) & D_MASK; m_pos = dict[dindex]; if (m_pos < in) goto literal; - if (ip == m_pos || (ip - m_pos) > M4_MAX_OFFSET) + if (ip == m_pos || ((size_t)(ip - m_pos) > M4_MAX_OFFSET)) goto literal; m_off = ip - m_pos; @@ -51,7 +51,7 @@ _lzo1x_1_do_compress(const unsigned char if (m_pos < in) goto literal; - if (ip == m_pos || (ip - m_pos) > M4_MAX_OFFSET) + if (ip == m_pos || ((size_t)(ip - m_pos) > M4_MAX_OFFSET)) goto literal; m_off = ip - m_pos; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: serial flow control appears broken
On Fri, 2007-07-27 at 13:48 -0700, Lee Howard wrote: > Here's the output: > > type: 4 > line: 1 > line: 760 > irq: 3 >flags: 1358954688 > xmit_fifo_size: 16 > custom_divisor: 0 >baud_base: 115200 OK, the FIFO should be enabled. What is known: * The error is a hardware FIFO overrun. - observed message is in n_tty due to driver setting TTY_OVERRUN * The RTS/CTS flow control is not involved - this is done only by the ldisc in response to buffer levels - you verified crtscts is set - you did not observed RTS change when 'overflow error' logged - you did observe RTS change when application stopped reading So this seems to be a latency issue reading the receive FIFO in the ISR. The current rx FIFO trigger level should be 8 bytes (UART_FCR_R_TRIG_10) which gives the ISR 694usec to get the data at 115200bps. IIRC, in 2.2.X kernels this defaulted to 4 bytes (TRIG_01) which gave a little more time to service the interrupt. How does the data rate affect the frequency of the overrun errors? Does 57600bps make them go away? -- Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC/RFT 1/5] Input: implement proper locking in input core
Hi, Not real feedback, just some nitpicks. On Tue, July 24, 2007 06:45, Dmitry Torokhov wrote: > +static int input_defuzz_abs_event(int value, int old_val, int fuzz) > +{ > + if (fuzz) { > + if (value > old_val - fuzz / 2 && value < old_val + fuzz / 2) > + return value; > > - add_input_randomness(type, code, value); > + if (value > old_val - fuzz && value < old_val + fuzz) > + return (old_val * 3 + value) / 4; > > - switch (type) { > + if (value > old_val - fuzz * 2 && value < old_val + fuzz * 2) > + return (old_val + value) / 2; > + } Shouldn't the return values of the second and third case be reversed? In the 2nd check the new values is weighted for 1/4, while in the 3rd case it counts for 1/2, which breaks the "account new value more when it is closer to the old one" logic that I thought I saw here. So to sum up, should the second return be "return (old_val + value * 3) / 4"? > +/* > + * Generate software autorepeat event. Note that we take > + * dev->event_lock here to avoid racing with input_event > + * which may cause keys get "stuck". > + */ Hurray. :-) > - if (code > SW_MAX || !test_bit(code, dev->swbit) || > !!test_bit(code, dev->sw) == value) > - return; > + if (dev->rep[REP_PERIOD]) > + mod_timer(&dev->timer, jiffies + > + msecs_to_jiffies(dev->rep[REP_PERIOD])); > + } Perhaps use a local var for the "msecs_to_jiffies(dev->rep[REP_PERIOD])" part. > +static void input_start_autorepeat(struct input_dev *dev, int code) > +{ > + if (test_bit(EV_REP, dev->evbit) && > + dev->rep[REP_PERIOD] && dev->rep[REP_DELAY] && > + dev->timer.data) { > + dev->repeat_key = code; > + mod_timer(&dev->timer, > + jiffies + msecs_to_jiffies(dev->rep[REP_DELAY])); > + } > +} Same here. > + case EV_KEY: > + if (is_event_supported(code, dev->keybit, KEY_MAX) && > + !!test_bit(code, dev->key) != value) { A bit confusing, test_bit(0 only returns 0 or 1 anyway, doesn't it? So "test_bit(code, dev->key) != value" should be all right. I noticed that the old code did it too, but still. > - case EV_MSC: > + case EV_SW: > + if (is_event_supported(code, dev->swbit, SW_MAX) && > + !!test_bit(code, dev->sw) != value) { Same. > - break; > + case EV_LED: > + if (is_event_supported(code, dev->ledbit, LED_MAX) && > + !!test_bit(code, dev->led) != value) { And here. > +void input_inject_event(struct input_handle *handle, > + unsigned int type, unsigned int code, int value) > { > - struct input_dev *dev = (void *) data; > + struct input_dev *dev = handle->dev; > + struct input_handle *grab; > > - if (!test_bit(dev->repeat_key, dev->key)) > - return; > + if (is_event_supported(type, dev->evbit, EV_MAX)) { > + spin_lock_irq(&dev->event_lock); > > - input_event(dev, EV_KEY, dev->repeat_key, 2); > - input_sync(dev); > + grab = rcu_dereference(dev->grab); > + if (!grab || grab == handle) > + input_handle_event(dev, type, code, value); 'handle' can't be NULL, so can drop the "!grab" check, as checking "grab == handle" should be sufficient. > +/** > + * input_open_device - open input device > + * @handle: handle through which device is being accessed > + * > + * This function should be called by input handlers when they > + * want to start receive events from given input device. > + */ > int input_open_device(struct input_handle *handle) > { > struct input_dev *dev = handle->dev; > - int err; > + int retval; > > - err = mutex_lock_interruptible(&dev->mutex); > - if (err) > - return err; > + retval = mutex_lock_interruptible(&dev->mutex); > + if (retval) > + return retval; > + > + if (dev->going_away) { > + retval = -ENODEV; > + goto out; > + } > > handle->open++; > > if (!dev->users++ && dev->open) Ugh, not your code, and perhaps it's me, but that looks weird. The ++ hidden inthe if check is ugly, and would mean that "users" can be negative, which is strange. > - err = dev->open(dev); > + retval = dev->open(dev); > > - if (err) > - handle->open--; > + if (retval && !--handle->open) { Eek! That -- is hidden well there. Would it hurt to call synchronize_sched() unconditionally? Something like: if (retval) { handle->open--; It's a rare case anyway. > + /* > + * Make sure we are not delivering any more events > + * through this handle > + */ > + synchronize_
Re: UML compile error
On Sat, 28 Jul 2007 00:46:57 +0200 Gabriel C <[EMAIL PROTECTED]> wrote: > UML does not compile on current git head. > > > $ make defconfig ARCH=um > [..] > $ make ARCH=um > scripts/kconfig/conf -s arch/um/Kconfig > net/bluetooth/hidp/Kconfig:4:warning: 'select' used by config symbol > 'BT_HIDP' refers to undefined symbol 'HID' > drivers/net/wireless/Kconfig:552:warning: 'select' used by config symbol > 'RTL8187' refers to undefined symbol 'EEPROM_93CX6' > SYMLINK arch/um/include/kern_constants.h > CHK arch/um/include/uml-config.h > UPD arch/um/include/uml-config.h > CC arch/um/sys-i386/user-offsets.s > CHK arch/um/include/user_constants.h > CHK include/linux/version.h > CHK include/linux/utsrelease.h > CC arch/um/kernel/asm-offsets.s > In file included from include/linux/sched.h:54, > from arch/um/include/sysdep/kernel-offsets.h:2, > from arch/um/kernel/asm-offsets.c:1: > include/linux/jiffies.h:18:5: warning: "CONFIG_HZ" is not defined > include/linux/jiffies.h:20:7: warning: "CONFIG_HZ" is not defined > include/linux/jiffies.h:22:7: warning: "CONFIG_HZ" is not defined > include/linux/jiffies.h:24:7: warning: "CONFIG_HZ" is not defined > include/linux/jiffies.h:26:7: warning: "CONFIG_HZ" is not defined > include/linux/jiffies.h:28:7: warning: "CONFIG_HZ" is not defined > include/linux/jiffies.h:30:7: warning: "CONFIG_HZ" is not defined I suspect your build setup broke. Try `make mrproper' then have another go. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]
> Any faults in that reasoning? GNU sort uses a merge sort with temporary files on disk. Not sure how much it keeps in memory during that, but it's probably less than 150MB. At some point the dirty limit should kick in and write back the data of the temporary files; so it's not quite the same as anonymous memory. But it's not that different given. It would be better to measure than to guess. At least Andrew's measurements on 128MB actually didn't show updatedb being really that big a problem. Perhaps some people have much more files or simply a less efficient updatedb implementation? I guess the people who complain here that loudly really need to supply some real numbers. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problems with framebuffer in 2.6.22-git17
On Fri, 2007-07-27 at 23:25 +0100, Adrian McMenamin wrote: > On 27/07/07, Antonino A. Daplas <[EMAIL PROTECTED]> wrote: > > On Fri, 2007-07-27 at 21:18 +0100, Adrian McMenamin wrote: > > > On 27/07/07, Adrian McMenamin <[EMAIL PROTECTED]> wrote: > > > > > > > With the patch reverted and 24bpp, it oopses before freezing (with two > > > > odd looking boot logos on the screen): > > > > > > > Tested this further and it fails on: > > > > > > rev = fb_readl(par->mmio_base + 0x04); > > > > Doubtful if this line is the point of failure, this line is executed > > only once, on initialization. > > > par->mmio_base is corrupted in some way during the call to > register_framebuffer - still investigating how/why. Possible, par->mmio_base is the last field in struct pvr2fb_par, after that is the pseudo_palette. The oops did not manifest when the pseudo_palette was written as u16, but oops'ed when written as u32. Memory alignment problems? Try the patch I posted before, might help. Tony - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]
On 2007.07.27 20:16:32 +0200, Rene Herman wrote: > On 07/27/2007 07:45 PM, Daniel Hazelton wrote: > >> Updatedb or another process that uses the FS heavily runs on a users >> 256MB P3-800 (when it is idle) and the VFS caches grow, causing memory >> pressure that causes other applications to be swapped to disk. In the >> morning the user has to wait for the system to swap those applications >> back in. >> Questions about it: >> Q) Does swap-prefetch help with this? A) [From all reports I've seen (*)] >> Yes, it does. > > No it does not. If updatedb filled memory to the point of causing swapping > (which noone is reproducing anyway) it HAS FILLED MEMORY and swap-prefetch > hasn't any memory to prefetch into -- updatedb itself doesn't use any > significant memory. > > Here's swap-prefetch's author saying the same: > > http://lkml.org/lkml/2007/2/9/112 > > | It can't help the updatedb scenario. Updatedb leaves the ram full and > | swap prefetch wants to cost as little as possible so it will never > | move anything out of ram in preference for the pages it wants to swap > | back in. > > Now please finally either understand this, or tell us how we're wrong. Con might have been wrong there for boxes with really little memory. My desktop box has not even 300k inodes in use (IIRC someone posted a df -i output showing 1 million inodes in use). Still, the memory footprint of the "sort" process grows up to about 50MB. Assuming that the average filename length stays, that would mean 150MB for the 1 million inode case, just for the "sort" process. Now, sort cannot produce any output before its got all its input, so that RSS usage exists at least as long as the VFS cache is growing due to the ongoing search for files. And then, all that memory that "sort" uses is required, because sort needs to output its results. So if there's memory pressure, the VFS cache is likely to be dropped, because "sort" needs its data, for sorting and producing output. And then sort terminates and leaves that whole lot of memory _unused_. The other actions of updatedb only touch the locate db, which is just a few megs (4.5MB here) big so the cache won't grow that much again. OK, so we got about, say, at least 128MB of totally unused memory, maybe even more. If you look at the vmstat output I sent, you see that I had between 90MB and 128MB free, depending on the swappiness setting, with increased inode usage, that could very well scale up. Conclusion: updatedb does _not_ leave the RAM full. And for a box with little memory (say 256MB) it might even be 50% or more memory that is free after updatedb ran. Might that make swap prefetch kick in? Any faults in that reasoning? Thanks, Björn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][sas] Fix potential NULL pointer dereference bug in sas_smp_get_phy_events()
On 28/07/07, James Bottomley <[EMAIL PROTECTED]> wrote: > On Fri, 2007-07-27 at 23:27 +0200, Jesper Juhl wrote: > > In sas_smp_get_phy_events() we never test if the call to > > alloc_smp_req(RPEL_REQ_SIZE) succeeds or fails. That means we run > > the risk of dereferencing a NULL pointer if it does fail. Far > > better to test if we got NULL back and in that case return -ENOMEM > > just as we already do for the other memory allocation in that > > function. > > This patch reworks the memory allocation a bit to deal with it > > (compile tested only). > > > > > > Signed-off-by: Jesper Juhl <[EMAIL PROTECTED]> > > --- > > > > drivers/scsi/libsas/sas_expander.c | 11 +-- > > 1 files changed, 9 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/scsi/libsas/sas_expander.c > > b/drivers/scsi/libsas/sas_expander.c > > index b500f0c..85f5145 100644 > > --- a/drivers/scsi/libsas/sas_expander.c > > +++ b/drivers/scsi/libsas/sas_expander.c > > @@ -507,14 +507,21 @@ static int sas_dev_present_in_domain(struct > > asd_sas_port *port, > > int sas_smp_get_phy_events(struct sas_phy *phy) > > { > > int res; > > + u8 *req; > > + u8 *resp; > > struct sas_rphy *rphy = dev_to_rphy(phy->dev.parent); > > struct domain_device *dev = sas_find_dev_by_rphy(rphy); > > - u8 *req = alloc_smp_req(RPEL_REQ_SIZE); > > - u8 *resp = kzalloc(RPEL_RESP_SIZE, GFP_KERNEL); > > > > + resp = kzalloc(RPEL_RESP_SIZE, GFP_KERNEL); > > Actually, this should be alloc_smp_resp(RPEL_RESP_SIZE); > > > if (!resp) > > return -ENOMEM; > > > > + req = alloc_smp_req(RPEL_REQ_SIZE); > > + if (!req) { > > + res = -ENOMEM; > > + goto out; > > + } > > Just for the sake of being the same as all the rest of the code, the > sequence should be > > req = alloc_smp_req(xxx_REQ_SIZE); > if (!req) > return -ENOMEM; > > resp = alloc_smp_resp(xxx_RESP_SIZE); > if (!resp) { > kfree(req); > return -ENOMEM; > } > > (allocate request then response). > Fair enough. It makes the code a bit larger though : My way, as per the original patch: textdata bss dec hex filename 13820 0 8 138283604 drivers/scsi/libsas/sas_expander.o Your way, as per this patch: textdata bss dec hex filename 13832 0 8 138403610 drivers/scsi/libsas/sas_expander.o I hope this patch is acceptable : In sas_smp_get_phy_events() we never test if the call to alloc_smp_req(RPEL_REQ_SIZE) succeeds or fails. That means we run the risk of dereferencing a NULL pointer if it does fail. Far better to test if we got NULL back and in that case return -ENOMEM just as we already do for the other memory allocation in that function. This patch should take care of it (compile tested only). Signed-off-by: Jesper Juhl <[EMAIL PROTECTED]> --- drivers/scsi/libsas/sas_expander.c | 13 ++--- 1 files changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c index b500f0c..e98d2b9 100644 --- a/drivers/scsi/libsas/sas_expander.c +++ b/drivers/scsi/libsas/sas_expander.c @@ -507,14 +507,21 @@ static int sas_dev_present_in_domain(struct asd_sas_port *port, int sas_smp_get_phy_events(struct sas_phy *phy) { int res; + u8 *req; + u8 *resp; struct sas_rphy *rphy = dev_to_rphy(phy->dev.parent); struct domain_device *dev = sas_find_dev_by_rphy(rphy); - u8 *req = alloc_smp_req(RPEL_REQ_SIZE); - u8 *resp = kzalloc(RPEL_RESP_SIZE, GFP_KERNEL); - if (!resp) + req = alloc_smp_req(RPEL_REQ_SIZE); + if (!req) return -ENOMEM; + resp = alloc_smp_resp(RPEL_RESP_SIZE); + if (!resp) { + kfree(req); + return -ENOMEM; + } + req[1] = SMP_REPORT_PHY_ERR_LOG; req[9] = phy->number; > It looks like disc_resp could use a little love too (it's using the req > alloc routines). > I'll take a look at that later. -- Jesper Juhl <[EMAIL PROTECTED]> Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html Plain text mails only, please http://www.expita.com/nomime.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] let SUSPEND select HOTPLUG_CPU
On Sat, Jul 28, 2007 at 12:47:37AM +0200, Stefan Richter wrote: > Adrian Bunk wrote: > > The dependency of SUSPEND_SMP on HOTPLUG_CPU is quite unintuitive, > > It's not entirely unintuitive. That option's full name is "Support for > suspend on SMP and hot-pluggable CPUs". > > Only the place where you find the option is unintuitive, as far as its > first application is concerned. (It lives in the "Processor type and > features" menu which is OK for the 2nd application of this option.) And > the variable name of that option is unintuitive because it covers only > the 2nd application of the option, I suppose for historical reasons. We can figure out ourselves when HOTPLUG_CPU is required, so there's no reason to bother the user with it. > > +config SUSPEND_SMP_POSSIBLE > > + bool > > + depends on (X86 && !X86_VOYAGER) || (PPC64 && (PPC_PSERIES || PPC_PMAC)) > > + depends on SMP > > + default y > > + > > +config SUSPEND_SMP > > + bool > > + depends on SUSPEND_SMP_POSSIBLE && SOFTWARE_SUSPEND > > + select HOTPLUG_CPU > > + default y > > Yes, that's the price to pay if you want to select something that in > turn depends on a number of other things. Yes, but a good user interface is worth it. > Wait, doesn't HOTPLUG_CPU also depend on EXPERIMENTAL? Damn, I started thinking about it, and then forgot about it when finishing the patch. My thoughts were: Is HOTPLUG_CPU still an experimental feature, or has it become a well-tested no longer experimental feature now that it's used on most recent laptops? > Stefan Richter cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel modules compilation
On 28/07/07, shacky <[EMAIL PROTECTED]> wrote: > > Symbol: USB [=y] > > Prompt: Support for Host-side USB > > Defined at drivers/usb/Kconfig:51 > > Depends on: USB_SUPPORT && USB_ARCH_HAS_HCD > > Location: > > -> Device Drivers > > -> USB support (USB_SUPPORT [=y]) > > Could you tell me how you found them, please? > Some of them I just knew from past experiences where to find, some of them are logical (like, obviously 'reiserfs' is found in the Filesystems submenu), some I searched for using "/" and a few I googled. > > Hint: In menuconfig, type "/" to search. > > Thank you very much! > You're welcome. By the way; if you had taken the time to read the text at the top of the menuconfig interface you'd have known this already - "... Press to exit, for Help, for Search." > > Not really. Your distribution could be loading a ton of modules that > > you don't really need. 'lsmod' will just show you what is currently > > loaded, but that that doesn't necessarily mean that all those modules > > are really needed. > > Ok, how I can know what modules are needed and what not? Only knowing > the hardware of my system? > If you know the hardware of the system, the filesystems you use etc etc, then it should be possible to deduce what modules you need... Read the help text for each config option related to your modules and think about whether or not you need it... > Another question please, what the symbol "---" near a kernel > configuration entry in menuconfig means? This entry is activated (with > * or M) or not? > It means that the option was automagically selected by some other option you selected, so you can't disable it unless you first disable that other option that selected it. -- Jesper Juhl <[EMAIL PROTECTED]> Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html Plain text mails only, please http://www.expita.com/nomime.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][sas] Fix potential NULL pointer dereference bug in sas_smp_get_phy_events()
On Fri, 2007-07-27 at 23:27 +0200, Jesper Juhl wrote: > In sas_smp_get_phy_events() we never test if the call to > alloc_smp_req(RPEL_REQ_SIZE) succeeds or fails. That means we run > the risk of dereferencing a NULL pointer if it does fail. Far > better to test if we got NULL back and in that case return -ENOMEM > just as we already do for the other memory allocation in that > function. > This patch reworks the memory allocation a bit to deal with it > (compile tested only). > > > Signed-off-by: Jesper Juhl <[EMAIL PROTECTED]> > --- > > drivers/scsi/libsas/sas_expander.c | 11 +-- > 1 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/libsas/sas_expander.c > b/drivers/scsi/libsas/sas_expander.c > index b500f0c..85f5145 100644 > --- a/drivers/scsi/libsas/sas_expander.c > +++ b/drivers/scsi/libsas/sas_expander.c > @@ -507,14 +507,21 @@ static int sas_dev_present_in_domain(struct > asd_sas_port *port, > int sas_smp_get_phy_events(struct sas_phy *phy) > { > int res; > + u8 *req; > + u8 *resp; > struct sas_rphy *rphy = dev_to_rphy(phy->dev.parent); > struct domain_device *dev = sas_find_dev_by_rphy(rphy); > - u8 *req = alloc_smp_req(RPEL_REQ_SIZE); > - u8 *resp = kzalloc(RPEL_RESP_SIZE, GFP_KERNEL); > > + resp = kzalloc(RPEL_RESP_SIZE, GFP_KERNEL); Actually, this should be alloc_smp_resp(RPEL_RESP_SIZE); > if (!resp) > return -ENOMEM; > > + req = alloc_smp_req(RPEL_REQ_SIZE); > + if (!req) { > + res = -ENOMEM; > + goto out; > + } Just for the sake of being the same as all the rest of the code, the sequence should be req = alloc_smp_req(xxx_REQ_SIZE); if (!req) return -ENOMEM; resp = alloc_smp_resp(xxx_RESP_SIZE); if (!resp) { kfree(req); return -ENOMEM; } (allocate request then response). It looks like disc_resp could use a little love too (it's using the req alloc routines). James - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] let SUSPEND select HOTPLUG_CPU
On Sat, 28 Jul 2007, Adrian Bunk wrote: > > The dependency of SUSPEND_SMP on HOTPLUG_CPU is quite unintuitive, so > what about something like the patch below? Yeah, this looks reasonable. May I suggest another level of indirection, though: > +config SUSPEND_SMP_POSSIBLE > + bool > + depends on (X86 && !X86_VOYAGER) || (PPC64 && (PPC_PSERIES || PPC_PMAC)) > + depends on SMP > + default y How about making this a bit more split up, and do it as # SMP suspend is possible on .. config SUSPEND_SMP_POSSIBLE bool depends on (X86 && !X86_VOYAGER) || (PPC64 && (PPC_PSERIES || PPC_PMAC)) default y # UP suspend is possible on .. config SUSPEND_UP_POSSIBLE bool depends on X86 || PPC64_SWSUSP || FRV || PPC32 default y # Can we suspend? config SUSPEND_POSSIBLE bool depends on (SMP && SUSPEND_SMP_POSSIBLE) || (SUSPEND_UP_POSSIBLE && !SMP) default y and then we have just a config SOFTWARE_SUSPEND bool "Software Suspend (Hibernation)" depends on PM && SWAP depends on SUSPEND_POSSIBLE config SUSPEND_SMP bool depends on SOFTWARE_SUSPEND && SMP select HOTPLUG_CPU default y and now each of the config options looks pretty simple and describe one thing. [ For extra bonus points: the SUSPEND_POSSIBLE thing is still pretty complicated, and it might actually be a better idea to make it a per-arch config option, and just make the x86/arch say config SUSPEND_POSSIBLE bool depends on !(X86_VOYAGER && SMP) default y instead: since SUSPEND_POSSIBLE is always true on x86 regardless of SMP or not, just not on X86_VOYAGER. Then, each architecture can have its own private rules for whether that architecture has SUSPEND_POSSIBLE or not, so on ppc, it might look like config SUSPEND_POSSIBLE bool depends on (PPC64 && (PPC_PSERIES || PPC_PMAC)) || PPC_SWSUSP bool y or something, but the point is, now the complexity is a per-architecture thing, so other architectures simply don't have to care any more! ] And the user only ever sees one single question: the one for "SOFTWARE_SUSPEND". All the others would directly flow either from the architecture choice, or from that. Anybody willing to rewrite it that way? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]
On Friday 27 July 2007 18:08:44 Mike Galbraith wrote: > On Fri, 2007-07-27 at 13:45 -0400, Daniel Hazelton wrote: > > On Friday 27 July 2007 06:25:18 Mike Galbraith wrote: > > > On Fri, 2007-07-27 at 03:00 -0700, Andrew Morton wrote: > > > > So hrm. Are we sure that updatedb is the problem? There are quite a > > > > few heavyweight things which happen in the wee small hours. > > > > > > The balance in _my_ world seems just fine. I don't let any of those > > > system maintenance things run while I'm using the system, and it > > > doesn't bother me if my working set has to be reconstructed after > > > heavy-weight maintenance things are allowed to run. I'm not seeing > > > anything I wouldn't expect to see when running a job the size of > > > updatedb. > > > > > > -Mike > > > > Do you realize you've totally missed the point? > > Did you notice that I didn't make one disparaging remark about the patch > or the concept behind it? Did you notice that I took _my time_ to > test, to actually look at the problem? No, you're too busy running > your mouth to appreciate the efforts of others. If you're done being an ass, take note of the fact that I never even said you were doing that. What I was commenting on was the fact that you (and a lot of the other developers) seem to keep saying "It doesn't happen here, so it doesn't matter!" - ie: If I don't see something happening, it doesn't matter. > > > Do yourself a favor, go dig into the VM source. Read it, understand it > (not terribly easy), _then_ come back and preach to me. I've been trying to do that since the thread started. Note that you snipped where I said (and I'm going to paraphrase myself) "There is another way to fix this, but I don't have the understanding necessary". Now, once more, I'm going to ask: What is so terribly wrong with swap prefetch? Why does it seem that everyone against it says "Its treating a symptom, so it can't go in"? Try coming up with an answer that isn't "I don't see the problem on my $10K system" or similar - try explaining it based on the *technical* merits. Does it cause the processor cache to get thrashed? Does it create locking problems? I stand by my statements, as vitriolic as you and Rene seem to want to get over it. So far in this thread I have not seen one bit of *technical* discussion over the merits, just the bits I've simplified and stated before. > Have a nice day. I am. You being nasty when somebody gets fed up with a line of BS doesn't stop me from having a nice day. Only thing that could make my life any better would be to have the questions I've asked answered, rather than having supposedly intelligent people act like trolls. DRH -- Dialup is like pissing through a pipette. Slow and excruciatingly painful. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel modules compilation
> Symbol: USB [=y] > Prompt: Support for Host-side USB > Defined at drivers/usb/Kconfig:51 > Depends on: USB_SUPPORT && USB_ARCH_HAS_HCD > Location: > -> Device Drivers > -> USB support (USB_SUPPORT [=y]) Could you tell me how you found them, please? > Hint: In menuconfig, type "/" to search. Thank you very much! > Not really. Your distribution could be loading a ton of modules that > you don't really need. 'lsmod' will just show you what is currently > loaded, but that that doesn't necessarily mean that all those modules > are really needed. Ok, how I can know what modules are needed and what not? Only knowing the hardware of my system? Another question please, what the symbol "---" near a kernel configuration entry in menuconfig means? This entry is activated (with * or M) or not? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
UML compile error
Hi, UML does not compile on current git head. $ make defconfig ARCH=um [..] $ make ARCH=um scripts/kconfig/conf -s arch/um/Kconfig net/bluetooth/hidp/Kconfig:4:warning: 'select' used by config symbol 'BT_HIDP' refers to undefined symbol 'HID' drivers/net/wireless/Kconfig:552:warning: 'select' used by config symbol 'RTL8187' refers to undefined symbol 'EEPROM_93CX6' SYMLINK arch/um/include/kern_constants.h CHK arch/um/include/uml-config.h UPD arch/um/include/uml-config.h CC arch/um/sys-i386/user-offsets.s CHK arch/um/include/user_constants.h CHK include/linux/version.h CHK include/linux/utsrelease.h CC arch/um/kernel/asm-offsets.s In file included from include/linux/sched.h:54, from arch/um/include/sysdep/kernel-offsets.h:2, from arch/um/kernel/asm-offsets.c:1: include/linux/jiffies.h:18:5: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:20:7: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:22:7: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:24:7: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:26:7: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:28:7: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:30:7: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:33:3: error: #error You lose. include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: error: division by zero in #if include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: error: division by zero in #if include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: error: division by zero in #if include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: error: division by zero in #if include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: error: division by zero in #if include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: error: division by zero in #if include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: error: division by zero in #if include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: error: division by zero in #if include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: error: division by zero in #if include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: error: division by zero in #if include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: error: division by zero in #if include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: error: division by zero in #if include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: error: division by zero in #if include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: error: division by zero in #if include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: error: division by zero in #if include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: warning: "CONFIG_HZ" is not defined include/linux/jiffies.h:225:31: error: division by zero in #if include/linux/jiffies.h:225:46: warning: "SHIFT_HZ" is not defined In file included from arch/um/include/sysdep/kernel-offsets.h:2, from arch/um/kernel/asm-offsets.c:1: include/linux/sched.h: In function 'dequeue_signal_lock': include/linux/sched.h:1501: error: implicit declaration of function 'local_irq_save' include/linux/sched.h:1503: error: implicit declaration of function 'local_irq_restore' In file included f
Re: [2.6 patch] let SUSPEND select HOTPLUG_CPU
Adrian Bunk wrote: > The dependency of SUSPEND_SMP on HOTPLUG_CPU is quite unintuitive, It's not entirely unintuitive. That option's full name is "Support for suspend on SMP and hot-pluggable CPUs". Only the place where you find the option is unintuitive, as far as its first application is concerned. (It lives in the "Processor type and features" menu which is OK for the 2nd application of this option.) And the variable name of that option is unintuitive because it covers only the 2nd application of the option, I suppose for historical reasons. > +config SUSPEND_SMP_POSSIBLE > + bool > + depends on (X86 && !X86_VOYAGER) || (PPC64 && (PPC_PSERIES || PPC_PMAC)) > + depends on SMP > + default y > + > +config SUSPEND_SMP > + bool > + depends on SUSPEND_SMP_POSSIBLE && SOFTWARE_SUSPEND > + select HOTPLUG_CPU > + default y Yes, that's the price to pay if you want to select something that in turn depends on a number of other things. Wait, doesn't HOTPLUG_CPU also depend on EXPERIMENTAL? -- Stefan Richter -=-=-=== -=== ===-- http://arcgraph.de/sr/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc1-mm1 - seems OK on Dell Latitude D820, except for tpm_tis
On Friday 27 July 2007 07:28:09 am [EMAIL PROTECTED] wrote: > Looks like the problematic code is in tpm_tis.c tpm_tis_init() near here: > > for (i = 3; i < 16 && chip->vendor.irq == 0; i++) { > iowrite8(i, chip->vendor.iobase + > TPM_INT_VECTOR(chip->vendor.locality)); > if (request_irq > (i, tis_int_probe, IRQF_SHARED, > chip->vendor.miscdev.name, chip) != 0) { > dev_info(chip->dev, > "Unable to request irq: %d for > probe\n" > , > i); > continue; > } > > This seems to be misbehaving differently for the two different DEBUG_SHIRQ > cases. > > With DEBUG_SHIRQ=n, it starts at IRQ3, gets to at least 8 (where it complains > it can't request it for probing), and possibly all the way to 15, without ever > actually selecting and assigning an IRQ (to refresh memories, in that range > /proc/interrupts only lists: > > 8: 0 0 IO-APIC-edge rtc > 9: 3 0 IO-APIC-fasteoi acpi > 12: 94 0 IO-APIC-edge i8042 > 14: 148166 0 IO-APIC-edge libata > 15: 94 0 IO-APIC-edge libata > > So there's certainly IRQ's available. No idea why it doesn't choose one. But > since it never chose one, it never gets into the "wait for the IRQ" protected > by 'if (chip->vendor.irq)' at the end of tpm_tis_send. > > With DEBUG_SHIRQ=y, It starts at IRQ3, and assigns it (which seems a good > thing). > Unfortunately, this then hits the timeouts in tpm_tis_send. > > Anybody got an idea what *should* be happening here? I don't know why tpm_tis_init() is messing around trying different IRQs between 3 and 16. That looks suspiciously x86-dependent. Maybe if you don't have PNP (though I doubt TPMs exist on any pre-PNPBIOS machines) the "check-IRQ" loop would be necessary. But you're using the PNP probe, and PNP should just tell you what IRQ the device is configured for (and whether the IRQ can be shared -- see 8250_pnp.c for an example). The BIOS should have configured the TPM IRQ, and if we go and mess with that IRQ setting without going through the PNP interface, e.g., the ACPI _SRS method, we're liable to mess something up. The TPM is often behind a few bridges, and if the bridge has any IRQ routing configuration, only the BIOS knows how to keep that in sync with the TPM IRQ configuration. > Just for the record, I see this in /sys: > > % cat /sys/bus/pnp/devices/00:0e/id > BCM0102 > PNP0c31 What's in /sys/bus/pnp/devices/00:0e/resources? Bjorn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ide: sis5513.c: Add FSC Amilo A1630 PCI subvendor/dev to laptops
On Fri, 27 Jul 2007 22:52:43 +0200 David Lamparter <[EMAIL PROTECTED]> wrote: > [PATCH] ide: sis5513.c: Add FSC Amilo A1630 PCI subvendor/dev to laptops > > Recognise the FSC Amilo A1630's incarnation of a SiS5513 chip as laptop to > get UDMA100 support. > > Signed-off-by: David Lamparter <[EMAIL PROTECTED]> Looks good to me - I've made a matching update to drivers/ata/pata_sis.c - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] ia64: fix a few section mismatch warnings
- mca_data = alloc_bootmem(sizeof(struct ia64_mca_cpu) -* NR_CPUS + KERNEL_STACK_SIZE); + mca_data = mca_bootmem(NR_CPUS + KERNEL_STACK_SIZE); Oops. You moved the multiply by sizeof(struct ia64_mca_cpu) up into the mca_bootmem() function to make it very specific to this use. But mutiply has higher precedence than addition. -Tony - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problems with framebuffer in 2.6.22-git17
On 27/07/07, Antonino A. Daplas <[EMAIL PROTECTED]> wrote: > On Fri, 2007-07-27 at 21:18 +0100, Adrian McMenamin wrote: > > On 27/07/07, Adrian McMenamin <[EMAIL PROTECTED]> wrote: > > > > > With the patch reverted and 24bpp, it oopses before freezing (with two > > > odd looking boot logos on the screen): > > > > > Tested this further and it fails on: > > > > rev = fb_readl(par->mmio_base + 0x04); > > Doubtful if this line is the point of failure, this line is executed > only once, on initialization. par->mmio_base is corrupted in some way during the call to register_framebuffer - still investigating how/why. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[2.6 patch] let SUSPEND select HOTPLUG_CPU
On Thu, Jul 26, 2007 at 01:55:18PM -0700, Linus Torvalds wrote: > > > On Thu, 26 Jul 2007, Rafael J. Wysocki wrote: > > > > My point is we have ACPI dependent on PM, so if you want ACPI, you end > > up with all of the STR stuff built in, which is what you don't like (if I > > understand that correctly). If we have CONFIG_SUSPEND, you'll be able to > > choose ACPI alone. :-) > > Good point. > > Anyway, I think the ACPI problem really is as trivial as the following > three-liner removal fix. If the user doesn't want suspend, ACPI shouldn't > force it on him. > > A nicer fix might be to also make some of the ACPI helper routines depend > on whether they are needed or not (which in turn will depend on whether > suspend support has been compiled into the kernel), but quite frankly, > that's secondary at least for me. > > So if we have a few ACPI routines that will never get called (because we > don't even enable the interfaces that would *cause* them to be called), I > don't think that's a huge problem. It's a beauty wart, but nobody really > cares (and it's even something that we could get the compiler to optimize > away for us if we really cared). > > Linus > > --- > Don't force-enable suspend/hibernate support just for ACPI > > It's a totally independent decision for the user whether he wants > suspend and/or hibernation support, and ACPI shouldn't care. > > Signed-off-by: Linus Torvalds <[EMAIL PROTECTED]> > --- > drivers/acpi/Kconfig |3 --- > 1 files changed, 0 insertions(+), 3 deletions(-) > > diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig > index 251344c..22b401b 100644 > --- a/drivers/acpi/Kconfig > +++ b/drivers/acpi/Kconfig > @@ -11,9 +11,6 @@ menuconfig ACPI > depends on PCI > depends on PM > select PNP > - # for sleep > - select HOTPLUG_CPU if X86 && SMP > - select SUSPEND_SMP if X86 && SMP > default y > ---help--- > Advanced Configuration and Power Interface (ACPI) support for The dependency of SUSPEND_SMP on HOTPLUG_CPU is quite unintuitive, so what about something like the patch below? This should address a main issue behind Len's patch. cu Adrian <-- snip --> An implementation detail of the suspend code that is not intuitive for the user is the HOTPLUG_CPU dependency of SOFTWARE_SUSPEND if SMP. This patch changes SOFTWARE_SUSPEND if SMP to select HOTPLUG_CPU instead of depending on it. Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> --- kernel/power/Kconfig | 20 ++-- 1 file changed, 14 insertions(+), 6 deletions(-) --- a/kernel/power/Kconfig +++ b/kernel/power/Kconfig @@ -72,9 +72,22 @@ config PM_TRACE CAUTION: this option will cause your machine's real-time clock to be set to an invalid time after a resume. +config SUSPEND_SMP_POSSIBLE + bool + depends on (X86 && !X86_VOYAGER) || (PPC64 && (PPC_PSERIES || PPC_PMAC)) + depends on SMP + default y + +config SUSPEND_SMP + bool + depends on SUSPEND_SMP_POSSIBLE && SOFTWARE_SUSPEND + select HOTPLUG_CPU + default y + config SOFTWARE_SUSPEND bool "Software Suspend (Hibernation)" - depends on PM && SWAP && (((X86 || PPC64_SWSUSP) && (!SMP || SUSPEND_SMP)) || ((FRV || PPC32) && !SMP)) + depends on PM && SWAP + depends on ((X86 || PPC64_SWSUSP || FRV || PPC32) && !SMP) || SUSPEND_SMP_POSSIBLE ---help--- Enable the suspend to disk (STD) functionality, which is usually called "hibernation" in user interfaces. STD checkpoints the @@ -132,11 +145,6 @@ config PM_STD_PARTITION suspended image to. It will simply pick the first available swap device. -config SUSPEND_SMP - bool - depends on HOTPLUG_CPU && (X86 || PPC64) && PM - default y - config APM_EMULATION tristate "Advanced Power Management Emulation" depends on PM && SYS_SUPPORTS_APM_EMULATION - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC/RFT 0/5] Input locking patches
On Tue, July 24, 2007 06:45, Dmitry Torokhov wrote: > Hi everyone, > > I finally managed to put together some patches implementing > locking in input core and main input handles. Please look > over them and give them a spin. Since kernel 2.6.21 or so I was annoyed by a warping mouse, and one kernel version later also by "stuck" keys, causing repeated input at the most inconvenient moments (e.g. when opening a program by pressing F1). As it happened irregularly and unpredictable it was hard to debug, and I suspected faulty hardware. My cpu was quite hot, but after removing all the dust it seems all right again. Unfortunately that was about the same time I upgraded to 2.6.23-rc1, so all I can say is that the stuck key problem seems to be gone, though not sure thanks to what, but that neither the cleaning nor the upgrade fixed the warping mouse problem. I'm running with these locking patches for two days now and the mouse doesn't warp any more (it can also have fixed the stuck key problem, not sure). Normally it would warp several times a day, and that didn't happen yet, so I'm tempted to praise your patches. Sorry for the babbling, just wanted to say that I've tested these patches and that they seem to fix real problems. Thanks, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Edgeport UPS Monitoring Problems
On Fri, 27 Jul 2007 13:37:08 -0700 Nick Pasich <[EMAIL PROTECTED]> wrote: > > Greg/Peter/Al, added linux-usb-devel. > I've been using the edgeport 4 port USB to Serial Converter > to monitor APC Smart UPS's via apcupsd for quite awhile on > various Linux boxes. > > I just upgraded to Kernel Version 2.6.22.1 from 2.6.20.6 on a > couple of systems and both the edgeports stopped communicating. > > I tried applying various patches, "PATCH 026/149" and "PATCH 082/149" > and one by Alan Cox.. but they didn't fix the problem. > > I copied the 2.6.20.6 edgeport module sources to the new > 2.6.22.1 tree and everything works again. > > linux/drivers/usb/serial/io_edgeport.c > linux/drivers/usb/serial/io_edgeport.h > linux/drivers/usb/serial/io_edgeport.mod.c > linux/drivers/usb/serial/io_tables.h > > > I thought you guys ought to be aware of this > Straightforward regression, most serious. Thanks for reporting it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Problems with reading DVD using 2.6.21.5
Hello, today I've tried to install Slackware 12.0 As the installer just "skipped" some install steps, I tried to find the error. The problem seems to be unreadable parts on the DVD: http://pastebin.com/f381e8a88 But the DVD is OK. I've checked the MD5sum directly from disc on the same system using the same DVD drive. dmesg says: http://pastebin.com/f63c5c389 The kernel, used on the Slackware setup disk, uses SMP, but my hardware doesn't support this (get error on dmesg). May this (SMP kernel on non-SMP system) cause such bugs? Is this a known bug? How could code, which breaks DVD access, get into stable 2.6.21.5? Thanks very much in advance CU Manuel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: swap-prefetch: A smart way to make good use of idle resources (was: updatedb)
On Sat, July 28, 2007 00:06, Arjan van de Ven wrote: > On Fri, 2007-07-27 at 23:51 +0200, Indan Zupanci >> > also, they take up seek time (5 to 10 msec), so if you were to read >> > something else at the time you get additional latency. >> >> If there's other disk activity swap prefetch shouldn't do much, so this isn't >> really true. > > how do you know there will be other activity? You start the IO and that > basically blacks out the disk for 5 to 10 ms. If the "real" IO gets > submitted in that time you add latency. You cannot predict that IO > happening or not happening. Ah, in that way. Yes, you right about that (though NCQ might help then?), but that's true for all disk activity. Though I think swap prefetch didn't want to run when there was CPU activity, so that would reduce the chance that new IO is submitted right at that moment. I think in practice this isn't worth worrying about, the real issue is the extra disk activity in the first place. Greetings, Indan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] scheduler: improve SMP fairness in CFS
On Fri, Jul 27, 2007 at 12:03:28PM -0700, Tong Li wrote: > Thanks for the interest. Attached is a design doc I wrote several months > ago (with small modifications). It talks about the two pieces of my design: > group scheduling and dwrr. The description was based on the original O(1) > scheduler, but as my CFS patch showed, the algorithm is applicable to other > underlying schedulers as well. It's interesting that I started working on > this in January for the purpose of eventually writing a paper about it. So > I knew reasonably well the related research work but was totally unaware > that people in the Linux community were also working on similar things. > This is good. If you are interested, I'd like to help with the algorithms > and theory side of the things. Tong, This is sufficient as an overview of the algorithm but not detailed enough for it to be a discussable design doc I believe. You should ask Chris to see what he means by this. Some examples of your rebalancing scheme and how your invariant applies across processor rounds would be helpful for me and possibly others as well. bill - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/