Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On Tue, 2007-03-13 at 16:53 +1100, Con Kolivas wrote: > On Tuesday 13 March 2007 16:10, Mike Galbraith wrote: > > It's not "offensive" to me, it is a behavioral regression. The > > situation as we speak is that you can run cpu intensive tasks while > > watching eye-candy. With RSDL, you can't, you feel the non-interactive > > load instantly. Doesn't the fact that you're asking me to lower my > > expectations tell you that I just might have a point? > > Yet looking at the mainline scheduler code, nice 5 tasks are also supposed to > get 75% cpu compared to nice 0 tasks, however I cannot seem to get 75% cpu > with a fully cpu bound task in the presence of an interactive task. (One more comment before I go. You can then have the last word this time, promise :) Because the interactivity logic, which was put there to do precisely this, is doing it's job? > To me > that means mainline is not living up to my expectations. What you're saying > is your expectations are based on a false cpu expectation from nice 5. You > can spin it both ways. Talk about spin, you turn an example of the current scheduler working properly into a negative attribute, and attempt to discredit me with it. The floor is yours. No reply will be forthcoming. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 0/4] Arch independent quicklists V2
> On Tue, 13 Mar 2007 00:13:25 -0700 (PDT) Christoph Lameter <[EMAIL > PROTECTED]> wrote: > Page table pages have the characteristics that they are typically zero > or in a known state when they are freed. Well if they're zero then perhaps they should be released to the page allocator to satisfy the next __GFP_ZERO request. If that request is for a pagetable page, we break even (except we get to remove special-case code). If that __GFP_ZERO allocation was or some application other than for a pagetable, we win. iow, can we just nuke 'em? (Will require some work in the page allocator) (That work will open the path to using the idle thread to prezero pages) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
LSM Stacking
Hi All, Within the security folder in the kernel tree, the 2.6.20 linux kernel distribution is shipped with a file root_plug.c (written by Greg Kroah-Hartman), which is a classic introduction to Linux Security Modules (LSM). The folder also contains the folder of SELinux. My question is that whether root_plug.c security module is stacked with the SELinux security module or not. If root_plug.c is stacked, where i can find the code which handles the stacking of SELinux and root_plug.c within the kernel. Further, any pointer to stacking mechansims in Linux 2.6.* kernel will be highly appreciated. Thanking you in advance, MA ___ To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sys_write() racy for multi-threaded append?
From: "Michael K. Edwards" <[EMAIL PROTECTED]> Date: Mon, 12 Mar 2007 23:25:48 -0800 > Quality means the devices you ship now keep working in the field, and > the probable cost of later rework if the requirements change does not > exceed the opportunity cost of over-engineering up front. Economy > gets a look-in too, and says that it's pointless to delay shipment and > bloat the application coding for cases that can't happen. If POSIX > says that any and all writes (except small pipe/FIFO writes, whatever) > can return a short byte count -- but you know damn well you're writing > to a device driver that never, ever writes short, and if it did you'd > miss a timing budget recovering from it anyway -- to hell with POSIX. You're not even safe over standard output, simply run the program over ssh and you suddenly have socket semantics to deal with. In the early days the fun game to play was to run programs over rsh to see in what amusing way they would explode. ssh has replaced rsh in this game, but the bugs have largely stayed the same. Even early versions of tar used to explode on TCP half-closes and whatnot. In short, if you don't handle short writes, you're writing a program for something other than unix. We're not changing write() to interlock with other parallel callers or messing with the f_pos semantics in such cases, that's stupid, please cope, kthx. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sys_write() racy for multi-threaded append?
On 3/12/07, Alan Cox <[EMAIL PROTECTED]> wrote: > Writing to a file from multiple processes is not usually the problem. > Writing to a common "struct file" from multiple threads is. Not normally because POSIX sensibly invented pread/pwrite. Forgot preadv/pwritev but they did the basics and end of problem pread/pwrite address a miniscule fraction of lseek+read(v)/write(v) use cases -- a fraction that someone cared about strongly enough to get into X/Open CAE Spec Issue 5 Version 2 (1997), from which it propagated into UNIX98 and thence into POSIX.2 2001. The fact that no one has bothered to implement preadv/pwritev in the decade since pread/pwrite entered the Single UNIX standard reflects the rarity with which they appear in general code. Life is too short to spend it rewriting application code that uses readv/writev systematically, especially when that code is going to ship inside a widget whose kernel you control. > So what? My products are shipping _now_. That doesn't inspire confidence. Oh, please. Like _your_ employer is the poster child for code quality. The cheap shot is also irrelevant to the point that I was making, which is that sometimes portability simply doesn't matter and the right thing to do is to firm up the semantics of the filesystem primitives from underneath. > even funny. If POSIX mandates stupid shit, and application > programmers don't read that part of the manual anyway (and don't code > on that assumption in practice), to hell with POSIX. On many file Thats funny, you were talking about quality a moment ago. Quality means the devices you ship now keep working in the field, and the probable cost of later rework if the requirements change does not exceed the opportunity cost of over-engineering up front. Economy gets a look-in too, and says that it's pointless to delay shipment and bloat the application coding for cases that can't happen. If POSIX says that any and all writes (except small pipe/FIFO writes, whatever) can return a short byte count -- but you know damn well you're writing to a device driver that never, ever writes short, and if it did you'd miss a timing budget recovering from it anyway -- to hell with POSIX. And if you want to build a test jig for this code that uses pipes or dummy files in place of the device driver, that test jig should never, ever write short either. > descriptors, short writes simply can't happen -- and code that There is almost no descriptor this is true for. Any file I/O can and will end up short on disk full or resource limit exceeded or quota exceeded or NFS server exploded or ... Not on a properly engineered widget, it won't. And if it does, and the application isn't coded to cope in some way totally different from an infinite retry loop, then you might as well signal the exception condition using whatever mechanism is appropriate to the API (-EWHATEVER, SIGCRISIS, or block until some other process makes room). And in any case files on disk are the least interesting kind of file descriptor in an embedded scenario -- devices and pipes and pollfds and netlink sockets are far more frequent read/write targets. And on the device side about the only thing with the vaguest guarantees is pipe(). Guaranteed by the standard, sure. Guaranteed by the implementation, as long as you write in the size blocks that the device is expecting? Lots of devices -- ALSA's OSS PCM emulation, most AF_LOCAL and AF_NETLINK sockets, almost any "character" device with a record-structured format. A short write to any of these almost certainly means the framing is screwed and you need to close and reopen the device. Not all of these are exclusively O_APPEND situations, and there's no reason on earth not to thread-safe the f_pos handling so that an application and filesystem/driver can agree on useful lseek() semantics. > purports to handle short writes but has never been exercised is > arguably worse than code that simply bombs on short write. So if I > can't shim in an induce-short-writes-randomly-on-purpose mechanism > during development, I don't want short writes in production, period. Easy enough to do and gcov plus dejagnu or similar tools will let you coverage analyse the resulting test set and replay it. Here we agree. Except that I've rarely seen embedded application code that wouldn't explode in my face if I tried it. Databases yes, and the better class of mail and web servers, and relatively mature scripting languages and bytecode interpreters; but the vast majority of working programmers in these latter days do not exercise this level of discipline. > Sure -- until the one code path in a hundred that handles the "short > write" case incorrectly gets traversed in production, after having > gone untested in a development environment that used a different > filesystem that never happened to trigger it. Competent QA and testing people test all the returns in the manual as well as all the returns they can find in the cod
Re: RSDL-mm 0.28
David Schwartz wrote: There's a substantial performance hit for not yield, so we probably want to investigate alternate semantics for it. It seems reasonable for apps to say "let me not hog the CPU" without completely expiring them. Imagine you're in the front of the line (aka queue) and you spend a moment fumbling for your wallet. The polite thing to do is to let the next guy in front. But with the current sched_yield, you go all the way to the back of the line. Well... are you advocating we change sched_yield semantics to a gentler form? This is a cinch to implement but I know how Ingo feels about this. It will only encourage more lax coding using sched_yield instead of proper blocking (see huge arguments with the ldap people on this one who insist it's impossible not to use yield). The basic point of sched_yield is to allow every other process at the same static priority level a chance to use the CPU before you get it back. It is generally an error to use sched_yield to be nice. It's nice to get your work done when the scheduler gives you the CPU, that's why it gave it to you. It is proper to use sched_yield as an optimization when it more efficient to allow another process/thread to run than you, for example, when you encounter a task you cannot do efficiently at that time because another thread holds a lock. It's also useful prior to doing something that can most efficiently be done without interruption. So a thread that returns from 'sched_yield' should ideally be given a full timeslice if possible. This may not be sensible if the 'sched_yield' didn't actuall yield, but then again, if nothing else wants to run, why not give the only task that does a full slice? In no case is much of anything guaranteed, of course. (What can you do if there's no other process to yield to?) Note that processes that call sched_yield should be rewarded for doing so just as process that block on I/O are, assuming they do in fact wind up giving up the CPU when they would otherwise have had it. DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 2/7] RSS controller core
Herbert Poetzl wrote: > On Mon, Mar 12, 2007 at 12:02:01PM +0300, Pavel Emelianov wrote: > Maybe you have some ideas how we can decide on this? We need to work out what the requirements are before we can settle on an implementation. >>> Linux-VServer (and probably OpenVZ): >>> >>> - shared mappings of 'shared' files (binaries >>>and libraries) to allow for reduced memory >>>footprint when N identical guests are running >> This is done in current patches. > > nice, but the question was about _requirements_ > (so your requirements are?) > >>> - virtual 'physical' limit should not cause >>>swap out when there are still pages left on >>>the host system (but pages of over limit guests >>>can be preferred for swapping) >> So what to do when virtual physical limit is hit? >> OOM-kill current task? > > when the RSS limit is hit, but there _are_ enough > pages left on the physical system, there is no > good reason to swap out the page at all > > - there is no benefit in doing so (performance >wise, that is) > > - it actually hurts performance, and could >become a separate source for DoS > > what should happen instead (in an ideal world :) > is that the page is considered swapped out for > the guest (add guest penality for swapout), and Is the page stays mapped for the container or not? If yes then what's the use of limits? Container mapped pages more than the limit is but all the pages are still in memory. Sounds weird. > when the page would be swapped in again, the guest > takes a penalty (for the 'virtual' page in) and > the page is returned to the guest, possibly kicking > out (again virtually) a different page > >>> - accounting and limits have to be consistent >>>and should roughly represent the actual used >>>memory/swap (modulo optimizations, I can go >>>into detail here, if necessary) >> This is true for current implementation for >> booth - this patchset ang OpenVZ beancounters. >> >> If you sum up the physpages values for all containers >> you'll get the exact number of RAM pages used. > > hmm, including or excluding the host pages? Depends on whether you account host pages or not. >>> - OOM handling on a per guest basis, i.e. some >>>out of memory condition in guest A must not >>>affect guest B >> This is done in current patches. > >> Herbert, did you look at the patches before >> sending this mail or do you just want to >> 'take part' in conversation w/o understanding >> of hat is going on? > > again, the question was about requirements, not > your patches, and yes, I had a look at them _and_ > the OpenVZ implementations ... > > best, > Herbert > > PS: hat is going on? :) > >>> HTC, >>> Herbert >>> Sigh. Who is running this show? Anyone? You can actually do a form of overcommittment by allowing multiple containers to share one or more of the zones. Whether that is sufficient or suitable I don't know. That depends on the requirements, and we haven't even discussed those, let alone agreed to them. ___ Containers mailing list [EMAIL PROTECTED] https://lists.osdl.org/mailman/listinfo/containers > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[QUICKLIST 3/4] Quicklist support for x86_64
Conver x86_64 to using quicklists This adds caching of pgds and puds, pmds, pte. That way we can avoid costly zeroing and initialization of special mappings in the pgd. A second quicklist is useful to separate out PGD handling. We can carry the initialized pgds over to the next process needing them. Also clean up the pgd_list handling to use regular list macros. There is no need anymore to avoid the lru field. Move the add/removal of the pgds to the pgdlist into the constructor / destructor. That way the implementation is congruent with i386. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- arch/x86_64/Kconfig |4 ++ arch/x86_64/kernel/process.c |1 arch/x86_64/kernel/smp.c |2 - arch/x86_64/mm/fault.c |5 +- include/asm-x86_64/pgalloc.h | 76 +-- include/asm-x86_64/pgtable.h |3 - mm/Kconfig |5 ++ 7 files changed, 52 insertions(+), 44 deletions(-) Index: linux-2.6.21-rc3-mm2/arch/x86_64/Kconfig === --- linux-2.6.21-rc3-mm2.orig/arch/x86_64/Kconfig 2007-03-12 22:49:20.0 -0700 +++ linux-2.6.21-rc3-mm2/arch/x86_64/Kconfig2007-03-12 22:53:28.0 -0700 @@ -56,6 +56,10 @@ config ZONE_DMA bool default y +config NR_QUICK + int + default 2 + config ISA bool Index: linux-2.6.21-rc3-mm2/include/asm-x86_64/pgalloc.h === --- linux-2.6.21-rc3-mm2.orig/include/asm-x86_64/pgalloc.h 2007-03-12 22:49:20.0 -0700 +++ linux-2.6.21-rc3-mm2/include/asm-x86_64/pgalloc.h 2007-03-12 22:53:28.0 -0700 @@ -4,6 +4,10 @@ #include #include #include +#include + +#define QUICK_PGD 0/* We preserve special mappings over free */ +#define QUICK_PT 1 /* Other page table pages that are zero on free */ #define pmd_populate_kernel(mm, pmd, pte) \ set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte))) @@ -20,86 +24,77 @@ static inline void pmd_populate(struct m static inline void pmd_free(pmd_t *pmd) { BUG_ON((unsigned long)pmd & (PAGE_SIZE-1)); - free_page((unsigned long)pmd); + quicklist_free(QUICK_PT, NULL, pmd); } static inline pmd_t *pmd_alloc_one (struct mm_struct *mm, unsigned long addr) { - return (pmd_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT); + return (pmd_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL); } static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr) { - return (pud_t *)get_zeroed_page(GFP_KERNEL|__GFP_REPEAT); + return (pud_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL|__GFP_REPEAT, NULL); } static inline void pud_free (pud_t *pud) { BUG_ON((unsigned long)pud & (PAGE_SIZE-1)); - free_page((unsigned long)pud); + quicklist_free(QUICK_PT, NULL, pud); } -static inline void pgd_list_add(pgd_t *pgd) +static inline void pgd_ctor(void *x) { + unsigned boundary; + pgd_t *pgd = x; struct page *page = virt_to_page(pgd); + /* +* Copy kernel pointers in from init. +*/ + boundary = pgd_index(__PAGE_OFFSET); + memcpy(pgd + boundary, + init_level4_pgt + boundary, + (PTRS_PER_PGD - boundary) * sizeof(pgd_t)); + spin_lock(&pgd_lock); - page->index = (pgoff_t)pgd_list; - if (pgd_list) - pgd_list->private = (unsigned long)&page->index; - pgd_list = page; - page->private = (unsigned long)&pgd_list; + list_add(&page->lru, &pgd_list); spin_unlock(&pgd_lock); } -static inline void pgd_list_del(pgd_t *pgd) +static inline void pgd_dtor(void *x) { - struct page *next, **pprev, *page = virt_to_page(pgd); + pgd_t *pgd = x; + struct page *page = virt_to_page(pgd); spin_lock(&pgd_lock); - next = (struct page *)page->index; - pprev = (struct page **)page->private; - *pprev = next; - if (next) - next->private = (unsigned long)pprev; + list_del(&page->lru); spin_unlock(&pgd_lock); } + static inline pgd_t *pgd_alloc(struct mm_struct *mm) { - unsigned boundary; - pgd_t *pgd = (pgd_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT); - if (!pgd) - return NULL; - pgd_list_add(pgd); - /* -* Copy kernel pointers in from init. -* Could keep a freelist or slab cache of those because the kernel -* part never changes. -*/ - boundary = pgd_index(__PAGE_OFFSET); - memset(pgd, 0, boundary * sizeof(pgd_t)); - memcpy(pgd + boundary, - init_level4_pgt + boundary, - (PTRS_PER_PGD - boundary) * sizeof(pgd_t)); + pgd_t *pgd = (pgd_t *)quicklist_alloc(QUICK_PGD, +GFP_KERNEL|__GFP_REPEAT, pgd_ctor); + return pgd; } static in
[QUICKLIST 0/4] Arch independent quicklists V2
V1->V2 - Add sparch64 patch - Single i386 and x86_64 patch - Update attribution - Update justification - Update approvals - Earlier discussion of V1 was at http://marc.info/?l=linux-kernel&m=117357922219342&w=2 This patchset introduces an arch independent framework to handle lists of recently used page table pages. It is necessary for x86_64 and i386 to avoid the special casing of SLUB because these two platforms use fields in the page_struct (page->index and page->private) that SLUB needs (and in fact SLAB also needs page-private if performing debugging!). There is also the tendency of arches to use page flags to mark page table pages. The slab also uses page flags. Separating page table page allocation into quicklists avoids the danger of conflicts and frees up page flags for SLUB and for the arch code. Page table pages have the characteristics that they are typically zero or in a known state when they are freed. This is usually the exactly same state as needed after allocation. So it makes sense to build a list of freed page table pages and then consume the pages already in use first. Those pages have already been initialized correctly (thus no need to zero them) and are likely already cached in such a way that the MMU can use them most effectively. Page table pages are used in a sparse way so zeroing them on allocation is not too useful. Such an implementation already exits for ia64. Howver, that implementation did not support constructors and destructors as needed by i386 / x86_64. It also only supported a single quicklist. The implementation here has constructor and destructor support as well as the ability for an arch to specify how many quicklists are needed. Quicklists are defined by an arch defining the necessary number of quicklists in arch//Kconfig. F.e. i386 needs two and thus has config NR_QUICK int default 2 If an arch has requested quicklist support then pages can be allocated from the quicklist (or from the page allocator if the quicklist is empty) via: quicklist_alloc(, , ) Page table pages can be freed using: quicklist_free(, , ) Pages must have a definite state after allocation and before they are freed. If no constructor is specified then pages will be zeroed on allocation and must be zeroed before they are freed. If a constructor is used then the constructor will establish a definite page state. F.e. the i386 and x86_64 pgd constructors establish certain mappings. Constructors and destructors can also be used to track the pages. i386 and x86_64 use a list of pgds in order to be able to dynamically update standard mappings. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[QUICKLIST 1/4] Generic quicklist implementation
Abstract quicklist from the OA64 implementation Extract the quicklist implementation for IA64, clean it up and generalize it to allow multiple quicklists and support for constructors and destructors.. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- arch/ia64/Kconfig |4 ++ arch/ia64/mm/contig.c |2 - arch/ia64/mm/discontig.c |2 - arch/ia64/mm/init.c| 51 --- include/asm-ia64/pgalloc.h | 82 - include/linux/quicklist.h | 81 mm/Kconfig |5 ++ mm/Makefile|2 + mm/quicklist.c | 81 9 files changed, 191 insertions(+), 119 deletions(-) Index: linux-2.6.21-rc3-mm2/arch/ia64/mm/init.c === --- linux-2.6.21-rc3-mm2.orig/arch/ia64/mm/init.c 2007-03-12 22:49:21.0 -0700 +++ linux-2.6.21-rc3-mm2/arch/ia64/mm/init.c2007-03-12 22:49:23.0 -0700 @@ -39,9 +39,6 @@ DEFINE_PER_CPU(struct mmu_gather, mmu_gathers); -DEFINE_PER_CPU(unsigned long *, __pgtable_quicklist); -DEFINE_PER_CPU(long, __pgtable_quicklist_size); - extern void ia64_tlb_init (void); unsigned long MAX_DMA_ADDRESS = PAGE_OFFSET + 0x1UL; @@ -56,54 +53,6 @@ EXPORT_SYMBOL(vmem_map); struct page *zero_page_memmap_ptr; /* map entry for zero page */ EXPORT_SYMBOL(zero_page_memmap_ptr); -#define MIN_PGT_PAGES 25UL -#define MAX_PGT_FREES_PER_PASS 16L -#define PGT_FRACTION_OF_NODE_MEM 16 - -static inline long -max_pgt_pages(void) -{ - u64 node_free_pages, max_pgt_pages; - -#ifndefCONFIG_NUMA - node_free_pages = nr_free_pages(); -#else - node_free_pages = node_page_state(numa_node_id(), NR_FREE_PAGES); -#endif - max_pgt_pages = node_free_pages / PGT_FRACTION_OF_NODE_MEM; - max_pgt_pages = max(max_pgt_pages, MIN_PGT_PAGES); - return max_pgt_pages; -} - -static inline long -min_pages_to_free(void) -{ - long pages_to_free; - - pages_to_free = pgtable_quicklist_size - max_pgt_pages(); - pages_to_free = min(pages_to_free, MAX_PGT_FREES_PER_PASS); - return pages_to_free; -} - -void -check_pgt_cache(void) -{ - long pages_to_free; - - if (unlikely(pgtable_quicklist_size <= MIN_PGT_PAGES)) - return; - - preempt_disable(); - while (unlikely((pages_to_free = min_pages_to_free()) > 0)) { - while (pages_to_free--) { - free_page((unsigned long)pgtable_quicklist_alloc()); - } - preempt_enable(); - preempt_disable(); - } - preempt_enable(); -} - void lazy_mmu_prot_update (pte_t pte) { Index: linux-2.6.21-rc3-mm2/include/asm-ia64/pgalloc.h === --- linux-2.6.21-rc3-mm2.orig/include/asm-ia64/pgalloc.h2007-03-12 22:49:21.0 -0700 +++ linux-2.6.21-rc3-mm2/include/asm-ia64/pgalloc.h 2007-03-12 22:49:23.0 -0700 @@ -18,71 +18,18 @@ #include #include #include +#include #include -DECLARE_PER_CPU(unsigned long *, __pgtable_quicklist); -#define pgtable_quicklist __ia64_per_cpu_var(__pgtable_quicklist) -DECLARE_PER_CPU(long, __pgtable_quicklist_size); -#define pgtable_quicklist_size __ia64_per_cpu_var(__pgtable_quicklist_size) - -static inline long pgtable_quicklist_total_size(void) -{ - long ql_size = 0; - int cpuid; - - for_each_online_cpu(cpuid) { - ql_size += per_cpu(__pgtable_quicklist_size, cpuid); - } - return ql_size; -} - -static inline void *pgtable_quicklist_alloc(void) -{ - unsigned long *ret = NULL; - - preempt_disable(); - - ret = pgtable_quicklist; - if (likely(ret != NULL)) { - pgtable_quicklist = (unsigned long *)(*ret); - ret[0] = 0; - --pgtable_quicklist_size; - preempt_enable(); - } else { - preempt_enable(); - ret = (unsigned long *)__get_free_page(GFP_KERNEL | __GFP_ZERO); - } - - return ret; -} - -static inline void pgtable_quicklist_free(void *pgtable_entry) -{ -#ifdef CONFIG_NUMA - int nid = page_to_nid(virt_to_page(pgtable_entry)); - - if (unlikely(nid != numa_node_id())) { - free_page((unsigned long)pgtable_entry); - return; - } -#endif - - preempt_disable(); - *(unsigned long *)pgtable_entry = (unsigned long)pgtable_quicklist; - pgtable_quicklist = (unsigned long *)pgtable_entry; - ++pgtable_quicklist_size; - preempt_enable(); -} - static inline pgd_t *pgd_alloc(struct mm_struct *mm) { - return pgtable_quicklist_alloc(); + return quicklist_alloc(0, GFP_KERNEL, NULL); } static inline void pgd_free(pgd_t
[QUICKLIST 4/4] Quicklist support for sparc64
From: David Miller <[EMAIL PROTECTED]> [QUICKLIST]: Add sparc64 quicklist support. I ported this to sparc64 as per the patch below, tested on UP SunBlade1500 and 24 cpu Niagara T1000. Signed-off-by: David S. Miller <[EMAIL PROTECTED]> --- arch/sparc64/Kconfig |4 arch/sparc64/mm/init.c| 24 arch/sparc64/mm/tsb.c |2 +- include/asm-sparc64/pgalloc.h | 26 ++ 4 files changed, 19 insertions(+), 37 deletions(-) Index: linux-2.6.21-rc3-mm2/arch/sparc64/Kconfig === --- linux-2.6.21-rc3-mm2.orig/arch/sparc64/Kconfig 2007-03-12 22:49:19.0 -0700 +++ linux-2.6.21-rc3-mm2/arch/sparc64/Kconfig 2007-03-12 22:53:30.0 -0700 @@ -26,6 +26,10 @@ config MMU bool default y +config NR_QUICK + int + default 1 + config STACKTRACE_SUPPORT bool default y Index: linux-2.6.21-rc3-mm2/arch/sparc64/mm/init.c === --- linux-2.6.21-rc3-mm2.orig/arch/sparc64/mm/init.c2007-03-12 22:49:19.0 -0700 +++ linux-2.6.21-rc3-mm2/arch/sparc64/mm/init.c 2007-03-12 22:53:30.0 -0700 @@ -176,30 +176,6 @@ unsigned long sparc64_kern_sec_context _ int bigkernel = 0; -struct kmem_cache *pgtable_cache __read_mostly; - -static void zero_ctor(void *addr, struct kmem_cache *cache, unsigned long flags) -{ - clear_page(addr); -} - -extern void tsb_cache_init(void); - -void pgtable_cache_init(void) -{ - pgtable_cache = kmem_cache_create("pgtable_cache", - PAGE_SIZE, PAGE_SIZE, - SLAB_HWCACHE_ALIGN | - SLAB_MUST_HWCACHE_ALIGN, - zero_ctor, - NULL); - if (!pgtable_cache) { - prom_printf("Could not create pgtable_cache\n"); - prom_halt(); - } - tsb_cache_init(); -} - #ifdef CONFIG_DEBUG_DCFLUSH atomic_t dcpage_flushes = ATOMIC_INIT(0); #ifdef CONFIG_SMP Index: linux-2.6.21-rc3-mm2/arch/sparc64/mm/tsb.c === --- linux-2.6.21-rc3-mm2.orig/arch/sparc64/mm/tsb.c 2007-03-12 22:49:19.0 -0700 +++ linux-2.6.21-rc3-mm2/arch/sparc64/mm/tsb.c 2007-03-12 22:53:30.0 -0700 @@ -252,7 +252,7 @@ static const char *tsb_cache_names[8] = "tsb_1MB", }; -void __init tsb_cache_init(void) +void __init pgtable_cache_init(void) { unsigned long i; Index: linux-2.6.21-rc3-mm2/include/asm-sparc64/pgalloc.h === --- linux-2.6.21-rc3-mm2.orig/include/asm-sparc64/pgalloc.h 2007-03-12 22:49:19.0 -0700 +++ linux-2.6.21-rc3-mm2/include/asm-sparc64/pgalloc.h 2007-03-12 22:53:30.0 -0700 @@ -6,6 +6,7 @@ #include #include #include +#include #include #include @@ -13,52 +14,50 @@ #include /* Page table allocation/freeing. */ -extern struct kmem_cache *pgtable_cache; static inline pgd_t *pgd_alloc(struct mm_struct *mm) { - return kmem_cache_alloc(pgtable_cache, GFP_KERNEL); + return quicklist_alloc(0, GFP_KERNEL, NULL); } static inline void pgd_free(pgd_t *pgd) { - kmem_cache_free(pgtable_cache, pgd); + quicklist_free(0, NULL, pgd); } #define pud_populate(MM, PUD, PMD) pud_set(PUD, PMD) static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr) { - return kmem_cache_alloc(pgtable_cache, - GFP_KERNEL|__GFP_REPEAT); + return quicklist_alloc(0, GFP_KERNEL, NULL); } static inline void pmd_free(pmd_t *pmd) { - kmem_cache_free(pgtable_cache, pmd); + quicklist_free(0, NULL, pmd); } static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address) { - return kmem_cache_alloc(pgtable_cache, - GFP_KERNEL|__GFP_REPEAT); + return quicklist_alloc(0, GFP_KERNEL, NULL); } static inline struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address) { - return virt_to_page(pte_alloc_one_kernel(mm, address)); + void *pg = quicklist_alloc(0, GFP_KERNEL, NULL); + return pg ? virt_to_page(pg) : NULL; } static inline void pte_free_kernel(pte_t *pte) { - kmem_cache_free(pgtable_cache, pte); + quicklist_free(0, NULL, pte); } static inline void pte_free(struct page *ptepage) { - pte_free_kernel(page_address(ptepage)); + quicklist_free(0, NULL, page_address(ptepage)); } @@ -66,6 +65,9 @@ static inline void pte_free(struct page #define pmd_populate(MM,PMD,PTE_PAGE) \ pmd_populate_
[QUICKLIST 2/4] Quicklist support for i386
i386: Convert to quicklists Implement the i386 management of pgd and pmds using quicklists. The i386 management of page table pages currently uses page sized slabs. The page state is therefore mainly determined by the slab code. However, i386 also uses its own fields in the page struct to mark special pages and to build a list of pgds using the ->private and ->index field (yuck!). This has been finely tuned to work right with SLAB but SLUB needs more control over the page struct. Currently the only way for SLUB to support these slabs is through special casing PAGE_SIZE slabs. If we use quicklists instead then we can avoid the mess, and also the overhead of manipulating page sized objects through slab. It also allows us to use standard list manipulation macros for the pgd list using page->lru thereby simplifying the code. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- arch/i386/Kconfig |4 ++ arch/i386/kernel/process.c |1 arch/i386/kernel/smp.c |2 - arch/i386/mm/fault.c |5 +-- arch/i386/mm/init.c| 25 - arch/i386/mm/pageattr.c|2 - arch/i386/mm/pgtable.c | 63 + include/asm-i386/pgalloc.h |2 - include/asm-i386/pgtable.h | 13 +++-- 9 files changed, 39 insertions(+), 78 deletions(-) Index: linux-2.6.21-rc3-mm2/arch/i386/mm/init.c === --- linux-2.6.21-rc3-mm2.orig/arch/i386/mm/init.c 2007-03-12 22:49:20.0 -0700 +++ linux-2.6.21-rc3-mm2/arch/i386/mm/init.c2007-03-12 22:53:27.0 -0700 @@ -695,31 +695,6 @@ int remove_memory(u64 start, u64 size) EXPORT_SYMBOL_GPL(remove_memory); #endif -struct kmem_cache *pgd_cache; -struct kmem_cache *pmd_cache; - -void __init pgtable_cache_init(void) -{ - if (PTRS_PER_PMD > 1) { - pmd_cache = kmem_cache_create("pmd", - PTRS_PER_PMD*sizeof(pmd_t), - PTRS_PER_PMD*sizeof(pmd_t), - 0, - pmd_ctor, - NULL); - if (!pmd_cache) - panic("pgtable_cache_init(): cannot create pmd cache"); - } - pgd_cache = kmem_cache_create("pgd", - PTRS_PER_PGD*sizeof(pgd_t), - PTRS_PER_PGD*sizeof(pgd_t), - 0, - pgd_ctor, - PTRS_PER_PMD == 1 ? pgd_dtor : NULL); - if (!pgd_cache) - panic("pgtable_cache_init(): Cannot create pgd cache"); -} - /* * This function cannot be __init, since exceptions don't work in that * section. Put this after the callers, so that it cannot be inlined. Index: linux-2.6.21-rc3-mm2/arch/i386/mm/pgtable.c === --- linux-2.6.21-rc3-mm2.orig/arch/i386/mm/pgtable.c2007-03-12 22:49:20.0 -0700 +++ linux-2.6.21-rc3-mm2/arch/i386/mm/pgtable.c 2007-03-12 22:53:27.0 -0700 @@ -13,6 +13,7 @@ #include #include #include +#include #include #include @@ -181,9 +182,12 @@ void reserve_top_address(unsigned long r #endif } +#define QUICK_PGD 0 +#define QUICK_PT 1 + pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address) { - return (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO); + return (pte_t *)quicklist_alloc(QUICK_PT, GFP_KERNEL, NULL); } struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address) @@ -198,11 +202,6 @@ struct page *pte_alloc_one(struct mm_str return pte; } -void pmd_ctor(void *pmd, struct kmem_cache *cache, unsigned long flags) -{ - memset(pmd, 0, PTRS_PER_PMD*sizeof(pmd_t)); -} - /* * List of all pgd's needed for non-PAE so it can invalidate entries * in both cached and uncached pgd's; not needed for PAE since the @@ -211,36 +210,15 @@ void pmd_ctor(void *pmd, struct kmem_cac * against pageattr.c; it is the unique case in which a valid change * of kernel pagetables can't be lazily synchronized by vmalloc faults. * vmalloc faults work because attached pagetables are never freed. - * The locking scheme was chosen on the basis of manfred's - * recommendations and having no core impact whatsoever. * -- wli */ DEFINE_SPINLOCK(pgd_lock); -struct page *pgd_list; - -static inline void pgd_list_add(pgd_t *pgd) -{ - struct page *page = virt_to_page(pgd); - page->index = (unsigned long)pgd_list; - if (pgd_list) - set_page_private(pgd_list, (unsigned long)&page->index); - pgd_list = page; - set_page_private(page, (unsigned long)&pgd_list); -} +LIST_HEAD(pgd_list); -static inline void pgd_list_del(pgd_t *pgd) -{ - struct page *next, **pprev, *page = virt_to_page(pgd); - next =
Re: [RFC][PATCH 3/7] Data structures changes for RSS accounting
Dave Hansen wrote: > On Mon, 2007-03-12 at 20:19 +0300, Pavel Emelianov wrote: >> Dave Hansen wrote: >>> On Mon, 2007-03-12 at 19:16 +0300, Kirill Korotaev wrote: now VE2 maps the same page. You can't determine whether this page is mapped to this container or another one w/o page->container pointer. >>> Hi Kirill, >>> >>> I thought we can always get from the page to the VMA. rmap provides >>> this to us via page->mapping and the 'struct address_space' or anon_vma. >>> Do we agree on that? >> Not completely. When page is unmapped from the *very last* >> user its *first* toucher may already be dead. So we'll never >> find out who it was. > > OK, but this is assuming that we didn't *un*account for the page when > the last user of the "owning" container stopped using the page. That's exactly what we agreed on during our discussions: When page is get touched it is charged to this container. When page is get touched again by new container it is NOT charged to new container, but keeps holding the old one till it (the page) is completely freed. Nobody worried the fact that a single page can hold container for good. OpenVZ beancounters work the other way (and we proposed this solution when we first sent the patches). We keep track of *all* the containers (i.e. beancounters) holding this page. >>> We can also get from the vma to the mm very easily, via vma->vm_mm, >>> right? >>> >>> We can also get from a task to the container quite easily. >>> >>> So, the only question becomes whether there is a 1:1 relationship >>> between mm_structs and containers. Does each mm_struct belong to one >> No. The question is "how to get a container that touched the >> page first" which is the same as "how to find mm_struct which >> touched the page first". Obviously there's no answer on this >> question unless we hold some direct page->container reference. >> This may be a hash, a direct on-page pointer, or mirrored >> array of pointers. > > Or, you keep track of when the last user from the container goes away, > and you effectively account it to another one. We can migrate page to another user but we decided to implement it later after accepting simple accounting. > Are there problems with shifting ownership around like this? > > -- Dave > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Make sure we populate the initroot filesystem late enough
> Hmm. The crash came back after I booted into Mac OS X and back. It was however > a different crash, I believe it was coming from the USB modules (as it would > keep going when it happened, and get another crash, which tended to scroll > away > too fast for me to capture) but I believe it was still getting down into the > slab code and actually dying there. Have you tried, instead, to apply 38f3323037de22bb0089d08be27be01196e7148b ? (That is revert 39d61db0edb34d60b83c5e0d62d0e906578cc707). I suspect this is the proper fix... Ben. > However, reverting the reversion of > 8d610dd52dd1da696e199e4b4545f33a2a5de5c6 and instead applying > the following patch: > > diff -ru linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c > linux-source-2.6.20/arch/powerpc/mm/init_32.c > --- linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c 2007-02-05 > 05:44:54.0 +1100 > +++ linux-source-2.6.20/arch/powerpc/mm/init_32.c 2007-03-10 > 11:03:56.0 +1100 > @@ -244,7 +244,8 @@ > void free_initrd_mem(unsigned long start, unsigned long end) > { > if (start < end) > - printk ("Freeing initrd memory: %ldk freed\n", (end - start) > >> 10); > + printk ("NOT Freeing initrd memory: %ldk freed\n", (end - > start) >> 10); > + return; > for (; start < end; start += PAGE_SIZE) { > ClearPageReserved(virt_to_page(start)); > init_page_count(virt_to_page(start)); > > which if I recall correctly David Woodhouse posted to this thread, > seems to have fixed it. > > I dunno if it's relevant, but my initrd.img is 13193315 bytes long, > (ie 99 bytes over 12884k) and the above logs: > "NOT Freeing initrd memory: 12888k freed" > which makes sense... > > I of course completely failed to think to check this with the crashing > kernel, if it seems relevant I can roll back to it and get the numbers. > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/12] Syslets, Threadlets, generic AIO support, v5
Anton Blanchard wrote: Hi Ingo, this is the v5 release of the syslet/threadlet subsystem: http://redhat.com/~mingo/syslet-patches/ Nice! I too went and downloaded patches-v5 for review. First off, one problem I noticed in sys_async_wait: + ah->events_left = min_wait_events - (kernel_ring_idx - user_ring_idx); This completely misses the wraparound case of kernel_ring_idx < user_ring_idx. I wonder if this is causing some of the benchmark problems? (add max_ring_index if kernel < user). I tried to port this to ppc64 but found a few problems: The 64bit powerpc ABI has the concept of a TOC (r2) which is used for per function data. This means this wont work: [deleted] I think we would want to change restore_ip to restore_function, and then create a per arch helper, perhaps: void set_user_context(struct task_struct *task, unsigned long stack, unsigned long function, unsigned long retval); ppc64 could then grab the ip and r2 values from the function descriptor. The other issue involves the syscall table: asmlinkage struct syslet_uatom __user * sys_async_exec(struct syslet_uatom __user *uatom, struct async_head_user __user *ahu) { return __sys_async_exec(uatom, ahu, sys_call_table, NR_syscalls); } This exposes the layout of the syscall table. Unfortunately it wont work on ppc64. In arch/powerpc/kernel/systbl.S: #define COMPAT_SYS(func).llong .sys_##func,.compat_sys_##func Both syscall tables are overlaid. Anton In addition, the entries in the table are not function pointers, they are the actual code targets. So we need a arch helper to invoke the system call. Here is another problem with your compat code. Just telling user space that everything is u64 and having the kernel retrieve pointers and ulong doesn't work, you have to actually copy in u64 values and truncate them down. Your current code is broken on all 32bit big endian kernels. Actually, the check needs to be that the upper 32 bits are 0 or return -EINVAL. In addition, the compat syscall entry points assume that the arguments have been truncated to compat_ulong values by the syscall entry path, and that they only need to do sign extension (and/or pointer conversion on s390 with its 31 bit pointers). So all compat kernels are broken. The two of these things together makes me think we want two copy functions. At that point we may as well define the struct uatom in terms of ulong and compat_ulong for the compat_uatom. That would lead to two copies of exec_uatom, but the elimination of passing the syscall table as an argument down. The need_resched and signal check could become part of the common next_uatom routine, although it would need to know uatom+1 instead of doing the addition in itself. Other observations: All the logic setting at and async_ready is a bit hard to follow. After some analysis, t->at is only ever set to &t->__at and async_ready is only set to the same at or NULL. Both of these should become flags, and at->task should be converted to container_of. Also, the name at is hard to grep / search for. The stop flags are decoded with a case but are not densely encoded, rather they are one hot. We either need to error on multiple stop bits being set, stop on each possible condition, or encode them densely. There is no check for flags being set that are not recognized. If we ever add a flag for another stop condition this would lead to incorrect execution by the kernel. There are some syscalls that can return -EFAULT but later have force_syscall_noerror. We should create a stop on ERROR and clear the force_noerror flag between syscalls. The umem_add syscall should add force_noerror if the put_user succeeds. In copy_uatom, you call verify_read on the entire uatom. This means that the struct with all user space size has to be within the process limit, which violates your assertion that userspace doesn't need the whole structure. If we add the requirement that the space that would be occupied by the complete atom has to exist, then we can copy the whole struct uatom with copy_from_user and then copy the args with get_user. User space can still pack them more densely, and we can still stop copying on a null arg pointer. Actually, calling access_ok then __get_user can be more expensive on some architectures because they have to verify both start and length on access_ok but can only verify start on get_user because they have unmapped areas between user space and kernel space. This would also mean that we don't check arg_ptr for NULL without verifying that get_user actually worked. The gotos in exec_uatom are just a while loop with a break. sys_umem_add should be in /lib under lib-y in the Makefile. In fact declaring the function weak does not make it a weak syscall implementation on some architectures. Weak syscalls aliases to sys_ni_syscall are needed for when async support is not selected in Kconfig. The Documentation
Re: Removal of multipath cached (was Re: [PATCH] [REVISED] net/ipv4/multipath_wrandom.c: check kmalloc() return value.)
On Mon, Mar 12, 2007 at 10:22:36PM -0800, Andrew Morton wrote: > > On Mon, 12 Mar 2007 13:53:11 -0700 (PDT) David Miller <[EMAIL PROTECTED]> > > wrote: ... > > And there is absolutely no negotiations about this, I've held back on > > this for nearly 2 years, and nothing has happened, this code is not > > maintained, nobody cares enough to fix the bugs, and even no > > distributions enable it because it causes crashes. > > Good stuff. > > I suggest you put a big printk explaining the above into 2.6.21. > Plus official way: Documentation/feature-remove-schedule.txt in the next rc-git. Jarek P. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix vmi time header bug
Andrew Morton wrote: Really truly? I think we have a _lot_ of declarations which omit the section qualifier altogether. How come they don't all break too? User build was smoking this: make O=build -j16 This and non-repeatable results make me suspect some kind of build dependency problem, or perhaps a make bug. Still, please apply, as it doesn't hurt. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Remove unused set_seg_base
The set_seg_base function isn't used anywhere (2.6.21-rc3-git1) Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> diff -r 0798f7cfc709 include/asm-x86_64/desc.h --- a/include/asm-x86_64/desc.h Mon Mar 12 16:56:18 2007 +1100 +++ b/include/asm-x86_64/desc.h Tue Mar 13 11:39:16 2007 +1100 @@ -107,16 +107,6 @@ static inline void set_ldt_desc(unsigned DESC_LDT, size * 8 - 1); } -static inline void set_seg_base(unsigned cpu, int entry, void *base) -{ - struct desc_struct *d = &cpu_gdt(cpu)[entry]; - u32 addr = (u32)(u64)base; - BUG_ON((u64)base >> 32); - d->base0 = addr & 0x; - d->base1 = (addr >> 16) & 0xff; - d->base2 = (addr >> 24) & 0xff; -} - #define LDT_entry_a(info) \ info)->base_addr & 0x) << 16) | ((info)->limit & 0x0)) /* Don't allow setting of the lm bit. It is useless anyways because - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Introduce load_TLS to the "for" loop.
GCC (4.1 at least) unrolls it anyway, but I can't believe this code was ever justifiable. (I've also submitted a patch which cleans up i386, which is even uglier). Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> diff -r de5618b5e562 include/asm-x86_64/desc.h --- a/include/asm-x86_64/desc.h Tue Mar 13 11:41:55 2007 +1100 +++ b/include/asm-x86_64/desc.h Tue Mar 13 16:09:56 2007 +1100 @@ -135,16 +135,13 @@ static inline void set_ldt_desc(unsigned (info)->useable == 0&& \ (info)->lm == 0) -#if TLS_SIZE != 24 -# error update this code. -#endif - static inline void load_TLS(struct thread_struct *t, unsigned int cpu) { + unsigned int i; u64 *gdt = (u64 *)(cpu_gdt(cpu) + GDT_ENTRY_TLS_MIN); - gdt[0] = t->tls_array[0]; - gdt[1] = t->tls_array[1]; - gdt[2] = t->tls_array[2]; + + for (i = 0; i < GDT_ENTRY_TLS_ENTRIES; i++) + gdt[i] = t->tls_array[i]; } /* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CONFIG_REORDER Kconfig help strange sentence.
On Tue, 2007-03-13 at 00:56 +0100, Andi Kleen wrote: > On Tue, Mar 13, 2007 at 10:18:03AM +1100, Rusty Russell wrote: > > OK, this confused me: > > > > Function reordering (REORDER) [N/y/?] (NEW) ? > > > > This option enables the toolchain to reorder functions for a more > > optimal TLB usage. If you have pretty much any version of binutils, > > this can increase your kernel build time by roughly one minute. > > > > "If you have pretty much any version of binutils"? Huh? > > > > You mean "This will slow your kernel build by about a minute"? > > Yes. Lots of sections seem to trigger some quadratic behaviour in ld. > > It might be fixed in some unreleased CVS version though (not 100% sure) > > -Andi OK, well here is a patch for the moment. == Clarify CONFIG_REORDER explanation if (1 && X) => if (X). Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> diff -r de5618b5e562 arch/x86_64/Kconfig --- a/arch/x86_64/Kconfig Tue Mar 13 11:41:55 2007 +1100 +++ b/arch/x86_64/Kconfig Tue Mar 13 17:27:05 2007 +1100 @@ -632,8 +632,8 @@ config REORDER default n help This option enables the toolchain to reorder functions for a more - optimal TLB usage. If you have pretty much any version of binutils, -this can increase your kernel build time by roughly one minute. + optimal TLB usage. This will slow your kernel build by +roughly one minute. config K8_NB def_bool y - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 1/3] Add ability to keep track of callers of symbol_(get|put)
On Tue, 13 Mar 2007, Rusty Russell wrote: > Hi Trent, > > Patch looks good, just one comment: > > On Mon, 2007-03-12 at 07:07 -0700, Trent Piepho wrote: > > + use = already_uses(a, b); > > + if (!use) { > > + printk(KERN_ERR "module %s trying to un-use a module, %s, which > > " > > + "it is not using", a->name, b->name); > > +return 0; > > + } > > s/return 0/BUG()/. This is potentially quite a nasty bug. Ok, I did that before, I'll change it back. Note that the reference counting isn't perfect when it comes to catching mistakes. The fundamental problem is that when a module is loaded and linked, all the modules that it used symbols from gain a "use". To be symmetric, when a module is unloaded all the modules it used symbols from should lose a "use". Except, there is no record of what modules gained a "use" at link time. Suppose module 1 uses a symbol from module 2. At link time, a module_use that "1 uses 2" is created. Now say 1 does a symbol_put() on something in 2, with no matching get. The "1 uses 2" goes away. When 1 is unloaded, there is no way to tell that "1 uses 2", deleted by the extra put, is missing. If it's wanted, I think I could fix this. I'd have a separate count of static uses vs dynamic uses.From: Trent Piepho <[EMAIL PROTECTED]> Add ability to keep track of callers of symbol_(get|put) When a module uses symbol_get() to increase the ref count of another module, there is no record what module called symbol_get(). A module can show up as having other users, but there is no way to tell who those users are. This adds that ability to symbol_put() and symbol_get(). __symbol_get() and __symbol_put() gain another parameter, which specifies the module that is doing the getting or putting. symbol_put_addr() is renamed to __symbol_put_addr() and has the same parameter added. The module can be NULL, in which case the symbol's owner's refcount is incremented without recording who did it, as was the case before. The macros symbol_get(), symbol_put(), and symbol_put_addr() will use THIS_MODULE as the getter/putter and so don't have an extra parameter. A macro symbol_put_user() is added that allows specifying the putting module. The module_use structure that keeps track of one module's use of another gains a count member. The module_use will not go away until the count goes down to zero. The count wasn't necessary before because a module could only use another module once, when the module was linked in, and un-use that module once, when it was unloaded. When a module calls symbol_get() to get a symbol from module that owns the symbol, the ref count of the owning module is _not_ incremented if the getting module was already listed as using the owning module. Rather, the count of that module_use is incremented. When a module is loaded and the kernel module linker is resolving symbols, it will not increment the module_use count for each symbol used, but will just leave it at one. We don't count each symbol resolved, because during module unloading we wouldn't know how many times to decrement the module_use count. When the module is unloaded, the module_use count will only be decremented by one, which should bring it to zero. If it's not zero, then the remaining count is the number of symbol_get()s the module did that were unmatched with a symbol_put(). Signed-off-by: Trent Piepho <[EMAIL PROTECTED]> diff --git a/include/linux/module.h b/include/linux/module.h --- a/include/linux/module.h +++ b/include/linux/module.h @@ -167,9 +167,10 @@ struct notifier_block; #ifdef CONFIG_MODULES /* Get/put a kernel symbol (calls must be symmetric) */ -void *__symbol_get(const char *symbol); +void *__symbol_get(const char *symbol, struct module *user); void *__symbol_get_gpl(const char *symbol); -#define symbol_get(x) ((typeof(&x))(__symbol_get(MODULE_SYMBOL_PREFIX #x))) +#define symbol_get(x) ((typeof(&x))(__symbol_get(MODULE_SYMBOL_PREFIX #x, \ + THIS_MODULE))) #ifndef __GENKSYMS__ #ifdef CONFIG_MODVERSIONS @@ -386,9 +387,11 @@ extern void __module_put_and_exit(struct #ifdef CONFIG_MODULE_UNLOAD unsigned int module_refcount(struct module *mod); -void __symbol_put(const char *symbol); -#define symbol_put(x) __symbol_put(MODULE_SYMBOL_PREFIX #x) -void symbol_put_addr(void *addr); +void __symbol_put(const char *symbol, struct module *user); +#define symbol_put(x) __symbol_put(MODULE_SYMBOL_PREFIX #x, THIS_MODULE) +#define symbol_put_user(x,u) __symbol_put(MODULE_SYMBOL_PREFIX #x, (u)) +void __symbol_put_addr(void *addr, struct module *user); +#define symbol_put_addr(x) __symbol_put_addr((x), THIS_MODULE) /* Sometimes we know we already have a refcount, and it's easier not to handle the error case (which only happens with rmmod --wait). */ diff --git a/kernel/module.c b/kernel/module.c --- a/kernel/module.c +++ b/kernel/module.c @@ -516,30 +516,54 @@ struct module_use { struct list
Re: [ck] Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On Tue, 2007-03-13 at 17:16 +1100, Con Kolivas wrote: > On Tuesday 13 March 2007 17:08, Mike Galbraith wrote: > > Virtual or physical cores has nothing to do with the interactivity > > regression I noticed. Two nice 0 tasks which combined used 50% of my > > box can no longer share that box with two nice 5 tasks and receive the > > 50% they need to perform. That's it. From there, we wandered off into a > > discussion on the relative merit and pitfalls of fairness. > > And again, with X in its current implementation it is NOT like two nice 0 > tasks at all; it is like one nice 0 task. This is being fixed in the X design > as we speak. Shrug. I don't live then, I live now. I have expressed my concerns, and will now switch from talk back to listen mode. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: _proxy_pda still makes linking modules fail
On Tue, 2007-03-13 at 08:59 +1100, Rusty Russell wrote: > On Mon, 2007-03-12 at 10:48 +0100, Andi Kleen wrote: > > > Rusty's pda->per_cpu patch will deal with this once and for all; have > > > > Not on x86-64. > > Indeed. Perhaps it's time I join the modern world and compile a 64-bit > kernel... > > Will prepare patches, No, I don't think I will. The PDA concept has gone too far in x86-64 to be undone. In particular, it's been put in GCC 4.1 for CONFIG_CC_STACKPROTECTOR, which assumes %gs:40 will give the stack canary. For the record: the PDA should never have existed, that's what percpu vars were supposed to be for. Something went wrong here 8( %gs is best set to the offset of the local cpu's area from the "master" per-cpu area, not set to the local cpu area's address. In the former case, booting with %gs at offset 0 works naturally, in the latter case, hoops need to be jumped through to make it work. See how much nicer the x86 code is post pda->percpu conversion. So, even if we leave the PDA and place the per-cpu area immediately after it, we still can't use "%gs:var" to access a per-cpu variable: we need to do a subtract, so why bother using the segment reg? The ideal solution has always been to use __thread, but no architecture has yet managed it (I tried for i386, and it quickly caused unbearable pain). On x86-64 that uses "%fs" on x86-64, not "%gs" as the kernel does, but I might try that if I feel particularly masochistic soon... In summary, containing the PDA infection to x86-64 is possible, but curing that patient is non-trivial 8) Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On Mon, 12 Mar 2007, Lee Revell wrote: On 3/12/07, David Lang <[EMAIL PROTECTED]> wrote: the problem comes when this isn't enough. if you have several CPU hogs on a system, and they are all around the same priority level, how can the scheduler know which one needs the CPU the most for good interactivity? in some cases you may be able to directly detect that your high-priority process is waiting for another one (tracing pipes and local sockets for example), but what if you are waiting for several of them? (think a multimedia desktop waiting for the sound card, CDRom, hard drive, and video all at once) which one needs the extra CPU the most? I'm not an expert in this area by any means but after reading this thread the OSX solution of simply telling the kernel "I'm the GUI, schedule me accordingly" looks increasingly attractive. Why make the kernel guess when we can just be explicit? this can solve the specific problem (and since 'nice' is the natural way to tell the kernel this, it's not even a one-shot solution). however Linus is right, the real underlying problem is where the user is waiting on a server. if this issue could be solved then a lot of things would benifit. Con, as a quick hack (probably a bad idea as I'm not a scheduling expert), if a program blocks on another program (via a pipe or socket) could you easily give the rest of the first program's timeslice to the second one, without makeing it loose it's own? I'm thinking that doing the dumb thing and just throwing a bit more CPU at the thing you are waiting for may work. (assuming that the server process actually does something useful with the extra CPU time it gets) as far as latencies go, it would be like turning every process on the system into a cpu hog. David Lang Does anyone know of a UNIX-like system that has managed to solve this problem without hooking the GUI into the scheduler? Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix vmi time header bug
Andrew Morton wrote: Really truly? I think we have a _lot_ of declarations which omit the section qualifier altogether. How come they don't all break too? According to the report I have. Perhaps a bogus section qualifier does more damage than an omitted one. I'll get gcc / linker version, but this could be a combination of user error, a strange toolchain, and perhaps a real bug somewhere. (ARM (at least) in fact does require the section tagging on the declaration as well as the definition, but we've thus far only fixed that in a couple of places which were causing breakage). Yes, I was surprised by this as well, and I'm still skeptical about this being the real cause. Still, this reportedly fixed the problem, and is certainly not a bad thing. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On Tue, 2007-03-13 at 16:53 +1100, Con Kolivas wrote: > On Tuesday 13 March 2007 16:10, Mike Galbraith wrote: > > I'm not trying to be pig-headed. I'm of the opinion that fairness is > > great... until you strictly enforce it wrt interactive tasks. > > How about answering my question then since I offered you numerous > combinations > of ways to tackle the problem? The simplest one doesn't even need code, it > just needs you to alter the nice value that you're already setting. Hey, you specifically asked me to not choose 5 :) (I mentioned 5 earlier in the thread anyway, so no sense in repeating myself) -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On Tuesday 13 March 2007 17:08, Mike Galbraith wrote: > Virtual or physical cores has nothing to do with the interactivity > regression I noticed. Two nice 0 tasks which combined used 50% of my > box can no longer share that box with two nice 5 tasks and receive the > 50% they need to perform. That's it. From there, we wandered off into a > discussion on the relative merit and pitfalls of fairness. And again, with X in its current implementation it is NOT like two nice 0 tasks at all; it is like one nice 0 task. This is being fixed in the X design as we speak. > -Mike -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On Mon, 2007-03-12 at 17:38 -0400, michael chang wrote: > Perhaps, Mike Galbraith, do you feel that it should be possible to use > the CPU at 100% for some task and still maintain excellent > interactivity? Within reason, yes. Defining "reason" is difficult. As we speak, this is possible to a much greater degree than with RSDL. Before anybody pipes in, yes, I'm very much aware of the down side of the interactivity estimator, I've waged bloody battles with it, and have the t-shirt :) > That said, I haven't run the test case in particular yet, although I > will see if I can get the time to do so soon. In any case, I > personally do have a few qualms about this test case being run on HT > virtual cores: Virtual or physical cores has nothing to do with the interactivity regression I noticed. Two nice 0 tasks which combined used 50% of my box can no longer share that box with two nice 5 tasks and receive the 50% they need to perform. That's it. From there, we wandered off into a discussion on the relative merit and pitfalls of fairness. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Djprobes questions
Hi Mathieu, Mathieu Desnoyers wrote: > Hi Masami, > > I recently had to add support for inline code patching on i386 to my > marker infrastructure. Clearly, it looks like what is done in djprobes, > with the main difference that I only patch the immediate value of a 2 > bytes "load immediate" instruction. That's interesting. > I think I found a solution to one of the main issues with djprobes : it > currently has to wait for each CPU to hit the probe before being sure > that it's safe to patch the code with something else than an int3. This > is due to PIII errata 49, which says that a CPU much execute a > serializing instruction before executing cross-modified code. Hmm, djprobe already might not wait for each CPU to hit the probe point. It just wait scheduler synchronization instead of that. And after that, it issues cpuid for cache serialization before executing cross-modified code. The most difficult point of the djprobe is that it has to replace "live" instructions. So we must check other processors not to run those instructions carefully. > Here is what I do : While I use a breakpoint to fall in a trap for the > CPUs that hit the site currently being modified, I also send an IPI to > all CPUs so they execute cpuid. Once it returns, I am sure that every > CPU has executed a serializing instruction, which enables me to go on > with the complete code modification, therefore removing the initial > breakpoint. I think its OK. That is the same way which I've done in djprobe. > Here is my code : > > http://ltt.polymtl.ca/cgi-bin/gitweb.cgi?p=linux-2.6-lttng.git;a=blob;f=arch/i386/kernel/marker.c;h=89b06f02f0966685be260d6364a0dd94c3d14456;hb=v2.6.20-lttng > > (Comments are welcome) > > On a second note, looking at the djprobes code triggered some question > in my mind about the safety of using a worker thread to "make sure" > every interrupt context has returned (so there is no IP pointing into > the modified code). The following scenario might be possible : an > interrupt handler (or trap handler) reenables interrupts, does irq_exit() > or nmi_exit() (which reenables preemption) but does not do iret yet. My > understanding is that it could be scheduled and have a return IP > pointing to the code that is being modified. Am I right ? Same idea was already discussed. It might work on normal kernel, but, unfortunately, it doesn't work on preemptive kernel. And actually, that idea is same as synchronize_sched(). So, I've used it on normal kernel. In the case of preemptive kernel, currently, I'm using freeze_processes() suggested by Ingo. Anyway, I and Satoshi are developing a static analysis tool to check whether target instructions can be replaced by long jump. I'd like to release djprobe patch against latest kernel after developed it. Best regards, -- Masami HIRAMATSU Linux Technology Center Hitachi, Ltd., Systems Development Laboratory E-mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On Tuesday 13 March 2007 00:53, Con Kolivas wrote: > On Tuesday 13 March 2007 16:10, Mike Galbraith wrote: > > On Tue, 2007-03-13 at 09:51 +1100, Con Kolivas wrote: > > > On 13/03/07, Mike Galbraith <[EMAIL PROTECTED]> wrote: > > > > As soon as your cpu is fully utilized, fairness looses or > > > > interactivity loses. Pick one. > > > > > > That's not true unless you refuse to prioritise your tasks > > > accordingly. Let's take this discussion in a different direction. You > > > already nice your lame processes. Why? You already have the concept > > > that you are prioritising things to normal or background tasks. You > > > say so yourself that lame is a background task. Stating the bleedingly > > > obvious, the unix way of prioritising things is via nice. You already > > > do that. So moving on from that... > > > > Sure. If a user wants to do anything interactive, they can indeed nice > > 19 the rest of their box before they start. > > > > > Your test case you ask "how can I maximise cpu usage". Well you know > > > the answer already. You run two threads. I won't dispute that. > > > > > > The debate seems to be centered on whether two tasks that are niced +5 > > > or to a higher value is background. In my opinion, nice 5 is not > > > background, but relatively less cpu. You already are savvy enough to > > > be using two threads and nicing them. All I ask you to do when using > > > RSDL is to change your expectations slightly and your settings from > > > nice 5 to nice 10 or 15 or even 19. Why is that so offensive to you? > > > > It's not "offensive" to me, it is a behavioral regression. The > > situation as we speak is that you can run cpu intensive tasks while > > watching eye-candy. With RSDL, you can't, you feel the non-interactive > > load instantly. Doesn't the fact that you're asking me to lower my > > expectations tell you that I just might have a point? I do not feel nearly any non-interactive load. See below. > > Yet looking at the mainline scheduler code, nice 5 tasks are also supposed > to get 75% cpu compared to nice 0 tasks, however I cannot seem to get 75% > cpu with a fully cpu bound task in the presence of an interactive task. To > me that means mainline is not living up to my expectations. What you're > saying is your expectations are based on a false cpu expectation from nice > 5. You can spin it both ways. It seems to me the only one that lives up to > a defined expectation is to be fair. Anything else is at best vague, and at > worst starvation prone. > > > > Please don't pick 5.none of the above. Please try to work with me on > > > this. > > > > I'm not trying to be pig-headed. I'm of the opinion that fairness is > > great... until you strictly enforce it wrt interactive tasks. > > How about answering my question then since I offered you numerous > combinations of ways to tackle the problem? The simplest one doesn't even > need code, it just needs you to alter the nice value that you're already > setting. Also, just to chime in, I am doing a large project converting over 250GB of FLAC audio to MP3 via lame for my archive conversion. I am using 2.6.20.2-rsdl0.30, and I have 2 processes of flac decoding/lame encoding running simultaneously from a perl script I hacked up on my P-D 830. These processes are both nice'd to 19. I have almost no degredation in latency in my usage of X (which is at nice 0), if that matters at all. Please try what Con is suggesting by adjusting your nice level, and see if that helps you at all. These are just useless arguments, time better spent on coding and fixing real problems, than a flamewar on whether nice 5 is good enough or not. Con's rsdl implements what ingosched was supposed to do, wrt the niceness levels. Perhaps Mike, you are used to the impression ingosched gave you with nice +5, but try something else as Con suggested.. +10, +15, hell, whatever. Is that so hard? My 2c, -r -- Rodney "meff" Gordon II -*- [EMAIL PROTECTED] Systems Administrator / Coder Geek -*- Open yourself to OpenSource - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench
Anton Blanchard a écrit : Hi Nick, Anyway, I'll keep experimenting. If anyone from MySQL wants to help look at this, send me a mail (eg. especially with the sched_setscheduler issue, you might be able to do something better). I took a look at this today and figured Id document it: http://ozlabs.org/~anton/linux/sysbench/ Bottom line: it looks like issues in the glibc malloc library, replacing it with the google malloc library fixes the negative scaling: # apt-get install libgoogle-perftools0 # LD_PRELOAD=/usr/lib/libtcmalloc.so /usr/sbin/mysqld Hi Anton, thanks for the report. glibc has certainly many scalability problems. One of the known problem is its (ab)use of mmap() to allocate one (yes : one !) page every time you fopen() a file. And then a munmap() at fclose() time. mmap()/munmap() should be avoided as hell in multithreaded programs. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugme-new] [Bug 8187] New: 2.6.20 "PCI: Quirks" patch breaks X11 on I82801
On Mon, Mar 12, 2007 at 10:19:52PM -0800, Andrew Morton wrote: > > On Mon, 12 Mar 2007 13:30:05 -0700 [EMAIL PROTECTED] wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=8187 > > > >Summary: 2.6.20 "PCI: Quirks" patch breaks X11 on I82801 > > Kernel Version: 2.6.20 > > Status: NEW > > Severity: normal > > Owner: [EMAIL PROTECTED] > > Submitter: [EMAIL PROTECTED] > > > > > > Most recent kernel where this bug did *NOT* occur: > > Any 2.6.20-pre prior to commit 368c73d4f689dae0807d0a2aa74c61fd2b9b075f > > > > Distribution: Slackware 11.0 > > Hardware Environment: HP/Compaq dc5000S (P4, 82801, 82865) > > Software Environment: Xorg 6.9.0 > > Problem Description: > > > > Alan Cox introduced a "PCI: Quirks" patch (git commit > > 368c73d4f689dae0807d0a2aa74c61fd2b9b075f) in 2.6.20 that breaks X11 on this > > I82801 platform. Specifically, it causes the PCI initialisation to become > > buggered; Xorg 6.9.0 dumps the following to the console: > > (EE) end of block range 0x177 < begin 0x3f0 > > (EE) end of block range 0x177 < begin 0x3f0 > > (WW) INVALID IO ALLOCATION b: 0x14d0 e: 0x14d7 correcting > > [...] > > Backtrace: > > 0: X(xf86SigHandler+0x8a) [0x8088b2a] > > 1: [0xb7f2b420] > > 2: /usr/X11R6/lib/modules/drivers/i810_drv.so [0xb797f592] > > 3: X(InitOutput+0xb83) [0x8072713] > > 4: X(main+0x226) [0x80d4496] > > 5: /lib/tls/libc.so.6(__libc_start_main+0xd4) [0xb7da7e14] > > 6: X [0x806ff61] > > > > Fatal server error: > > Caught signal 11. Server aborting > > > > Steps to reproduce: > > > > Reverting the git commit mentioned above fixes the issue. Apparently, this > > may > > be limited to certain combinations of on-motherboard chipsets, as I haven't > > seen > > many bug reports. Googling shows some people having X11 segfault issues > > with > > 2.6.20 (e.g. freedesktop.org bug #9956) but in most of those cases it's due > > to > > the evdev driver and not PCI initialisation. > > > > I wrote to Alan (cc'ed Greg as he signed off on the patch) nearly two weeks > > ago > > but have heard nothing, so I'm leaving a bug here instead. > > > > argh. > > Would we break more machines than we fix if we just revert that? I don't know, Alan? thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On Tuesday 13 March 2007 16:10, Mike Galbraith wrote: > On Tue, 2007-03-13 at 09:51 +1100, Con Kolivas wrote: > > On 13/03/07, Mike Galbraith <[EMAIL PROTECTED]> wrote: > > > As soon as your cpu is fully utilized, fairness looses or interactivity > > > loses. Pick one. > > > > That's not true unless you refuse to prioritise your tasks > > accordingly. Let's take this discussion in a different direction. You > > already nice your lame processes. Why? You already have the concept > > that you are prioritising things to normal or background tasks. You > > say so yourself that lame is a background task. Stating the bleedingly > > obvious, the unix way of prioritising things is via nice. You already > > do that. So moving on from that... > > Sure. If a user wants to do anything interactive, they can indeed nice > 19 the rest of their box before they start. > > > Your test case you ask "how can I maximise cpu usage". Well you know > > the answer already. You run two threads. I won't dispute that. > > > > The debate seems to be centered on whether two tasks that are niced +5 > > or to a higher value is background. In my opinion, nice 5 is not > > background, but relatively less cpu. You already are savvy enough to > > be using two threads and nicing them. All I ask you to do when using > > RSDL is to change your expectations slightly and your settings from > > nice 5 to nice 10 or 15 or even 19. Why is that so offensive to you? > > It's not "offensive" to me, it is a behavioral regression. The > situation as we speak is that you can run cpu intensive tasks while > watching eye-candy. With RSDL, you can't, you feel the non-interactive > load instantly. Doesn't the fact that you're asking me to lower my > expectations tell you that I just might have a point? Yet looking at the mainline scheduler code, nice 5 tasks are also supposed to get 75% cpu compared to nice 0 tasks, however I cannot seem to get 75% cpu with a fully cpu bound task in the presence of an interactive task. To me that means mainline is not living up to my expectations. What you're saying is your expectations are based on a false cpu expectation from nice 5. You can spin it both ways. It seems to me the only one that lives up to a defined expectation is to be fair. Anything else is at best vague, and at worst starvation prone. > > Please don't pick 5.none of the above. Please try to work with me on > > this. > > I'm not trying to be pig-headed. I'm of the opinion that fairness is > great... until you strictly enforce it wrt interactive tasks. How about answering my question then since I offered you numerous combinations of ways to tackle the problem? The simplest one doesn't even need code, it just needs you to alter the nice value that you're already setting. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Stracing Amanda (was: RSDL for 2.6.21-rc3- 0.29)
On Tuesday 13 March 2007, Willy Tarreau wrote: >On Tue, Mar 13, 2007 at 12:04:42AM -0400, Gene Heskett wrote: >> On Monday 12 March 2007, Nish Aravamudan wrote: >> >On 3/12/07, Gene Heskett <[EMAIL PROTECTED]> wrote: >> >> On Monday 12 March 2007, Douglas McNaught wrote: >> >> >Patrick Mau <[EMAIL PROTECTED]> writes: >> >> >> Why not temporarly replace "/bin/tar" with a shell script that >> >> >> does: >> >> >> >> >> >> #!/bin/sh >> >> >> exec strace -f -o output /bin/real.tar $@ >> >> > >> >> >You beat me to it. :) I've done that before; it's a great >> >> > suggestion. >> >> > >> >> >Except that if you expect 'tar' to be invoked multiple times in a >> >> > run, you should probably use 'output.$$' for the output filename >> >> > so things don't get clobbered. >> >> > >> >> >-Doug >> >> >> >> In my case, Doug, it will get invoked 64 times, amanda does a dummy >> >> run to get an estimate, calculates what to do based on that output >> >> which is 32 runs, 1 per disklist entry and I have 32, and then >> >> reruns tar with the appropriate level options against each >> >> individual disklist entry. >> >> >> >> But I'm puzzled a bit, what does the double $$ do?, or it buried >> >> someplace in the bash manpage? Its not something I've stumbled >> >> over yet. >> > >> >buried indeed: >> > >> >"Special Parameters: >> > ... >> > $ Expands to the process ID of the shell. In a () >> > subshell, it expands to the process ID of the current shell, >> > not the sub?$B!> shell. >> >" >> >> Well, that's clear enough, but what of the double $$ case? Would this >> them make a PID unique to each invocation untill it finally wraps a 16 >> bit value, or will the kernel re-use them because they won't all be >> running simultainiously, but limited by the number of unique 'spindle' >> numbers on the system, this to prevent as best as it can, the >> thrashing of a drive by having tar working on 2 separate (or more) >> partitions at the same time. In my case 2 are possible, as /var is on >> a separate drive. > >Yes there a risk of wrapping, but it is very small. You can add the > command line arguments to the file name if you want, like this : > >#!/bin/sh >exec strace -f -o "output.$$.${*//\//_}" /bin/real.tar $@ > >It will name the output file "output..", replacing slashes > with underscores. This is very dirty but can help. > Excellent Willy, thanks. >Cheers, >Willy -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Whatever doesn't succeed in two months and a half in California will never succeed. -- Rev. Henry Durant, founder of the University of California - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cleanfile: a script to clean up stealth whitespace
H. Peter Anvin wrote: Fair enough. It'd be nice to have a clean-up-a-patch version of this. So it does all these things, except it only changes lines which start with ^+. It can do everything except kill empty lines at the end of the file; a patch simply doesn't contain enough information to know if blank lines are inserted at the end of a file as opposed in the middle of the file. It can, of course, be done if the unpatched material is available, probably by applying the patch and seeing what happens. Correction: for a context/unified diff it can be done by observing that there is no context left at the end of the file. It won't work if the file already have empty space at the end of it, but that's probably good enough. I'll cook something up. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kthread_should_stop_check_freeze (was: Re: [PATCH -mm 3/7] Freezer: Remove PF_NOFREEZE from rcutorture thread)
On Tue, Mar 13, 2007 at 10:57:16AM +0530, Gautham R Shenoy wrote: > CPU_DEAD: > thaw_process(p); > kthread_stop(p); > p = NULL; This neednt guarantee that the thread will see the stop request before it exits the kthread_should_stop_freeze() function. There will always be races .. So the only safe way for a thread to know whether it is time to exit is: while (!kthread_should_stop_freeze()) { if (!cpu_online(home_cpu)) goto wait_to_die; ... } wait_to_die: while (!kthread_should_stop()) { /* sleep */ } -- Regards, vatsa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cleanfile: a script to clean up stealth whitespace
Andrew Morton wrote: On Mon, 12 Mar 2007 12:16:30 -0700 "H. Peter Anvin" <[EMAIL PROTECTED]> wrote: This script cleans up various classes of stealth whitespace. In particular, it cleans up: - Whitespace (spaces or tabs)before newline; - DOS line endings (CR before LF); - Space before tab (spaces are deleted or converted to tabs); - Empty lines at end of file. Fair enough. It'd be nice to have a clean-up-a-patch version of this. So it does all these things, except it only changes lines which start with ^+. It can do everything except kill empty lines at the end of the file; a patch simply doesn't contain enough information to know if blank lines are inserted at the end of a file as opposed in the middle of the file. It can, of course, be done if the unpatched material is available, probably by applying the patch and seeing what happens. Let me know if you still want it; I'll whip it up. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix vmi time header bug
> On Mon, 12 Mar 2007 14:58:08 -0800 Zachary Amsden <[EMAIL PROTECTED]> wrote: > Some gcc put this function in .init.text because the header didn't > match. For 2.6.21-rc. > > Zach > > > [vmi-devinit-header-fix.patch text/plain (606B)] > > > Index: linux-2.6.21/include/asm-i386/vmi_time.h > === > --- linux-2.6.21.orig/include/asm-i386/vmi_time.h 2007-03-06 > 18:56:03.0 -0800 > +++ linux-2.6.21/include/asm-i386/vmi_time.h 2007-03-12 13:55:16.0 > -0800 > @@ -54,7 +54,7 @@ extern unsigned long vmi_cpu_khz(void); > > #ifdef CONFIG_X86_LOCAL_APIC > extern void __init vmi_timer_setup_boot_alarm(void); > -extern void __init vmi_timer_setup_secondary_alarm(void); > +extern void __devinit vmi_timer_setup_secondary_alarm(void); > extern void apic_vmi_timer_interrupt(void); > #endif Really truly? I think we have a _lot_ of declarations which omit the section qualifier altogether. How come they don't all break too? (ARM (at least) in fact does require the section tagging on the declaration as well as the definition, but we've thus far only fixed that in a couple of places which were causing breakage). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL v0.30 cpu scheduler for mainline kernels
From: Willy Tarreau <[EMAIL PROTECTED]> Date: Tue, 13 Mar 2007 05:32:07 +0100 > On Tue, Mar 13, 2007 at 02:05:23PM +1100, Con Kolivas wrote: > > On Tuesday 13 March 2007 10:46, David Miller wrote: > > > From: Con Kolivas <[EMAIL PROTECTED]> > > > Date: Mon, 12 Mar 2007 10:58:11 +1100 > > > > > > > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-sched-rsdl-0. > > > >30.patch > > > > > > FWIW, this boots and seems to work well on sparc64. Tested > > > on UP SunBlade1500 and 24cpu Niagara T1000. > > > > Very nice. Thanks for the feedback and I'm sorry you have to work with such > > lousy hardware. > > BTW, I don't know if you say this as a joke, He was definitely being sarcastic, relax :-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kthread_should_stop_check_freeze (was: Re: [PATCH -mm 3/7] Freezer: Remove PF_NOFREEZE from rcutorture thread)
On Sun, Mar 11, 2007 at 06:49:08PM +0100, Rafael J. Wysocki wrote: > On Saturday, 3 March 2007 18:32, Oleg Nesterov wrote: > > On 03/02, Paul E. McKenney wrote: > > > > > > On Sat, Mar 03, 2007 at 02:33:37AM +0300, Oleg Nesterov wrote: > > > > On 03/02, Paul E. McKenney wrote: > > > > > > > > > > One way to embed try_to_freeze() into kthread_should_stop() might be > > > > > as follows: > > > > > > > > > > int kthread_should_stop(void) > > > > > { > > > > > if (kthread_stop_info.k == current) > > > > > return 1; > > > > > try_to_freeze(); > > > > > return 0; > > > > > } > > > > > > > > I think this is dangerous. For example, worker_thread() will probably > > > > need some special actions after return from refrigerator. Also, a kernel > > > > thread may check kthread_should_stop() in the place where > > > > try_to_freeze() > > > > is not safe. > > > > > > > > Perhaps we should introduce a new helper which does this. > > > > > > Good point -- the return value from try_to_freeze() is lost if one uses > > > the above approach. About one third of the calls to try_to_freeze() > > > in 2.6.20 pay attention to the return value. > > > > > > One approach would be to have a kthread_should_stop_nofreeze() for those > > > cases, and let the default be to try to freeze. > > > > I personally think we should do the opposite, add > > kthread_should_stop_check_freeze() > > or something. kthread_should_stop() is like signal_pending(), we can use > > it under spin_lock (and it is probably used this way by some out-of-tree > > driver). The new helper is obviously "might_sleep()". > > Something like this, perhaps: > > include/linux/kthread.h |1 + > kernel/kthread.c| 16 > kernel/rcutorture.c |5 ++--- > 3 files changed, 19 insertions(+), 3 deletions(-) > > Index: linux-2.6.21-rc3-mm2/kernel/kthread.c > === > --- linux-2.6.21-rc3-mm2.orig/kernel/kthread.c2007-03-08 > 21:58:48.0 +0100 > +++ linux-2.6.21-rc3-mm2/kernel/kthread.c 2007-03-11 18:32:59.0 > +0100 > @@ -13,6 +13,7 @@ > #include > #include > #include > +#include > #include > > /* > @@ -60,6 +61,21 @@ int kthread_should_stop(void) > } > EXPORT_SYMBOL(kthread_should_stop); > > +/** > + * kthread_should_stop_check_freeze - check if the thread should return now > and > + * if not, check if there is a freezing request pending for it. > + */ > +int kthread_should_stop_check_freeze(void) > +{ > + might_sleep(); > + if (kthread_stop_info.k == current) > + return 1; > + > + try_to_freeze(); > + return 0; > +} > +EXPORT_SYMBOL(kthread_should_stop_check_freeze); I would prefer to have try_to_freeze() followed by the kthread_stop_info.k check. Something like if (try_to_freeze()) /*some barrier ensuring all writes are completed */ if (kthread_stop_info.k == current) return 1; return 0; This would be helpful in situations (atleast for cpu-hotplug) where we want to stop a frozen thread immediately after thawing it. Something like CPU_DEAD: thaw_process(p); kthread_stop(p); p = NULL; Is there a problem with this line of thinking ? thanks and regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Removal of multipath cached (was Re: [PATCH] [REVISED] net/ipv4/multipath_wrandom.c: check kmalloc() return value.)
> On Mon, 12 Mar 2007 13:53:11 -0700 (PDT) David Miller <[EMAIL PROTECTED]> > wrote: > From: Jarek Poplawski <[EMAIL PROTECTED]> > Date: Mon, 12 Mar 2007 12:51:37 +0100 > > > But until then it'll unnecessarily spoil linux opinion as regards > > stability and waste time of developers to check error messages. > > So, maybe it's less evil to check those NULLs where possible and add > > some WARN_ONs here and there... > > It's a crash either way, so zero improvement. > > And _THIS_ is my big problem with the multi-path cached code in the > kernel. > > NOBODY wants to step up and fix the code, but people refuse to let it > get removed from the tree. That is totally unacceptable, so I'm going > to FIX THIS. > > I'm going to FIX IT by saying that if nobody steps up to the plate to > fix the multipath cached code by 2.6.23 IT IS GONE forver. > > And there is absolutely no negotiations about this, I've held back on > this for nearly 2 years, and nothing has happened, this code is not > maintained, nobody cares enough to fix the bugs, and even no > distributions enable it because it causes crashes. Good stuff. I suggest you put a big printk explaining the above into 2.6.21. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Need help on mach-ep93xx
Hi, I have one question mach-ep93xx. In EP93xx IRQ handling part in core.c, the 2.6.19.2 kernel and newer kernels are configuring the 16 interrupts of the ports A & B together. The code is not using the interrupt capability of the port F which can provide 3 interrupts. Why the port F is not configured for interrupts ? Thanks in advance, Maxin B. John - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugme-new] [Bug 8187] New: 2.6.20 "PCI: Quirks" patch breaks X11 on I82801
> On Mon, 12 Mar 2007 13:30:05 -0700 [EMAIL PROTECTED] wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=8187 > >Summary: 2.6.20 "PCI: Quirks" patch breaks X11 on I82801 > Kernel Version: 2.6.20 > Status: NEW > Severity: normal > Owner: [EMAIL PROTECTED] > Submitter: [EMAIL PROTECTED] > > > Most recent kernel where this bug did *NOT* occur: > Any 2.6.20-pre prior to commit 368c73d4f689dae0807d0a2aa74c61fd2b9b075f > > Distribution: Slackware 11.0 > Hardware Environment: HP/Compaq dc5000S (P4, 82801, 82865) > Software Environment: Xorg 6.9.0 > Problem Description: > > Alan Cox introduced a "PCI: Quirks" patch (git commit > 368c73d4f689dae0807d0a2aa74c61fd2b9b075f) in 2.6.20 that breaks X11 on this > I82801 platform. Specifically, it causes the PCI initialisation to become > buggered; Xorg 6.9.0 dumps the following to the console: > (EE) end of block range 0x177 < begin 0x3f0 > (EE) end of block range 0x177 < begin 0x3f0 > (WW) INVALID IO ALLOCATION b: 0x14d0 e: 0x14d7 correcting > [...] > Backtrace: > 0: X(xf86SigHandler+0x8a) [0x8088b2a] > 1: [0xb7f2b420] > 2: /usr/X11R6/lib/modules/drivers/i810_drv.so [0xb797f592] > 3: X(InitOutput+0xb83) [0x8072713] > 4: X(main+0x226) [0x80d4496] > 5: /lib/tls/libc.so.6(__libc_start_main+0xd4) [0xb7da7e14] > 6: X [0x806ff61] > > Fatal server error: > Caught signal 11. Server aborting > > Steps to reproduce: > > Reverting the git commit mentioned above fixes the issue. Apparently, this > may > be limited to certain combinations of on-motherboard chipsets, as I haven't > seen > many bug reports. Googling shows some people having X11 segfault issues with > 2.6.20 (e.g. freedesktop.org bug #9956) but in most of those cases it's due to > the evdev driver and not PCI initialisation. > > I wrote to Alan (cc'ed Greg as he signed off on the patch) nearly two weeks > ago > but have heard nothing, so I'm leaving a bug here instead. > argh. Would we break more machines than we fix if we just revert that? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cleanfile: a script to clean up stealth whitespace
> On Mon, 12 Mar 2007 12:16:30 -0700 "H. Peter Anvin" <[EMAIL PROTECTED]> wrote: > This script cleans up various classes of stealth whitespace. In > particular, it cleans up: > > - Whitespace (spaces or tabs)before newline; > - DOS line endings (CR before LF); > - Space before tab (spaces are deleted or converted to tabs); > - Empty lines at end of file. Fair enough. It'd be nice to have a clean-up-a-patch version of this. So it does all these things, except it only changes lines which start with ^+. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SMP performance degradation with sysbench
Anton Blanchard wrote: Hi Nick, Anyway, I'll keep experimenting. If anyone from MySQL wants to help look at this, send me a mail (eg. especially with the sched_setscheduler issue, you might be able to do something better). I took a look at this today and figured Id document it: http://ozlabs.org/~anton/linux/sysbench/ Bottom line: it looks like issues in the glibc malloc library, replacing it with the google malloc library fixes the negative scaling: # apt-get install libgoogle-perftools0 # LD_PRELOAD=/usr/lib/libtcmalloc.so /usr/sbin/mysqld Hi Anton, Very cool. Yeah I had come to the conclusion that it wasn't a kernel issue, and basically was afraid to look into userspace ;) That bogus setscheduler thing must surely have never worked, though. I wonder if FreeBSD avoids the scalability issue because it is using SCHED_RR there, or because it has a decent threaded malloc implementation. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On Tue, 2007-03-13 at 09:51 +1100, Con Kolivas wrote: > On 13/03/07, Mike Galbraith <[EMAIL PROTECTED]> wrote: > > As soon as your cpu is fully utilized, fairness looses or interactivity > > loses. Pick one. > > That's not true unless you refuse to prioritise your tasks > accordingly. Let's take this discussion in a different direction. You > already nice your lame processes. Why? You already have the concept > that you are prioritising things to normal or background tasks. You > say so yourself that lame is a background task. Stating the bleedingly > obvious, the unix way of prioritising things is via nice. You already > do that. So moving on from that... Sure. If a user wants to do anything interactive, they can indeed nice 19 the rest of their box before they start. > Your test case you ask "how can I maximise cpu usage". Well you know > the answer already. You run two threads. I won't dispute that. > > The debate seems to be centered on whether two tasks that are niced +5 > or to a higher value is background. In my opinion, nice 5 is not > background, but relatively less cpu. You already are savvy enough to > be using two threads and nicing them. All I ask you to do when using > RSDL is to change your expectations slightly and your settings from > nice 5 to nice 10 or 15 or even 19. Why is that so offensive to you? It's not "offensive" to me, it is a behavioral regression. The situation as we speak is that you can run cpu intensive tasks while watching eye-candy. With RSDL, you can't, you feel the non-interactive load instantly. Doesn't the fact that you're asking me to lower my expectations tell you that I just might have a point? > Please don't pick 5.none of the above. Please try to work with me on this. I'm not trying to be pig-headed. I'm of the opinion that fairness is great... until you strictly enforce it wrt interactive tasks. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 2/7] RSS controller core
> On Mon, 12 Mar 2007 23:41:29 +0100 Herbert Poetzl <[EMAIL PROTECTED]> wrote: > On Mon, Mar 12, 2007 at 11:42:59AM -0700, Dave Hansen wrote: > > How about we drill down on these a bit more. > > > > On Mon, 2007-03-12 at 02:00 +0100, Herbert Poetzl wrote: > > > - shared mappings of 'shared' files (binaries > > >and libraries) to allow for reduced memory > > >footprint when N identical guests are running > > > > So, it sounds like this can be phrased as a requirement like: > > > > "Guests must be able to share pages." > > > > Can you give us an idea why this is so? > > sure, one reason for this is that guests tend to > be similar (or almost identical) which results > in quite a lot of 'shared' libraries and executables > which would otherwise get cached for each guest and > would also be mapped for each guest separately nooo. What you're saying there amounts to text replication. There is no proposal here to create duplicated copies of pagecache pages: the VM just doesn't support that (Nick has soe protopatches which do this as a possible NUMA optimisation). So these mmapped pages will contiue to be shared across all guests. The problem boils down to "which guest(s) get charged for each shared page". A simple and obvious and easy-to-implement answer is "the guest which paged it in". I think we should firstly explain why that is insufficient. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: RSDL v0.30 cpu scheduler for mainline kernels
On 3/13/07, Willy Tarreau <[EMAIL PROTECTED]> wrote: On Tue, Mar 13, 2007 at 02:05:23PM +1100, Con Kolivas wrote: > On Tuesday 13 March 2007 10:46, David Miller wrote: > > From: Con Kolivas <[EMAIL PROTECTED]> > > Date: Mon, 12 Mar 2007 10:58:11 +1100 > > > > > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-sched-rsdl-0. > > >30.patch > > > > FWIW, this boots and seems to work well on sparc64. Tested > > on UP SunBlade1500 and 24cpu Niagara T1000. > > Very nice. Thanks for the feedback and I'm sorry you have to work with such > lousy hardware. BTW, I don't know if you say this as a joke, but those are not necessarily lousy hardware. Sun does lousy hardware when they put Sparcs in PCs (ultra5, ultra10, blade100). But their servers generally are nice with large memory busses and very scalable SMP architectures. I guess Con was kidding. A 24-CPU system can be anything but lousy hardware. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] Fix some coding-style errors in autofs
On Mon, 12 Mar 2007 [EMAIL PROTECTED] wrote: > From: Sukadev Bhattiprolu <[EMAIL PROTECTED]> > Subject: [PATCH 1/2] Fix some coding-style errors in autofs > > Fix coding style errors (extra spaces, long lines) in autofs > and autofs4 files being modified for container/pidspace issues. > > --- > fs/autofs/inode.c | 29 +++ > fs/autofs/root.c | 77 > ++--- > fs/autofs4/inode.c | 16 --- > fs/autofs4/root.c | 18 ++-- > 4 files changed, 70 insertions(+), 70 deletions(-) > > Index: lx26-20-mm2c/fs/autofs/inode.c > === > --- lx26-20-mm2c.orig/fs/autofs/inode.c 2007-02-28 14:48:35.0 > -0800 > +++ lx26-20-mm2c/fs/autofs/inode.c2007-02-28 15:47:09.0 -0800 > @@ -34,12 +34,12 @@ void autofs_kill_sb(struct super_block * > > autofs_hash_nuke(sbi); > - for ( n = 0 ; n < AUTOFS_MAX_SYMLINKS ; n++ ) { > - if ( test_bit(n, sbi->symlink_bitmap) ) > + for (n = 0 ; n < AUTOFS_MAX_SYMLINKS ; n++) { > + if (test_bit(n, sbi->symlink_bitmap)) > kfree(sbi->symlink[n].data); > } Please do a complete job on the 'for' line by eliminating the space before each semi-colon. -- ~Randy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] Replace pid_t in autofs with struct pid reference
From: Sukadev Bhattiprolu <[EMAIL PROTECTED]> Subject: [PATCH 2/2] Replace pid_t in autofs with struct pid reference. Make autofs container-friendly by caching struct pid reference rather than pid_t and using pid_nr() to retreive a task's pid_t. ChangeLog: - Fix Eric Biederman's comments - Use find_get_pid() to hold a reference to oz_pgrp and release while unmounting; separate out changes to autofs and autofs4. - Fix Cedric's comments: retain old prototype of parse_options() and move necessary change to its caller. Signed-off-by: Sukadev Bhattiprolu <[EMAIL PROTECTED]> Cc: Cedric Le Goater <[EMAIL PROTECTED]> Cc: Dave Hansen <[EMAIL PROTECTED]> Cc: Serge Hallyn <[EMAIL PROTECTED]> Cc: Eric Biederman <[EMAIL PROTECTED]> Cc: [EMAIL PROTECTED] Acked-by: Eric W. Biederman <[EMAIL PROTECTED]> --- fs/autofs/autofs_i.h |4 ++-- fs/autofs/inode.c| 20 fs/autofs/root.c |6 -- 3 files changed, 22 insertions(+), 8 deletions(-) Index: lx26-21-rc3-mm2/fs/autofs/autofs_i.h === --- lx26-21-rc3-mm2.orig/fs/autofs/autofs_i.h 2007-03-12 17:12:05.0 -0700 +++ lx26-21-rc3-mm2/fs/autofs/autofs_i.h2007-03-12 17:18:55.0 -0700 @@ -101,7 +101,7 @@ struct autofs_symlink { struct autofs_sb_info { u32 magic; struct file *pipe; - pid_t oz_pgrp; + struct pid *oz_pgrp; int catatonic; struct super_block *sb; unsigned long exp_timeout; @@ -122,7 +122,7 @@ static inline struct autofs_sb_info *aut filesystem without "magic".) */ static inline int autofs_oz_mode(struct autofs_sb_info *sbi) { - return sbi->catatonic || process_group(current) == sbi->oz_pgrp; + return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp; } /* Hash operations */ Index: lx26-21-rc3-mm2/fs/autofs/inode.c === --- lx26-21-rc3-mm2.orig/fs/autofs/inode.c 2007-03-12 17:18:48.0 -0700 +++ lx26-21-rc3-mm2/fs/autofs/inode.c 2007-03-12 17:18:55.0 -0700 @@ -37,6 +37,8 @@ void autofs_kill_sb(struct super_block * if (!sbi->catatonic) autofs_catatonic_mode(sbi); /* Free wait queues, close pipe */ + put_pid(sbi->oz_pgrp); + autofs_hash_nuke(sbi); for (n = 0 ; n < AUTOFS_MAX_SYMLINKS ; n++) { if (test_bit(n, sbi->symlink_bitmap)) @@ -139,6 +141,7 @@ int autofs_fill_super(struct super_block int pipefd; struct autofs_sb_info *sbi; int minproto, maxproto; + pid_t pgid; sbi = kzalloc(sizeof(*sbi), GFP_KERNEL); if (!sbi) @@ -150,7 +153,6 @@ int autofs_fill_super(struct super_block sbi->pipe = NULL; sbi->catatonic = 1; sbi->exp_timeout = 0; - sbi->oz_pgrp = process_group(current); autofs_initialize_hash(&sbi->dirhash); sbi->queues = NULL; memset(sbi->symlink_bitmap, 0, sizeof(long)*AUTOFS_SYMLINK_BITMAP_LEN); @@ -171,7 +173,7 @@ int autofs_fill_super(struct super_block /* Can this call block? - WTF cares? s is locked. */ if (parse_options(data, &pipefd, &root_inode->i_uid, - &root_inode->i_gid, &sbi->oz_pgrp, &minproto, + &root_inode->i_gid, &pgid, &minproto, &maxproto)) { printk("autofs: called with bogus options\n"); goto fail_dput; @@ -184,13 +186,21 @@ int autofs_fill_super(struct super_block goto fail_dput; } - DPRINTK(("autofs: pipe fd = %d, pgrp = %u\n", pipefd, sbi->oz_pgrp)); + DPRINTK(("autofs: pipe fd = %d, pgrp = %u\n", pipefd, pgid)); + sbi->oz_pgrp = find_get_pid(pgid); + + if (!sbi->oz_pgrp) { + printk("autofs: could not find process group %d\n", pgid); + goto fail_dput; + } + pipe = fget(pipefd); if (!pipe) { printk("autofs: could not open pipe file descriptor\n"); - goto fail_dput; + goto fail_put_pid; } + if (!pipe->f_op || !pipe->f_op->write) goto fail_fput; sbi->pipe = pipe; @@ -205,6 +215,8 @@ int autofs_fill_super(struct super_block fail_fput: printk("autofs: pipe file descriptor does not contain proper ops\n"); fput(pipe); +fail_put_pid: + put_pid(sbi->oz_pgrp); fail_dput: dput(root); goto fail_free; Index: lx26-21-rc3-mm2/fs/autofs/root.c === --- lx26-21-rc3-mm2.orig/fs/autofs/root.c 2007-03-12 17:18:48.0 -0700 +++ lx26-21-rc3-mm2/fs/autofs/root.c2007-03-12 17:18:55.0 -0700 @@ -213,8 +213,10 @@ static struct dentry *autofs_root_lookup sbi = autofs_sbi(dir->i_sb); oz_mode = autof
[PATCH 1/2] Fix some coding-style errors in autofs
From: Sukadev Bhattiprolu <[EMAIL PROTECTED]> Subject: [PATCH 1/2] Fix some coding-style errors in autofs Fix coding style errors (extra spaces, long lines) in autofs and autofs4 files being modified for container/pidspace issues. Signed-off-by: Sukadev Bhattiprolu <[EMAIL PROTECTED]> Cc: Cedric Le Goater <[EMAIL PROTECTED]> Cc: Dave Hansen <[EMAIL PROTECTED]> Cc: Serge Hallyn <[EMAIL PROTECTED]> Cc: [EMAIL PROTECTED] Cc: Eric W. Biederman <[EMAIL PROTECTED]> --- fs/autofs/inode.c | 29 +++ fs/autofs/root.c | 77 ++--- fs/autofs4/inode.c | 16 --- fs/autofs4/root.c | 18 ++-- 4 files changed, 70 insertions(+), 70 deletions(-) Index: lx26-20-mm2c/fs/autofs/inode.c === --- lx26-20-mm2c.orig/fs/autofs/inode.c 2007-02-28 14:48:35.0 -0800 +++ lx26-20-mm2c/fs/autofs/inode.c 2007-02-28 15:47:09.0 -0800 @@ -34,12 +34,12 @@ void autofs_kill_sb(struct super_block * if (!sbi) goto out_kill_sb; - if ( !sbi->catatonic ) + if (!sbi->catatonic) autofs_catatonic_mode(sbi); /* Free wait queues, close pipe */ autofs_hash_nuke(sbi); - for ( n = 0 ; n < AUTOFS_MAX_SYMLINKS ; n++ ) { - if ( test_bit(n, sbi->symlink_bitmap) ) + for (n = 0 ; n < AUTOFS_MAX_SYMLINKS ; n++) { + if (test_bit(n, sbi->symlink_bitmap)) kfree(sbi->symlink[n].data); } @@ -69,7 +69,8 @@ static match_table_t autofs_tokens = { {Opt_err, NULL} }; -static int parse_options(char *options, int *pipefd, uid_t *uid, gid_t *gid, pid_t *pgrp, int *minproto, int *maxproto) +static int parse_options(char *options, int *pipefd, uid_t *uid, gid_t *gid, + pid_t *pgrp, int *minproto, int *maxproto) { char *p; substring_t args[MAX_OPT_ARGS]; @@ -140,7 +141,7 @@ int autofs_fill_super(struct super_block int minproto, maxproto; sbi = kzalloc(sizeof(*sbi), GFP_KERNEL); - if ( !sbi ) + if (!sbi) goto fail_unlock; DPRINTK(("autofs: starting up, sbi = %p\n",sbi)); @@ -169,14 +170,16 @@ int autofs_fill_super(struct super_block goto fail_iput; /* Can this call block? - WTF cares? s is locked. */ - if ( parse_options(data,&pipefd,&root_inode->i_uid,&root_inode->i_gid,&sbi->oz_pgrp,&minproto,&maxproto) ) { + if (parse_options(data, &pipefd, &root_inode->i_uid, + &root_inode->i_gid, &sbi->oz_pgrp, &minproto, + &maxproto)) { printk("autofs: called with bogus options\n"); goto fail_dput; } /* Couldn't this be tested earlier? */ - if ( minproto > AUTOFS_PROTO_VERSION || -maxproto < AUTOFS_PROTO_VERSION ) { + if (minproto > AUTOFS_PROTO_VERSION || +maxproto < AUTOFS_PROTO_VERSION) { printk("autofs: kernel does not match daemon version\n"); goto fail_dput; } @@ -184,11 +187,11 @@ int autofs_fill_super(struct super_block DPRINTK(("autofs: pipe fd = %d, pgrp = %u\n", pipefd, sbi->oz_pgrp)); pipe = fget(pipefd); - if ( !pipe ) { + if (!pipe) { printk("autofs: could not open pipe file descriptor\n"); goto fail_dput; } - if ( !pipe->f_op || !pipe->f_op->write ) + if (!pipe->f_op || !pipe->f_op->write) goto fail_fput; sbi->pipe = pipe; sbi->catatonic = 0; @@ -230,7 +233,7 @@ static void autofs_read_inode(struct ino inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME; inode->i_blocks = 0; - if ( ino == AUTOFS_ROOT_INO ) { + if (ino == AUTOFS_ROOT_INO) { inode->i_mode = S_IFDIR | S_IRUGO | S_IXUGO | S_IWUSR; inode->i_op = &autofs_root_inode_operations; inode->i_fop = &autofs_root_operations; @@ -241,12 +244,12 @@ static void autofs_read_inode(struct ino inode->i_uid = inode->i_sb->s_root->d_inode->i_uid; inode->i_gid = inode->i_sb->s_root->d_inode->i_gid; - if ( ino >= AUTOFS_FIRST_SYMLINK && ino < AUTOFS_FIRST_DIR_INO ) { + if (ino >= AUTOFS_FIRST_SYMLINK && ino < AUTOFS_FIRST_DIR_INO) { /* Symlink inode - should be in symlink list */ struct autofs_symlink *sl; n = ino - AUTOFS_FIRST_SYMLINK; - if ( n >= AUTOFS_MAX_SYMLINKS || !test_bit(n,sbi->symlink_bitmap)) { + if (n >= AUTOFS_MAX_SYMLINKS || !test_bit(n,sbi->symlink_bitmap)) { printk("autofs: Looking for bad symlink inode %u\n", (unsigned int) ino); return; } Index: lx26-20-mm2c/fs/autofs/root.c =
[PATCH] Kill unused sesssion and group values in rocket driver
From: Sukadev Bhattiprolu <[EMAIL PROTECTED]> Subject: [PATCH] Kill unused sesssion and group values in rocket driver The process_session() and process_group() values are not really used by the driver. Signed-off-by: Sukadev Bhattiprolu <[EMAIL PROTECTED]> Cc: Cedric Le Goater <[EMAIL PROTECTED]> Cc: Dave Hansen <[EMAIL PROTECTED]> Cc: Serge Hallyn <[EMAIL PROTECTED]> Cc: [EMAIL PROTECTED] Cc: Eric W. Biederman <[EMAIL PROTECTED]> --- drivers/char/rocket.c |3 --- drivers/char/rocket_int.h |2 -- 2 files changed, 5 deletions(-) Index: lx26-20-mm2c/drivers/char/rocket.c === --- lx26-20-mm2c.orig/drivers/char/rocket.c 2007-02-28 19:23:00.0 -0800 +++ lx26-20-mm2c/drivers/char/rocket.c 2007-02-28 19:24:41.0 -0800 @@ -1018,9 +1018,6 @@ static int rp_open(struct tty_struct *tt /* * Info->count is now 1; so it's safe to sleep now. */ - info->session = process_session(current); - info->pgrp = process_group(current); - if ((info->flags & ROCKET_INITIALIZED) == 0) { cp = &info->channel; sSetRxTrigger(cp, TRIG_1); Index: lx26-20-mm2c/drivers/char/rocket_int.h === --- lx26-20-mm2c.orig/drivers/char/rocket_int.h 2007-02-28 19:23:00.0 -0800 +++ lx26-20-mm2c/drivers/char/rocket_int.h 2007-02-28 19:24:41.0 -0800 @@ -1156,8 +1156,6 @@ struct r_port { int xmit_head; int xmit_tail; int xmit_cnt; - int session; - int pgrp; int cd_status; int ignore_status_mask; int read_status_mask; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Stracing Amanda (was: RSDL for 2.6.21-rc3- 0.29)
On Tue, Mar 13, 2007 at 12:04:42AM -0400, Gene Heskett wrote: > On Monday 12 March 2007, Nish Aravamudan wrote: > >On 3/12/07, Gene Heskett <[EMAIL PROTECTED]> wrote: > >> On Monday 12 March 2007, Douglas McNaught wrote: > >> >Patrick Mau <[EMAIL PROTECTED]> writes: > >> >> Why not temporarly replace "/bin/tar" with a shell script that > >> >> does: > >> >> > >> >> #!/bin/sh > >> >> exec strace -f -o output /bin/real.tar $@ > >> > > >> >You beat me to it. :) I've done that before; it's a great > >> > suggestion. > >> > > >> >Except that if you expect 'tar' to be invoked multiple times in a > >> > run, you should probably use 'output.$$' for the output filename so > >> > things don't get clobbered. > >> > > >> >-Doug > >> > >> In my case, Doug, it will get invoked 64 times, amanda does a dummy > >> run to get an estimate, calculates what to do based on that output > >> which is 32 runs, 1 per disklist entry and I have 32, and then reruns > >> tar with the appropriate level options against each individual > >> disklist entry. > >> > >> But I'm puzzled a bit, what does the double $$ do?, or it buried > >> someplace in the bash manpage? Its not something I've stumbled over > >> yet. > > > >buried indeed: > > > >"Special Parameters: > > ... > > $ Expands to the process ID of the shell. In a () > > subshell, it expands to the process ID of the current shell, not > > the sub?$B!> shell. > >" > > Well, that's clear enough, but what of the double $$ case? Would this > them make a PID unique to each invocation untill it finally wraps a 16 > bit value, or will the kernel re-use them because they won't all be > running simultainiously, but limited by the number of unique 'spindle' > numbers on the system, this to prevent as best as it can, the thrashing > of a drive by having tar working on 2 separate (or more) partitions at > the same time. In my case 2 are possible, as /var is on a separate > drive. Yes there a risk of wrapping, but it is very small. You can add the command line arguments to the file name if you want, like this : #!/bin/sh exec strace -f -o "output.$$.${*//\//_}" /bin/real.tar $@ It will name the output file "output..", replacing slashes with underscores. This is very dirty but can help. Cheers, Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/5] Use struct pid parameter in copy_process()
From: Sukadev Bhattiprolu <[EMAIL PROTECTED]> Subject: [PATCH 3/5] Use struct pid parameter in copy_process() Modify copy_process() to take a struct pid * parameter instead of a pid_t. This simplifies the code a bit and also avoids having to call find_pid() to convert the pid_t to a struct pid. Changelog: - Fixed Badari Pulavarty's comments and passed in &init_struct_pid from fork_idle(). - Fixed Eric Biederman's comments and simplified this patch and used a new patch to remove the likely(pid) check. Signed-off-by: Sukadev Bhattiprolu <[EMAIL PROTECTED]> Cc: Cedric Le Goater <[EMAIL PROTECTED]> Cc: Dave Hansen <[EMAIL PROTECTED]> Cc: Serge Hallyn <[EMAIL PROTECTED]> Cc: Eric Biederman <[EMAIL PROTECTED]> Cc: [EMAIL PROTECTED] Acked-by: Eric W. Biederman <[EMAIL PROTECTED]> --- kernel/fork.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) Index: lx26-21-rc3-mm2/kernel/fork.c === --- lx26-21-rc3-mm2.orig/kernel/fork.c 2007-03-12 17:16:39.0 -0700 +++ lx26-21-rc3-mm2/kernel/fork.c 2007-03-12 17:17:48.0 -0700 @@ -966,7 +966,7 @@ static struct task_struct *copy_process( unsigned long stack_size, int __user *parent_tidptr, int __user *child_tidptr, - int pid) + struct pid *pid) { int retval; struct task_struct *p = NULL; @@ -1033,7 +1033,7 @@ static struct task_struct *copy_process( p->did_exec = 0; delayacct_tsk_init(p); /* Must remain after dup_task_struct() */ copy_flags(clone_flags, p); - p->pid = pid; + p->pid = pid_nr(pid); INIT_LIST_HEAD(&p->children); INIT_LIST_HEAD(&p->sibling); @@ -1265,7 +1265,7 @@ static struct task_struct *copy_process( list_add_tail_rcu(&p->tasks, &init_task.tasks); __get_cpu_var(process_counts)++; } - attach_pid(p, PIDTYPE_PID, find_pid(p->pid)); + attach_pid(p, PIDTYPE_PID, pid); nr_threads++; } @@ -1336,7 +1336,8 @@ struct task_struct * __cpuinit fork_idle struct task_struct *task; struct pt_regs regs; - task = copy_process(CLONE_VM, 0, idle_regs(®s), 0, NULL, NULL, 0); + task = copy_process(CLONE_VM, 0, idle_regs(®s), 0, NULL, NULL, + &init_struct_pid); if (!IS_ERR(task)) init_idle(task, cpu); @@ -1364,7 +1365,7 @@ long do_fork(unsigned long clone_flags, return -EAGAIN; nr = pid->nr; - p = copy_process(clone_flags, stack_start, regs, stack_size, parent_tidptr, child_tidptr, nr); + p = copy_process(clone_flags, stack_start, regs, stack_size, parent_tidptr, child_tidptr, pid); /* * Do this prior waking up the new thread - the thread pointer * might get invalid after that point, if the thread exits quickly. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/5] Explicitly set pgid and sid of init process
From: Sukadev Bhattiprolu <[EMAIL PROTECTED]> Subject: [PATCH 2/5] Explicitly set pgid and sid of init process Explicitly set pgid and sid of init process to 1. Signed-off-by: Sukadev Bhattiprolu <[EMAIL PROTECTED]> Cc: Cedric Le Goater <[EMAIL PROTECTED]> Cc: Dave Hansen <[EMAIL PROTECTED]> Cc: Serge Hallyn <[EMAIL PROTECTED]> Cc: Eric Biederman <[EMAIL PROTECTED]> Cc: Herbert Poetzl <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Acked-by: Eric W. Biederman <[EMAIL PROTECTED]> --- init/main.c |1 + 1 file changed, 1 insertion(+) Index: lx26-20-mm2c/init/main.c === --- lx26-20-mm2c.orig/init/main.c 2007-02-28 15:49:13.0 -0800 +++ lx26-20-mm2c/init/main.c2007-02-28 15:49:35.0 -0800 @@ -791,6 +791,7 @@ static int __init init(void * unused) */ init_pid_ns.child_reaper = current; + __set_special_pids(1, 1); cad_pid = task_pid(current); smp_prepare_cpus(max_cpus); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/5] Remove the likely(pid) check in copy_process
From: Sukadev Bhattiprolu <[EMAIL PROTECTED]> Subject: [PATCH 4/5] Remove the likely(pid) check in copy_process Now that we pass in a struct pid parameter to copy_process() and even the swapper (pid_t == 0) has a valid struct pid, we no longer need this check. Changelog: Per Eric Biederman's comments, moved this out to a separate patch for easier review. Signed-off-by: Sukadev Bhattiprolu <[EMAIL PROTECTED]> Cc: Cedric Le Goater <[EMAIL PROTECTED]> Cc: Dave Hansen <[EMAIL PROTECTED]> Cc: Serge Hallyn <[EMAIL PROTECTED]> Cc: [EMAIL PROTECTED] Acked-by: Eric W. Biederman <[EMAIL PROTECTED]> --- kernel/fork.c | 34 -- 1 file changed, 16 insertions(+), 18 deletions(-) Index: lx26-20-mm2c/kernel/fork.c === --- lx26-20-mm2c.orig/kernel/fork.c 2007-02-28 15:08:46.0 -0800 +++ lx26-20-mm2c/kernel/fork.c 2007-02-28 15:33:20.0 -0800 @@ -1249,26 +1249,24 @@ static struct task_struct *copy_process( } } - if (likely(p->pid)) { - add_parent(p); - tracehook_init_task(p); - - if (thread_group_leader(p)) { - pid_t pgid = process_group(current); - pid_t sid = process_session(current); - - p->signal->tty = current->signal->tty; - p->signal->pgrp = pgid; - set_signal_session(p->signal, process_session(current)); - attach_pid(p, PIDTYPE_PGID, find_pid(pgid)); - attach_pid(p, PIDTYPE_SID, find_pid(sid)); + add_parent(p); + tracehook_init_task(p); - list_add_tail_rcu(&p->tasks, &init_task.tasks); - __get_cpu_var(process_counts)++; - } - attach_pid(p, PIDTYPE_PID, pid); - nr_threads++; + if (thread_group_leader(p)) { + pid_t pgid = process_group(current); + pid_t sid = process_session(current); + + p->signal->tty = current->signal->tty; + p->signal->pgrp = pgid; + set_signal_session(p->signal, process_session(current)); + attach_pid(p, PIDTYPE_PGID, find_pid(pgid)); + attach_pid(p, PIDTYPE_SID, find_pid(sid)); + + list_add_tail_rcu(&p->tasks, &init_task.tasks); + __get_cpu_var(process_counts)++; } + attach_pid(p, PIDTYPE_PID, pid); + nr_threads++; total_forks++; spin_unlock(¤t->sighand->siglock); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/5] Use task_pgrp() task_session() in copy_process()
From: Sukadev Bhattiprolu <[EMAIL PROTECTED]> Subject: [PATCH 5/5] Use task_pgrp() task_session() in copy_process(). Use task_pgrp() and task_session() in copy_process(), and avoid find_pid() call when attaching the task to its process group and session. Signed-off-by: Sukadev Bhattiprolu <[EMAIL PROTECTED]> Cc: Cedric Le Goater <[EMAIL PROTECTED]> Cc: Dave Hansen <[EMAIL PROTECTED]> Cc: Serge Hallyn <[EMAIL PROTECTED]> Cc: [EMAIL PROTECTED] Acked-by: Eric W. Biederman <[EMAIL PROTECTED]> --- kernel/fork.c |9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) Index: lx26-21-rc3-mm2/kernel/fork.c === --- lx26-21-rc3-mm2.orig/kernel/fork.c 2007-03-12 17:18:03.0 -0700 +++ lx26-21-rc3-mm2/kernel/fork.c 2007-03-12 17:18:11.0 -0700 @@ -1252,14 +1252,11 @@ static struct task_struct *copy_process( tracehook_init_task(p); if (thread_group_leader(p)) { - pid_t pgid = process_group(current); - pid_t sid = process_session(current); - p->signal->tty = current->signal->tty; - p->signal->pgrp = pgid; + p->signal->pgrp = process_group(current); set_signal_session(p->signal, process_session(current)); - attach_pid(p, PIDTYPE_PGID, find_pid(pgid)); - attach_pid(p, PIDTYPE_SID, find_pid(sid)); + attach_pid(p, PIDTYPE_PGID, task_pgrp(current)); + attach_pid(p, PIDTYPE_SID, task_session(current)); list_add_tail_rcu(&p->tasks, &init_task.tasks); __get_cpu_var(process_counts)++; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/5] statically initialize struct pid for swapper
From: Sukadev Bhattiprolu <[EMAIL PROTECTED]> Subject: [PATCH 1/5] statically initialize struct pid for swapper Statically initialize a struct pid for the swapper process (pid_t == 0) and attach it to init_task. This is needed so task_pid(), task_pgrp() and task_session() interfaces work on the swapper process also. Signed-off-by: Sukadev Bhattiprolu <[EMAIL PROTECTED]> Cc: Cedric Le Goater <[EMAIL PROTECTED]> Cc: Dave Hansen <[EMAIL PROTECTED]> Cc: Serge Hallyn <[EMAIL PROTECTED]> Cc: Eric Biederman <[EMAIL PROTECTED]> Cc: Herbert Poetzl <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Acked-by: Eric W. Biederman <[EMAIL PROTECTED]> --- include/linux/init_task.h | 27 +++ include/linux/pid.h |2 ++ kernel/pid.c |2 ++ 3 files changed, 31 insertions(+) Index: lx26-20-mm2c/include/linux/init_task.h === --- lx26-20-mm2c.orig/include/linux/init_task.h 2007-02-28 15:47:44.0 -0800 +++ lx26-20-mm2c/include/linux/init_task.h 2007-02-28 15:48:07.0 -0800 @@ -96,6 +96,28 @@ extern struct group_info init_groups; #define INIT_PREEMPT_RCU #endif +#define INIT_STRUCT_PID { \ + .count = ATOMIC_INIT(1), \ + .nr = 0,\ + /* Don't put this struct pid in pid_hash */ \ + .pid_chain = { .next = NULL, .pprev = NULL }, \ + .tasks = { \ + { .first = &init_task.pids[PIDTYPE_PID].node }, \ + { .first = &init_task.pids[PIDTYPE_PGID].node },\ + { .first = &init_task.pids[PIDTYPE_SID].node }, \ + }, \ + .rcu= RCU_HEAD_INIT,\ +} + +#define INIT_PID_LINK(type)\ +{ \ + .node = { \ + .next = NULL, \ + .pprev = &init_struct_pid.tasks[type].first,\ + }, \ + .pid = &init_struct_pid,\ +} + /* * INIT_TASK is used to set up the first task table, touch at * your own risk!. Base=0, limit=0x1f (=2MB) @@ -145,6 +167,11 @@ extern struct group_info init_groups; .cpu_timers = INIT_CPU_TIMERS(tsk.cpu_timers), \ .fs_excl= ATOMIC_INIT(0), \ .pi_lock= SPIN_LOCK_UNLOCKED, \ + .pids = { \ + [PIDTYPE_PID] = INIT_PID_LINK(PIDTYPE_PID),\ + [PIDTYPE_PGID] = INIT_PID_LINK(PIDTYPE_PGID), \ + [PIDTYPE_SID] = INIT_PID_LINK(PIDTYPE_SID),\ + }, \ INIT_TRACE_IRQFLAGS \ INIT_LOCKDEP\ } Index: lx26-20-mm2c/include/linux/pid.h === --- lx26-20-mm2c.orig/include/linux/pid.h 2007-02-28 15:48:07.0 -0800 +++ lx26-20-mm2c/include/linux/pid.h2007-02-28 15:48:07.0 -0800 @@ -51,6 +51,8 @@ struct pid struct rcu_head rcu; }; +extern struct pid init_struct_pid; + struct pid_link { struct hlist_node node; Index: lx26-20-mm2c/kernel/pid.c === --- lx26-20-mm2c.orig/kernel/pid.c 2007-02-28 15:48:07.0 -0800 +++ lx26-20-mm2c/kernel/pid.c 2007-02-28 15:48:07.0 -0800 @@ -27,11 +27,13 @@ #include #include #include +#include #define pid_hashfn(nr) hash_long((unsigned long)nr, pidhash_shift) static struct hlist_head *pid_hash; static int pidhash_shift; static struct kmem_cache *pid_cachep; +struct pid init_struct_pid = INIT_STRUCT_PID; int pid_max = PID_MAX_DEFAULT; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL v0.30 cpu scheduler for mainline kernels
On Tue, Mar 13, 2007 at 02:05:23PM +1100, Con Kolivas wrote: > On Tuesday 13 March 2007 10:46, David Miller wrote: > > From: Con Kolivas <[EMAIL PROTECTED]> > > Date: Mon, 12 Mar 2007 10:58:11 +1100 > > > > > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-sched-rsdl-0. > > >30.patch > > > > FWIW, this boots and seems to work well on sparc64. Tested > > on UP SunBlade1500 and 24cpu Niagara T1000. > > Very nice. Thanks for the feedback and I'm sorry you have to work with such > lousy hardware. BTW, I don't know if you say this as a joke, but those are not necessarily lousy hardware. Sun does lousy hardware when they put Sparcs in PCs (ultra5, ultra10, blade100). But their servers generally are nice with large memory busses and very scalable SMP architectures. Regards, Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On Mar 12, 2007, at 11:26:25, Linus Torvalds wrote: So "good fairness" really should involve some notion of "work done for others". It's just not very easy to do.. Maybe extend UNIX sockets to add another passable object type vis-a- vis SCM_RIGHTS, except in this case "SCM_CPUTIME". You call SCM_CPUTIME with a time value in monotonic real-time nanoseconds (duration) and a value out of 100 indicating what percentage of your timeslices to give to the process (for the specified duration). The receiving process would be informed of the estimated total number of nanoseconds of timeslice that it will be given based on the priority of the processes. (Maybe it could prioritize requests?). The X libraries could then properly "pass" CPU time to the X server to help with rendering their requests, and the X server could give priority to tasks which give up more CPU time than is needed to render their data, and penalize those which use more than they give. Initially even if you don't patch the X server you could at least patch the X clients to give up CPU to the X server to promote interactivity. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.21rc suspend to ram regression on Lenovo X60
I spent considerable time over the last day or so bisecting to find out why an X60 stopped resuming somewhen between 2.6.20 and current -git. (Total lockup, black screen of death). The bisect log looked like this. git-bisect start # bad: [c8f71b01a50597e298dc3214a2f2be7b8d31170c] Linux 2.6.21-rc1 git-bisect bad c8f71b01a50597e298dc3214a2f2be7b8d31170c # good: [fa285a3d7924a0e3782926e51f16865c5129a2f7] Linux 2.6.20 git-bisect good fa285a3d7924a0e3782926e51f16865c5129a2f7 # bad: [574009c1a895aeeb85eaab29c235d75852b09eb8] Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus git-bisect bad 574009c1a895aeeb85eaab29c235d75852b09eb8 # bad: [43187902cbfafe73ede0144166b741fb0f7d04e1] Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 git-bisect bad 43187902cbfafe73ede0144166b741fb0f7d04e1 # good: [1545085a28f226b59c243f88b82ea25393b0d63f] drm: Allow for 44 bit user-tokens (or drm_file offsets) git-bisect good 1545085a28f226b59c243f88b82ea25393b0d63f # good: [c96e2c92072d3e78954c961f53d8c7352f7abbd7] Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/usb-2.6 git-bisect good c96e2c92072d3e78954c961f53d8c7352f7abbd7 # good: [31c56d820e03a2fd47f81d6c826f92caf511f9ee] [POWERPC] pasemi: iommu support git-bisect good 31c56d820e03a2fd47f81d6c826f92caf511f9ee # bad: [78149df6d565c36675463352d0bfeb02b7a7] Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6 git-bisect bad 78149df6d565c36675463352d0bfeb02b7a7 # good: [3d9c18872fa1db5c43ab97d8cbca43775998e49c] shpchp: remove CONFIG_HOTPLUG_PCI_SHPC_POLL_EVENT_MODE git-bisect good 3d9c18872fa1db5c43ab97d8cbca43775998e49c # good: [88187dfa4d8bb565df762f272511d2c91e427e0d] MSI: Replace pci_msi_quirk with calls to pci_no_msi() git-bisect good 88187dfa4d8bb565df762f272511d2c91e427e0d # good: [866a8c87c4e51046602387953bbef76992107bcb] msi: Fix msi_remove_pci_irq_vectors. git-bisect good 866a8c87c4e51046602387953bbef76992107bcb # good: [f7feaca77d6ad6bcfcc88ac54e3188970448d6fe] msi: Make MSI useable more architectures git-bisect good f7feaca77d6ad6bcfcc88ac54e3188970448d6fe # good: [14719f325e1cd4ff757587e9a221ebaf394563ee] Revert "PCI: remove duplicate device id from ata_piix" git-bisect good 14719f325e1cd4ff757587e9a221ebaf394563ee which led me to a final 'bad' commit of 78149df6d565c36675463352d0bfeb02b7a7 which is a merge changeset of lots of PCI bits. Seeing a couple of MSI changes in there, on a hunch I booted latest tree with pci=nomsi, and it resumed again. Any ideas how to further debug this? I'll try backing out individual changes from that merge tomorrow. Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3280277 - ynlg
AVERT Labs - Beaverton Current Scan Engine Version:5100.0194 Current DAT Version:4982. Thank you for your submission. Analysis ID: 3280277 File NameFindings Detection Type Extra |--| ||- [EMAIL PROTECTED]|current detection |w32/[EMAIL PROTECTED] |Virus |no current detection [EMAIL PROTECTED] The file received is infected and can be detected and removed with our current DAT files and engine. It is recommended that you update your DAT and engine files and scan your computer again. If you are not seeing this with the product you are using, please speak with technical support so that they can help you determine the cause of this discrepancy. To find detailed information about viruses and other malware, please review AVERT's Virus Information Library: http://vil.mcafeesecurity.com In order to get the fastest possible response, you may wish to submit future virus-samples to: https://www.webimmune.net/default.asp In most cases it can respond almost instantly with a solution. This may also be the best option if you are having a problem with gateway scanners stripping your sample submission. If you believe your computer is infected, but are unsure which files should be submitted to AVERT for review, please visit: http://vil.mcafeesecurity.com/vil/submit-sample.aspx For other virus-related information, please review the AVERT homepage at: http://www.mcafee.com/us/threat_center/default.asp Support - Virus Research accepts file-samples for analysis and possible inclusion into AV signature DAT sets. We are also prepared to answer general virus questions. All product-related questions and comments can be addressed through technical support and customer service, including: * Product installation and update questions * Product usage questions * Specific operating system/version questions * Assistance with detection and cleaning or removal of viruses or trojans Use the following link to update your DAT and scan engine to the most current version: http://www.mcafee.com/apps/downloads/security_updates/dat.asp Use the following links to reach online technical support for McAfee products - Corporate Customers: http://www.mcafeesecurity.com/us/support/ Single User/Retail Customers: http://www.mcafeehelp.com Note - Due to the prevalence of network gateway AV products, it is important that all submissions be zipped and the zip file password-protected (password - infected). Some products will reject an email that contains a virus that is not sent in this way. In addition, often we receive a file that appears not to have been infected, to find later that the file was infected when it left the sender, and was cleaned somewhere along the line. Regards, McAfee AVERT tm A division of McAfee, Inc - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Stracing Amanda (was: RSDL for 2.6.21-rc3- 0.29)
On Monday 12 March 2007, Nish Aravamudan wrote: >On 3/12/07, Gene Heskett <[EMAIL PROTECTED]> wrote: >> On Monday 12 March 2007, Douglas McNaught wrote: >> >Patrick Mau <[EMAIL PROTECTED]> writes: >> >> Why not temporarly replace "/bin/tar" with a shell script that >> >> does: >> >> >> >> #!/bin/sh >> >> exec strace -f -o output /bin/real.tar $@ >> > >> >You beat me to it. :) I've done that before; it's a great >> > suggestion. >> > >> >Except that if you expect 'tar' to be invoked multiple times in a >> > run, you should probably use 'output.$$' for the output filename so >> > things don't get clobbered. >> > >> >-Doug >> >> In my case, Doug, it will get invoked 64 times, amanda does a dummy >> run to get an estimate, calculates what to do based on that output >> which is 32 runs, 1 per disklist entry and I have 32, and then reruns >> tar with the appropriate level options against each individual >> disklist entry. >> >> But I'm puzzled a bit, what does the double $$ do?, or it buried >> someplace in the bash manpage? Its not something I've stumbled over >> yet. > >buried indeed: > >"Special Parameters: > ... > $ Expands to the process ID of the shell. In a () > subshell, it expands to the process ID of the current shell, not > the sub‐ shell. >" Well, that's clear enough, but what of the double $$ case? Would this them make a PID unique to each invocation untill it finally wraps a 16 bit value, or will the kernel re-use them because they won't all be running simultainiously, but limited by the number of unique 'spindle' numbers on the system, this to prevent as best as it can, the thrashing of a drive by having tar working on 2 separate (or more) partitions at the same time. In my case 2 are possible, as /var is on a separate drive. >Thanks, >Nish -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) "Say yur prayers, yuh flea-pickin' varmint!" -- Yosemite Sam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Attachment Received Autoreply
Thank you for your file-sample. We will review your email and either send you a response or forward to the appropriate contact. If you have sent us a file which is not in a password protected zip file (password - infected) then your sample will not be reviewed. __ Virus Research accepts file-samples for analysis and possible inclusion into AV signature DAT sets. We are also prepared to answer general virus questions. Virus Research does not handle product related issues. This message has been sent based upon keywords in your message. If you have been sent this message in error, please resend your message with the word "noauto" in the subject line. __ Information on recent threats, along with other AVERT resources and tools, can be found at: http://www.mcafeesecurity.com/us/security/home.asp All product-related questions and comments can be addressed through technical support. Contact information for Technical Support can be found at: http://www.mcafeesecurity.com/us/contact/home.htm. Engine and DAT updates are available at: http://www.mcafeesecurity.com/us/downloads/updates For instructions on submitting a sample to AVERT please see: http://vil.nai.com/vil/submit-sample.asp If you suspect you have a new, unknown virus and have a system where you can do a test scan, you may first wish to try our Beta Hourly DATs to get the latest detection available at: http://vil.mcafeesecurity.com/vil/averttools.asp Thanks - McAfee AVERT(tm) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kthread_should_stop_check_freeze (was: Re: [PATCH -mm 3/7] Freezer: Remove PF_NOFREEZE from rcutorture thread)
On Mon, Mar 12, 2007 at 05:45:24PM -0500, Anton Blanchard wrote: > Then please document it _clearly_ with the kthread code somewhere. Document as well in the kernel_thread() API, as I notice people still use kernel_thread() some places (ex: rtasd.c in powerpc arch)? > The reason I brought this up is I had no idea we had to put the freezer gunk > in all kernel thread loops and Ive been writing kernel threads for years. I noticed that in the Powerpc code (atleast for rtas kernel thread) here: http://lkml.org/lkml/2007/1/9/61 That was not a serious problem perhaps because process freezer was mostly used in software suspend and only those platforms supporting software suspend had to worry abt it. But now we intend to use process freezer for CPU hotplug as well, so all platforms wanting to support CPU hotplug better support process freezer! P.S : I believe kprobes is already using process freezer as well. -- Regards, vatsa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RSDL v0.30 cpu scheduler for mainline kernels
On Tuesday 13 March 2007 10:46, David Miller wrote: > From: Con Kolivas <[EMAIL PROTECTED]> > Date: Mon, 12 Mar 2007 10:58:11 +1100 > > > http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc3-sched-rsdl-0. > >30.patch > > FWIW, this boots and seems to work well on sparc64. Tested > on UP SunBlade1500 and 24cpu Niagara T1000. Very nice. Thanks for the feedback and I'm sorry you have to work with such lousy hardware. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Question: removal of syscall macros?
2006/12/14, Teunis Peters <[EMAIL PROTECTED]>: Now that syscall macros have been pulled from the -mm tree, what method is recommended to use syscalls? (I've wasted a day grubbing through sources before giving up and copying the old syscall macros into one key driver) _syscall macros are used by: ATI driver (no choice. I'm working with laptops) I have the same problem as yours. Do you have any idea to use ATI firegl driver in recent kernels ? Thanks in advance. Regards, albcamus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Make sure we populate the initroot filesystem late enough
On Mar 12, 2007, at 6:01 PM, Paul TBBle Hampson wrote: On Thu, Mar 01, 2007 at 09:30:56AM +0900, Michael Ellerman wrote: On Wed, 2007-02-28 at 10:13 +, David Woodhouse wrote: On Wed, 2007-02-28 at 07:43 +0100, Benjamin Herrenschmidt wrote: I wouldn't be that sure ... I've had problems in the past with PMU based cpufreq... looks like flushing all caches and hard-resetting the processor on the fly when there can be pending DMAs might be a source of trouble... especially on CPUs that don't have working cache flush HW assist. I've seen it on a PowerMac3,1 (400MHz G4) where we don't have cpufreq. I've also seen it on the latest 1.5GHz Mac Mini, and on my shinybook. They all fall over with the latest kernel, although the shinybook only does so immediately when booted with mem=512M. The shinybook does crash later with new kernels though; I don't yet know why. It could be the same thing, or it could be something different. That one seemed to appear between Fedora's 2.6.19-1.2913 and 2.6.19-1.2914 kernels, where we did nothing but turned CONFIG_SYSFS_DEPRECATED on. I don't blame cpufreq. At various times I've been equally convinced that it was due to CONFIG_KPROBES, and Linus' initrd-moving patch. Is there any pattern to the way it dies? Or is it just randomly dieing somewhere depending on which config options you have enabled? This is starting to sound reminiscent of a bug I chased for a while last year on Power5, but didn't find. It was "fixed" on some machines by disabling CONFIG_KEXEC, and/or other random unrelated CONFIG options. Unfortunately it magically stopped reproducing so I never caught it :/ Hmm. The crash came back after I booted into Mac OS X and back. It was however a different crash, I believe it was coming from the USB modules (as it would keep going when it happened, and get another crash, which tended to scroll away too fast for me to capture) but I believe it was still getting down into the slab code and actually dying there. However, reverting the reversion of 8d610dd52dd1da696e199e4b4545f33a2a5de5c6 and instead applying the following patch: diff -ru linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c linux- source-2.6.20/arch/powerpc/mm/init_32.c --- linux-source-2.6.20.orig/arch/powerpc/mm/init_32.c 2007-02-05 05:44:54.0 +1100 +++ linux-source-2.6.20/arch/powerpc/mm/init_32.c 2007-03-10 11:03:56.0 +1100 @@ -244,7 +244,8 @@ void free_initrd_mem(unsigned long start, unsigned long end) { if (start < end) - printk ("Freeing initrd memory: %ldk freed\n", (end - start) >> 10); + printk ("NOT Freeing initrd memory: %ldk freed\n", (end - start) >> 10); + return; for (; start < end; start += PAGE_SIZE) { ClearPageReserved(virt_to_page(start)); init_page_count(virt_to_page(start)); which if I recall correctly David Woodhouse posted to this thread, seems to have fixed it. I dunno if it's relevant, but my initrd.img is 13193315 bytes long, (ie 99 bytes over 12884k) and the above logs: "NOT Freeing initrd memory: 12888k freed" which makes sense... I of course completely failed to think to check this with the crashing kernel, if it seems relevant I can roll back to it and get the numbers. Have you tried 2.6.20.2, there was a significant bug in get_order() that was deemed to be causing these issues. - k - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Stracing Amanda (was: RSDL for 2.6.21-rc3- 0.29)
On 3/12/07, Gene Heskett <[EMAIL PROTECTED]> wrote: On Monday 12 March 2007, Douglas McNaught wrote: >Patrick Mau <[EMAIL PROTECTED]> writes: >> Why not temporarly replace "/bin/tar" with a shell script that does: >> >> #!/bin/sh >> exec strace -f -o output /bin/real.tar $@ > >You beat me to it. :) I've done that before; it's a great suggestion. > >Except that if you expect 'tar' to be invoked multiple times in a run, >you should probably use 'output.$$' for the output filename so things >don't get clobbered. > >-Doug In my case, Doug, it will get invoked 64 times, amanda does a dummy run to get an estimate, calculates what to do based on that output which is 32 runs, 1 per disklist entry and I have 32, and then reruns tar with the appropriate level options against each individual disklist entry. But I'm puzzled a bit, what does the double $$ do?, or it buried someplace in the bash manpage? Its not something I've stumbled over yet. buried indeed: "Special Parameters: ... $ Expands to the process ID of the shell. In a () subshell, it expands to the process ID of the current shell, not the sub‐ shell. " Thanks, Nish - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Stracing Amanda (was: RSDL for 2.6.21-rc3- 0.29)
On Monday 12 March 2007, Douglas McNaught wrote: >Patrick Mau <[EMAIL PROTECTED]> writes: >> Why not temporarly replace "/bin/tar" with a shell script that does: >> >> #!/bin/sh >> exec strace -f -o output /bin/real.tar $@ > >You beat me to it. :) I've done that before; it's a great suggestion. > >Except that if you expect 'tar' to be invoked multiple times in a run, >you should probably use 'output.$$' for the output filename so things >don't get clobbered. > >-Doug In my case, Doug, it will get invoked 64 times, amanda does a dummy run to get an estimate, calculates what to do based on that output which is 32 runs, 1 per disklist entry and I have 32, and then reruns tar with the appropriate level options against each individual disklist entry. But I'm puzzled a bit, what does the double $$ do?, or it buried someplace in the bash manpage? Its not something I've stumbled over yet. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) rugged, adj.: Too heavy to lift. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fwd: libata extension
Why is the access to Control register needed? To execute soft reset for example. > In the perfect case i would like to be able to execute vendor command > set (reverse engineered). Sounds interesting. :-) Could you give some more details on what are you going to implement? Reading/writing service area, uploading, downloading modules, working with flash etc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 0/6] Arch independent quicklists V1
From: David Miller <[EMAIL PROTECTED]> Date: Mon, 12 Mar 2007 19:26:16 -0700 (PDT) > From: Paul Mackerras <[EMAIL PROTECTED]> > Date: Tue, 13 Mar 2007 11:37:32 +1100 > > > David Miller writes: > > > > > I ported this to sparc64 as per the patch below, tested on > > > UP SunBlade1500 and 24 cpu Niagara T1000. > > > > Did you see any performance improvement? We used to have quicklists > > on ppc, but I remain to be convinced that they actually help. > > It shaved about 3 or 4 seconds consistently off of my kernel > build on Niagara which usually clocks in just over 4 minutes > on this 24 thread machine. I want to quantify this with the fact that all the cache false sharing issues are irrelevant in this test because the L2 cache is shared between all of the cpu threads on Niagara. It was fast just because the quicklists were lighter weight than the SLAB stuff. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 0/6] Arch independent quicklists V1
From: Paul Mackerras <[EMAIL PROTECTED]> Date: Tue, 13 Mar 2007 11:37:32 +1100 > David Miller writes: > > > I ported this to sparc64 as per the patch below, tested on > > UP SunBlade1500 and 24 cpu Niagara T1000. > > Did you see any performance improvement? We used to have quicklists > on ppc, but I remain to be convinced that they actually help. It shaved about 3 or 4 seconds consistently off of my kernel build on Niagara which usually clocks in just over 4 minutes on this 24 thread machine. > Also, I didn't understand why we have to do quicklists to take > advantage of the fact that the pages are in a pristine state when they > are freed. I thought the whole point of the slab allocator was to be > able to take advantage of that... He just wants to side-step the issue in SLUB, which arguably is an attempt to simplify SLUB at the expense of functionality. I don't agree with that, but I'm merely preemptively testing his patches and porting them to sparc64 so it does not break when/if his code is merged in. After being bitten by stuff like this in the past, I've decided to become more proactive :) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!
On 3/12/07, Herbert Poetzl <[EMAIL PROTECTED]> wrote: why? you simply enter that specific space and use the existing mechanisms (netlink, proc, whatever) to retrieve the information with _existing_ tools, That's assuming that you're using network namespace virtualization, with each group of tasks in a separate namespace. What if you don't want the virtualization overhead, just the accounting? Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On 3/12/07, David Lang <[EMAIL PROTECTED]> wrote: the problem comes when this isn't enough. if you have several CPU hogs on a system, and they are all around the same priority level, how can the scheduler know which one needs the CPU the most for good interactivity? in some cases you may be able to directly detect that your high-priority process is waiting for another one (tracing pipes and local sockets for example), but what if you are waiting for several of them? (think a multimedia desktop waiting for the sound card, CDRom, hard drive, and video all at once) which one needs the extra CPU the most? I'm not an expert in this area by any means but after reading this thread the OSX solution of simply telling the kernel "I'm the GUI, schedule me accordingly" looks increasingly attractive. Why make the kernel guess when we can just be explicit? Does anyone know of a UNIX-like system that has managed to solve this problem without hooking the GUI into the scheduler? Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 2/7] RSS controller core
On Tue, Mar 13, 2007 at 07:27:06AM +0530, Balbir Singh wrote: > I am not sure what went wrong. Could you please check your mail > client, cause it seemed to even change email address to smtp.osdl.org > which bounced back when I wrote to you earlier. I have a problem doing a group-reply in mutt to Herbert's mails. His email id gets dropped from the To or Cc list. Is that his email setting? Don't know. -- Regards, vatsa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!
On Tue, Mar 13, 2007 at 12:31:13AM +0100, Herbert Poetzl wrote: > just means that the current Linux-VServer behaviour > is a subset of that, no problem there as long as > it really _is_ a subset :) we always like to provide > more features in the future, no problem with that :) Considering the example Sam quoted, doesn't it make sense to split resource classes (some of them atleast) independent of each other? That would also argue for providing multiple hierarchy feature in Paul's patches. Given that and the mail Serge sent on why nsproxy optimization is usefull given numbers, can you reconsider your earlier proposals as below: - pid_ns and resource parameters should be in a single struct (point 1c, 7b in [1]) - pointers to resource controlling objects should be inserted in task_struct directly (instead of nsproxy indirection) (points 2c in [1]) [1] http://lkml.org/lkml/2007/3/12/138 -- Regards, vatsa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 2/7] RSS controller core
hmm, it is very unlikely that this would happen, for several reasons ... and indeed, checking the thread in my mailbox shows that akpm dropped you ... But, I got Andrew's email. Subject: [RFC][PATCH 2/7] RSS controller core From: Pavel Emelianov <[EMAIL PROTECTED]> To: Andrew Morton <[EMAIL PROTECTED]>, Paul Menage <[EMAIL PROTECTED]>, Srivatsa Vaddagiri <[EMAIL PROTECTED]>, Balbir Singh <[EMAIL PROTECTED]> Cc: [EMAIL PROTECTED], Linux Kernel Mailing List Date: Tue, 06 Mar 2007 17:55:29 +0300 Subject: Re: [RFC][PATCH 2/7] RSS controller core From: Andrew Morton <[EMAIL PROTECTED]> To: Pavel Emelianov <[EMAIL PROTECTED]> Cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], Paul Menage <[EMAIL PROTECTED]>, List Date: Tue, 6 Mar 2007 14:00:36 -0800 that's the one I 'group' replied to ... > Could you please not modify the "cc" list. I never modify the cc unless explicitely asked to do so. I wish others would have it that way too :) Thats good to know, but my mailer shows Andrew Morton <[EMAIL PROTECTED]> to Pavel Emelianov <[EMAIL PROTECTED]> cc Paul Menage <[EMAIL PROTECTED]>, Srivatsa Vaddagiri <[EMAIL PROTECTED]>, Balbir Singh <[EMAIL PROTECTED]> (see I am <>), devel@openvz.org, Linux Kernel Mailing List , [EMAIL PROTECTED], Kirill Korotaev <[EMAIL PROTECTED]> dateMar 7, 2007 3:30 AM subject Re: [RFC][PATCH 2/7] RSS controller core mailed-by vger.kernel.org On Tue, 06 Mar 2007 17:55:29 +0300 and your reply as Andrew Morton <[EMAIL PROTECTED]>, Pavel Emelianov <[EMAIL PROTECTED]>, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], Paul Menage <[EMAIL PROTECTED]>, List to Andrew Morton <[EMAIL PROTECTED]> cc Pavel Emelianov <[EMAIL PROTECTED]>, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], Paul Menage <[EMAIL PROTECTED]>, List dateMar 9, 2007 10:18 PM subject Re: [RFC][PATCH 2/7] RSS controller core mailed-by vger.kernel.org I am not sure what went wrong. Could you please check your mail client, cause it seemed to even change email address to smtp.osdl.org which bounced back when I wrote to you earlier. best, Herbert Cheers, Balbir - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Fwd: PROBLEM: 2.6.20-1 not working on ibook g4 (BUG/Oops)
-- Forwarded message -- Hi, I have tested on my mac mini g4. The 2.6.21-rc2 will cause oops like the above post. And for the new 2.6.21-rc3-git7 , the kernel load ok, penguin pixmap appears, but then it stopped, there's no error messages also. Regards dave 2007/3/7, Benjamin Herrenschmidt <[EMAIL PROTECTED]>: On Wed, 2007-03-07 at 17:53 +1300, Paul Collins wrote: > David Woodhouse <[EMAIL PROTECTED]> writes: > > > On Tue, 2007-03-06 at 14:53 +1300, Paul Collins wrote: > >> In case it's of interest, 2.6.20 has been running fine on my > >> PowerBook5,4. > > > > How much memory? What if you boot with mem=512M or mem=256M? > > 1GB. Also works fine when booted with those options. Can you try 2.6.21-rc3 ? We just fixed a nasty bug causing memory corruption. Ben. ___ Linuxppc-dev mailing list [EMAIL PROTECTED] https://ozlabs.org/mailman/listinfo/linuxppc-dev - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] usb-serial regression fix
Jim Radford wrote: On Mon, Mar 12, 2007 at 05:18:19PM -0700, Greg KH wrote: On Mon, Mar 12, 2007 at 03:59:22PM -0700, Jim Radford wrote: On Mon, Mar 12, 2007 at 03:42:35PM -0700, Jim Radford wrote: On Mon, Mar 12, 2007 at 01:33:31PM -0700, Greg KH wrote: On Mon, Mar 12, 2007 at 04:22:22PM -0400, Mark Lord wrote: Oliver Neukum wrote: Mark Lord wrote: Okay, from that part (above), the problem is obvious: in that the "MCT U232 converter now disconnected" appears, and then we continue to try and call the driver's method.. Oops! IMHO shutdown() is using serial->port[] and bombs. Could you reverse the order here? Do not NULL serial->port[i] since it is used in ->shutdown(). This wasn't an issue until the order or ->shutdown() and device_unregister was corrected. for (i = 0; i < serial->num_ports; ++i) if (serial->port[i]->dev.parent != NULL) { device_unregister(&serial->port[i]->dev); - serial->port[i] = NULL; } But shouldn't you null it out somewhere? It will be an "empty" pointer at some point in time... Not as far as I can see. The serial structure that ->port[i] is in gets kfree()ed soon after, in the same function, and nothing in between, other than ->shutdown(), uses ->port[]. I assume it was someone being overly cautious. So where does the memory get freed -- the structure pointed at by the serial->port[i] thingie ? It's not a leak, is it? ??? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 0/6] Arch independent quicklists V1
On Tue, 13 Mar 2007, Paul Mackerras wrote: > Also, I didn't understand why we have to do quicklists to take > advantage of the fact that the pages are in a pristine state when they > are freed. I thought the whole point of the slab allocator was to be > able to take advantage of that... It used to be the case that initializating objects was better in the past. Today it is better to initialize the objects immediately before they are used. That will move them into the cpu caches and keep them there. Initializing them earlier may cause the cachelines of the object to be evicted from the cpu cache and then those have to be refetched. The benefit of this approach diminishes the larger objects get and the sparser the access to the cachelines of the object. In the case of page sized objects that are sparsely accessed (the PAGE_SIZE caches covered by quicklists) it makes sense to attempt to avoid having to touch all cachelines of the page on alloc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Stracing Amanda (was: RSDL for 2.6.21-rc3- 0.29)
Patrick Mau <[EMAIL PROTECTED]> writes: > Why not temporarly replace "/bin/tar" with a shell script that does: > > #!/bin/sh > exec strace -f -o output /bin/real.tar $@ You beat me to it. :) I've done that before; it's a great suggestion. Except that if you expect 'tar' to be invoked multiple times in a run, you should probably use 'output.$$' for the output filename so things don't get clobbered. -Doug - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sys_write() racy for multi-threaded append?
> Writing to a file from multiple processes is not usually the problem. > Writing to a common "struct file" from multiple threads is. Not normally because POSIX sensibly invented pread/pwrite. Forgot preadv/pwritev but they did the basics and end of problem > So what? My products are shipping _now_. That doesn't inspire confidence. > even funny. If POSIX mandates stupid shit, and application > programmers don't read that part of the manual anyway (and don't code > on that assumption in practice), to hell with POSIX. On many file Thats funny, you were talking about quality a moment ago. > descriptors, short writes simply can't happen -- and code that There is almost no descriptor this is true for. Any file I/O can and will end up short on disk full or resource limit exceeded or quota exceeded or NFS server exploded or ... And on the device side about the only thing with the vaguest guarantees is pipe(). > purports to handle short writes but has never been exercised is > arguably worse than code that simply bombs on short write. So if I > can't shim in an induce-short-writes-randomly-on-purpose mechanism > during development, I don't want short writes in production, period. Easy enough to do and gcov plus dejagnu or similar tools will let you coverage analyse the resulting test set and replay it. > Sure -- until the one code path in a hundred that handles the "short > write" case incorrectly gets traversed in production, after having > gone untested in a development environment that used a different > filesystem that never happened to trigger it. Competent QA and testing people test all the returns in the manual as well as all the returns they can find in the code. See ptrace(2) if you don't want to do a lot of relinking and strace for some useful worked examples of syscall hooking. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Tue, Mar 13, 2007 at 12:01:13AM +0100, Blaisorblade wrote: > On Wednesday 07 March 2007 11:02, Nick Piggin wrote: > > > > > > Yeah, tmpfs/shm segs are what I was thinking about. If UML can live with > > > that as well, then I think it might be a good option. > > > > Oh, hmm if you can truncate these things then you still need to > > force unmap so you still need i_mmap_nonlinear. > > Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug, which > is > way similar I guess. > > About the restriction to tmpfs, I have just discovered > '[PATCH] mm: tracking shared dirty pages' (commit > d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially conflicts > with remap_file_pages for file-based mmaps (and that's fully fine, for now). > > Even if UML does not need it, till now if there is a VMA protection and a > page > hasn't been remapped with remap_file_pages, the VMA protection is used (just > because it makes sense). > > However, it is only used when the PTE is first created - we can never change > protections on a VMA - so it vma_wants_writenotify() is true (on all > file-based and on no shmfs based mapping, right?), and we write-protect the > VMA, it will always be write-protected. Yes, I believe that is the case, however I wonder if that is going to be a problem for you to distinguish between write faults for clean writable ptes, and write faults for readonly ptes? > That's no problem for UML, but for any other user (I guess I'll have to > prevent callers from trying such stuff - I started from a pretty generic > patch). > > > But come to think of it, I still don't think nonlinear mappings are > > too bad as they are ;) > > Btw, I really like removing ->populate and merging the common code together. > filemap_populate and shmem_populate are so obnoxiously different that I > already wanted to do that (after merging remap_file_pages() core). Yeah they are also frustratingly similar to filemap_nopage and shmem_nopage, and duplicate a lot of the same code ;) > Also, I'm curious. Since my patches are already changing remap_file_pages() > code, should they be absolutely merged after yours? Is there a big clash? I don't think I did a great deal to fremap.c (mainly just removing stuff)... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] i386: Simplify smp_call_function*() by using common implementation
Subject: Simplify smp_call_function*() by using common implementation smp_call_function and smp_call_function_single are almost complete duplicates of the same logic. This patch combines them by implementing them in terms of the more general smp_call_function_mask(). Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]> Cc: Stephane Eranian <[EMAIL PROTECTED]> Cc: Andrew Morton <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Cc: "Randy.Dunlap" <[EMAIL PROTECTED]> Cc: Ingo Molnar <[EMAIL PROTECTED]> --- arch/i386/kernel/smp.c | 213 ++-- 1 file changed, 102 insertions(+), 111 deletions(-) === --- a/arch/i386/kernel/smp.c +++ b/arch/i386/kernel/smp.c @@ -515,6 +515,73 @@ void unlock_ipi_call_lock(void) static struct call_data_struct *call_data; + +/** + * smp_call_function_mask(): Run a function on a set of other CPUs. + * @mask: The set of cpus to run on. Must not include the current cpu. + * @func: The function to run. This must be fast and non-blocking. + * @info: An arbitrary pointer to pass to the function. + * @wait: If true, wait (atomically) until function has completed on other CPUs. + * + * Returns 0 on success, else a negative status code. Does not return until + * remote CPUs are nearly ready to execute <> or are or have finished. + * + * You must not call this function with disabled interrupts or from a + * hardware interrupt handler or from a bottom half handler. + */ +int smp_call_function_mask(cpumask_t mask, + void (*func)(void *), void *info, + int wait) +{ + struct call_data_struct data; + cpumask_t allbutself; + int cpus; + + /* Can deadlock when called with interrupts disabled */ + WARN_ON(irqs_disabled()); + + /* Holding any lock stops cpus from going down. */ + spin_lock(&call_lock); + + allbutself = cpu_online_map; + cpu_clear(smp_processor_id(), allbutself); + + cpus_and(mask, mask, allbutself); + cpus = cpus_weight(mask); + + if (!cpus) { + spin_unlock(&call_lock); + return 0; + } + + data.func = func; + data.info = info; + atomic_set(&data.started, 0); + data.wait = wait; + if (wait) + atomic_set(&data.finished, 0); + + call_data = &data; + mb(); + + /* Send a message to other CPUs */ + if (cpus_equal(mask, allbutself)) + send_IPI_allbutself(CALL_FUNCTION_VECTOR); + else + send_IPI_mask(mask, CALL_FUNCTION_VECTOR); + + /* Wait for response */ + while (atomic_read(&data.started) != cpus) + cpu_relax(); + + if (wait) + while (atomic_read(&data.finished) != cpus) + cpu_relax(); + spin_unlock(&call_lock); + + return 0; +} + /** * smp_call_function(): Run a function on all other CPUs. * @func: The function to run. This must be fast and non-blocking. @@ -528,48 +595,43 @@ static struct call_data_struct *call_dat * You must not call this function with disabled interrupts or from a * hardware interrupt handler or from a bottom half handler. */ -int smp_call_function (void (*func) (void *info), void *info, int nonatomic, - int wait) -{ - struct call_data_struct data; - int cpus; - - /* Holding any lock stops cpus from going down. */ - spin_lock(&call_lock); - cpus = num_online_cpus() - 1; - if (!cpus) { - spin_unlock(&call_lock); - return 0; - } - - /* Can deadlock when called with interrupts disabled */ - WARN_ON(irqs_disabled()); - - data.func = func; - data.info = info; - atomic_set(&data.started, 0); - data.wait = wait; - if (wait) - atomic_set(&data.finished, 0); - - call_data = &data; - mb(); - - /* Send a message to all other CPUs and wait for them to respond */ - send_IPI_allbutself(CALL_FUNCTION_VECTOR); - - /* Wait for response */ - while (atomic_read(&data.started) != cpus) - cpu_relax(); - - if (wait) - while (atomic_read(&data.finished) != cpus) - cpu_relax(); - spin_unlock(&call_lock); - - return 0; +int smp_call_function(void (*func) (void *info), void *info, int nonatomic, + int wait) +{ + return smp_call_function_mask(cpu_online_map, func, info, wait); } EXPORT_SYMBOL(smp_call_function); + +/* + * smp_call_function_single - Run a function on another CPU + * @func: The function to run. This must be fast and non-blocking. + * @info: An arbitrary pointer to pass to the function. + * @nonatomic: Currently unused. + * @wait: If true, wait until function has completed on other CPUs. + * + * Retrurns 0 on success, else a negative
Re: sys_write() racy for multi-threaded append?
On 3/12/07, Bodo Eggert <[EMAIL PROTECTED]> wrote: On Mon, 12 Mar 2007, Michael K. Edwards wrote: > That's fine when you're doing integration test, and should probably be > the default during development. But if the race is first exposed in > the field, or if the developer is trying to concentrate on a different > problem, "spectacular crash and burn" may do more harm than good. > It's easy enough to refactor the f_pos handling in the kernel so that > it all goes through three or four inline accessor functions, at which > point you can choose your trade-off between speed and idiot-proofness > -- at _kernel_ compile time, or (given future hardware that supports > standardized optionally-atomic-based-on-runtime-flag operations) per > process at run-time. CONFIG_WOMBAT Waste memory, brain and time in order to grant an atomic write which is neither guaranteed by the standard nor expected by any sane programmer, just in case some idiot tries to write to one file from multiple processes. Warning: Programs expecting this behaviour are buggy and non-portable. OK, I laughed out loud at this. But I think you're missing my point, which is that there's a time to be hard-core about code quality and there's a time to be hard-core about _product_ quality. Face it, all products containing software more or less suck. This is because most programmers write crap code most of the time. The only way to cope with this, outside the confines of the European defense industry and other niches insulated from economic reality, is to make the production environment gentler on _application_ code than the development environment is. Hence CONFIG_WOMBAT. (I like that name. I'm going to use it in my patch, with your permission. :-) Writing to a file from multiple processes is not usually the problem. Writing to a common "struct file" from multiple threads is. 99.999% of the time it will work, because you're only writing as far as VFS cache and then bumping f_pos, and your threads are probably on the same processor anyway. 0.001% of the time the second thread will see a stale f_pos and clobber the first write. This is true even on file types that can never return a short write. If you remember to open with O_APPEND so the pos argument to vfs_write is silently ignored, or if the implementation underlying vfs_write effectively ignores the pos argument irrespective of flags, you're OK. If the pos argument isn't ignored, or if you ever look at the result of a relative seek on any fd that maps to that struct file, you're screwed. (Note to the alert reader: yes, this means shell scripts should always use >> rather than > when routing stdout and/or stderr to a file. You're just as vulnerable to interleaving due to stdio buffering issues as you are when stdio and stderr are sent to the tty, and short writes may still be a problem if you are so foolish as to use a filesystem that generates them on anything short of a catastrophic error, but at least you get O_APPEND and sane behavior on ftruncate().) > Frankly, I think that unless application programmers poke at some sort > of magic "I promise to handle short writes correctly" bit, write() > should always return either the full number of bytes requested or an > error code. If you asume that you won't have short writes, your programs may fail on e.g. solaris. There may be reasons for linux to use the same semantics at some time in the future, you never know. So what? My products are shipping _now_. Future kernels are guaranteed to break them anyway because sysfs is a moving target. Solaris is so not in the game for my kind of embedded work, it's not even funny. If POSIX mandates stupid shit, and application programmers don't read that part of the manual anyway (and don't code on that assumption in practice), to hell with POSIX. On many file descriptors, short writes simply can't happen -- and code that purports to handle short writes but has never been exercised is arguably worse than code that simply bombs on short write. So if I can't shim in an induce-short-writes-randomly-on-purpose mechanism during development, I don't want short writes in production, period. In my world, GNU/Linux is not a crappy imitation Solaris that you get to pay out the wazoo for to Red Hat (and get no documentation and lousy tech support that doesn't even cover your hardware). It's a full-source-code platform on which you can engineer robust industrial and consumer products, because you can control the freeze and release schedule component-by-component, and you can point fix anything in the system at any time. If, that is, you understand that the source code is not the software, and that you can't retrofit stability and security overnight onto code that was written with no thought of anything but performance. If you asume you *may* have short writes, you have no problem. Sure -- until the one code path in a hundred that handles the "short write" case incorrectly gets traversed in
Re: [PATCH] usb-serial regression fix
On Mon, Mar 12, 2007 at 05:18:19PM -0700, Greg KH wrote: > On Mon, Mar 12, 2007 at 03:59:22PM -0700, Jim Radford wrote: > > On Mon, Mar 12, 2007 at 03:42:35PM -0700, Jim Radford wrote: > > > On Mon, Mar 12, 2007 at 01:33:31PM -0700, Greg KH wrote: > > > > On Mon, Mar 12, 2007 at 04:22:22PM -0400, Mark Lord wrote: > > > > > Oliver Neukum wrote: > > > > > > >Mark Lord wrote: > > > > > > > >Okay, from that part (above), the problem is obvious: > > > > > > > >in that the "MCT U232 converter now disconnected" > > > > > > > >appears, and then we continue to try and call the > > > > > > > >driver's method.. Oops! > > > > > >IMHO shutdown() is using serial->port[] and bombs. > > > > > >Could you reverse the order here? > > Do not NULL serial->port[i] since it is used in ->shutdown(). > > This wasn't an issue until the order or ->shutdown() and > > device_unregister was corrected. > > for (i = 0; i < serial->num_ports; ++i) > > if (serial->port[i]->dev.parent != NULL) { > > device_unregister(&serial->port[i]->dev); > > - serial->port[i] = NULL; > > } > But shouldn't you null it out somewhere? It will be an "empty" > pointer at some point in time... Not as far as I can see. The serial structure that ->port[i] is in gets kfree()ed soon after, in the same function, and nothing in between, other than ->shutdown(), uses ->port[]. I assume it was someone being overly cautious. -Jim - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: /sys/devices/system/cpu/cpuX/online are missing
Giuliano Pochini <[EMAIL PROTECTED]> writes: > I had a look at arch/powerpc/kernel/smp.c but I'm not familiar at all with > those parts of the kernel. See arch/powerpc/kernel/sysfs.c:topology_init. I don't think there is anything to do here. You probably don't have CONFIG_HOTPLUG_CPU enabled. Andreas. -- Andreas Schwab, SuSE Labs, [EMAIL PROTECTED] SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [QUICKLIST 0/6] Arch independent quicklists V1
David Miller writes: > I ported this to sparc64 as per the patch below, tested on > UP SunBlade1500 and 24 cpu Niagara T1000. Did you see any performance improvement? We used to have quicklists on ppc, but I remain to be convinced that they actually help. Also, I didn't understand why we have to do quicklists to take advantage of the fact that the pages are in a pristine state when they are freed. I thought the whole point of the slab allocator was to be able to take advantage of that... Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] usb-serial regression fix
On Mon, Mar 12, 2007 at 03:59:22PM -0700, Jim Radford wrote: > On Mon, Mar 12, 2007 at 03:42:35PM -0700, Jim Radford wrote: > > On Mon, Mar 12, 2007 at 01:33:31PM -0700, Greg KH wrote: > > > On Mon, Mar 12, 2007 at 04:22:22PM -0400, Mark Lord wrote: > > > > Oliver Neukum wrote: > > > > >>Mark Lord wrote: > > > > >>>Okay, from that part (above), the problem is obvious: > > > > >>>in that the "MCT U232 converter now disconnected" appears, > > > > >>>and then we continue to try and call the driver's method.. Oops! > > > > > > >IMHO shutdown() is using serial->port[] and bombs. > > > > >Could you reverse the order here? > > > > > > Yup. Fixed. Tested. Works. > > > > > > This patch fixes the Oops that otherwise occurs whenever > > > > a USB serial adapter is unplugged from a system, as well > > > > the Oops seen when one is in use before resume (to RAM). > > > > > Argh, no, this change was done to help the ftdi drivers out. > > > > > Look at changeset d9a7ecacac5f8274d2afce09aadcf37bdb42b93a in Linus's > > > tree from Jim Radford: > > > > > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d9a7ecacac5f8274d2afce09aadcf37bdb42b93a > > > > > > It makes this change because the usb-serial drivers need the port > > > devices when the port_remove() callbacks happen. Otherwise you get an > > > oops that way. > > > > > Jim, can you take a look at this and see if you can figure something > > > out? > > > > The problem is really the > > > >serial->port[i] = NULL; > > > > line after device_unregister() which is used to flag "fake" devices > > that don't need legacy cleanup later in the destrol_serial. That > > flagging should be done using a *real* flag, and not by overloading > > the ->port[i] pointer since we require it to be non-NULL in > > ->shutdown() in all drivers that are not converted to new > > ->port_probe()/->port_remove() framework (currently all except ftdi). > > > I'll work on a patch to do that, but for now, I think you should apply > > Mark's patch to revert the order change since the FTDI driver no > > longer requires the correct ordering of device_unregister() and > > ->shutdown(). > > Do not NULL serial->port[i] since it is used in ->shutdown(). This > wasn't an issue until the order or ->shutdown() and device_unregister > was corrected. > > Signed-Off: Jim Radford <[EMAIL PROTECTED]> > > diff --git a/drivers/usb/serial/usb-serial.c b/drivers/usb/serial/usb-serial.c > index 8511352..871c9a8 100644 > --- a/drivers/usb/serial/usb-serial.c > +++ b/drivers/usb/serial/usb-serial.c > @@ -145,7 +145,6 @@ static void destroy_serial(struct kref *kref) > for (i = 0; i < serial->num_ports; ++i) > if (serial->port[i]->dev.parent != NULL) { > device_unregister(&serial->port[i]->dev); > - serial->port[i] = NULL; > } But shouldn't you null it out somewhere? It will be an "empty" pointer at some point in time... Mark, does this solve your oops (after you revert your previous patch)? thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On 3/12/07, michael chang <[EMAIL PROTECTED]> wrote: Considering the concepts put out by projects such as BOINC and [EMAIL PROTECTED], I wouldn't be thoroughly surprised by this ideology, although I do question the particular way this test case is being run. If Con actually implements SCHED_IDLEPRIO in RSDL, life is good even in that case. This seems to me like he's saying that there has to be a mechanism (outside of nice) that can be used to treat processes that "I" want to be interactive all special-like. It feels like something that would have been said in the design of what the scheduler was in -ck and is currently in vanilla. Exactly. Driving us again toward the fact that different workloads might benefit from different schedulers (eg: RSDL is cool for server loads, previous staircase did an excellent job on desktop, etc) and thus that having a choice of schedulers might be something that would satisfy (some) people... To me, that fundamentally clashes with the design behind RSDL. That said, I could be wrong -- Con appears to have something that could be very promising up his sleeve that could come out sooner or later. Once he's written it, of course. In any case, RSDL seems very promising, for the most part. It certainly is. "Negative" feedback can be a good thing too, as it helps improving it anyway. It's nonetheless true that it's practically impossible to satisfy 100% of use case with a single design, so choices will have to be made. HTH T-Bone -- Thibaut VARENE http://www.parisc-linux.org/~varenet/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Delete superfluous source file "net/wanrouter/af_wanpipe.c".
From: "Robert P. J. Day" <[EMAIL PROTECTED]> Date: Sat, 10 Mar 2007 03:49:52 -0500 (EST) > > Delete the apparently superfluous source file > net/wanrouter/af_wanpipe.c. > > Signed-off-by: Robert P. J. Day <[EMAIL PROTECTED]> Applied, thanks Robert. This thing isn't even built in 2.4.x :-) Although there is some ancient reference to the build module in Documentation/networking/wan-router.txt, a heavily out of date document. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] xfs: stop using kmalloc in xfs_buf_get_noaddr
Hi, --On 9 March 2007 12:55:11 PM +0100 Christoph Hellwig <[EMAIL PROTECTED]> wrote: Ed Cashin found a bug in the error handling code for the case where a page allocation fails. Here's the updated version: Index: linux-2.6/fs/xfs/linux-2.6/xfs_buf.c === --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_buf.c 2007-03-08 19:08:38.0 +0100 +++ linux-2.6/fs/xfs/linux-2.6/xfs_buf.c2007-03-09 08:59:15.0 +0100 + for (i = 0; i < page_count; i++) { + bp->b_pages[i] = alloc_page(GFP_KERNEL); + if (!bp->b_pages[i]) + goto fail_free_mem; + } + bp->b_flags |= _XBF_PAGES; + + error = _xfs_buf_map_pages(bp, XBF_MAPPED); + if (unlikely(error)) { + printk(KERN_WARNING "%s: failed to map pages\n", + __FUNCTION__); goto fail_free_mem; - bp->b_flags |= _XBF_KMEM_ALLOC; + } xfs_buf_unlock(bp); XB_TRACE(bp, "no_daddr", data); return bp; + fail_free_mem: - kmem_free(data, malloc_len); + for ( ; i >= 0; i--) + __free_page(bp->b_pages[i]); fail_free_buf: xfs_buf_free(bp); fail: It looks like you might need: for (i--; i >= 0; i--) (or: for (j = 0; j < i; j++) etc.) Because if the initial alloc_page loop goes to completion then: i == pagecount and if alloc_page loop terminates early then bp->b_pages[i] == NULL So we have gone 1 too far in both cases and need to start free'ing back one. Unless I missed something. --Tim - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2
On Mon, 12 Mar 2007, Mike Galbraith wrote: On Tue, 2007-03-13 at 07:38 +1100, Con Kolivas wrote: On Tuesday 13 March 2007 07:11, Mike Galbraith wrote: Killing the known corner case starvation scenarios is wonderful, but let's not just pretend that interactive tasks don't have any special requirements. Now you're really making a stretch of things. Where on earth did I say that interactive tasks don't have special requirements? It's a fundamental feature of this scheduler that I go to great pains to get them as low latency as possible and their fair share of cpu despite having a completely fair cpu distribution. As soon as your cpu is fully utilized, fairness looses or interactivity loses. Pick one. correct. the problem is that it's hard (if not impossible) to properly identify what is needed to make a system have good interactivity. in some cases it's a matter of low latency (wake up a process as quickly as you can when whatever it was waiting on is available), but in others it's a matter of allocating the _right_ process enough CPU (X needs enough CPU to do things) where it's a matter of needing low-latency, it's possible to design a scheduler that will do things in a predictable enough way that you know the max latency you have to deal with (and the RSDL seems to do this) the problem comes when this isn't enough. if you have several CPU hogs on a system, and they are all around the same priority level, how can the scheduler know which one needs the CPU the most for good interactivity? in some cases you may be able to directly detect that your high-priority process is waiting for another one (tracing pipes and local sockets for example), but what if you are waiting for several of them? (think a multimedia desktop waiting for the sound card, CDRom, hard drive, and video all at once) which one needs the extra CPU the most? Fairness is much easier to enforce (and much easier to understand) the RSDL is concentrating on enforcing fairness, with bounded (and predictable) latencies. if you are willing to tell the system what you consider more important (and how much more important you consider it), then it's much easier to figure out who to give the CPU to. Con is just asking you to do this (and you already do, by doing a nice -5. but it sounds like you want that to mean more then it currently does) David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git patches] net driver fixes
From: Geert Uytterhoeven <[EMAIL PROTECTED]> Date: Mon, 12 Mar 2007 11:02:43 +0100 (CET) > On Tue, 6 Mar 2007, Jeff Garzik wrote: > > Jay Vosburgh (3): > > bonding: Improve IGMP join processing > > ip_mc_rejoin_group: Kill warning about unused variable `in_dev' when > CONFIG_IP_MULTICAST is not set. > > Signed-off-by: Geert Uytterhoeven <[EMAIL PROTECTED]> Applied, thanks Geert. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CONFIG_REORDER Kconfig help strange sentence.
On Tue, Mar 13, 2007 at 10:18:03AM +1100, Rusty Russell wrote: > OK, this confused me: > > Function reordering (REORDER) [N/y/?] (NEW) ? > > This option enables the toolchain to reorder functions for a more > optimal TLB usage. If you have pretty much any version of binutils, > this can increase your kernel build time by roughly one minute. > > "If you have pretty much any version of binutils"? Huh? > > You mean "This will slow your kernel build by about a minute"? Yes. Lots of sections seem to trigger some quadratic behaviour in ld. It might be fixed in some unreleased CVS version though (not 100% sure) -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 4/7] RSS accounting hooks over the code
On Mon, Mar 12, 2007 at 09:50:08AM -0700, Dave Hansen wrote: > On Mon, 2007-03-12 at 19:23 +0300, Kirill Korotaev wrote: > > > > For these you essentially need per-container page->_mapcount counter, > > otherwise you can't detect whether rss group still has the page > > in question being mapped in its processes' address spaces or not. > What do you mean by this? You can always tell whether a process has a > particular page mapped. Could you explain the issue a bit more. I'm > not sure I get it. OpenVZ wants to account _shared_ pages in a guest different than separate pages, so that the RSS accounted values reflect the actual used RAM instead of the sum of all processes RSS' pages, which for sure is more relevant to the administrator, but IMHO not so terribly important to justify memory consuming structures and sacrifice performance to get it right YMMV, but maybe we can find a smart solution to the issue too :) best, Herbert > -- Dave > > ___ > Containers mailing list > [EMAIL PROTECTED] > https://lists.osdl.org/mailman/listinfo/containers - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] mm: Inconsistent use of node IDs
Andi Kleen wrote: On Monday 12 March 2007 23:51, Ethan Solomita wrote: This patch corrects inconsistent use of node numbers (variously "nid" or "node") in the presence of fake NUMA. I think it's very consistent -- your patch would make it inconsistent though. It's consistent to call node_online() with a physical node ID when the online node mask is composed of fake nodes? Sorry, but when you ask for NUMA emulation you will get it. I don't see any point in a "half way only for some subsystems I like" NUMA emulation. It's unlikely that your ideas of where it is useful and where is not matches other NUMA emulation user's ideas too. I don't understand your comments. My code is intended to work for all systems. If the system is non-NUMA by nature, then all CPUs map to fake node 0. As an example, on a two chip dual-core AMD opteron system, there are 4 "cpus" where CPUs 0 and 1 are close to the first half of memory, and CPUs 2 and 3 are close to the second half. Without this change CPUs 2 and 3 are mapped to fake node 1. This results in awful performance. With this change, CPUs 2 and 3 are mapped to (roughly) 1/2 the fake node count. Their zonelists[] are ordered to do allocations preferentially from zones that are local to CPUs 2 and 3. Can you tell me the scenario where my code makes things worse? Besides adding such a secondary node space would be likely a huge long term mainteance issue. I just can it see breaking with every non trivial change. I'm adding no data structures to do this. The current code already has get_phys_node. My changes use the existing information about node layout, both the physical and fake, and defines a mapping. The current mapping just takes a physical node and says "it's the fake node too". NACK. I wish you would include some specifics as to why you think what you do. You're suggesting we leave in place a system that destroys NUMA locality when using fake numa, and passes around physical node ids as an index into nodes[] whihc is indexed by fake nodes. My change has no effect without fake numa, and harms no one with fake numa. -- Ethan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: irda rmmod lockdep trace.
From: Samuel Ortiz <[EMAIL PROTECTED]> Date: Mon, 12 Mar 2007 02:38:43 +0200 > On Sat, Mar 10, 2007 at 07:43:26PM +0200, Samuel Ortiz wrote: > > Hi Dave, > > > > On Thu, Mar 08, 2007 at 05:54:36PM -0500, Dave Jones wrote: > > > modprobe irda ; rmmod irda in 2.6.21rc3 gets me the spew below.. > > Well it seems that we call __irias_delete_object() from hashbin_delete(). > > Then > > __irias_delete_object() calls itself hashbin_delete() again. We're trying to > > get the lock recursively. > Looking at the code more carefully, this seems to be a false positive: > iriap_cleanup and and __irias_delete_object are taking 2 different locks from > 2 different hashbin instances. The locks belong to the same lock class but > they are hierarchically different. We need to tell the validator about it and > the following patch does that. Comments are welcomed as I'm planning to push > it to netdev soon: I would strongly caution against adding any run-time overhead just to cure a false lockdep warning. Even adding a new function argument is too much IMHO. Make the cost show up for lockdep only, perhaps by putting each hashbin lock into a seperate locking class? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Asus P5B-VM motherboard: cd drive malfunctions if internal nic in use.
It is a pata cd drive, attached to the JMicron controller. I'll look into whether the usb ports power off on shutdown. Thanks, Phil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/