Re: [PATCH -mm] x86_64 UP needs smp_call_function_single
On Thu, 30 Nov 2006 08:00:00 +0100 Ingo Molnar <[EMAIL PROTECTED]> wrote: > On Wed, 2006-11-29 at 17:45 -0800, Andrew Morton wrote: > > No, I think this patch is right - the declaration of the CONFIG_SMP > > smp_call_function_single() is in linux/smp.h so the !CONFIG_SMP > > declaration > > or definition should be there too. > > > > It's still buggy though. It should disable local interrupts around > > the > > call to match the SMP version. I'll fix that separately. > > hm, didnt i send an updated patch for that already? See the patch below, > from many days ago. I sent it after the tsc-sync-rewrite patch. > Might have got lost. > ---> > Subject: x86_64: build fixes > From: Ingo Molnar <[EMAIL PROTECTED]> > > x86_64 does not build cleanly on UP: > > arch/x86_64/kernel/vsyscall.c: In function 'cpu_vsyscall_notifier': > arch/x86_64/kernel/vsyscall.c:282: warning: implicit declaration of > function 'smp_call_function_single' > arch/x86_64/kernel/vsyscall.c: At top level: > arch/x86_64/kernel/vsyscall.c:279: warning: 'cpu_vsyscall_notifier' > defined but not used > > this patch fixes it by making smp_call_function_single() globally > available. > > Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> > --- > include/asm-x86_64/smp.h | 11 ++- > include/linux/smp.h | 10 +++--- > kernel/sched.c | 19 +++ > 3 files changed, 28 insertions(+), 12 deletions(-) > > Index: linux/include/asm-x86_64/smp.h > === > --- linux.orig/include/asm-x86_64/smp.h > +++ linux/include/asm-x86_64/smp.h > @@ -115,16 +115,9 @@ static __inline int logical_smp_processo > } > > #ifdef CONFIG_SMP > -#define cpu_physical_id(cpu) x86_cpu_to_apicid[cpu] > +# define cpu_physical_id(cpu)x86_cpu_to_apicid[cpu] > #else > -#define cpu_physical_id(cpu) boot_cpu_id > -static inline int smp_call_function_single(int cpuid, void (*func) > (void *info), congratulations-your-first-wordwrapped-patch ;) > --- linux.orig/kernel/sched.c > +++ linux/kernel/sched.c > @@ -1110,6 +1110,25 @@ repeat: > task_rq_unlock(rq, ); > } > > +#ifndef CONFIG_SMP > +/* > + * Call a function on a specific CPU (on UP the function gets executed > + * on the current CPU, immediately): > + */ > +int smp_call_function_single(int cpuid, void (*func) (void *info), void > *info, > + int retry, int wait) > +{ > + unsigned long flags; > + > + local_irq_save(flags); > + func(info); > + local_irq_restore(flags); > + > + return 0; > +} yes, but a) calling the SMP version with local interrupts disabled is a bug, so we can use bare local_irq_disable() here and b) only two archictures call or use this function, so all the others don't want a copy of it. So I did: --- a/include/linux/smp.h~up-smp_call_function_single-should-disable-interrupts +++ a/include/linux/smp.h @@ -15,6 +15,7 @@ extern void cpu_idle(void); #include #include #include +#include #include /* @@ -102,8 +103,9 @@ static inline void smp_send_reschedule(i static inline int smp_call_function_single(int cpuid, void (*func) (void *info), void *info, int retry, int wait) { - /* Disable interrupts here? */ + local_irq_disable();/* Match the SMP call environment */ func(info); + local_irq_enable(); return 0; } _ which is somewhat unpleasant. I added a WARN_ON(irqs_disabled()) to the out-of-line SMP version. btw, does anyone know why the SMP versions of this function use spin_lock_bh(_lock)? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH] Mount problem with the GFS2 code
Hi all While mounting the gfs2 filesystem,our test team had a problem and we got this error message. === GFS2: fsid=: Trying to join cluster "lock_nolock", "dasde1" GFS2: fsid=dasde1.0: Joined cluster. Now mounting FS... GFS2: not a GFS2 filesystem GFS2: fsid=dasde1.0: can't read superblock: -22 == On debugging further we found that problem is while reading the super block(gfs2_read_super) and comparing the magic number in it. When I replace the submit_bio() call(present in gfs2_read_super) with the sb_getblk() and ll_rw_block(), mount operation succeded. On further analysis we found that before calling submit_bio(), bio->bi_sector was set to "sector" variable. This "sector" variable has the same value of bh->b_blocknr(block number). Hence there is a need to multiply this valuwith (blocksize >> 9)(9 because,sector size 2^9,samething happens in ll_rw_block also, before calling submit_bio()). So I have developed the patch which solves this problem. Please let me know your comments. Signed-off-by: Srinivasa DS <[EMAIL PROTECTED]> super.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6.19-rc6/fs/gfs2/super.c === --- linux-2.6.19-rc6.orig/fs/gfs2/super.c +++ linux-2.6.19-rc6/fs/gfs2/super.c @@ -199,7 +199,7 @@ struct page *gfs2_read_super(struct supe return NULL; } - bio->bi_sector = sector; + bio->bi_sector = sector * (sb->s_blocksize >> 9); bio->bi_bdev = sb->s_bdev; bio_add_page(bio, page, PAGE_SIZE, 0);
Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP
* David Miller <[EMAIL PROTECTED]> wrote: > > furthermore, the tweak allows the shifting of processing from a > > prioritized process context into a highest-priority softirq context. > > (it's not proven that there is any significant /net win/ of > > performance: all that was proven is that if we shift TCP processing > > from process context into softirq context then TCP throughput of > > that otherwise penalized process context increases.) > > If we preempt with any packets in the backlog, we send no ACKs and the > sender cannot send thus the pipe empties. That's the problem, this > has nothing to do with scheduler priorities or stuff like that IMHO. > The argument goes that if the reschedule is delayed long enough, the > ACKs will exceed the round trip time and trigger retransmits which > will absolutely kill performance. yes, but i disagree a bit about the characterisation of the problem. The question in my opinion is: how is TCP processing prioritized for this particular socket, which is attached to the process context which was preempted. normally, normally quite a bit of TCP processing happens in a softirq context (in fact most of it happens there), and softirq contexts have no fairness whatsoever - they preempt whatever processing is going on, regardless of any priority preferences of the user! what was observed here were the effects of completely throttling TCP processing for a given socket. I think such throttling can in fact be desirable: there is a /reason/ why the process context was preempted: in that load scenario there was 10 times more processing requested from the CPU than it can possibly service. It's a serious overload situation and it's the scheduler's task to prioritize between workloads! normally such kind of "throttling" of the TCP stack for this particular socket does not happen. Note that there's no performance lost: we dont do TCP processing because there are /9 other tasks for this CPU to run/, and the scheduler has a tough choice. Now i agree that there are more intelligent ways to throttle and less intelligent ways to throttle, but the notion to allow a given workload 'steal' CPU time from other workloads by allowing it to push its processing into a softirq is i think unfair. (and this issue is partially addressed by my softirq threading patches in -rt :-) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/7] Generic Process Containers (+ ResGroups/BeanCounters)
I got a chance to build and test this patch set, to see if it behaved like I expected cpusets to behave, on an ia64 SN2 Altix system. Two details - otherwise looked good. I continue to like this approach. The two details are (1) /proc//cpuset not configured by default if CPUSETS configured, and (2) a locking bug wedging tasks trying to rmdir a cpuset off the notify_on_release hook. 1) I had to enable CONFIG_PROC_PID_CPUSET. I used the following one line change to do this. I am willing to consider, in due time, phasing out such legacy cpuset support. But so long as it is small stuff that is not getting in anyone's way, I think we should take our sweet time about doing so -- as in a year or two after marking it deprecated or some such. No sense deciding that matter now; keep the current cpuset API working throughout any transitition to container based cpusets, then revisit the question of whether to deprecate and eventually remove these kernel API details, later on, after the major reconstruction dust settles. In general, we try to avoid removing kernel API's, especially if they are happily being used and working and not causing anyone grief. begin --- 2.6.19-rc5.orig/init/Kconfig2006-11-29 21:14:48.071114833 -0800 +++ 2.6.19-rc5/init/Kconfig 2006-11-29 22:19:02.015166048 -0800 @@ -268,6 +268,7 @@ config CPUSETS config PROC_PID_CPUSET bool "Include legacy /proc//cpuset file" depends on CPUSETS + default y if CPUSETS config CONTAINER_CPUACCT bool "Simple CPU accounting container subsystem" = end = 2) I wedged the kernel on the container_lock, doing a removal of a cpuset using notify_on_release. Right now, that test system has the following two tasks, wedged: begin F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 0 S root 4992 34 0 71 -5 - 380 wait 22:51 ? 00:00:00 /bin/sh /sbin/cpuset_release_agent /cpuset_test_tree 0 D root 4994 4992 0 72 -5 - 200 contai 22:51 ? 00:00:00 rmdir /dev/cpuset//cpuset_test_tree = end = I had a cpuset called /cpuset_test_tree, and some sub-cpusets below it. I marked it 'notify_on_release' and then removed all tasks from it, and then removed the child cpusets that it had. Removing that last child cpuset presumably triggered the above callout to /sbin/cpuset_release_agent, which called rmdir. That wait address (from /proc/4994/stat) in hex is a001000f1060, and my System.map has the two lines: a001000f1040 T container_lock a001000f1360 T container_manage_unlock So it is wedged in container_lock. I have subsequently also wedged an 'ls' command trying to scan this /dev/cpuset directory, waiting in the kernel routine vfs_readdir (not surprising, given that I'm in the middle of doing a rmdir on that directory.) If you don't immediately see the problem, I can go back and get a kernel stack trace or whatever else you need. This lockup occurred the first, and thus far only, time that I tried to use notify_on_release to rmdir a cpuset. So I presume it is an easy failure for me to reproduce. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch/rfc 2.6.19-rc5] arch-neutral GPIO calls
On 11/30/06, pHilipp Zabel <[EMAIL PROTECTED]> wrote: > Effectively, yes. I counted quite a few implementations in the current > tree which can trivially (#defines) map to that API. Or so I thought, sorry. regards Philipp Index: linux-2.6/include/asm-arm/arch-pxa/gpio.h === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6/include/asm-arm/arch-pxa/gpio.h 2006-11-30 07:39:59.0 +0100 @@ -0,0 +1,65 @@ +/* + * linux/include/asm-arm/arch-pxa/gpio.h + * + * PXA GPIO wrappers for arch-neutral GPIO calls + * + * Written by Philipp Zabel <[EMAIL PROTECTED]> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + */ + +#ifndef __ASM_ARCH_PXA_GPIO_H +#define __ASM_ARCH_PXA_GPIO_H + +#include +#include +#include + +#include + +static inline int gpio_request(unsigned gpio, const char *label) +{ + return 0; +} + +static inline void gpio_free(unsigned gpio) +{ + return; +} + +static inline int gpio_direction_input(unsigned gpio) +{ + if (gpio > PXA_LAST_GPIO) + return -EINVAL; + pxa_gpio_mode(gpio | GPIO_IN); +} + +static inline int gpio_direction_output(unsigned gpio) +{ + if (gpio > PXA_LAST_GPIO) + return -EINVAL; + pxa_gpio_mode(gpio | GPIO_OUT); +} + +#define gpio_get_value(gpio) (GPLR(gpio) & GPIO_bit(gpio)) +#define gpio_set_value(gpio,value) \ + ((value)? (GPSR(gpio) = GPIO_bit(gpio)):(GPCR(gpio) = GPIO_bit(gpio))) + +#define gpio_to_irq(gpio) IRQ_GPIO(gpio) +#define irq_to_gpio(irq) IRQ_TO_GPIO(irq) + + +#endif
[patch 0/3] more buffered write fixes
Sorry, I should give some background. The following patches attempt to fix the problems people have identified with buffered write deadlock patches. Against 2.6.19 + the previous patchset dropped from -mm. Comments? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 3/3] fs: fix cont vs deadlock patches
Rework the cont filesystem helpers so that generic_cont_expand does the actual work of expanding the file. cont_prepare_write then calls this routine if expanding is needed, and retries. Also solves the problem where cont_prepare_write would previously hold the target page locked while doing not-very-nice things like locking other pages. Means that zero-length prepare/commit_write pairs are no longer needed as an overloaded directive to extend the file, thus cont should operate better within the new deadlock-free buffered write code. Converts fat over to the new cont scheme. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/fs/buffer.c === --- linux-2.6.orig/fs/buffer.c +++ linux-2.6/fs/buffer.c @@ -2004,19 +2004,20 @@ int block_read_full_page(struct page *pa return 0; } -/* utility function for filesystems that need to do work on expanding - * truncates. Uses prepare/commit_write to allow the filesystem to - * deal with the hole. +/* + * Utility function for filesystems that need to do work on expanding + * truncates. For moronic filesystems that do not allow holes in file. */ -static int __generic_cont_expand(struct inode *inode, loff_t size, -pgoff_t index, unsigned int offset) +int generic_cont_expand(struct inode *inode, loff_t size, loff_t *bytes, + get_block_t *get_block) { struct address_space *mapping = inode->i_mapping; + unsigned long blocksize = 1 << inode->i_blkbits; struct page *page; unsigned long limit; - int err; + int status; - err = -EFBIG; + status = -EFBIG; limit = current->signal->rlim[RLIMIT_FSIZE].rlim_cur; if (limit != RLIM_INFINITY && size > (loff_t)limit) { send_sig(SIGXFSZ, current, 0); @@ -2025,146 +2026,83 @@ static int __generic_cont_expand(struct if (size > inode->i_sb->s_maxbytes) goto out; - err = -ENOMEM; - page = grab_cache_page(mapping, index); - if (!page) - goto out; - err = mapping->a_ops->prepare_write(NULL, page, offset, offset); - if (err) { - /* -* ->prepare_write() may have instantiated a few blocks -* outside i_size. Trim these off again. -*/ - unlock_page(page); - page_cache_release(page); - vmtruncate(inode, inode->i_size); - goto out; - } + status = 0; - err = mapping->a_ops->commit_write(NULL, page, offset, offset); + while (*bytes < size) { + unsigned int zerofrom; + unsigned int zeroto; + void *kaddr; + pgoff_t pgpos; + + pgpos = *bytes >> PAGE_CACHE_SHIFT; + page = grab_cache_page(mapping, pgpos); + if (!page) { + status = -ENOMEM; + break; + } + /* we might sleep */ + if (*bytes >> PAGE_CACHE_SHIFT != pgpos) + goto unlock; - unlock_page(page); - page_cache_release(page); - if (err > 0) - err = 0; -out: - return err; -} + zerofrom = *bytes & ~PAGE_CACHE_MASK; + if (zerofrom & (blocksize-1)) + *bytes = (*bytes + blocksize-1) & (blocksize-1); -int generic_cont_expand(struct inode *inode, loff_t size) -{ - pgoff_t index; - unsigned int offset; + zeroto = PAGE_CACHE_SIZE; - offset = (size & (PAGE_CACHE_SIZE - 1)); /* Within page */ + status = __block_prepare_write(inode, page, zerofrom, + zeroto, get_block); + if (status) + goto unlock; + kaddr = kmap_atomic(page, KM_USER0); + memset(kaddr+zerofrom, 0, PAGE_CACHE_SIZE-zerofrom); + flush_dcache_page(page); + kunmap_atomic(kaddr, KM_USER0); + status = __block_commit_write(inode, page, zerofrom, zeroto); - /* ugh. in prepare/commit_write, if from==to==start of block, we - ** skip the prepare. make sure we never send an offset for the start - ** of a block - */ - if ((offset & (inode->i_sb->s_blocksize - 1)) == 0) { - /* caller must handle this extra byte. */ - offset++; +unlock: + unlock_page(page); + page_cache_release(page); + if (status) { + BUG_ON(status == AOP_TRUNCATED_PAGE); + break; + } } - index = size >> PAGE_CACHE_SHIFT; - return __generic_cont_expand(inode, size, index, offset); -} - -int generic_cont_expand_simple(struct inode *inode, loff_t size) -{ -
[patch 1/3] mm: pagecache write deadlocks zerolength fix
writev with a zero-length segment is a noop, and we shouldn't return EFAULT. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/include/linux/pagemap.h === --- linux-2.6.orig/include/linux/pagemap.h +++ linux-2.6/include/linux/pagemap.h @@ -198,6 +198,9 @@ static inline int fault_in_pages_writeab { int ret; + if (unlikely(size == 0)) + return 0; + /* * Writing zeroes into userspace here is OK, because we know that if * the zero gets there, we'll be overwriting it. @@ -222,6 +225,9 @@ static inline int fault_in_pages_readabl volatile char c; int ret; + if (unlikely(size == 0)) + return 0; + ret = __get_user(c, uaddr); if (ret == 0) { const char __user *end = uaddr + size - 1; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -rt] 2.6.19-4c6-rt9 build problem
* Paul E. McKenney <[EMAIL PROTECTED]> wrote: > > > thanks, applied. Have you tried to boot the resulting kernel as > > > well? > > And with these two changes, it does boot! great! Applied both of them. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 2/3] mm: pagecache write deadlocks stale holes fix
If the data copy within a prepare_write can potentially allocate blocks to fill holes, so if the page copy fails then new blocks must be zeroed so uninitialised data cannot be exposed with a subsequent read. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/mm/filemap.c === --- linux-2.6.orig/mm/filemap.c +++ linux-2.6/mm/filemap.c @@ -1951,7 +1951,14 @@ retry_noprogress: bytes); dec_preempt_count(); - if (!PageUptodate(page)) { + if (unlikely(copied != bytes)) { + /* +* Must zero out new buffers here so that we do end +* up properly filling holes rather than leaving stale +* data in them that might be read in future. +*/ + page_zero_new_buffers(page); + /* * If the page is not uptodate, we cannot allow a * partial commit_write because when we unlock the @@ -1965,10 +1972,10 @@ retry_noprogress: * Abort the operation entirely with a zero length * commit_write. Retry. We will enter the * single-segment path below, which should get the -* filesystem to bring the page uputodate for us next +* filesystem to bring the page uptodate for us next * time. */ - if (unlikely(copied != bytes)) + if (!PageUptodate(page)) copied = 0; } Index: linux-2.6/fs/buffer.c === --- linux-2.6.orig/fs/buffer.c +++ linux-2.6/fs/buffer.c @@ -1491,6 +1491,39 @@ out: } EXPORT_SYMBOL(block_invalidatepage); +void page_zero_new_buffers(struct page *page) +{ + unsigned int block_start, block_end; + struct buffer_head *head, *bh; + + BUG_ON(!PageLocked(page)); + if (!page_has_buffers(page)) + return; + + bh = head = page_buffers(page); + block_start = 0; + do { + block_end = block_start + bh->b_size; + + if (buffer_new(bh)) { + void *kaddr; + + if (!PageUptodate(page)) { + kaddr = kmap_atomic(page, KM_USER0); + memset(kaddr+block_start, 0, bh->b_size); + flush_dcache_page(page); + kunmap_atomic(kaddr, KM_USER0); + } + clear_buffer_new(bh); + set_buffer_uptodate(bh); + mark_buffer_dirty(bh); + } + + block_start = block_end; + bh = bh->b_this_page; + } while (bh != head); +} + /* * We attach and possibly dirty the buffers atomically wrt * __set_page_dirty_buffers() via private_lock. try_to_free_buffers @@ -1784,36 +1817,33 @@ static int __block_prepare_write(struct } continue; } - if (buffer_new(bh)) - clear_buffer_new(bh); if (!buffer_mapped(bh)) { WARN_ON(bh->b_size != blocksize); err = get_block(inode, block, bh, 1); if (err) break; - if (buffer_new(bh)) { - unmap_underlying_metadata(bh->b_bdev, - bh->b_blocknr); - if (PageUptodate(page)) { - set_buffer_uptodate(bh); - continue; - } - if (block_end > to || block_start < from) { - void *kaddr; - - kaddr = kmap_atomic(page, KM_USER0); - if (block_end > to) - memset(kaddr+to, 0, - block_end-to); - if (block_start < from) - memset(kaddr+block_start, - 0, from-block_start); - flush_dcache_page(page); - kunmap_atomic(kaddr, KM_USER0); - } + } + if (buffer_new(bh)) { + unmap_underlying_metadata(bh->b_bdev,
redboot partition combind fis / config problem
Can't analyze FIS directory in CYGSEM_REDBOOT_FLASH_COMBINED_FIS_AND_CONFIG really. Signed-off-by: Yoshinori Sato <[EMAIL PROTECTED]> diff --git a/drivers/mtd/redboot.c b/drivers/mtd/redboot.c index 5b58523..0204cb9 100644 --- a/drivers/mtd/redboot.c +++ b/drivers/mtd/redboot.c @@ -110,6 +110,9 @@ #endif } } break; + } else { + /* re-calculate of real numslots */ + numslots = buf[i].size / sizeof(struct fis_image_desc); } } if (i == numslots) { -- Yoshinori Sato <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP
From: Ingo Molnar <[EMAIL PROTECTED]> Date: Thu, 30 Nov 2006 07:47:58 +0100 > furthermore, the tweak allows the shifting of processing from a > prioritized process context into a highest-priority softirq context. > (it's not proven that there is any significant /net win/ of performance: > all that was proven is that if we shift TCP processing from process > context into softirq context then TCP throughput of that otherwise > penalized process context increases.) If we preempt with any packets in the backlog, we send no ACKs and the sender cannot send thus the pipe empties. That's the problem, this has nothing to do with scheduler priorities or stuff like that IMHO. The argument goes that if the reschedule is delayed long enough, the ACKs will exceed the round trip time and trigger retransmits which will absolutely kill performance. The only reason we block input packet processing while we hold this lock is because we don't want the receive queue changing from underneath us while we're copying data to userspace. Furthermore once you preempt in this particular way, no input packet processing occurs in that socket still, exacerbating the situation. Anyways, even if we somehow unlocked the socket and ran the backlog at preemption points, by hand, since we've thus deferred the whole work of processing whatever is in the backlog until the preemption point, we've lost our quantum already, so it's perhaps not legal to do the deferred processing as the preemption signalling point from a fairness perspective. It would be different if we really did the packet processing at the original moment (where we had to queue to the socket backlog because it was locked, in softirq) because then we'd return from the softirq and hit the preemption point earlier or whatever. Therefore, perhaps the best would be to see if there is a way we can still allow input packet processing even while running the majority of TCP's recvmsg(). It won't be easy :) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] rtc: ds1743 support
Hello, The real time clocks ds1742 and ds1743 differs only in the size of the nvram. This patch changes the existing ds1742 driver to support also ds1743. The main change is that the nvram size is determined from the resource attached to the device. This patch applies to and have been tested with 2.6.19-rc5 and 2.6.19-rc6. The patch have benefitted from suggestions from Atsushi Nemeto, who is the author of the ds1742 driver. Please cc: me on any comments Regards, Signed-off-by: Torsten Rasmussen --- diff -uprN -X linux-2.6.19-rc5-vanilla/Documentation/dontdiff linux-2.6.19-rc5-vanilla/drivers/rtc/Kconfig linux-2.6.19-rc5/drivers/rtc/Kconfig --- linux-2.6.19-rc5-vanilla/drivers/rtc/Kconfig2006-11-08 03:24:20.0 +0100 +++ linux-2.6.19-rc5/drivers/rtc/Kconfig2006-11-23 11:07:20.157388499 +0100 @@ -154,11 +154,11 @@ config RTC_DRV_DS1672 will be called rtc-ds1672. config RTC_DRV_DS1742 - tristate "Dallas DS1742" + tristate "Dallas DS1742/1743" depends on RTC_CLASS help If you say yes here you get support for the - Dallas DS1742 timekeeping chip. + Dallas DS1742/1743 timekeeping chip. This driver can also be built as a module. If so, the module will be called rtc-ds1742. diff -uprN -X linux-2.6.19-rc5-vanilla/Documentation/dontdiff linux-2.6.19-rc5-vanilla/drivers/rtc/rtc-ds1742.c linux-2.6.19-rc5/drivers/rtc/rtc-ds1742.c --- linux-2.6.19-rc5-vanilla/drivers/rtc/rtc-ds1742.c 2006-11-08 03:24:20.0 +0100 +++ linux-2.6.19-rc5/drivers/rtc/rtc-ds1742.c 2006-11-23 11:04:19.977903832 +0100 @@ -6,6 +6,10 @@ * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. + * + * Copyright (C) 2006 Torsten Ertbjerg Rasmussen <[EMAIL PROTECTED]> + * - nvram size determined from resource + * - this ds1742 driver now supports ds1743. */ #include @@ -17,20 +21,19 @@ #include #include -#define DRV_VERSION "0.2" +#define DRV_VERSION "0.3" -#define RTC_REG_SIZE 0x800 -#define RTC_OFFSET 0x7f8 +#define RTC_SIZE 8 -#define RTC_CONTROL(RTC_OFFSET + 0) -#define RTC_CENTURY(RTC_OFFSET + 0) -#define RTC_SECONDS(RTC_OFFSET + 1) -#define RTC_MINUTES(RTC_OFFSET + 2) -#define RTC_HOURS (RTC_OFFSET + 3) -#define RTC_DAY(RTC_OFFSET + 4) -#define RTC_DATE (RTC_OFFSET + 5) -#define RTC_MONTH (RTC_OFFSET + 6) -#define RTC_YEAR (RTC_OFFSET + 7) +#define RTC_CONTROL0 +#define RTC_CENTURY0 +#define RTC_SECONDS1 +#define RTC_MINUTES2 +#define RTC_HOURS 3 +#define RTC_DAY4 +#define RTC_DATE 5 +#define RTC_MONTH 6 +#define RTC_YEAR 7 #define RTC_CENTURY_MASK 0x3f #define RTC_SECONDS_MASK 0x7f @@ -48,7 +51,10 @@ struct rtc_plat_data { struct rtc_device *rtc; - void __iomem *ioaddr; + void __iomem *ioaddr_nvram; + void __iomem *ioaddr_rtc; + size_t size_nvram; + size_t size; unsigned long baseaddr; unsigned long last_jiffies; }; @@ -57,7 +63,7 @@ static int ds1742_rtc_set_time(struct de { struct platform_device *pdev = to_platform_device(dev); struct rtc_plat_data *pdata = platform_get_drvdata(pdev); - void __iomem *ioaddr = pdata->ioaddr; + void __iomem *ioaddr = pdata->ioaddr_rtc; u8 century; century = BIN2BCD((tm->tm_year + 1900) / 100); @@ -82,7 +88,7 @@ static int ds1742_rtc_read_time(struct d { struct platform_device *pdev = to_platform_device(dev); struct rtc_plat_data *pdata = platform_get_drvdata(pdev); - void __iomem *ioaddr = pdata->ioaddr; + void __iomem *ioaddr = pdata->ioaddr_rtc; unsigned int year, month, day, hour, minute, second, week; unsigned int century; @@ -127,10 +133,10 @@ static ssize_t ds1742_nvram_read(struct struct platform_device *pdev = to_platform_device(container_of(kobj, struct device, kobj)); struct rtc_plat_data *pdata = platform_get_drvdata(pdev); - void __iomem *ioaddr = pdata->ioaddr; + void __iomem *ioaddr = pdata->ioaddr_nvram; ssize_t count; - for (count = 0; size > 0 && pos < RTC_OFFSET; count++, size--) + for (count = 0; size > 0 && pos < pdata->size_nvram; count++, size--) *buf++ = readb(ioaddr + pos++); return count; } @@ -141,10 +147,10 @@ static ssize_t ds1742_nvram_write(struct struct platform_device *pdev = to_platform_device(container_of(kobj, struct device, kobj)); struct rtc_plat_data *pdata = platform_get_drvdata(pdev); - void __iomem
Re: [RFC: 2.6 patch] remove the broken MTD_PCMCIA driver
On Tue, Nov 28, 2006 at 10:16:27PM +, David Woodhouse wrote: > On Sat, 2006-11-18 at 22:40 +0100, Adrian Bunk wrote: > > The MTD_PCMCIA driver has: > > - already been marked as BROKEN in 2.6.0 three years ago and > > - is still marked as BROKEN. > > > > Drivers that had been marked as BROKEN for such a long time seem to be > > unlikely to be revived in the forseeable future. > > Actually, there's hardware currently on its way to me, and I plan to fix > this driver fairly soon. OK. > > But if anyone wants to ever revive this driver, the code is still > > present in the older kernel releases. > > I'm unconvinced by that argument in the general case. People don't go > looking back through git history, do they? Drivers such as this don't > really do any harm as they are, and they're _much_ easier to find when > someone does want to fix them up. If there is an already merged driver that is marked as broken for a long time, there are usually two possible cases: - it is really unused - patches to fix it are pending or floating around A patch to remove a driver is usually the best way for getting the information which case a driver belongs into (a good example might be the zr36120 driver that seems to have found a new maintainer due to my removal patch). And if there's no reaction, the usefullness of very outdated and usually non-compiling code is quite questionable. > dwmw2 cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm] x86_64 UP needs smp_call_function_single
On Wed, 2006-11-29 at 17:45 -0800, Andrew Morton wrote: > No, I think this patch is right - the declaration of the CONFIG_SMP > smp_call_function_single() is in linux/smp.h so the !CONFIG_SMP > declaration > or definition should be there too. > > It's still buggy though. It should disable local interrupts around > the > call to match the SMP version. I'll fix that separately. hm, didnt i send an updated patch for that already? See the patch below, from many days ago. I sent it after the tsc-sync-rewrite patch. Ingo ---> Subject: x86_64: build fixes From: Ingo Molnar <[EMAIL PROTECTED]> x86_64 does not build cleanly on UP: arch/x86_64/kernel/vsyscall.c: In function 'cpu_vsyscall_notifier': arch/x86_64/kernel/vsyscall.c:282: warning: implicit declaration of function 'smp_call_function_single' arch/x86_64/kernel/vsyscall.c: At top level: arch/x86_64/kernel/vsyscall.c:279: warning: 'cpu_vsyscall_notifier' defined but not used this patch fixes it by making smp_call_function_single() globally available. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- include/asm-x86_64/smp.h | 11 ++- include/linux/smp.h | 10 +++--- kernel/sched.c | 19 +++ 3 files changed, 28 insertions(+), 12 deletions(-) Index: linux/include/asm-x86_64/smp.h === --- linux.orig/include/asm-x86_64/smp.h +++ linux/include/asm-x86_64/smp.h @@ -115,16 +115,9 @@ static __inline int logical_smp_processo } #ifdef CONFIG_SMP -#define cpu_physical_id(cpu) x86_cpu_to_apicid[cpu] +# define cpu_physical_id(cpu) x86_cpu_to_apicid[cpu] #else -#define cpu_physical_id(cpu) boot_cpu_id -static inline int smp_call_function_single(int cpuid, void (*func) (void *info), - void *info, int retry, int wait) -{ - /* Disable interrupts here? */ - func(info); - return 0; -} +# define cpu_physical_id(cpu) boot_cpu_id #endif /* !CONFIG_SMP */ #endif Index: linux/include/linux/smp.h === --- linux.orig/include/linux/smp.h +++ linux/include/linux/smp.h @@ -53,9 +53,6 @@ extern void smp_cpus_done(unsigned int m */ int smp_call_function(void(*func)(void *info), void *info, int retry, int wait); -int smp_call_function_single(int cpuid, void (*func) (void *info), void *info, - int retry, int wait); - /* * Call a function on all processors */ @@ -103,6 +100,13 @@ static inline void smp_send_reschedule(i #endif /* !SMP */ /* + * Call a function on a specific CPU (on UP the function gets executed + * on the current CPU, immediately): + */ +int smp_call_function_single(int cpuid, void (*func) (void *info), void *info, +int retry, int wait); + +/* * smp_processor_id(): get the current CPU ID. * * if DEBUG_PREEMPT is enabled the we check whether it is Index: linux/kernel/sched.c === --- linux.orig/kernel/sched.c +++ linux/kernel/sched.c @@ -1110,6 +1110,25 @@ repeat: task_rq_unlock(rq, ); } +#ifndef CONFIG_SMP +/* + * Call a function on a specific CPU (on UP the function gets executed + * on the current CPU, immediately): + */ +int smp_call_function_single(int cpuid, void (*func) (void *info), void *info, +int retry, int wait) +{ + unsigned long flags; + + local_irq_save(flags); + func(info); + local_irq_restore(flags); + + return 0; +} +EXPORT_SYMBOL(smp_call_function_single); +#endif + /*** * kick_process - kick a running thread to enter/exit the kernel * @p: the to-be-kicked thread - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP
* David Miller <[EMAIL PROTECTED]> wrote: > This is why my suggestion is to preempt_disable() as soon as we grab > the socket lock, [...] independently of the issue at hand, in general the explicit use of preempt_disable() in non-infrastructure code is quite a heavy tool. Its effects are heavy and global: it disables /all/ preemption (even on PREEMPT_RT). Furthermore, when preempt_disable() is used for per-CPU data structures then [unlike for example to a spin-lock] the connection between the 'data' and the 'lock' is not explicit - causing all kinds of grief when trying to convert such code to a different preemption model. (such as PREEMPT_RT :-) So my plan is to remove all "open-coded" use of preempt_disable() [and raw use of local_irq_save/restore] from the kernel and replace it with some facility that connects data and lock. (Note that this will not result in any actual changes on the instruction level because internally every such facility still maps to preempt_disable() on non-PREEMPT_RT kernels, so on non-PREEMPT_RT kernels such code will still be the same as before.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch/rfc 2.6.19-rc5] arch-neutral GPIO calls
Hi, On 11/23/06, David Brownell <[EMAIL PROTECTED]> wrote: On Tuesday 21 November 2006 7:57 am, Bill Gatliff wrote: > Once you're hiding the GPIO number behind an enumeration, you can create > a bitmap with more information than a single integer. That extra > information could be used--- in my implementations, if any ever come > about--- to store routing information. But none of the existing GPIO users do that. The goal wasn't to define a new notion of GPIO; it was collecting the existing ones under a single arch-neutral umbrella. > >It'd also be a big (and needless) disruption to code that's been working > >fine for several years now ... > > ... all of which is using the current GPIO API, you mean? :) Effectively, yes. I counted quite a few implementations in the current tree which can trivially (#defines) map to that API. I tried to do that for pxa, the patch is attached. So what is the state of this discussion, now that 2.6.19 is here? I just submitted an input driver for GPIO buttons to linux-input that we use in the handhelds.org kernel for sa1100, pxa and s3c2410 archs. It needs some ugly #ifdefs currently, but with common GPIO calls they all could go away. regards Philipp - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP
* David Miller <[EMAIL PROTECTED]> wrote: > > yeah, i like this one. If the problem is "too long locked section", > > then the most natural solution is to "break up the lock", not to > > "boost the priority of the lock-holding task" (which is what the > > proposed patch does). > > Ingo you're mis-read the problem :-) yeah, the problem isnt too long locked section but "too much time spent holding a lock" and hence opening up ourselves to possible negative side-effects of the scheduler's fairness algorithm when it forces a preemption of that process context with that lock held (and forcing all subsequent packets to be backlogged). but please read my last mail - i think i'm slowly starting to wake up ;-) I dont think there is any real problem: a tweak to the scheduler that in essence gives TCP-using tasks a preference changes the balance of workloads. Such an explicit tweak is possible already. furthermore, the tweak allows the shifting of processing from a prioritized process context into a highest-priority softirq context. (it's not proven that there is any significant /net win/ of performance: all that was proven is that if we shift TCP processing from process context into softirq context then TCP throughput of that otherwise penalized process context increases.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Fix for OpenSUSE kernel bug (was Re: [Opps] Invalid opcode)
S.Çağlar Onur wrote: 05 Kas 2006 Paz 18:40 tarihinde, Andi Kleen şunları yazmıştı: How do you know this? Just guessing, if im not wrong panics occur after SMP alternative switching code done its job. And does it still happen in 2.6.19-rc4? Will try in VmWare and Microsoft Virtual PC and in order to confirm this bug is not our distro specific i downloaded and tried latest OpenSuse also [1] and [2] are screens captured by vmware but exact same panic occurs in Virtual PC as reported to us in [3]. Always the same BUG()? Yes, same bug There is just some rolling Turkish text there. Ah im sorry here is the correct links :( [1] http://cekirdek.pardus.org.tr/~caglar/2.6.18/panic_on_opensuse.png [2] http://cekirdek.pardus.org.tr/~caglar/2.6.18/panic_on_pardus.png Cheers I'm proposing this as a fix for your bug. Having tasklets scheduled before softirqd gets to run might be somewhat backwards, but there is nothing I can find wrong about it from a correctness point of view. Better to boot the kernel even when compiled with bug checking on, I think. This bug started becoming apparent in 2.6.18 because of some rework with the CPU hotplug code, but in theory, it exists at least all the way back to 2.6.10, which is as far as I looked backwards in time. Zach It is possible to have tasklets get scheduled before softirqd has had a chance to spawn on all CPUs. This is totally harmless; after success during action CPU_UP_PREPARE, action CPU_ONLINE will be called, which immediately wakes softirqd on the appropriate CPU to process the already pending tasklets. So there is no danger of having a missed wakeup for any tasklets that were already pending. In particular, i386 is affected by this during startup, and is visible when using a very large initrd; during the time it takes for the initrd to be decompressed, a timer IRQ can come in and schedule RCU callbacks. It is also possible that resending of a hardware IRQ via a softirq triggers the same bug. Because of different timing conditions, this shows up in all emulators and virtual machines tested, including Xen, VMware, Virtual PC, and Qemu. It is also possible to trigger on native hardware with a large enough initrd, although I don't have a reliable case demonstrating that. Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]> Index: linux-2.6.18/kernel/softirq.c === --- linux-2.6.18.orig/kernel/softirq.c 2006-11-10 14:44:39.0 -0800 +++ linux-2.6.18/kernel/softirq.c 2006-11-29 22:19:36.0 -0800 @@ -574,8 +574,6 @@ static int __cpuinit cpu_callback(struct switch (action) { case CPU_UP_PREPARE: - BUG_ON(per_cpu(tasklet_vec, hotcpu).list); - BUG_ON(per_cpu(tasklet_hi_vec, hotcpu).list); p = kthread_create(ksoftirqd, hcpu, "ksoftirqd/%d", hotcpu); if (IS_ERR(p)) { printk("ksoftirqd for %i failed\n", hotcpu);
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
On Thu, Nov 30, 2006 at 09:58:07AM +0530, Gautham R Shenoy wrote: > > So can we ignore this circular-dep warning as a false positive? > Or is there a way to exploit this circular dependency ? > > At the moment, I cannot think of way to exploit this circular dependency > unless we do something like try destroying the created workqueue when the > cpu is dead, i.e make the cpufreq governors cpu-hotplug-aware. > (eeks! that doesn't look good) Ok, I see that we are already doing it :(. So we can end up in a deadlock. Here's the culprit callpath: _cpu_down() ! !-> raw_notifier_call_chain(CPU_LOCK_ACQUIRE) ! ! ! !-> workqueue_cpu_mutex(CPU_LOCK_ACQUIRE) [*] ! !-> raw_notifier_call_chain(CPU_DEAD) ! !-> cpufreq_cpu_callback (CPU_DEAD) ! !-> cpufreq_remove_dev ! !-> __cpufreq_governor(data, GOVERNOR_STOP) ! !-> policy->governor->governor() ! !-> cpufreq_governor_dbs(GOVERNOR_STOP) ! !-> destroy_workqueue() [*] [*] indicates function takes workqueue_mutex. So a deadlock! I wasn't able to observe this because I'm running Xeon SMP box on which you cannot offline cpu0. And cpufreq data is created only for cpu0, while all other cpus cpufreq_data just point to cpu0's cpufreq_data. So the mentioned callpath within cpufreq_remove_dev is never reached during the normal cpu offline cycle. However, if there are architectures which allow the first-booted-cpu (or to be precise, the cpu for which cpufreq_data is *actually* created) to be offlined and we are running Ondemand governor during the offline, we will see this deadlock. regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug 7596 - Potential performance bottleneck for Linxu TCP
* Andrew Morton <[EMAIL PROTECTED]> wrote: > > Attached is the detailed description of the problem and one possible > > solution. > > Thanks. The attachment will be too large for the mailing-list servers > so I uploaded a copy to > http://userweb.kernel.org/~akpm/Linux-TCP-Bottleneck-Analysis-Report.pdf > > From a quick peek it appears that you're getting around 10% > improvement in TCP throughput, best case. Wenji, have you tried to renice the receiving task (to say nice -20) and see how much TCP throughput you get in "background load of 10.0". (similarly, you could also renice the background load tasks to nice +19 and/or set their scheduling policy to SCHED_BATCH) as far as i can see, the numbers in the paper and the patch prove the following two points: - a task doing TCP receive with 10 other tasks running on the CPU will see lower TCP throughput than if it had the CPU for itself alone. - a patch that tweaks the scheduler to give the receiving task more timeslices (i.e. raises its nice level in essence) results in ... more timeslices, which results in higher receive numbers ... so the most important thing to check would be, before any scheduler and TCP code change is considered: if you give the task higher priority /explicitly/, via nice -20, do the numbers improve? Similarly, if all the other "background load" tasks are reniced to nice +19 (or their policy is set to SCHED_BATCH), do you get a similar improvement? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP
From: Ingo Molnar <[EMAIL PROTECTED]> Date: Thu, 30 Nov 2006 07:17:58 +0100 > > * David Miller <[EMAIL PROTECTED]> wrote: > > > We can make explicitl preemption checks in the main loop of > > tcp_recvmsg(), and release the socket and run the backlog if > > need_resched() is TRUE. > > > > This is the simplest and most elegant solution to this problem. > > yeah, i like this one. If the problem is "too long locked section", then > the most natural solution is to "break up the lock", not to "boost the > priority of the lock-holding task" (which is what the proposed patch > does). Ingo you're mis-read the problem :-) The issue is that we actually don't hold any locks that prevent preemption, so we can take preemption points which the TCP code wasn't designed with in-mind. Normally, we control the sleep point very carefully in the TCP sendmsg/recvmsg code, such that when we sleep we drop the socket lock and process the backlog packets that accumulated while the socket was locked. With pre-emption we can't control that properly. The problem is that we really do need to run the backlog any time we give up the cpu in the sendmsg/recvmsg path, or things get real erratic. ACKs don't go out as early as we'd like them to, etc. It isn't easy to do generically, perhaps, because we can only drop the socket lock at certain points and we need to do that to run the backlog. This is why my suggestion is to preempt_disable() as soon as we grab the socket lock, and explicitly test need_resched() at places where it is absolutely safe, like this: if (need_resched()) { /* Run packet backlog... */ release_sock(sk); schedule(); lock_sock(sk); } The socket lock is just a by-hand binary semaphore, so it doesn't block pre-emption. We have to be able to sleep while holding it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP
* Wenji Wu <[EMAIL PROTECTED]> wrote: > > That yield() will need to be removed - yield()'s behaviour is truly > > awfulif the system is otherwise busy. What is it there for? > > Please read the uploaded paper, which has detailed description. do you have any URL for that? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP
* David Miller <[EMAIL PROTECTED]> wrote: > We can make explicitl preemption checks in the main loop of > tcp_recvmsg(), and release the socket and run the backlog if > need_resched() is TRUE. > > This is the simplest and most elegant solution to this problem. yeah, i like this one. If the problem is "too long locked section", then the most natural solution is to "break up the lock", not to "boost the priority of the lock-holding task" (which is what the proposed patch does). [ Also note that "sprinkle the code with preempt_disable()" kind of solutions, besides hurting interactivity, are also a pain to resolve in something like PREEMPT_RT. (unlike say a spinlock, preempt_disable() is quite opaque in what data structure it protects, etc., making it hard to convert it to a preemptible primitive) ] > The one suggested in your patch and paper are way overkill, there is > no reason to solve a TCP specific problem inside of the generic > scheduler. agreed. What we could also add is a /reverse/ mechanism to the scheduler: a task could query whether it has just a small amount of time left in its timeslice, and could in that case voluntarily drop its current lock and yield, and thus give up its current timeslice and wait for a new, full timeslice, instead of being forcibly preempted due to lack of timeslices with a possibly critical lock still held. But the suggested solution here, to "prolong the running of this task just a little bit longer" only starts a perpetual arms race between users of such a facility and other kernel subsystems. (besides not being adequate anyway, there can always be /so/ long lock-hold times that the scheduler would have no other option but to preempt the task) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP
On Wed, 2006-11-29 at 17:08 -0800, Andrew Morton wrote: > + if (p->backlog_flag == 0) { > + if (!TASK_INTERACTIVE(p) || expired_starving(rq)) { > + enqueue_task(p, rq->expired); > + if (p->static_prio < rq->best_expired_prio) > + rq->best_expired_prio = p->static_prio; > + } else > + enqueue_task(p, rq->active); > + } else { > + if (expired_starving(rq)) { > + enqueue_task(p,rq->expired); > + if (p->static_prio < rq->best_expired_prio) > + rq->best_expired_prio = p->static_prio; > + } else { > + if (!TASK_INTERACTIVE(p)) > + p->extrarun_flag = 1; > + enqueue_task(p,rq->active); > + } > + } (oh my, doing that to the scheduler upsets my tummy, but that aside...) I don't see how that can really solve anything. "Interactive" tasks starting to use cpu heftily can still preempt and keep the special cased cpu hog off the cpu for ages. It also only takes one task in the expired array to trigger the forced array switch with a fully loaded cpu, and once any task hits the expired array, a stream of wakeups can prevent the switch from completing for as long as you can keep wakeups happening. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c (kernel 2.6.18.1)
On 30/11/06, David Chinner <[EMAIL PROTECTED]> wrote: On Wed, Nov 29, 2006 at 10:17:25AM +0100, Jesper Juhl wrote: > On 29/11/06, David Chinner <[EMAIL PROTECTED]> wrote: > >On Tue, Nov 28, 2006 at 04:49:00PM +0100, Jesper Juhl wrote: > >> Filesystem "dm-1": XFS internal error xfs_trans_cancel at line 1138 of > >> file fs/xfs/xfs_trans.c. Caller 0x8034b47e > >> > >> Call Trace: > >> [] show_trace+0xb2/0x380 > >> [] dump_stack+0x15/0x20 > >> [] xfs_error_report+0x3c/0x50 > >> [] xfs_trans_cancel+0x6e/0x130 > >> [] xfs_create+0x5ee/0x6a0 > >> [] xfs_vn_mknod+0x156/0x2e0 > >> [] xfs_vn_create+0xb/0x10 > >> [] vfs_create+0x8c/0xd0 > >> [] nfsd_create_v3+0x31a/0x560 > >> [] nfsd3_proc_create+0x148/0x170 > >> [] nfsd_dispatch+0xf9/0x1e0 > >> [] svc_process+0x437/0x6e0 > >> [] nfsd+0x1cd/0x360 > >> [] child_rip+0xa/0x12 > >> xfs_force_shutdown(dm-1,0x8) called from line 1139 of file > >> fs/xfs/xfs_trans.c. Return address = 0x80359daa > > > >We shut down the filesystem because we cancelled a dirty transaction. > >Once we start to dirty the incore objects, we can't roll back to > >an unchanged state if a subsequent fatal error occurs during the > >transaction and we have to abort it. > > > So you are saying that there's nothing I can do to prevent this from > happening in the future? Pretty much - we need to work out what is going wrong and we can't from teh shutdown message above - the error has occurred in a path that doesn't have error report traps in it. Is this reproducable? Not on demand, no. It has happened only this once as far as I know and for unknown reasons. > >If I understand historic occurrences of this correctly, there is > >a possibility that it can be triggered in ENOMEM situations. Was your > >machine running out of memoy when this occurred? > > > Not really. I just checked my monitoring software and, at the time > this happened, the box had ~5.9G RAM free (of 8G total) and no swap > used (but 11G available). Ok. Sounds like we need more error reporting points inserted into that code so we dump an error earlier and hence have some hope of working out what went wrong next time. OOC, there weren't any I/O errors reported before this shutdown? No. I looked but found none. Let me know if there's anything I can do to help. -- Jesper Juhl <[EMAIL PROTECTED]> Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html Plain text mails only, please http://www.expita.com/nomime.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PM-Timer clock source is slow. Try something else: How slow? What other source(s)?
john stultz wrote: On Wed, 2006-11-29 at 16:56 -0800, Linda Walsh wrote: I recently noticed this message in my bootup that I don't remember from before: PCI: Probing PCI hardware (bus 00) * Found PM-Timer Bug on the chipset. Due to workarounds for a bug, * this clock source is slow. Consider trying other clock sources This basically means that your chipset has a bug which requires the ACPI PM timer to be read three times in order to get a valid reading. This will cause gettimeofday/clock_gettime to take longer to execute, which is what is meant by "slow" (rather then the counter's frequency being incorrect). How would this affect my clock? It says to try another clock source, what type of clock source would it be suggesting I use? Another chip already in the computer? Yes. It is an Intel 440BX chipset; on an Dell motherboard. Would that be likely to have another chip source that is compensating? You can change the clock source using "clock=" kernel parameter. Please refer to Documentation/kernel-parameters.txt file of kernel source. I don't notice a significant clock slowdown, but I'm running NTP, so that could be masking the problem. Unless you're running performance critical programs that utilize gettimeofday/clock_gettime, you probably won't notice anything. Time should still function properly. If you are having performance issues, you can try using a different clocksource (the TSC is probably safe, but not necessarily). thanks -john - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ Thanks Srinivasa DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] autofs: fix error code path in autofs_fill_sb()
On Thu, 2006-11-30 at 01:26 +0100, Jiri Kosina wrote: > [PATCH] autofs: fix error code path in autofs_fill_sb() > > When kernel is compiled with old version of autofs (CONFIG_AUTOFS_FS), and > new (observed at least with 5.x.x) automount deamon is started, kernel > correctly reports incompatible version of kernel and userland daemon, but > then screws things up instead of correct handling of the error: > > autofs: kernel does not match daemon version > = > [ BUG: bad unlock balance detected! ] > - > automount/4199 is trying to release lock (>s_umount_key) at: > [] get_sb_nodev+0x76/0xa4 > but there are no more locks to release! > > other info that might help us debug this: > no locks held by automount/4199. > > stack backtrace: > [] dump_trace+0x68/0x1b2 > [] show_trace_log_lvl+0x18/0x2c > [] show_trace+0xf/0x11 > [] dump_stack+0x12/0x14 > [] print_unlock_inbalance_bug+0xe7/0xf3 > [] lock_release+0x8d/0x164 > [] up_write+0x14/0x27 > [] get_sb_nodev+0x76/0xa4 > [] vfs_kern_mount+0x83/0xf6 > [] do_kern_mount+0x2d/0x3e > [] do_mount+0x607/0x67a > [] sys_mount+0x72/0xa4 > [] sysenter_past_esp+0x5f/0x99 > DWARF2 unwinder stuck at sysenter_past_esp+0x5f/0x99 > Leftover inexact backtrace: > === > > and then deadlock comes. > > The problem: autofs_fill_super() returns EINVAL to get_sb_nodev(), but before > that, it calls kill_anon_super() to destroy the superblock which won't be > needed. This is however way too soon to call kill_anon_super(), because > get_sb_nodev() has to perform its own cleanup of the superblock first > (deactivate_super(), etc.). The correct time to call kill_anon_super() is in > the autofs_kill_sb() callback, which is called by deactivate_super() at proper > time, when the superblock is ready to be killed. > > I can see the same faulty codepath also in autofs4. This patch solves issues > in > both filesystems in a same way - it postpones the kill_anon_super() until the > proper time is signalized by deactivate_super() calling the kill_sb() > callback. > > Patch against 2.6.19-rc6-mm2. > > Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]> Acked-by: Ian Kent <[EMAIL PROTECTED]> It looks so obvious now. Updating the comment above would be a good idea also, see attached. > > --- > > fs/autofs/inode.c|4 ++-- > fs/autofs4/inode.c |4 ++-- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/fs/autofs/inode.c b/fs/autofs/inode.c > index 38ede5c..61e04ab 100644 > --- a/fs/autofs/inode.c > +++ b/fs/autofs/inode.c > @@ -31,7 +31,7 @@ void autofs_kill_sb(struct super_block * >* just exit when we are called from deactivate_super. >*/ > if (!sbi) > - return; > + goto out_kill_sb; > > if ( !sbi->catatonic ) > autofs_catatonic_mode(sbi); /* Free wait queues, close pipe */ > @@ -44,6 +44,7 @@ void autofs_kill_sb(struct super_block * > > kfree(sb->s_fs_info); > > +out_kill_sb: > DPRINTK(("autofs: shutting down\n")); > kill_anon_super(sb); > } > @@ -209,7 +210,6 @@ fail_iput: > fail_free: > kfree(sbi); > s->s_fs_info = NULL; > - kill_anon_super(s); > fail_unlock: > return -EINVAL; > } > diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c > index ce7c0f1..be14200 100644 > --- a/fs/autofs4/inode.c > +++ b/fs/autofs4/inode.c > @@ -155,7 +155,7 @@ void autofs4_kill_sb(struct super_block >* just exit when we are called from deactivate_super. >*/ > if (!sbi) > - return; > + goto out_kill_sb; > > sb->s_fs_info = NULL; > > @@ -167,6 +167,7 @@ void autofs4_kill_sb(struct super_block > > kfree(sbi); > > +out_kill_sb: > DPRINTK("shutting down"); > kill_anon_super(sb); > } > @@ -426,7 +427,6 @@ fail_ino: > fail_free: > kfree(sbi); > s->s_fs_info = NULL; > - kill_anon_super(s); > fail_unlock: > return -EINVAL; > } > > Update descriptive comment also. Signed-off-by: Ian Kent <[EMAIL PROTECTED]> --- --- linux-2.6.19-rc5-mm1/fs/autofs4/inode.c.fix-error-in-autofs_fill_sb-comment 2006-11-30 13:05:13.0 +0800 +++ linux-2.6.19-rc5-mm1/fs/autofs4/inode.c 2006-11-30 13:09:27.0 +0800 @@ -152,7 +152,8 @@ void autofs4_kill_sb(struct super_block /* * In the event of a failure in get_sb_nodev the superblock * info is not present so nothing else has been setup, so -* just exit when we are called from deactivate_super. +* just call kill_anon_super when we are called from +* deactivate_super. */ if (!sbi) goto out_kill_sb; --- linux-2.6.19-rc5-mm1/fs/autofs/inode.c.fix-error-in-autofs_fill_sb-comment 2006-11-30 13:05:02.0 +0800 +++ linux-2.6.19-rc5-mm1/fs/autofs/inode.c 2006-11-30 13:09:00.0 +0800 @@
Re: failed 'ljmp' in linear addressing mode
On Tue, Nov 28, 2006 at 05:40:56PM -0800, Jun Sun wrote: > > Can you elaborate more why this last ljmp will fail? I thought at this point > the paging is turned off, and 0x1000- would simply mean a physical > address - which is a valid physical address in RAM, btw. > I finally got it working, even though I don't understand at all. :) I realized that after paging mode is turned off, 0x1000- is actually at the same flag 4G code segment as caller code. So I tried to just "call" and that worked. Here is the excerpt of the related code in case someone else needs to do the same: In arch/i386/kernel/machine_kexec.c: extern void do_os_switching(void); void os_switch(void) { void (*foo)(void); /* absolutely no irq */ local_irq_disable(); /* create identity mapping */ foo=virt_to_phys(do_os_switching); identity_map_page((unsigned long)foo); /* jump to the real address */ load_segments(); set_gdt(phys_to_virt(0),0); set_idt(phys_to_virt(0),0); foo(); } In arch/i386/kernel/acpi/wakeup.S: .align 4096 ENTRY(do_os_switching) /* JSUN, 0x11 was the boot up value for cr0. */ movl$0x11, %eax movl%eax, %cr0 /* clear cr4 */ movl$0, %eax movl%eax, %cr4 /* clear cr3, flush TLB */ movl$0, %eax movl%eax, %cr3 movl$0x1000,%eax call*%eax I have a second Linux kernel loaded at 0x1000-. Now the only matter remaining is to figure out why the tsc timer stopped working ... :) Cheers. Jun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] dynsched - different cpu schedulers per cpuset
pj wrote: > See Paul Menage's most recent patch proposal at: > http://lkml.org/lkml/2006/11/17/217 > Subject: [PATCH 0/6] Multi-hierarchy Process Containers > Date:Fri, 17 Nov 2006 11:11:59 -0800 I'm behind the times. Paul Menage's most recent proposal is at: http://lkml.org/lkml/2006/11/23/95 Subject: [PATCH 0/7] Generic Process Containers (+ ResGroups/BeanCounters) Date:Thu, 23 Nov 2006 04:08:48 -0800 -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Core file size?
Does anyone know what determines the size of a core dump? I have a process running out of memory (it allocates about 3GB) - but the size of core varies (between 2-3GB) depending on how much the process wrote on the allocated memory. Also, the time it takes to write the core (same size) varies?? I briefly looked at elf_core_dump and get_user_pages() in binfmt_elf.c. Is there any documentation on this? Or anyone knows how it works? TIA Cheap talk? Check out Yahoo! Messenger's low PC-to-Phone call rates. http://voice.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mass-storage problems with Archos AV500
David Weinehall wrote: I've got an Archos AV500 here (running the very latest firmware), pretty much acting as a doorstop, since I cannot get it to be recognized properly by Linux. .. [ 118.144000] SCSI device sdb: 58074975 512-byte hdwr sectors (29734 MB) [ 118.144000] sdb: Write Protect is off [ 118.144000] sdb: Mode Sense: 33 00 00 00 [ 118.144000] sdb: assuming drive cache: write through [ 118.144000] sdb: unknown partition table [ 118.452000] sd 4:0:0:0: Attached scsi removable disk sdb [ 118.452000] usb-storage: device scan complete This is with linux-image-2.6.19-7-generic 2.6.19-7.10 from Ubuntu edgy. I get similar results with a home-brew 2.6.18-rc4. Any mass storage quirk needed that might be missing? That all seems normal, other than the unknown partition table, but the device might be all one unpartitioned disk.. at what point is it failing? -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 panic on boot -- i386
* David Miller ([EMAIL PROTECTED]) wrote: > Check [EMAIL PROTECTED]'s inbox, I just sent it in :) Ooh, nice timing! thanks, -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/4] [ATM] Add CPPFLAGS to byteorder.h check.
O= builds produced errors in the shell command because of unfound headers. Signed-off-by: Ben Collins <[EMAIL PROTECTED]> --- drivers/atm/Makefile |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/atm/Makefile b/drivers/atm/Makefile index b5077ce..1b16f81 100644 --- a/drivers/atm/Makefile +++ b/drivers/atm/Makefile @@ -41,7 +41,7 @@ ifeq ($(CONFIG_ATM_FORE200E_PCA),y) # guess the target endianess to choose the right PCA-200E firmware image ifeq ($(CONFIG_ATM_FORE200E_PCA_DEFAULT_FW),y) byteorder.h:= include$(if $(patsubst $(srctree),,$(objtree)),2)/asm/byteorder.h -CONFIG_ATM_FORE200E_PCA_FW := $(obj)/pca200e$(if $(shell $(CC) -E -dM $(byteorder.h) | grep ' __LITTLE_ENDIAN '),.bin,_ecd.bin2) +CONFIG_ATM_FORE200E_PCA_FW := $(obj)/pca200e$(if $(shell $(CC) $(CPPFLAGS) -E -dM $(byteorder.h) | grep ' __LITTLE_ENDIAN '),.bin,_ecd.bin2) endif endif -- 1.4.1 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/4] [APIC] Allow disabling of UP APIC/IO-APIC by default, with command line option to turn it on.
Signed-off-by: Ben Collins <[EMAIL PROTECTED]> --- arch/i386/Kconfig | 13 + arch/i386/kernel/apic.c| 13 +++-- arch/i386/kernel/io_apic.c | 10 +- include/asm-i386/apic.h|6 ++ include/asm-i386/io_apic.h |5 + 5 files changed, 44 insertions(+), 3 deletions(-) diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig index b4a2461..ef2f2db 100644 --- a/arch/i386/Kconfig +++ b/arch/i386/Kconfig @@ -285,6 +285,19 @@ config X86_UP_IOAPIC to use it. If you say Y here even though your machine doesn't have an IO-APIC, then the kernel will still run with no slowdown at all. +config X86_UP_APIC_DEFAULT_OFF + bool "APIC support on uniprocessors defaults to off" + depends on X86_UP_APIC + default n + help + Some older systems have flaky APICs. Say Y to turn off APIC + support by default, while still allowing it to be enabled by the + "lapic" and "apic" command line options. + + Usually this is only necessary for distro installer kernels that + must work with everything. Everyone else can safely say N here + and configure APIC support in or out as needed. + config X86_LOCAL_APIC bool depends on X86_UP_APIC || ((X86_VISWS || SMP) && !X86_VOYAGER) || X86_GENERICARCH diff --git a/arch/i386/kernel/apic.c b/arch/i386/kernel/apic.c index 2fd4b7d..2f2eb83 100644 --- a/arch/i386/kernel/apic.c +++ b/arch/i386/kernel/apic.c @@ -51,8 +51,9 @@ static cpumask_t timer_bcast_ipi; /* * Knob to control our willingness to enable the local APIC. + * -2=default-disable, -1=force-disable, 1=force-enable, 0=automatic */ -static int enable_local_apic __initdata = 0; /* -1=force-disable, +1=force-enable */ +static int enable_local_apic __initdata = (X86_APIC_DEFAULT_OFF ? -2 : 0); static inline void lapic_disable(void) { @@ -801,7 +802,7 @@ static int __init detect_init_APIC (void * APIC only if "lapic" specified. */ if (enable_local_apic <= 0) { - printk("Local APIC disabled by BIOS -- " + printk("Local APIC disabled by BIOS (or by default) -- " "you can enable it with \"lapic\"\n"); return -1; } @@ -1350,6 +1351,14 @@ int __init APIC_init_uniprocessor (void) if (!smp_found_config && !cpu_has_apic) return -1; + /* If local apic is off due to config_x86_apic_off option, jump +* out here. */ + if (enable_local_apic < -1) { + printk(KERN_INFO "Local APIC disabled by default -- " + "use 'lapic' to enable it.\n"); + return -1; + } + /* * Complain if the BIOS pretends there is one. */ diff --git a/arch/i386/kernel/io_apic.c b/arch/i386/kernel/io_apic.c index 3b7a63e..0122dba 100644 --- a/arch/i386/kernel/io_apic.c +++ b/arch/i386/kernel/io_apic.c @@ -767,7 +767,7 @@ #endif /* !CONFIG_SMP */ #define MAX_PIRQS 8 static int pirq_entries [MAX_PIRQS]; static int pirqs_enabled; -int skip_ioapic_setup; +int skip_ioapic_setup = X86_APIC_DEFAULT_OFF; static int __init ioapic_setup(char *str) { @@ -2887,3 +2887,11 @@ static int __init parse_noapic(char *arg return 0; } early_param("noapic", parse_noapic); + +static int __init parse_apic(char *arg) +{ + /* enable IO-APIC */ + enable_ioapic_setup(); + return 0; +} +early_param("apic", parse_apic); diff --git a/include/asm-i386/apic.h b/include/asm-i386/apic.h index b952957..a06ca3f 100644 --- a/include/asm-i386/apic.h +++ b/include/asm-i386/apic.h @@ -71,6 +71,12 @@ # define apic_read_around(x) apic_read(x # define apic_write_around(x,y) apic_write_atomic((x),(y)) #endif +#ifdef CONFIG_X86_UP_APIC_DEFAULT_OFF +# define X86_APIC_DEFAULT_OFF 1 +#else +# define X86_APIC_DEFAULT_OFF 0 +#endif + static inline void ack_APIC_irq(void) { /* diff --git a/include/asm-i386/io_apic.h b/include/asm-i386/io_apic.h index 059a9ff..ddedeec 100644 --- a/include/asm-i386/io_apic.h +++ b/include/asm-i386/io_apic.h @@ -126,6 +126,11 @@ static inline void disable_ioapic_setup( skip_ioapic_setup = 1; } +static inline void enable_ioapic_setup(void) +{ + skip_ioapic_setup = 0; +} + static inline int ioapic_setup_disabled(void) { return skip_ioapic_setup; -- 1.4.1 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPUFREQ-CPUHOTPLUG: Possible circular locking dependency
On Wed, Nov 29, 2006 at 01:05:56PM -0800, Andrew Morton wrote: > On Wed, 29 Nov 2006 20:54:04 +0530 > Gautham R Shenoy <[EMAIL PROTECTED]> wrote: > > > Ok, so to cut the long story short, > > - While changing governor from anything to > > ondemand, locks are taken in the following order > > > > policy->lock ===> dbs_mutex ===> workqueue_mutex. > > > > - While offlining a cpu, locks are taken in the following order > > > > cpu_add_remove_lock ==> sched_hotcpu_mutex ==> workqueue_mutex == > > ==> cache_chain_mutex ==> policy->lock. > > What functions are taking all these locks? (ie: the callpath?) While changing cpufreq governor to ondemand, the locks taken are: -- lockfunctionfile -- policy->lockstore_scaling_governor drivers/cpufreq/cpufreq.c dbs_mutex cpufreq_governor_dbsdrivers/cpufreq/cpufreq_ondemand.c workqueue_mutex __create_workqueue kernel/workqueue.c -- The complete callpath would be store_scaling_governor [*] | __cpufreq_set_policy | __cpufreq_governor(data, CPUFREQ_GOV_START) | policy->governor->governor => cpufreq_governor_dbs(data, CPUFREQ_GOV_START) [*] | create_workqueue #defined as __create_workqueue [*] where [*] = locks taken. While offlining a cpu, locks are taken in the following order: -- lockfunctionfile -- cpu_add_remove_lock cpu_downkernel/cpu.c sched_hotcpu_mutex migration_call kernel/sched.c workqueue_mutex workqueue_cpu_callback kernel/workqueue.c cache_chain_mutex cpuup_callback mm/slab.c policy->lockcpufreq_driver_target drivers/cpufreq/cpufreq.c --- Please note that in the above, - sched_hotcpu_mutex, workqueue_mutex, cache_chain_mutex are taken while handling CPU_LOCK_ACQUIRE events in the respective subsystems' cpu_callback functions. - policy->lock is taken while handling CPU_DOWN_PREPARE in cpufreq_cpu_callback which calls cpufreq_driver_target. It's perfectly clear that in the cpu offline callpath, cpufreq does not have to do anything with the workqueue. So can we ignore this circular-dep warning as a false positive? Or is there a way to exploit this circular dependency ? At the moment, I cannot think of way to exploit this circular dependency unless we do something like try destroying the created workqueue when the cpu is dead, i.e make the cpufreq governors cpu-hotplug-aware. (eeks! that doesn't look good) I'm working on fixing this. Let me see if I can come up with something. Thanks and Regards gautham. -- Gautham R Shenoy Linux Technology Center IBM India. "Freedom comes with a price tag of responsibility, which is still a bargain, because Freedom is priceless!" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/4] [HVCS] Select HVC_CONSOLE if HVCS is enabled.
If HVC_CONSOLE provides symbols that HVCS requires. Signed-off-by: Ben Collins <[EMAIL PROTECTED]> --- drivers/char/Kconfig |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig index 2af12fc..c94ecdc 100644 --- a/drivers/char/Kconfig +++ b/drivers/char/Kconfig @@ -598,6 +598,7 @@ config HVC_RTAS config HVCS tristate "IBM Hypervisor Virtual Console Server support" depends on PPC_PSERIES + select HVC_CONSOLE help Partitionable IBM Power5 ppc64 machines allow hosting of firmware virtual consoles from one Linux partition by -- 1.4.1 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 panic on boot -- i386
From: Chris Wright <[EMAIL PROTECTED]> Date: Wed, 29 Nov 2006 20:27:59 -0800 > * David Miller ([EMAIL PROTECTED]) wrote: > > From: Pete Clements <[EMAIL PROTECTED]> > > Date: Wed, 29 Nov 2006 22:13:09 -0500 (EST) > > > > > 2.6.19 panics at boot. Good up through rc6-git11. > > > Hand copied screen below. > > > > Here is the fix, which was posted in response to a seperate > > report of this problem here: > > looks like 2.6.19.1 material ;-) Check [EMAIL PROTECTED]'s inbox, I just sent it in :) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Ubuntu patch sync for 2.6.20
This is a set of patches from the Ubuntu tree that seemed suitable for upstream sync. [PATCH 1/4] [x86] Add command line option to enable/disable hyper-threading. [PATCH 2/4] [APIC] Allow disabling of UP APIC/IO-APIC by default, with command line option to turn it on. [PATCH 3/4] [ATM] Add CPPFLAGS to byteorder.h check. [PATCH 4/4] [HVCS] Select HVC_CONSOLE if HVCS is enabled. arch/i386/Kconfig | 13 + Documentation/kernel-parameters.txt |3 +++ arch/i386/Kconfig |5 + arch/i386/kernel/apic.c | 13 +++-- arch/i386/kernel/cpu/common.c | 30 +- arch/i386/kernel/io_apic.c | 10 +- drivers/atm/Makefile|3 +-- drivers/char/Kconfig|2 +- include/asm-i386/apic.h |6 ++ include/asm-i386/io_apic.h |6 +- 10 files changed, 83 insertions(+), 8 deletions(-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Ubuntu patch sync for 2.6.20
This is a set of patches from the Ubuntu tree that seemed suitable for upstream sync. [PATCH 1/4] [x86] Add command line option to enable/disable hyper-threading. [PATCH 2/4] [APIC] Allow disabling of UP APIC/IO-APIC by default, with command line option to turn it on. [PATCH 3/4] [ATM] Add CPPFLAGS to byteorder.h check. [PATCH 4/4] [HVCS] Select HVC_CONSOLE if HVCS is enabled. arch/i386/Kconfig | 13 + Documentation/kernel-parameters.txt |3 +++ arch/i386/Kconfig |5 + arch/i386/kernel/apic.c | 13 +++-- arch/i386/kernel/cpu/common.c | 30 +- arch/i386/kernel/io_apic.c | 10 +- drivers/atm/Makefile|3 +-- drivers/char/Kconfig|2 +- include/asm-i386/apic.h |6 ++ include/asm-i386/io_apic.h |6 +- 10 files changed, 83 insertions(+), 8 deletions(-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/4] [x86] Add command line option to enable/disable hyper-threading.
This patch adds a config option to allow disabling hyper-threading by default, and a kernel command line option to changes this default at boot time. Signed-off-by: Ben Collins <[EMAIL PROTECTED]> --- Documentation/kernel-parameters.txt |3 +++ arch/i386/Kconfig |5 + arch/i386/kernel/cpu/common.c | 29 + 3 files changed, 37 insertions(+), 0 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 6747384..2b68d6e 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -600,6 +600,9 @@ and is between 256 and 4096 characters. hisax= [HW,ISDN] See Documentation/isdn/README.HiSax. + ht= [HW,IA-32,SMP] Enable or disable hyper-threading. + Format: + hugepages= [HW,IA-32,IA-64] Maximal number of HugeTLB pages. noirqbalance[IA-32,SMP,KNL] Disable kernel irq balancing diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig index 8ff1c6f..b4a2461 100644 --- a/arch/i386/Kconfig +++ b/arch/i386/Kconfig @@ -1185,6 +1185,11 @@ config X86_HT depends on SMP && !(X86_VISWS || X86_VOYAGER) default y +config X86_HT_DISABLE + bool "Disable Hyper-Threading by default" + depends on X86_HT + default n + config X86_BIOS_REBOOT bool depends on !(X86_VISWS || X86_VOYAGER) diff --git a/arch/i386/kernel/cpu/common.c b/arch/i386/kernel/cpu/common.c index d9f3e3c..42d2361 100644 --- a/arch/i386/kernel/cpu/common.c +++ b/arch/i386/kernel/cpu/common.c @@ -482,6 +482,29 @@ void __cpuinit identify_cpu(struct cpuin } #ifdef CONFIG_X86_HT + +#ifdef CONFIG_X86_HT_DISABLE +static int disable_ht __cpuinitdata = 1; +#else +static int disable_ht __cpuinitdata; +#endif + +static int __init parse_ht(char *arg) +{ + if (!arg) + return -EINVAL; + + if (!memcmp(arg, "on", 2)) + disable_ht = 0; + else if (!memcmp(arg, "off", 3)) + disable_ht = 1; + else + return -EINVAL; + + return 0; +} +early_param("ht", parse_ht); + void __cpuinit detect_ht(struct cpuinfo_x86 *c) { u32 eax, ebx, ecx, edx; @@ -492,6 +515,12 @@ void __cpuinit detect_ht(struct cpuinfo_ if (!cpu_has(c, X86_FEATURE_HT) || cpu_has(c, X86_FEATURE_CMP_LEGACY)) return; + if (disable_ht) { + printk(KERN_INFO "CPU: Hyper-Threading disabled by default. Enable with ht=on\n"); + smp_num_siblings = 1; + return; + } + smp_num_siblings = (ebx & 0xff) >> 16; if (smp_num_siblings == 1) { -- 1.4.1 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Ubuntu patch sync for 2.6.20
This is a set of patches from the Ubuntu tree that seemed suitable for upstream sync. [PATCH 1/4] [x86] Add command line option to enable/disable hyper-threading. [PATCH 2/4] [APIC] Allow disabling of UP APIC/IO-APIC by default, with command line option to turn it on. [PATCH 3/4] [ATM] Add CPPFLAGS to byteorder.h check. [PATCH 4/4] [HVCS] Select HVC_CONSOLE if HVCS is enabled. arch/i386/Kconfig | 13 + Documentation/kernel-parameters.txt |3 +++ arch/i386/Kconfig |5 + arch/i386/kernel/apic.c | 13 +++-- arch/i386/kernel/cpu/common.c | 30 +- arch/i386/kernel/io_apic.c | 10 +- drivers/atm/Makefile|3 +-- drivers/char/Kconfig|2 +- include/asm-i386/apic.h |6 ++ include/asm-i386/io_apic.h |6 +- 10 files changed, 83 insertions(+), 8 deletions(-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 panic on boot -- i386
* David Miller ([EMAIL PROTECTED]) wrote: > From: Pete Clements <[EMAIL PROTECTED]> > Date: Wed, 29 Nov 2006 22:13:09 -0500 (EST) > > > 2.6.19 panics at boot. Good up through rc6-git11. > > Hand copied screen below. > > Here is the fix, which was posted in response to a seperate > report of this problem here: looks like 2.6.19.1 material ;-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] dynsched - different cpu schedulers per cpuset
Felix wrote: > The cpu<->scheduler mapping is controlled via cpusets. Thus you > can switch the scheduler for a cpuset containing multiple cpus and > keep the rest untouched. I don't have comments on the main focus of this work - schedulers are not my expertise. I just noticed this lkml post because of my interest in cpusets. You should take a look at the work of Paul Menage (added to the cc list), who is splitting the cpuset code into: 1) a generic "container" mechanism, 2) separate CPU and Memory "controllers", and 3) various other additional "controllers". See Paul Menage's most recent patch proposal at: http://lkml.org/lkml/2006/11/17/217 Subject: [PATCH 0/6] Multi-hierarchy Process Containers Date:Fri, 17 Nov 2006 11:11:59 -0800 The container mechanism uses a virtual file system derived from the cpuset code to provide a file system style (hierarchical names and classic Unix style file and directory permissions) naming of a partitioning of the tasks on a system. By partitioning here, I mean a division of the tasks into several subsets, aka partition elements, which are non-overlapping and covering. That is, each task is in one and only one of the partition elements, these partitions elements are named by the directories in the container file system, and the regular files in the container file system provide per-element attributes. Then kernel facilities that can be considered as providing attributes for and control of subsets of tasks is represented as a controller, and attached to such a container. Your dynamic scheduler mechanisms appear (from what I can tell after a brief glance) to be a candidate for being such a controller. The upshot of this is that, if your work should proceed and eventually be considered for inclusion in the kernel (I have --no-- idea if that would be a good idea, either for the purposes of your student group, or for the kernel itself) then it would likely (if Menage's work is accepted) need to be recast as a "controller" in Menage's terms, not as an extension to cpusets. If Menage succeeds, that should not actually be that big of a change, either semantically, or in coding details. Good luck. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [rfc patch] Re: [patch] PM: suspend/resume debugging should depend on SOFTWARE_SUSPEND
On Wed, 2006-11-29 at 11:49 -0800, Andrew Morton wrote: > > +#ifdef CONFIG_PM > > +static int serial_pnp_suspend(struct pnp_dev *dev, pm_message_t state) > > +{ > > + long line = (long)pnp_get_drvdata(dev); > > Please avoid adding long lines. (heh, I kill me) Ok. I also changed the place I got it from. > We'd usually do > > #else > #define serial_pnp_suspend NULL > #define serial_pnp_resume NULL > > here > > > + > > +#endif /* CONFIG_PM */ > > + > > static struct pnp_driver serial_pnp_driver = { > > .name = "serial", > > - .id_table = pnp_dev_table, > > .probe = serial_pnp_probe, > > .remove = __devexit_p(serial_pnp_remove), > > +#ifdef CONFIG_PM > > + .suspend= serial_pnp_suspend, > > + .resume = serial_pnp_resume, > > +#endif > > and hence omit the ifdefs here. New patch. Add suspend/resume methods to drivers/serial/8250_pnp.c. Tested on a P4/HT 16550A box, ttyS0 login survives across suspend to ram. Signed-off-by: Mike Galbraith <[EMAIL PROTECTED]> --- linux-2.6.19-rc6-mm2/drivers/serial/8250_pnp.c.org 2006-11-29 07:14:15.0 +0100 +++ linux-2.6.19-rc6-mm2/drivers/serial/8250_pnp.c 2006-11-29 20:49:33.0 +0100 @@ -459,16 +459,43 @@ serial_pnp_probe(struct pnp_dev *dev, co static void __devexit serial_pnp_remove(struct pnp_dev *dev) { - long line = (long)pnp_get_drvdata(dev); + int line = (long)pnp_get_drvdata(dev); if (line) serial8250_unregister_port(line - 1); } +#ifdef CONFIG_PM +static int serial_pnp_suspend(struct pnp_dev *dev, pm_message_t state) +{ + int line = (int)pnp_get_drvdata(dev); + + if (!line) + return -ENODEV; + serial8250_suspend_port(line - 1); + return 0; +} + +static int serial_pnp_resume(struct pnp_dev *dev) +{ + int line = (int)pnp_get_drvdata(dev); + + if (!line) + return -ENODEV; + serial8250_resume_port(line - 1); + return 0; +} +#else +#define serial_pnp_suspend NULL +#define serial_pnp_resume NULL +#endif /* CONFIG_PM */ + static struct pnp_driver serial_pnp_driver = { .name = "serial", - .id_table = pnp_dev_table, .probe = serial_pnp_probe, .remove = __devexit_p(serial_pnp_remove), + .suspend= serial_pnp_suspend, + .resume = serial_pnp_resume, + .id_table = pnp_dev_table, }; static int __init serial8250_pnp_init(void) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 panic on boot -- i386
From: Pete Clements <[EMAIL PROTECTED]> Date: Wed, 29 Nov 2006 22:13:09 -0500 (EST) > 2.6.19 panics at boot. Good up through rc6-git11. > Hand copied screen below. Here is the fix, which was posted in response to a seperate report of this problem here: commit c28728decc37fe52c8cdf48b3e0c0cf9b0c2fefb Author: David S. Miller <[EMAIL PROTECTED]> Date: Wed Nov 29 18:14:47 2006 -0800 [IPV6] NDISC: Calculate packet length correctly for allocation. MAX_HEADER does not include the ipv6 header length in it, so we need to add it in explicitly. Signed-off-by: David S. Miller <[EMAIL PROTECTED]> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index 73eb8c3..c42d4c2 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -441,7 +441,8 @@ static void ndisc_send_na(struct net_dev struct sk_buff *skb; int err; - len = sizeof(struct icmp6hdr) + sizeof(struct in6_addr); + len = sizeof(struct ipv6hdr) + sizeof(struct icmp6hdr) + + sizeof(struct in6_addr); /* for anycast or proxy, solicited_addr != src_addr */ ifp = ipv6_get_ifaddr(solicited_addr, dev, 1); @@ -556,7 +557,8 @@ void ndisc_send_ns(struct net_device *de if (err < 0) return; - len = sizeof(struct icmp6hdr) + sizeof(struct in6_addr); + len = sizeof(struct ipv6hdr) + sizeof(struct icmp6hdr) + + sizeof(struct in6_addr); send_llinfo = dev->addr_len && !ipv6_addr_any(saddr); if (send_llinfo) len += ndisc_opt_addr_space(dev); @@ -632,7 +634,7 @@ void ndisc_send_rs(struct net_device *de if (err < 0) return; - len = sizeof(struct icmp6hdr); + len = sizeof(struct ipv6hdr) + sizeof(struct icmp6hdr); if (dev->addr_len) len += ndisc_opt_addr_space(dev); @@ -1381,7 +1383,8 @@ void ndisc_send_redirect(struct sk_buff struct in6_addr *target) { struct sock *sk = ndisc_socket->sk; - int len = sizeof(struct icmp6hdr) + 2 * sizeof(struct in6_addr); + int len = sizeof(struct ipv6hdr) + sizeof(struct icmp6hdr) + + 2 * sizeof(struct in6_addr); struct sk_buff *buff; struct icmp6hdr *icmph; struct in6_addr saddr_buf; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19-rc6-mm2
On Wed, 29 Nov 2006 22:42:20 -0500 Ed Tomlinson wrote: > On Tuesday 28 November 2006 05:02, Andrew Morton wrote: > > > Will appear eventually at > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19-rc6/2.6.19-rc6-mm2/ > > This kernel does not boot here. It does not get far enough to post anything > to my serial console. Have you tried using "earlyprintk=..." to see if it produces any more output? > The last booted kernel here is 19-rc5-mm2. Grub is used to boot, here is > the starting log > of rc5-mm2 build is UP AMD64: > > [0.00] Linux version 2.6.19-rc5-mm2 ([EMAIL PROTECTED]) (gcc version > 4.1.1 (Gentoo 4.1.1-r1)) #1 PREEM6 > [0.00] Command line: root=/dev/sda3 vga=0x318 > video=vesafb:ywrap,mtrr:3 console=tty0 console=tty1 > [0.00] BIOS-provided physical RAM map: > [0.00] BIOS-e820: - 0009f800 (usable) > [0.00] BIOS-e820: 0009f800 - 000a (reserved) > [0.00] BIOS-e820: 000f - 0010 (reserved) > [0.00] BIOS-e820: 0010 - 3fff (usable) > [0.00] BIOS-e820: 3fff - 3fff3000 (ACPI NVS) > [0.00] BIOS-e820: 3fff3000 - 4000 (ACPI data) > [0.00] BIOS-e820: fec0 - fec01000 (reserved) > [0.00] BIOS-e820: fee0 - fef0 (reserved) > [0.00] BIOS-e820: fefffc00 - ff00 (reserved) > [0.00] BIOS-e820: - 0001 (reserved) > [0.00] end_pfn_map = 1048576 > [0.00] DMI 2.2 present. > [0.00] Zone PFN ranges: > [0.00] DMA 0 -> 4096 > [0.00] DMA324096 -> 1048576 > [0.00] Normal1048576 -> 1048576 > [0.00] early_node_map[2] active PFN ranges > [0.00] 0:0 -> 159 > [0.00] 0: 256 -> 262128 > [0.00] Nvidia board detected. Ignoring ACPI timer override. > [0.00] ACPI: PM-Timer IO Port: 0x4008 > [0.00] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) > [0.00] Processor #0 (Bootup-CPU) > [0.00] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) > [0.00] ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0]) > [0.00] IOAPIC[0]: apic_id 2, address 0xfec0, GSI 0-23 > [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) > [0.00] ACPI: BIOS IRQ0 pin2 override ignored. > [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) > [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge) > [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge) > [0.00] Setting APIC routing to flat > [0.00] Using ACPI (MADT) for SMP configuration information > [0.00] Nosave address range: 0009f000 - 000a > [0.00] Nosave address range: 000a - 000f > [0.00] Nosave address range: 000f - 0010 > [0.00] Allocating PCI resources starting at 5000 (gap: > 4000:bec0) > [0.00] Built 1 zonelists. Total pages: 257320 > [0.00] Kernel command line: root=/dev/sda3 vga=0x318 > video=vesafb:ywrap,mtrr:3 console=tty0 cons1 > [0.00] Initializing CPU#0 > [0.00] PID hash table entries: 4096 (order: 12, 32768 bytes) > > Any ideas what I should try or suggestions on patches to remove/try. > > Thanks > Ed --- ~Randy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
man-pages-2.43 is released
Gidday, I just released man-pages-2.43. This release is now available for download at: ftp://ftp.kernel.org/pub/linux/docs/manpages or mirrors: ftp://ftp.XX.kernel.org/pub/linux/docs/manpages and soon at: ftp://ftp.win.tue.nl/pub/linux-local/manpages Changes in this release that may be of interest to readers of this list include the following: Changes to individual pages --- rtc.4 David Brownell Update the RTC man page to reflect the new RTC class framework: - Generalize ... it's not just for PC/AT style RTCs, and there may be more than one RTC per system. - Not all RTCs expose the same feature set as PC/AT ones; most of these ioctls will be rejected by some RTCs. - Be explicit about when {A,P}IE_{ON,OFF} calls are needed. - Describe the parameter to the get/set epoch request; correct the description of the get/set frequency parameter. - Document RTC_WKALM_{RD,SET}, which don't need AIE_{ON,OFF} and which support longer alarm periods. - Hey, not all system clock implementations count timer irqs any more now that the new RT-derived clock support is merging. raw.7 udp.7 Andi Kleen Describe the correct default for UDP/RAW path MTU discovery. == Cheers, Michael -- Michael Kerrisk maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 Want to help with man page maintenance? Grab the latest tarball at http://www.kernel.org/pub/linux/docs/manpages/ read the HOWTOHELP file and grep the source files for 'FIXME'. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 panic on boot -- i386
Quoting Randy Dunlap > > 2.6.19 panics at boot. Good up through rc6-git11. > > Hand copied screen below. > > Try the patch that DaveM recently posted: > http://lkml.org/lkml/2006/11/29/335 > > --- > ~Randy > That fixed it. -- Pete Clements - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH 1/2] qrcu: "quick" srcu implementation
On 11/30, Oleg Nesterov wrote: > > On 11/29, Paul E. McKenney wrote: > > > > Hmmm... Now I am wondering if the memory barriers inherent in the > > __wait_event() suffice for this last barrier... :-/ Thoughts? > > > > > + smp_mb(); > > Fastpath skips __wait_event(), and it is possible that the reader does > lock/unlock between the first 'mb()' and 'if (atomic_read() == 1)'. In fact, a slow path needs (I think) it too. We can have an unrelated wakeup, and then the reader does unlock() before we check !atomic_read() in the __wait_event()'s loop. The reader removes us from ->wq, in that case finish_wait() does nothing. Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: A commit between 2.6.16.4 and 2.6.16.5 failed crashme
Thanks for your report. A git-bisect might be a bit of overkill considering that there were only two patches applied beween 2.6.16.4 and 2.6.16.5: Andi Kleen (2): x86_64: Clean up execve x86_64: When user could have changed RIP always force IRET (CVE-2006-0744) I've attached both patches. Hi Andi, I found that this patch is also in 2.6.18.3, but crashme doesn't trigger kernel panic for 2.6.18.3..weird. Thanks, Forrest - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19-rc6-mm2
On Tuesday 28 November 2006 05:02, Andrew Morton wrote: > Will appear eventually at > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19-rc6/2.6.19-rc6-mm2/ This kernel does not boot here. It does not get far enough to post anything to my serial console. The last booted kernel here is 19-rc5-mm2. Grub is used to boot, here is the starting log of rc5-mm2 build is UP AMD64: [0.00] Linux version 2.6.19-rc5-mm2 ([EMAIL PROTECTED]) (gcc version 4.1.1 (Gentoo 4.1.1-r1)) #1 PREEM6 [0.00] Command line: root=/dev/sda3 vga=0x318 video=vesafb:ywrap,mtrr:3 console=tty0 console=tty1 [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: - 0009f800 (usable) [0.00] BIOS-e820: 0009f800 - 000a (reserved) [0.00] BIOS-e820: 000f - 0010 (reserved) [0.00] BIOS-e820: 0010 - 3fff (usable) [0.00] BIOS-e820: 3fff - 3fff3000 (ACPI NVS) [0.00] BIOS-e820: 3fff3000 - 4000 (ACPI data) [0.00] BIOS-e820: fec0 - fec01000 (reserved) [0.00] BIOS-e820: fee0 - fef0 (reserved) [0.00] BIOS-e820: fefffc00 - ff00 (reserved) [0.00] BIOS-e820: - 0001 (reserved) [0.00] end_pfn_map = 1048576 [0.00] DMI 2.2 present. [0.00] Zone PFN ranges: [0.00] DMA 0 -> 4096 [0.00] DMA324096 -> 1048576 [0.00] Normal1048576 -> 1048576 [0.00] early_node_map[2] active PFN ranges [0.00] 0:0 -> 159 [0.00] 0: 256 -> 262128 [0.00] Nvidia board detected. Ignoring ACPI timer override. [0.00] ACPI: PM-Timer IO Port: 0x4008 [0.00] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) [0.00] Processor #0 (Bootup-CPU) [0.00] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) [0.00] ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0]) [0.00] IOAPIC[0]: apic_id 2, address 0xfec0, GSI 0-23 [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [0.00] ACPI: BIOS IRQ0 pin2 override ignored. [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge) [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge) [0.00] Setting APIC routing to flat [0.00] Using ACPI (MADT) for SMP configuration information [0.00] Nosave address range: 0009f000 - 000a [0.00] Nosave address range: 000a - 000f [0.00] Nosave address range: 000f - 0010 [0.00] Allocating PCI resources starting at 5000 (gap: 4000:bec0) [0.00] Built 1 zonelists. Total pages: 257320 [0.00] Kernel command line: root=/dev/sda3 vga=0x318 video=vesafb:ywrap,mtrr:3 console=tty0 cons1 [0.00] Initializing CPU#0 [0.00] PID hash table entries: 4096 (order: 12, 32768 bytes) Any ideas what I should try or suggestions on patches to remove/try. Thanks Ed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[rfc patch] optimize o_direct on block device
I've been complaining about O_DIRECT I/O processing being exceedingly complex and slow since March 2005, see posting below: http://marc.theaimsgroup.com/?l=linux-kernel=111033309732261=2 At that time, a patch was written for raw device to demonstrate that large performance head room is achievable (at ~20% speedup for micro- benchmark and ~2% for db transaction processing benchmark) with a tight I/O submission processing loop. Since raw device is being slowly phased out, I've rewritten the patch for block device. O_DIRECT on block device is much simpler than O_D on file system. Part of the reason that direct_io_worker is so complex is because of O_D on file system, where it needs to perform block allocation, hole detection, extents file on write, and tons of other corner cases. The end result is that it takes tons of CPU time to submit an I/O. For block device, the block allocation is much simpler and I can write a really tight double loop to iterate each iovec and each page within the iovec in order to construct/prepare bio structure and then subsequently submit it to the block layer. So here it goes, posted here for comments. A few notes on the patch: (1) I need a vector structure similar to pagevec, however, pagevec doesn't have everything that I need, i.e., an iterator variable. So I create a new struct pvec. Maybe something can be worked out with pagevec? (2) there are some inconsistency for synchronous I/O: condition to update ppos and condition to wait on sync_kiocb is incompatible. Call chain looks like the following: do_sync_read generic_file_aio_read ... blkdev_direct_IO do_sync_read will wait for I/O completion only if lower function returned -EIOCBQUEUED. Updating ppos is done via generic_file_aio_read, but only if the lower function returned positive value. So I either have to construct my own wait_on_sync_kiocb, or hack around the ppos update. (3) I/O length setup in kiocb is inconsistent between normal read vs vector read or aio_read. One is passed in kiocb->ki_left vs others passing total length in kiocb->nbytes. I've made them consistent in the read path (note to self: I need to add the same thing in do_sync_write). Signed-off-by: Ken Chen <[EMAIL PROTECTED]> --- ./fs/block_dev.c.orig 2006-11-29 14:52:20.0 -0800 +++ ./fs/block_dev.c2006-11-29 16:45:36.0 -0800 @@ -129,43 +129,147 @@ blkdev_get_block(struct inode *inode, se return 0; } -static int -blkdev_get_blocks(struct inode *inode, sector_t iblock, - struct buffer_head *bh, int create) +int blk_end_aio(struct bio *bio, unsigned int bytes_done, int error) { - sector_t end_block = max_block(I_BDEV(inode)); - unsigned long max_blocks = bh->b_size >> inode->i_blkbits; + struct kiocb* iocb = bio->bi_private; + atomic_t* bio_count = (atomic_t*) >private; + long res; + + if ((bio->bi_rw & 1) == READ) + bio_check_pages_dirty(bio); + else { + bio_release_pages(bio); + bio_put(bio); + } - if ((iblock + max_blocks) > end_block) { - max_blocks = end_block - iblock; - if ((long)max_blocks <= 0) { - if (create) - return -EIO;/* write fully beyond EOF */ - /* -* It is a read which is fully beyond EOF. We return -* a !buffer_mapped buffer -*/ - max_blocks = 0; - } + if (error) + iocb->ki_left = -EIO; + + if (atomic_dec_and_test(bio_count)) { + res = (iocb->ki_left < 0) ? iocb->ki_left : iocb->ki_nbytes; + aio_complete(iocb, res, 0); } - bh->b_bdev = I_BDEV(inode); - bh->b_blocknr = iblock; - bh->b_size = max_blocks << inode->i_blkbits; - if (max_blocks) - set_buffer_mapped(bh); return 0; } +#define VEC_SIZE 16 +struct pvec { + unsigned short nr; + unsigned short idx; + struct page *page[VEC_SIZE]; +}; + + +struct page *blk_get_page(unsigned long addr, size_t count, int rw, + struct pvec *pvec) +{ + int ret, nr_pages; + if (pvec->idx == pvec->nr) { + nr_pages = (addr + count + PAGE_SIZE - 1) / PAGE_SIZE - + addr / PAGE_SIZE; + nr_pages = min(nr_pages, VEC_SIZE); + down_read(>mm->mmap_sem); + ret = get_user_pages(current, current->mm, addr, nr_pages, +rw==READ, 0, pvec->page, NULL); + up_read(>mm->mmap_sem); + if (ret < 0) + return ERR_PTR(ret); + pvec->nr = ret; + pvec->idx = 0; + } + return pvec->page[pvec->idx++]; +} +
Re: [RFC, PATCH 1/2] qrcu: "quick" srcu implementation
On 11/29, Paul E. McKenney wrote: > > On Thu, Nov 30, 2006 at 04:57:14AM +0300, Oleg Nesterov wrote: > > (the same patch + comments from Paul) > > > With the addition of a comment for the smp_mb() at the beginning of > synchronize_qrcu(), shown below: > > Acked-by: Paul E. McKenney <[EMAIL PROTECTED]> Thanks! > /* >* The following memory barrier is needed to ensure that >* and subsequent freeing of data elements previously >* removed is seen by other CPUs after the wait completes. >*/ I think we have another reason for mb(), but I can't suggest a clear comment. struct data { ... int in_use; ... } void free_data(struct data *p) { BUG_ON(p->in_use); kfree(p); } struct data *DATA; Reader: qrcu_read_lock(); data = rcu_dereference(DATA); data->in_use = 1; do_something(data); data->in_use = 0; qrcu_read_unlock(); Writer: old = DATA; DATA = alloc_new_data(); synchronize_qrcu(); free_data(old); qrcu_read_unlock() does (implicit) mb() on reader's side, but we must pair it on our side, otherwise we can't be sure (of course, _only_ in theory) we are seeing all the changes (->in_use == 0) made by the reader. > Hmmm... Now I am wondering if the memory barriers inherent in the > __wait_event() suffice for this last barrier... :-/ Thoughts? > > > + smp_mb(); Fastpath skips __wait_event(), and it is possible that the reader does lock/unlock between the first 'mb()' and 'if (atomic_read() == 1)'. Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] genapic: default to physical mode on hotplug CPU kernels
On Wed, Nov 29, 2006 at 09:08:34AM +0100, Ingo Molnar wrote: > > * Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > hm - indeed. Then we can indeed do the patch below. Nice simplification! > > forgot to convert a few more places - full patch below. Acked-by: Suresh Siddha <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19 panic on boot -- i386
On Wed, 29 Nov 2006 22:13:09 -0500 (EST) Pete Clements wrote: > 2.6.19 panics at boot. Good up through rc6-git11. > Hand copied screen below. Try the patch that DaveM recently posted: http://lkml.org/lkml/2006/11/29/335 --- ~Randy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.19 panic on boot -- i386
2.6.19 panics at boot. Good up through rc6-git11. Hand copied screen below. -- Pete Clements Call Trace: [] ndisc_send_rs+0x420/0x460 [ipv6] [] ndisc_send_rs+0x42c/0x460 [ipv6] [] ndisc_send rs+0x420/0x460 [ipv6] [] addrconf_dad_completed+0x93/0xe0 [ipv6] [] addrconf_dad_timer+0x119/0x120 [ipv6] [] rebalance_tick+0x131/0x350 [] addrconf_dad_timer+0x0/0x120 [ipv6] [] run_timer_softirq+0x113/0x190 [] __do_softirq+0x75/0xf0 [] do_softirq+03b/0x50 [] smp_apic_timer_interrupt+0xa5/0xc0 [] apic_timer_interrupt+0x1f/0x24 [] default_idle+0x0/0x60 [] default_idle+031/0x60 [] cpu_idle+0x6c/0x90 [] start_kernel+0x34e/0x3d0 [] unknown_bootoption+0x0/0x290 Code: 8c 00 00 00 89 44 24 10 8b 44 24 2c 89 44 24 0c 8b 41 60 c7 04 24 e4 ac 36 c0 89 44 24 08 8b 44 24 30 89 44 24 04 e8 9d 51 e6 ff <0f> 0b 5d 00 1a 84 36 c0 83 c4 24 c3 90 55 57 56 53 83 ec 2c 8b EIP: [] skb_over_panic+0x63/0x70 SS:ESP 0068:c03cfe08 <0>Kernel panic - not syncing: Fatal exception in interrupt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, PATCH 1/2] qrcu: "quick" srcu implementation
On Thu, Nov 30, 2006 at 04:57:14AM +0300, Oleg Nesterov wrote: > (the same patch + comments from Paul) > > [RFC, PATCH 1/2] qrcu: "quick" srcu implementation > > Very much based on ideas, corrections, and patient explanations from > Alan and Paul. > > The current srcu implementation is very good for readers, lock/unlock > are extremely cheap. But for that reason it is not possible to avoid > synchronize_sched() and polling in synchronize_srcu(). > > Jens Axboe wrote: > > > > It works for me, but the overhead is still large. Before it would take > > 8-12 jiffies for a synchronize_srcu() to complete without there actually > > being any reader locks active, now it takes 2-3 jiffies. So it's > > definitely faster, and as suspected the loss of two of three > > synchronize_sched() cut down the overhead to a third. > > 'qrcu' behaves the same as srcu but optimized for writers. The fast path > for synchronize_qrcu() is mutex_lock() + atomic_read() + mutex_unlock(). > The slow path is __wait_event(), no polling. However, the reader does > atomic inc/dec on lock/unlock, and the counters are not per-cpu. > > Also, unlike srcu, qrcu read lock/unlock can be used in interrupt context, > and 'qrcu_struct' can be compile-time initialized. > > See also (a long) discussion: > http://marc.theaimsgroup.com/?t=11637085763 With the addition of a comment for the smp_mb() at the beginning of synchronize_qrcu(), shown below: Acked-by: Paul E. McKenney <[EMAIL PROTECTED]> > Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]> > > --- 19-rc6/include/linux/srcu.h~1_qrcu2006-10-22 18:24:03.0 > +0400 > +++ 19-rc6/include/linux/srcu.h 2006-11-30 04:32:42.0 +0300 > @@ -27,6 +27,8 @@ > #ifndef _LINUX_SRCU_H > #define _LINUX_SRCU_H > > +#include > + > struct srcu_struct_array { > int c[2]; > }; > @@ -50,4 +52,32 @@ void srcu_read_unlock(struct srcu_struct > void synchronize_srcu(struct srcu_struct *sp); > long srcu_batches_completed(struct srcu_struct *sp); > > +/* > + * fully compatible with srcu, but optimized for writers. > + */ > + > +struct qrcu_struct { > + int completed; > + atomic_t ctr[2]; > + wait_queue_head_t wq; > + struct mutex mutex; > +}; > + > +int init_qrcu_struct(struct qrcu_struct *qp); > +int qrcu_read_lock(struct qrcu_struct *qp); > +void qrcu_read_unlock(struct qrcu_struct *qp, int idx); > +void synchronize_qrcu(struct qrcu_struct *qp); > + > +/** > + * cleanup_qrcu_struct - deconstruct a quick-RCU structure > + * @qp: structure to clean up. > + * > + * Must invoke this after you are finished using a given qrcu_struct that > + * was initialized via init_qrcu_struct(). We reserve the right to > + * leak memory should you fail to do this! > + */ > +static inline void cleanup_qrcu_struct(struct qrcu_struct *qp) > +{ > +} > + > #endif > --- 19-rc6/kernel/srcu.c~1_qrcu 2006-10-22 18:24:03.0 +0400 > +++ 19-rc6/kernel/srcu.c 2006-11-30 04:39:53.0 +0300 > @@ -256,3 +256,94 @@ EXPORT_SYMBOL_GPL(srcu_read_unlock); > EXPORT_SYMBOL_GPL(synchronize_srcu); > EXPORT_SYMBOL_GPL(srcu_batches_completed); > EXPORT_SYMBOL_GPL(srcu_readers_active); > + > +/** > + * init_qrcu_struct - initialize a quick-RCU structure. > + * @qp: structure to initialize. > + * > + * Must invoke this on a given qrcu_struct before passing that qrcu_struct > + * to any other function. Each qrcu_struct represents a separate domain > + * of QRCU protection. > + */ > +int init_qrcu_struct(struct qrcu_struct *qp) > +{ > + qp->completed = 0; > + atomic_set(qp->ctr + 0, 1); > + atomic_set(qp->ctr + 1, 0); > + init_waitqueue_head(>wq); > + mutex_init(>mutex); > + > + return 0; > +} > + > +/** > + * qrcu_read_lock - register a new reader for an QRCU-protected structure. > + * @qp: qrcu_struct in which to register the new reader. > + * > + * Counts the new reader in the appropriate element of the qrcu_struct. > + * Returns an index that must be passed to the matching qrcu_read_unlock(). > + */ > +int qrcu_read_lock(struct qrcu_struct *qp) > +{ > + for (;;) { > + int idx = qp->completed & 0x1; > + if (likely(atomic_inc_not_zero(qp->ctr + idx))) > + return idx; > + } > +} > + > +/** > + * qrcu_read_unlock - unregister a old reader from an QRCU-protected > structure. > + * @qp: qrcu_struct in which to unregister the old reader. > + * @idx: return value from corresponding qrcu_read_lock(). > + * > + * Removes the count for the old reader from the appropriate element of > + * the qrcu_struct. > + */ > +void qrcu_read_unlock(struct qrcu_struct *qp, int idx) > +{ > + if (atomic_dec_and_test(qp->ctr + idx)) > + wake_up(>wq); > +} > + > +/** > + * synchronize_qrcu - wait for prior QRCU read-side critical-section > completion > + * @qp: qrcu_struct with which to synchronize. > + * > + * Flip the completed counter, and wait for the old count to drain to zero. > + * As with
Re: Linux 2.6.19
On Wed, Nov 29, 2006 at 05:08:15PM -0800, Randy Dunlap wrote: > On Wed, 29 Nov 2006 18:56:31 -0600 Greg Norris wrote: > > On a similar vein, it'd be nice if http://www.kernel.org/kdist/version.html > > would break the entries into separate lines. > > I prefer to use > http://www.kernel.org/kdist/finger_banner > for that. I use that in some cases as well, but the browser on my PDA insists upon trying to download that file rather than simply displaying it. So I sometimes need to use version.html instead, even though it renders poorly under every browser I've tried. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19-rc6-mm2: uli526x only works after reload
On Thu, 30 Nov 2006 02:04:15 +0100 "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > > > > git-netdev-all.patch > > > git-netdev-all-fixup.patch > > > libphy-dont-do-that.patch > > > > Are you able to eliminate libphy-dont-do-that.patch? > > > > > Is a broken-out version of git-netdev-all.patch available from somewhere? > > > > Nope, and my few fumbling attempts to generate the sort of patch series > > which you want didn't work out too well. One has to downgrade to > > git-bisect :( > > > > What does "doesn't work" mean, btw? > > Well, it turns out not to be 100% reproducible. I can only reproduce it after > a soft reboot (eg. shutdown -r now). > > Then, while configuring network interfaces the system says the interface name > is ethxx0, but it should be eth1 (eth0 is an RTL-8139, which is not used). > Now > if I run ifconfig, it says: > > eth0: error fetching interface information: Device not found > > and that's all (normally, ifconfig would show the information for lo and eth1, > without eth0). Moreover, 'ifconfig eth1' says: > > eth1: error fetching interface information: Device not found > > Next, I run 'rmmod uli526x' and 'modprobe uli526x' and then 'ifconfig' is > still saying the above (about eth0), but 'ifconfig eth1' seems to work as > it should. However, the interface often fails to transfer anything after > that. Lovely. Sounds like some startup race, perhaps against userspace. Is CONFIG_PCI_MULTITHREAD_PROBE set? (err, we meant to disable that for 2.6.19 but forgot). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP
From: Wenji Wu <[EMAIL PROTECTED]> Date: Wed, 29 Nov 2006 19:56:58 -0600 > >We could also pepper tcp_recvmsg() with some very carefully placed > >preemption disable/enable calls to deal with this even with > >CONFIG_PREEMPT enabled. > > I also think about this approach. But since the "problem" happens in > the 2.6 Desktop and Low-latency Desktop (not server), system > responsiveness is a key feature, simply placing preemption > disabled/enable call might not work. If you want to place > preemption disable/enable calls within tcp_recvmsg, you have to put > them in the very beginning and end of the call. Disabling preemption > would degrade system responsiveness. We can make explicitl preemption checks in the main loop of tcp_recvmsg(), and release the socket and run the backlog if need_resched() is TRUE. This is the simplest and most elegant solution to this problem. The one suggested in your patch and paper are way overkill, there is no reason to solve a TCP specific problem inside of the generic scheduler. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm] sata_nv: add suspend/resume support
The attached patch is against 2.6.18-rc6-mm1, to be applied on top of the patch "sata_nv: fix ATAPI in ADMA mode" which Andrew and Jeff already have in their trees. I've only been able to test this myself by doing an aborted suspend and immediate resume and verifying it doesn't blow up in that case (suspend-to-RAM is broken on my box and something isn't configured properly for suspend-to-disk to work). However, since resume will definitely not work on some of these controllers without this patch, I think it's an improvement in any case.. --- This patch adds the necessary callbacks to support suspend/resume properly in sata_nv. Most of the controllers don't need any specific handling but CK804/MCP04 controllers, whether ADMA is enabled or not, need some additional setup on resume. As well as the additional storage of the controller type needed for proper resume handling, this also removes the inline helper functions for getting ADMA register locations by storing the pointers so we don't have to keep calculating them all the time. Signed-off-by: Robert Hancock <[EMAIL PROTECTED]> --- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ --- linux-2.6.19-rc6-mm1-admafixnoresume/drivers/ata/sata_nv.c 2006-11-26 00:53:44.0 -0600 +++ linux-2.6.19-rc6-mm1-admafix/drivers/ata/sata_nv.c 2006-11-29 18:42:17.0 -0600 @@ -49,7 +49,7 @@ #include #define DRV_NAME "sata_nv" -#define DRV_VERSION"3.2" +#define DRV_VERSION"3.3" #define NV_ADMA_DMA_BOUNDARY 0xUL @@ -213,12 +213,21 @@ struct nv_adma_port_priv { dma_addr_t cpb_dma; struct nv_adma_prd *aprd; dma_addr_t aprd_dma; + void __iomem * ctl_block; + void __iomem * gen_block; + void __iomem * notifier_clear_block; u8 flags; }; +struct nv_host_priv { + unsigned long type; +}; + #define NV_ADMA_CHECK_INTR(GCTL, PORT) ((GCTL) & ( 1 << (19 + (12 * (PORT) static int nv_init_one (struct pci_dev *pdev, const struct pci_device_id *ent); +static void nv_remove_one (struct pci_dev *pdev); +static int nv_pci_device_resume(struct pci_dev *pdev); static void nv_ck804_host_stop(struct ata_host *host); static irqreturn_t nv_generic_interrupt(int irq, void *dev_instance); static irqreturn_t nv_nf2_interrupt(int irq, void *dev_instance); @@ -239,6 +248,8 @@ static irqreturn_t nv_adma_interrupt(int static void nv_adma_irq_clear(struct ata_port *ap); static int nv_adma_port_start(struct ata_port *ap); static void nv_adma_port_stop(struct ata_port *ap); +static int nv_adma_port_suspend(struct ata_port *ap, pm_message_t mesg); +static int nv_adma_port_resume(struct ata_port *ap); static void nv_adma_error_handler(struct ata_port *ap); static void nv_adma_host_stop(struct ata_host *host); static void nv_adma_bmdma_setup(struct ata_queued_cmd *qc); @@ -292,7 +303,9 @@ static struct pci_driver nv_pci_driver = .name = DRV_NAME, .id_table = nv_pci_tbl, .probe = nv_init_one, - .remove = ata_pci_remove_one, + .suspend= ata_pci_device_suspend, + .resume = nv_pci_device_resume, + .remove = nv_remove_one, }; static struct scsi_host_template nv_sht = { @@ -311,6 +324,8 @@ static struct scsi_host_template nv_sht .slave_configure= ata_scsi_slave_config, .slave_destroy = ata_scsi_slave_destroy, .bios_param = ata_std_bios_param, + .suspend= ata_scsi_device_suspend, + .resume = ata_scsi_device_resume, }; static struct scsi_host_template nv_adma_sht = { @@ -330,6 +345,8 @@ static struct scsi_host_template nv_adma .slave_configure= nv_adma_slave_config, .slave_destroy = ata_scsi_slave_destroy, .bios_param = ata_std_bios_param, + .suspend= ata_scsi_device_suspend, + .resume = ata_scsi_device_resume, }; static const struct ata_port_operations nv_generic_ops = { @@ -438,6 +455,8 @@ static const struct ata_port_operations .scr_write = nv_scr_write, .port_start = nv_adma_port_start, .port_stop = nv_adma_port_stop, + .port_suspend = nv_adma_port_suspend, + .port_resume= nv_adma_port_resume, .host_stop = nv_adma_host_stop, }; @@ -476,6 +495,7 @@ static struct ata_port_info nv_port_info { .sht= _adma_sht, .flags = ATA_FLAG_SATA | ATA_FLAG_NO_LEGACY | + ATA_FLAG_HRST_TO_RESUME
Re: XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c (kernel 2.6.18.1)
On Wed, Nov 29, 2006 at 10:17:25AM +0100, Jesper Juhl wrote: > On 29/11/06, David Chinner <[EMAIL PROTECTED]> wrote: > >On Tue, Nov 28, 2006 at 04:49:00PM +0100, Jesper Juhl wrote: > >> Filesystem "dm-1": XFS internal error xfs_trans_cancel at line 1138 of > >> file fs/xfs/xfs_trans.c. Caller 0x8034b47e > >> > >> Call Trace: > >> [] show_trace+0xb2/0x380 > >> [] dump_stack+0x15/0x20 > >> [] xfs_error_report+0x3c/0x50 > >> [] xfs_trans_cancel+0x6e/0x130 > >> [] xfs_create+0x5ee/0x6a0 > >> [] xfs_vn_mknod+0x156/0x2e0 > >> [] xfs_vn_create+0xb/0x10 > >> [] vfs_create+0x8c/0xd0 > >> [] nfsd_create_v3+0x31a/0x560 > >> [] nfsd3_proc_create+0x148/0x170 > >> [] nfsd_dispatch+0xf9/0x1e0 > >> [] svc_process+0x437/0x6e0 > >> [] nfsd+0x1cd/0x360 > >> [] child_rip+0xa/0x12 > >> xfs_force_shutdown(dm-1,0x8) called from line 1139 of file > >> fs/xfs/xfs_trans.c. Return address = 0x80359daa > > > >We shut down the filesystem because we cancelled a dirty transaction. > >Once we start to dirty the incore objects, we can't roll back to > >an unchanged state if a subsequent fatal error occurs during the > >transaction and we have to abort it. > > > So you are saying that there's nothing I can do to prevent this from > happening in the future? Pretty much - we need to work out what is going wrong and we can't from teh shutdown message above - the error has occurred in a path that doesn't have error report traps in it. Is this reproducable? > >If I understand historic occurrences of this correctly, there is > >a possibility that it can be triggered in ENOMEM situations. Was your > >machine running out of memoy when this occurred? > > > Not really. I just checked my monitoring software and, at the time > this happened, the box had ~5.9G RAM free (of 8G total) and no swap > used (but 11G available). Ok. Sounds like we need more error reporting points inserted into that code so we dump an error earlier and hence have some hope of working out what went wrong next time. OOC, there weren't any I/O errors reported before this shutdown? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.18-3.rt10.0001 report
Hi, Like I right for rt7 , I had successfully boot without notsc, but not all times. The same happens with rt10 I can have 3 results when I boot without notsc (with notsc I don't had/see any problem) : 1st boot without errors (dmesg on http://bugzilla.kernel.org/show_bug.cgi?id=6419#c59 ) 2nd hangs on boot with last message input: ImPS/2 Generic Wheel Mouse as /class/input/input1and 3rd boot but gives a long oops. (dmesg on http://bugzilla.kernel.org/show_bug.cgi?id=6419#c60 ) Thanks, -- Sérgio M.B. smime.p7s Description: S/MIME cryptographic signature
Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP
> That yield() will need to be removed - yield()'s behaviour is truly > awfulif the system is otherwise busy. What is it there for? Please read the uploaded paper, which has detailed description. thanks, wenji - Original Message - From: Andrew Morton <[EMAIL PROTECTED]> Date: Wednesday, November 29, 2006 7:08 pm Subject: Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP > On Wed, 29 Nov 2006 16:53:11 -0800 (PST) > David Miller <[EMAIL PROTECTED]> wrote: > > > > > Please, it is very difficult to review your work the way you have > > submitted this patch as a set of 4 patches. These patches have not > > been split up "logically", but rather they have been split up "per > > file" with the same exact changelog message in each patch posting. > > This is very clumsy, and impossible to review, and wastes a lot of > > mailing list bandwith. > > > > We have an excellent file, called > Documentation/SubmittingPatches, in > > the kernel source tree, which explains exactly how to do this > > correctly. > > > > By splitting your patch into 4 patches, one for each file touched, > > it is impossible to review your patch as a logical whole. > > > > Please also provide your patch inline so people can just hit reply > > in their mail reader client to quote your patch and comment on it. > > This is impossible with the attachments you've used. > > > > Here you go - joined up, cleaned up, ported to mainline and test- > compiled. > That yield() will need to be removed - yield()'s behaviour is truly > awfulif the system is otherwise busy. What is it there for? > > > > From: Wenji Wu <[EMAIL PROTECTED]> > > For Linux TCP, when the network applcaiton make system call to move > data from > socket's receive buffer to user space by calling tcp_recvmsg(). > The socket > will be locked. During this period, all the incoming packet for > the TCP > socket will go to the backlog queue without being TCP processed > > Since Linux 2.6 can be inerrupted mid-task, if the network application > expires, and moved to the expired array with the socket locked, all > thepackets within the backlog queue will not be TCP processed till > the network > applicaton resume its execution. If the system is heavily loaded, > TCP can > easily RTO in the Sender Side. > > > > include/linux/sched.h |2 ++ > kernel/fork.c |3 +++ > kernel/sched.c| 24 ++-- > net/ipv4/tcp.c|9 + > 4 files changed, 32 insertions(+), 6 deletions(-) > > diff -puN net/ipv4/tcp.c~tcp-speedup net/ipv4/tcp.c > --- a/net/ipv4/tcp.c~tcp-speedup > +++ a/net/ipv4/tcp.c > @@ -1109,6 +1109,8 @@ int tcp_recvmsg(struct kiocb *iocb, stru > struct task_struct *user_recv = NULL; > int copied_early = 0; > > + current->backlog_flag = 1; > + > lock_sock(sk); > > TCP_CHECK_TIMER(sk); > @@ -1468,6 +1470,13 @@ skip_copy: > > TCP_CHECK_TIMER(sk); > release_sock(sk); > + > + current->backlog_flag = 0; > + if (current->extrarun_flag == 1){ > + current->extrarun_flag = 0; > + yield(); > + } > + > return copied; > > out: > diff -puN include/linux/sched.h~tcp-speedup include/linux/sched.h > --- a/include/linux/sched.h~tcp-speedup > +++ a/include/linux/sched.h > @@ -1023,6 +1023,8 @@ struct task_struct { > #ifdefCONFIG_TASK_DELAY_ACCT > struct task_delay_info *delays; > #endif > + int backlog_flag; /* packets wait in tcp backlog queue flag */ > + int extrarun_flag; /* extra run flag for TCP performance */ > }; > > static inline pid_t process_group(struct task_struct *tsk) > diff -puN kernel/sched.c~tcp-speedup kernel/sched.c > --- a/kernel/sched.c~tcp-speedup > +++ a/kernel/sched.c > @@ -3099,12 +3099,24 @@ void scheduler_tick(void) > > if (!rq->expired_timestamp) > rq->expired_timestamp = jiffies; > - if (!TASK_INTERACTIVE(p) || expired_starving(rq)) { > - enqueue_task(p, rq->expired); > - if (p->static_prio < rq->best_expired_prio) > - rq->best_expired_prio = p->static_prio; > - } else > - enqueue_task(p, rq->active); > + if (p->backlog_flag == 0) { > + if (!TASK_INTERACTIVE(p) || expired_starving(rq)) { > + enqueue_task(p, rq->expired); > + if (p->static_prio < rq->best_expired_prio) > + rq->best_expired_prio = p- > >static_prio;+} else > + enqueue_task(p, rq->active); > + } else { > + if (expired_starving(rq)) { > + enqueue_task(p,rq->expired); > + if (p->static_prio < rq->best_expired_prio) > + rq->best_expired_prio = p- >
Re: [RFC, PATCH 1/2] qrcu: "quick" srcu implementation
(the same patch + comments from Paul) [RFC, PATCH 1/2] qrcu: "quick" srcu implementation Very much based on ideas, corrections, and patient explanations from Alan and Paul. The current srcu implementation is very good for readers, lock/unlock are extremely cheap. But for that reason it is not possible to avoid synchronize_sched() and polling in synchronize_srcu(). Jens Axboe wrote: > > It works for me, but the overhead is still large. Before it would take > 8-12 jiffies for a synchronize_srcu() to complete without there actually > being any reader locks active, now it takes 2-3 jiffies. So it's > definitely faster, and as suspected the loss of two of three > synchronize_sched() cut down the overhead to a third. 'qrcu' behaves the same as srcu but optimized for writers. The fast path for synchronize_qrcu() is mutex_lock() + atomic_read() + mutex_unlock(). The slow path is __wait_event(), no polling. However, the reader does atomic inc/dec on lock/unlock, and the counters are not per-cpu. Also, unlike srcu, qrcu read lock/unlock can be used in interrupt context, and 'qrcu_struct' can be compile-time initialized. See also (a long) discussion: http://marc.theaimsgroup.com/?t=11637085763 Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]> --- 19-rc6/include/linux/srcu.h~1_qrcu 2006-10-22 18:24:03.0 +0400 +++ 19-rc6/include/linux/srcu.h 2006-11-30 04:32:42.0 +0300 @@ -27,6 +27,8 @@ #ifndef _LINUX_SRCU_H #define _LINUX_SRCU_H +#include + struct srcu_struct_array { int c[2]; }; @@ -50,4 +52,32 @@ void srcu_read_unlock(struct srcu_struct void synchronize_srcu(struct srcu_struct *sp); long srcu_batches_completed(struct srcu_struct *sp); +/* + * fully compatible with srcu, but optimized for writers. + */ + +struct qrcu_struct { + int completed; + atomic_t ctr[2]; + wait_queue_head_t wq; + struct mutex mutex; +}; + +int init_qrcu_struct(struct qrcu_struct *qp); +int qrcu_read_lock(struct qrcu_struct *qp); +void qrcu_read_unlock(struct qrcu_struct *qp, int idx); +void synchronize_qrcu(struct qrcu_struct *qp); + +/** + * cleanup_qrcu_struct - deconstruct a quick-RCU structure + * @qp: structure to clean up. + * + * Must invoke this after you are finished using a given qrcu_struct that + * was initialized via init_qrcu_struct(). We reserve the right to + * leak memory should you fail to do this! + */ +static inline void cleanup_qrcu_struct(struct qrcu_struct *qp) +{ +} + #endif --- 19-rc6/kernel/srcu.c~1_qrcu 2006-10-22 18:24:03.0 +0400 +++ 19-rc6/kernel/srcu.c2006-11-30 04:39:53.0 +0300 @@ -256,3 +256,94 @@ EXPORT_SYMBOL_GPL(srcu_read_unlock); EXPORT_SYMBOL_GPL(synchronize_srcu); EXPORT_SYMBOL_GPL(srcu_batches_completed); EXPORT_SYMBOL_GPL(srcu_readers_active); + +/** + * init_qrcu_struct - initialize a quick-RCU structure. + * @qp: structure to initialize. + * + * Must invoke this on a given qrcu_struct before passing that qrcu_struct + * to any other function. Each qrcu_struct represents a separate domain + * of QRCU protection. + */ +int init_qrcu_struct(struct qrcu_struct *qp) +{ + qp->completed = 0; + atomic_set(qp->ctr + 0, 1); + atomic_set(qp->ctr + 1, 0); + init_waitqueue_head(>wq); + mutex_init(>mutex); + + return 0; +} + +/** + * qrcu_read_lock - register a new reader for an QRCU-protected structure. + * @qp: qrcu_struct in which to register the new reader. + * + * Counts the new reader in the appropriate element of the qrcu_struct. + * Returns an index that must be passed to the matching qrcu_read_unlock(). + */ +int qrcu_read_lock(struct qrcu_struct *qp) +{ + for (;;) { + int idx = qp->completed & 0x1; + if (likely(atomic_inc_not_zero(qp->ctr + idx))) + return idx; + } +} + +/** + * qrcu_read_unlock - unregister a old reader from an QRCU-protected structure. + * @qp: qrcu_struct in which to unregister the old reader. + * @idx: return value from corresponding qrcu_read_lock(). + * + * Removes the count for the old reader from the appropriate element of + * the qrcu_struct. + */ +void qrcu_read_unlock(struct qrcu_struct *qp, int idx) +{ + if (atomic_dec_and_test(qp->ctr + idx)) + wake_up(>wq); +} + +/** + * synchronize_qrcu - wait for prior QRCU read-side critical-section completion + * @qp: qrcu_struct with which to synchronize. + * + * Flip the completed counter, and wait for the old count to drain to zero. + * As with classic RCU, the updater must use some separate means of + * synchronizing concurrent updates. Can block; must be called from + * process context. + * + * Note that it is illegal to call synchronize_qrcu() from the corresponding + * QRCU read-side critical section; doing so will result in deadlock. + * However, it is perfectly legal to call synchronize_qrcu() on one + * qrcu_struct from some other qrcu_struct's read-side critical section. + */ +void
Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP
Yes, when CONFIG_PREEMPT is disabled, the "problem" won't happen. That is why I put "for 2.6 desktop, low-latency desktop" in the uploaded paper. This "problem" happens in the 2.6 Desktop and Low-latency Desktop. >We could also pepper tcp_recvmsg() with some very carefully placed preemption >disable/enable calls to deal with this even with CONFIG_PREEMPT enabled. I also think about this approach. But since the "problem" happens in the 2.6 Desktop and Low-latency Desktop (not server), system responsiveness is a key feature, simply placing preemption disabled/enable call might not work. If you want to place preemption disable/enable calls within tcp_recvmsg, you have to put them in the very beginning and end of the call. Disabling preemption would degrade system responsiveness. wenji - Original Message - From: David Miller <[EMAIL PROTECTED]> Date: Wednesday, November 29, 2006 7:13 pm Subject: Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP > From: Andrew Morton <[EMAIL PROTECTED]> > Date: Wed, 29 Nov 2006 17:08:35 -0800 > > > On Wed, 29 Nov 2006 16:53:11 -0800 (PST) > > David Miller <[EMAIL PROTECTED]> wrote: > > > > > > > > Please, it is very difficult to review your work the way you have > > > submitted this patch as a set of 4 patches. These patches have > not> > been split up "logically", but rather they have been split > up "per > > > file" with the same exact changelog message in each patch posting. > > > This is very clumsy, and impossible to review, and wastes a lot of > > > mailing list bandwith. > > > > > > We have an excellent file, called > Documentation/SubmittingPatches, in > > > the kernel source tree, which explains exactly how to do this > > > correctly. > > > > > > By splitting your patch into 4 patches, one for each file touched, > > > it is impossible to review your patch as a logical whole. > > > > > > Please also provide your patch inline so people can just hit reply > > > in their mail reader client to quote your patch and comment on it. > > > This is impossible with the attachments you've used. > > > > > > > Here you go - joined up, cleaned up, ported to mainline and test- > compiled.> > > That yield() will need to be removed - yield()'s behaviour is > truly awful > > if the system is otherwise busy. What is it there for? > > What about simply turning off CONFIG_PREEMPT to fix this "problem"? > > We always properly run the backlog (by doing a release_sock()) before > going to sleep otherwise except for the specific case of taking a page > fault during the copy to userspace. It is only CONFIG_PREEMPT that > can cause this situation to occur in other circumstances as far as I > can see. > > We could also pepper tcp_recvmsg() with some very carefully placed > preemption disable/enable calls to deal with this even with > CONFIG_PREEMPT enabled. > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: isochronous receives?
Hi Robert, I never resolved the problem. I turned on the excessive debugging output, but it didn't print out info about receiving packets or interrupts. My test app claimed there were no packets received although the bus analyzer showed lots of packets going by. If I can help out, let me know, but I'm not sure where to start at this point. Keith -Original Message- From: Robert Crocombe [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 28, 2006 4:59 PM To: Keith Curtis; linux1394-devel; linux-kernel Subject: isochronous receives? Keith, et. al, I am having problems with isochronous receives, and remembered just as I was getting ready to dig into the source that there was a message about this stuff. Lo and behold your message to linux1394-user from September 7: > I'm trying to receive isochronous streams (using libraw1394 1.2.0), and > I've noticed that if data is transmitted on channel 63, then my app tends > to work fine. If the stream is on a different channel, then I don't see > any isochronous packets at all. I'm using 2.4.29, I've also tried 2.6.15 > with similar results, can't seem to receive channels < 63. Did you ultimately have any success getting this going? Funnily enough, when I tested isochronous stuff in July, I just did iso transmit since I figured receives *must* be working since everyone has camcorders and whatnot. My currently my iso xmit stuff does appear to be working, but iso receives are not. I have a Firespy and no reason not to trust it, so I can see the junk I'm spewing out. I've tried transmitting on channels 4 and 63 (per your advice), but neither works for me. I suppose it could my stuff... nah. -- Robert Crocombe [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Infinite retries reading the partition table
--- Luben Tuikov <[EMAIL PROTECTED]> wrote: > Suppose reading sector 0 always reports an error, > sense key HARDWARE ERROR. > > What I'm observing is that the request to read sector 0, > reading partition information, is retried forever, ad infinitum. > > Does anyone have a patch to resolve this? (2.6.19-rc6) Actually the device sends SK: MEDIUM ERROR, ASC: UNRECOVERED READ ERR, but SCSI Core seems to retry reading the partition table (sector 0) forever. Anyone seen this and/or has a patch in their tree for it? Luben P.S. This is fairly straightforward to inject/test. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.19
Getting an oops on boot here, caused by commit e81c73596704793e73e6dbb478f41686f15a4b34 titled "[NET]: Fix MAX_HEADER setting". Reverting that patch fixes things up for me. Dave? Phil Bringing up interface eth0: skb_over_panic: text:c02af809 len:56 put:16 head:d7e213c0 data:d7e213d0 tail:d7e21408 end:d7e21400 dev:eth0 [ cut here ] kernel BUG at net/core/skbuff.c:93! invalid opcode: [#1] CPU:0 EIP:0060:[]Not tainted VLI EFLAGS: 00010296 (2.6.19 #1) EIP is at skb_over_panic+0x59/0x70 eax: 006f ebx: d7e213c0 ecx: edx: c03102c0 esi: d7e4f000 edi: d7e213f8 ebp: d7e4f000 esp: c037aec4 ds: 007b es: 007b ss: 0068 Process swapper (pid: 0, ti=c037a000 task=c03023e0 task.ti=c0347000) Stack: c02fbb9c c02af809 0038 0010 d7e213c0 d7e213d0 d7e21408 d7e21400 d7e4f000 0010 d6e84520 c02af80e d6c718a0 c037af6c 003a 0010 c037af6c d6c718a0 d6c09920 d79749c0 0001 02ff Call Trace: [] ndisc_send_rs+0x399/0x3e0 [] ndisc_send_rs+0x39e/0x3e0 [] addrconf_dad_completed+0x82/0xc0 [] addrconf_dad_timer+0xe5/0xf0 [] e100_poll+0x259/0x420 [] it_real_fn+0x0/0x60 [] cascade+0x3f/0x60 [] addrconf_dad_timer+0x0/0xf0 [] run_timer_softirq+0xab/0x170 [] __do_softirq+0x42/0xa0 [] do_softirq+0x60/0xb0 [] handle_edge_irq+0x0/0x110 [] do_IRQ+0x85/0xe0 [] schedule+0x29e/0x580 [] common_interrupt+0x1a/0x20 [] default_idle+0x32/0x60 [] cpu_idle+0x42/0x60 [] start_kernel+0x283/0x330 [] unknown_bootoption+0x0/0x260 === Code: 00 00 89 5c 24 14 8b 98 8c 00 00 00 89 54 24 0c 89 5c 24 10 8b 40 60 89 4c 2 4 04 c7 04 24 9c bb 2f c0 89 44 24 08 e8 47 07 ed ff <0f> 0b 5d 00 a4 91 2f c0 83 c4 24 5b 5e c3 89 f6 8d bc 27 00 00 EIP: [] skb_over_panic+0x59/0x70 SS:ESP 0068:c037aec4 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm] x86_64 UP needs smp_call_function_single
On Wed, 29 Nov 2006 17:01:11 -0800 Randy Dunlap <[EMAIL PROTECTED]> wrote: > From: Randy Dunlap <[EMAIL PROTECTED]> > > smp_call_function_single() needs to be visible in non-SMP builds, to fix: > > arch/x86_64/kernel/vsyscall.c:283: warning: implicit declaration of function > 'smp_call_function_single' > > The (other/trivial) fix (instead of this one) is to add: > #include > to linux-2.6.19-rc6-mm2/arch/x86_64/kernel/vsyscall.c > > Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]> > --- > include/asm-x86_64/smp.h |7 --- > include/linux/smp.h |7 +++ > 2 files changed, 7 insertions(+), 7 deletions(-) > > --- linux-2.6.19-rc6-mm2.orig/include/asm-x86_64/smp.h > +++ linux-2.6.19-rc6-mm2/include/asm-x86_64/smp.h > @@ -113,13 +113,6 @@ static __inline int logical_smp_processo > #define cpu_physical_id(cpu) x86_cpu_to_apicid[cpu] > #else > #define cpu_physical_id(cpu) boot_cpu_id > -static inline int smp_call_function_single(int cpuid, void (*func) (void > *info), > - void *info, int retry, int wait) > -{ > - /* Disable interrupts here? */ > - func(info); > - return 0; > -} > #endif /* !CONFIG_SMP */ > #endif > > --- linux-2.6.19-rc6-mm2.orig/include/linux/smp.h > +++ linux-2.6.19-rc6-mm2/include/linux/smp.h > @@ -99,6 +99,13 @@ static inline int up_smp_call_function(v > static inline void smp_send_reschedule(int cpu) { } > #define num_booting_cpus() 1 > #define smp_prepare_boot_cpu() do {} while (0) > +static inline int smp_call_function_single(int cpuid, void (*func) (void > *info), > + void *info, int retry, int wait) > +{ > + /* Disable interrupts here? */ > + func(info); > + return 0; > +} > > #endif /* !SMP */ > No, I think this patch is right - the declaration of the CONFIG_SMP smp_call_function_single() is in linux/smp.h so the !CONFIG_SMP declaration or definition should be there too. It's still buggy though. It should disable local interrupts around the call to match the SMP version. I'll fix that separately. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PM-Timer clock source is slow. Try something else: How slow? What other source(s)?
On Wed, 2006-11-29 at 16:56 -0800, Linda Walsh wrote: > I recently noticed this message in my bootup that I don't remember > from before: > > PCI: Probing PCI hardware (bus 00) > * Found PM-Timer Bug on the chipset. Due to workarounds for a bug, > * this clock source is slow. Consider trying other clock sources This basically means that your chipset has a bug which requires the ACPI PM timer to be read three times in order to get a valid reading. This will cause gettimeofday/clock_gettime to take longer to execute, which is what is meant by "slow" (rather then the counter's frequency being incorrect). > How would this affect my clock? It says to try another > clock source, what type of clock source would it be suggesting I > use? Another chip already in the computer? It is an Intel 440BX > chipset; on an Dell motherboard. Would that be likely to have > another chip source that is compensating? > > I don't notice a significant clock slowdown, but I'm running NTP, > so that could be masking the problem. Unless you're running performance critical programs that utilize gettimeofday/clock_gettime, you probably won't notice anything. Time should still function properly. If you are having performance issues, you can try using a different clocksource (the TSC is probably safe, but not necessarily). thanks -john - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bulk] Re: [patch 2.6.19-rc6] fix hotplug for legacy platform drivers
On Wednesday 29 November 2006 3:02 pm, Greg KH wrote: > > > > Here's my fix. ... > > ... I audited all the drivers using the relevant APIs, and I can't > > see many (if any!) folk hitting problems from this. > > But this still can cause the problem that your 'modalias' file in sysfs > contains exactly the same name as the module itself, right? Not a problem if folk stick to the original design. Hotplug will at most "modprobe $MODALIAS" (iff the device needs a driver) before doing a udevsend ... and only coldplug uses "modprobe $(cat modalias)". The two were provided to address distinct problems. And the issue that was described to me was _only_ relevant on the hotplug paths; coldplug scripts, using /sys/devices/.../modalias files, see no problems. I could update the patch so that attribute turns into a null string, but that would have a **negative effect** since it would break coldplug for all the platform init code which doesn't use platform_add_devices() or maybe platform_device_register(). > That's not good, it should be an alias, not the "real name". Well, adding unjustified complexity _after the fact_ isn't good either, and that's what I see going on here. How many years has KMOD been around? It's worked just fine without that sort of bizarre (and un-needed) rule. Aliases were provided just to give *additional* names to modules ... not to make one needlessly "special". Kernel request_module() calls make no distinction between which type of name they use ... and when the filesystem name changes, they still work when the old name is properly aliased. That whole "give a module an alias of itself" model just seems ludicrous to me. We _know_ that "A" means "A", so there's no point in aliasing it as itself. ... plus it's a distraction from the real problem, namely that certain "legacy" drivers, primarily stuffed onto the "platform" bus, can't ever hotplug. (While normal platform drivers do so just fine, without needing strange rules to make some names "more equal than others".) > That will ensure that userspace tools do not get confused, I don't observe any confusion on my systems. Platform device hotplug works just fine. Udev creates their /dev nodes. If there are some tools getting confused, that seems best characterized as tool bugs. - Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.16.32 stuck in generic_file_aio_write()
Dear Igmar Palsenberg, If you are working on arcmsr 1.20.00.13 for official kernel version. This is the last version. Could you check your RAID controller event and tell someting to me? You can check "MBIOS"=>"Physical Drive Information"=>"View Drive Information"=>"Select The Drive"=>"Timeout Count".. It could tell you which disk had bad behavior cause your RAID volume offline. About the message dump from arcmsr, it said that your RAID volume had something wrong and kicked out from the system. How about your RAID config? Areca had new firmware released (1.42). If you are working on "sg" device with scsi passthrough ioctl method to feed data into Areca's RAID volume. You need to limit your data under 512 blocks (256K) each transfer. The new firmware will enlarge it into 4096 blocks (2M) each transfer. The firmware version 1.42 is on releasing procedure but not yet put it on Areca ftp site. If you need it, please tell me again. Best Regards Erich Chen - Original Message - From: "Igmar Palsenberg" <[EMAIL PROTECTED]> To: Cc: <[EMAIL PROTECTED]> Sent: Wednesday, November 29, 2006 8:41 PM Subject: 2.6.16.32 stuck in generic_file_aio_write() Hi, I've got a machine which occasionally locks up. I can still sysrq it from a serial console, so it's not entirely dead. A sysrq-t learns me that it's got a large number of httpd processes stuck in D state : httpd D F7619440 2160 11635 2057 11636 (NOTLB) dbb7ae14 cc9b0550 c33224a0 f7619440 de187604 00b3 0001 00b3 d374a550 c33224a0 0005b8d8 f04af800 000f75e7 d374a550 cc9b0550 cc9b0678 ef7d33ec ef7d33e8 cc9b0550 ef7d33fc c041bf70 Call Trace: [] __mutex_lock_slowpath+0x92/0x43e [] generic_file_aio_write+0x5c/0xfa [] generic_file_aio_write+0x5c/0xfa [] generic_file_aio_write+0x5c/0xfa [] permission+0xad/0xcb [] ext3_file_write+0x3b/0xb0 [] do_sync_write+0xd5/0x130 [] _spin_unlock+0xb/0xf [] autoremove_wake_function+0x0/0x4b [] vfs_write+0x1a3/0x1a8 [] sys_write+0x4b/0x74 [] sysenter_past_esp+0x54/0x75 After this, the machine is rendered useless (probably due to the fact that disk IO isn't working anymore). The lock debugging gives me this : D httpd:11635 [cc9b0550, 116] blocked on mutex: [ef7d33e8] {inode_init_once} .. held by: httpd: 506 [d67e1000, 121] ... acquired at: generic_file_aio_write+0x5c/0xfa I see similiar things as mentioned in http://lkml.org/lkml/2006/1/10/64, with the difference that I'm not running software RAID or SATA (it's an Areca ARC-1110). I can't reproduce it until now, it 'just' happens. Can someone give me a pointer where to start looking ? Erich, I've CC-ed you since the machine is running an Areca RAID config. It's also the only used disk subsystem in this machine. Regards, Igmar - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RCU] adds a prefetch() in rcu_do_batch()
On Wed, Nov 22, 2006 at 04:02:29PM +0100, Eric Dumazet wrote: > On some workloads, (for example when lot of close() syscalls are done), RCU > qlen can be quite large, and RCU heads are no longer in cpu cache when > rcu_do_batch() is called. > > This patches adds a prefetch() in rcu_do_batch() to give CPU a hint to bring > back cache lines containing 'struct rcu_head's. > > Most list manipulations macros include prefetch(), but not open coded ones (at > least with current C compilers :) ) > > I got a nice speedup on a trivial benchmark (3.48 us per iteration instead of > 3.95 us on a 1.6 GHz Pentium-M) > while (1) { pipe(p); close(fd[0]); close(fd[1]);} Interesting! How much of the speedup was due to the prefetch() and how much to removing the extra store to rdp->donelist? Thanx, Paul > Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]> > --- linux-2.6.19-rc6/kernel/rcupdate.c2006-11-16 05:03:40.0 > +0100 > +++ linux-2.6.19-rc6-ed/kernel/rcupdate.c 2006-11-22 15:12:09.0 > +0100 > @@ -235,12 +235,14 @@ static void rcu_do_batch(struct rcu_data > > list = rdp->donelist; > while (list) { > - next = rdp->donelist = list->next; > + next = list->next; > + prefetch(next); > list->func(list); > list = next; > if (++count >= rdp->blimit) > break; > } > + rdp->donelist = list; > > local_irq_disable(); > rdp->qlen -= count; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Infinite retries reading the partition table
Suppose reading sector 0 always reports an error, sense key HARDWARE ERROR. What I'm observing is that the request to read sector 0, reading partition information, is retried forever, ad infinitum. Does anyone have a patch to resolve this? (2.6.19-rc6) Thanks, Luben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP
From: Andrew Morton <[EMAIL PROTECTED]> Date: Wed, 29 Nov 2006 17:08:35 -0800 > On Wed, 29 Nov 2006 16:53:11 -0800 (PST) > David Miller <[EMAIL PROTECTED]> wrote: > > > > > Please, it is very difficult to review your work the way you have > > submitted this patch as a set of 4 patches. These patches have not > > been split up "logically", but rather they have been split up "per > > file" with the same exact changelog message in each patch posting. > > This is very clumsy, and impossible to review, and wastes a lot of > > mailing list bandwith. > > > > We have an excellent file, called Documentation/SubmittingPatches, in > > the kernel source tree, which explains exactly how to do this > > correctly. > > > > By splitting your patch into 4 patches, one for each file touched, > > it is impossible to review your patch as a logical whole. > > > > Please also provide your patch inline so people can just hit reply > > in their mail reader client to quote your patch and comment on it. > > This is impossible with the attachments you've used. > > > > Here you go - joined up, cleaned up, ported to mainline and test-compiled. > > That yield() will need to be removed - yield()'s behaviour is truly awful > if the system is otherwise busy. What is it there for? What about simply turning off CONFIG_PREEMPT to fix this "problem"? We always properly run the backlog (by doing a release_sock()) before going to sleep otherwise except for the specific case of taking a page fault during the copy to userspace. It is only CONFIG_PREEMPT that can cause this situation to occur in other circumstances as far as I can see. We could also pepper tcp_recvmsg() with some very carefully placed preemption disable/enable calls to deal with this even with CONFIG_PREEMPT enabled. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.19-rc6-mm2: uli526x only works after reload
On Thursday, 30 November 2006 00:26, Andrew Morton wrote: > On Thu, 30 Nov 2006 00:08:21 +0100 > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > On Wednesday, 29 November 2006 22:31, Rafael J. Wysocki wrote: > > > On Wednesday, 29 November 2006 22:30, Andrew Morton wrote: > > > > On Wed, 29 Nov 2006 21:08:00 +0100 > > > > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > > > > > > > On Wednesday, 29 November 2006 20:54, Rafael J. Wysocki wrote: > > > > > > On Tuesday, 28 November 2006 11:02, Andrew Morton wrote: > > > > > > > > > > > > > > Temporarily at > > > > > > > > > > > > > > http://userweb.kernel.org/~akpm/2.6.19-rc6-mm2/ > > > > > > > > > > > > > > Will appear eventually at > > > > > > > > > > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19-rc6/2.6.19-rc6-mm2/ > > > > > > > > > > > > A minor issue: on one of my (x86-64) test boxes the uli526x driver > > > > > > doesn't > > > > > > work when it's first loaded. I have to rmmod and modprobe it to > > > > > > make it work. > > > > > > > > That isn't a minor issue. > > > > > > > > > > It worked just fine on -mm1, so something must have happened to it > > > > > > recently. > > > > > > > > > > Sorry, I was wrong. The driver doesn't work at all, even after > > > > > reload. > > > > > > > > > > > > > tulip-dmfe-carrier-detection-fix.patch was added in rc6-mm2. But you're > > > > not using that (corrent?) > > > > > > > > git-netdev-all changes drivers/net/tulip/de2104x.c, but you're not using > > > > that either. > > > > > > > > git-powerpc(!) alters drivers/net/tulip/de4x5.c, but you're not using > > > > that. > > > > > > > > Beats me, sorry. Perhaps it's due to changes in networking core. It's > > > > presumably a showstopper for statically-linked-uli526x users. If you > > > > could > > > > bisect it, please? I'd start with git-netdev-all, then tulip-*. > > > > > > OK, but it'll take some time. > > > > OK, done. > > > > It's one of these (the first one alone doesn't compile): > > > > git-netdev-all.patch > > git-netdev-all-fixup.patch > > libphy-dont-do-that.patch > > Are you able to eliminate libphy-dont-do-that.patch? > > > Is a broken-out version of git-netdev-all.patch available from somewhere? > > Nope, and my few fumbling attempts to generate the sort of patch series > which you want didn't work out too well. One has to downgrade to > git-bisect :( > > What does "doesn't work" mean, btw? Well, it turns out not to be 100% reproducible. I can only reproduce it after a soft reboot (eg. shutdown -r now). Then, while configuring network interfaces the system says the interface name is ethxx0, but it should be eth1 (eth0 is an RTL-8139, which is not used). Now if I run ifconfig, it says: eth0: error fetching interface information: Device not found and that's all (normally, ifconfig would show the information for lo and eth1, without eth0). Moreover, 'ifconfig eth1' says: eth1: error fetching interface information: Device not found Next, I run 'rmmod uli526x' and 'modprobe uli526x' and then 'ifconfig' is still saying the above (about eth0), but 'ifconfig eth1' seems to work as it should. However, the interface often fails to transfer anything after that. Greetings, Rafael -- You never change things by fighting the existing reality. R. Buckminster Fuller - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP
On Wed, 29 Nov 2006 16:53:11 -0800 (PST) David Miller <[EMAIL PROTECTED]> wrote: > > Please, it is very difficult to review your work the way you have > submitted this patch as a set of 4 patches. These patches have not > been split up "logically", but rather they have been split up "per > file" with the same exact changelog message in each patch posting. > This is very clumsy, and impossible to review, and wastes a lot of > mailing list bandwith. > > We have an excellent file, called Documentation/SubmittingPatches, in > the kernel source tree, which explains exactly how to do this > correctly. > > By splitting your patch into 4 patches, one for each file touched, > it is impossible to review your patch as a logical whole. > > Please also provide your patch inline so people can just hit reply > in their mail reader client to quote your patch and comment on it. > This is impossible with the attachments you've used. > Here you go - joined up, cleaned up, ported to mainline and test-compiled. That yield() will need to be removed - yield()'s behaviour is truly awful if the system is otherwise busy. What is it there for? From: Wenji Wu <[EMAIL PROTECTED]> For Linux TCP, when the network applcaiton make system call to move data from socket's receive buffer to user space by calling tcp_recvmsg(). The socket will be locked. During this period, all the incoming packet for the TCP socket will go to the backlog queue without being TCP processed Since Linux 2.6 can be inerrupted mid-task, if the network application expires, and moved to the expired array with the socket locked, all the packets within the backlog queue will not be TCP processed till the network applicaton resume its execution. If the system is heavily loaded, TCP can easily RTO in the Sender Side. include/linux/sched.h |2 ++ kernel/fork.c |3 +++ kernel/sched.c| 24 ++-- net/ipv4/tcp.c|9 + 4 files changed, 32 insertions(+), 6 deletions(-) diff -puN net/ipv4/tcp.c~tcp-speedup net/ipv4/tcp.c --- a/net/ipv4/tcp.c~tcp-speedup +++ a/net/ipv4/tcp.c @@ -1109,6 +1109,8 @@ int tcp_recvmsg(struct kiocb *iocb, stru struct task_struct *user_recv = NULL; int copied_early = 0; + current->backlog_flag = 1; + lock_sock(sk); TCP_CHECK_TIMER(sk); @@ -1468,6 +1470,13 @@ skip_copy: TCP_CHECK_TIMER(sk); release_sock(sk); + + current->backlog_flag = 0; + if (current->extrarun_flag == 1){ + current->extrarun_flag = 0; + yield(); + } + return copied; out: diff -puN include/linux/sched.h~tcp-speedup include/linux/sched.h --- a/include/linux/sched.h~tcp-speedup +++ a/include/linux/sched.h @@ -1023,6 +1023,8 @@ struct task_struct { #ifdef CONFIG_TASK_DELAY_ACCT struct task_delay_info *delays; #endif + int backlog_flag; /* packets wait in tcp backlog queue flag */ + int extrarun_flag; /* extra run flag for TCP performance */ }; static inline pid_t process_group(struct task_struct *tsk) diff -puN kernel/sched.c~tcp-speedup kernel/sched.c --- a/kernel/sched.c~tcp-speedup +++ a/kernel/sched.c @@ -3099,12 +3099,24 @@ void scheduler_tick(void) if (!rq->expired_timestamp) rq->expired_timestamp = jiffies; - if (!TASK_INTERACTIVE(p) || expired_starving(rq)) { - enqueue_task(p, rq->expired); - if (p->static_prio < rq->best_expired_prio) - rq->best_expired_prio = p->static_prio; - } else - enqueue_task(p, rq->active); + if (p->backlog_flag == 0) { + if (!TASK_INTERACTIVE(p) || expired_starving(rq)) { + enqueue_task(p, rq->expired); + if (p->static_prio < rq->best_expired_prio) + rq->best_expired_prio = p->static_prio; + } else + enqueue_task(p, rq->active); + } else { + if (expired_starving(rq)) { + enqueue_task(p,rq->expired); + if (p->static_prio < rq->best_expired_prio) + rq->best_expired_prio = p->static_prio; + } else { + if (!TASK_INTERACTIVE(p)) + p->extrarun_flag = 1; + enqueue_task(p,rq->active); + } + } } else { /* * Prevent a too long timeslice allowing a task to monopolize diff -puN kernel/fork.c~tcp-speedup kernel/fork.c --- a/kernel/fork.c~tcp-speedup +++ a/kernel/fork.c @@ -1032,6 +1032,9 @@ static struct task_struct *copy_process(
Re: Linux 2.6.19
On Wed, 29 Nov 2006 18:56:31 -0600 Greg Norris wrote: > On Wed, Nov 29, 2006 at 03:11:11PM -0800, Randy Dunlap wrote: > > What would it take to have the kernel.org web page and finger banner > > give the correct version information? (yessir, not your problem) > > On a similar vein, it'd be nice if http://www.kernel.org/kdist/version.html > would break the entries into separate lines. I prefer to use http://www.kernel.org/kdist/finger_banner for that. And script it so that I can just type: $ kcurrent to see it. --- ~Randy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [patch 2.6.19-rc6] Stop gcc 4.1.0 optimizing wait_hpet_tick away
Ask yourself this question: Can an assignment to a non-volatile variable be optimized out? Then ask yourself this question: Does casting away volatile make it not volatile any more? > The volatile'ness does not simply disappear the moment you > assign the result to some local variable which is not volatile. Yes, it does. That's what a cast does, it tells the compiler to, in all respects, pretend that a variable is of a different type than it 'actually is', such that it actually isn't anymore. > Half of our drivers would break if this were true. On the contrary, they'd break if it was true. If casting away volatile didn't make it go away, then casting in volatile wouldn't have to make it appear. A cast causes the compiler to act as if a variable really was the type you cast it to. If you cast volatile away, that has the reverse of the same affect casting to volatile has. The 'readl' function should actually assign the value to a volatile variable. Assignments to volatiles cannot be cast away, but casts can and assignments to non-volatile variables can be optimized out. DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug 7596 - Potential performance bottleneck for Linxu TCP
The delays dealt with in your paper might actually help a highly loaded server with lots of sockets and threads trying to communicate. The packet processing delays caused by the scheduling delay paces the TCP sender by controlling the rate at which ACKs go back to that sender. Those ACKs will go out paced to the rate at which the sleeping TCP receiver gets back onto the cpu, and this will cause the TCP sender to naturally adjust to the overall processing rate of the receiver system, on a per-connection basis. Perhaps try a system with hundreds of processes and potentially hundreds of thousands of TCP sockets, with thousands of unique sender sites, and see what happens. This is a similar topic like TSO, where we are trying to balance the gains from batching work from the losses of gaps in the communication stream. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [v4l-dvb-maintainer] [2.6 patch] remove DVB_AV7110_FIRMWARE
Adrian Bunk wrote: > On Tue, Nov 28, 2006 at 08:45:56PM -0800, Trent Piepho wrote: > > On Wed, 29 Nov 2006, Adrian Bunk wrote: > > > On Tue, Nov 28, 2006 at 01:06:02PM -0800, Trent Piepho wrote: > > > > On Sun, 26 Nov 2006, Adrian Bunk wrote: > > > > > DVB_AV7110_FIRMWARE was (except for some OSS drivers) the only option > > > > > that was still compiling a binary-only user-supplied firmware file at > > > > > build-time into the kernel. > > > > > > > > > > This patch changes the driver to always use the standard > > > > > request_firmware() way for firmware by removing DVB_AV7110_FIRMWARE. > > > > > > > > Doesn't this also prevent the AV7110 module from getting compiled > > > > into the kernel? Shouldn't the Kconfig file be adjusted so > > > > that 'y' can't be selected anymore and it depends on MODULES? > > > > > > No. > > > No. > > > > > > request_firmware() works fine for built-in drivers. > > > > Wouldn't that require loading the firmware file before the filesystems are > > mounted? > > Sure. And you have to create an initrd for the firmware! As I wrote before: I NAK any attempt to remove this option. The option _is_ useful because it allows a user to build an av7110 driver without hotplug, initrd etc. Nobody has to use this option, but it should be possible to do so. CU Oliver -- VDR Remote Plugin 0.3.8 available at http://www.escape-edv.de/endriss/vdr/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm] x86_64 UP needs smp_call_function_single
From: Randy Dunlap <[EMAIL PROTECTED]> smp_call_function_single() needs to be visible in non-SMP builds, to fix: arch/x86_64/kernel/vsyscall.c:283: warning: implicit declaration of function 'smp_call_function_single' The (other/trivial) fix (instead of this one) is to add: #include to linux-2.6.19-rc6-mm2/arch/x86_64/kernel/vsyscall.c Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]> --- include/asm-x86_64/smp.h |7 --- include/linux/smp.h |7 +++ 2 files changed, 7 insertions(+), 7 deletions(-) --- linux-2.6.19-rc6-mm2.orig/include/asm-x86_64/smp.h +++ linux-2.6.19-rc6-mm2/include/asm-x86_64/smp.h @@ -113,13 +113,6 @@ static __inline int logical_smp_processo #define cpu_physical_id(cpu) x86_cpu_to_apicid[cpu] #else #define cpu_physical_id(cpu) boot_cpu_id -static inline int smp_call_function_single(int cpuid, void (*func) (void *info), - void *info, int retry, int wait) -{ - /* Disable interrupts here? */ - func(info); - return 0; -} #endif /* !CONFIG_SMP */ #endif --- linux-2.6.19-rc6-mm2.orig/include/linux/smp.h +++ linux-2.6.19-rc6-mm2/include/linux/smp.h @@ -99,6 +99,13 @@ static inline int up_smp_call_function(v static inline void smp_send_reschedule(int cpu) { } #define num_booting_cpus() 1 #define smp_prepare_boot_cpu() do {} while (0) +static inline int smp_call_function_single(int cpuid, void (*func) (void *info), + void *info, int retry, int wait) +{ + /* Disable interrupts here? */ + func(info); + return 0; +} #endif /* !SMP */ --- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
PM-Timer clock source is slow. Try something else: How slow? What other source(s)?
I recently noticed this message in my bootup that I don't remember from before: PCI: Probing PCI hardware (bus 00) * Found PM-Timer Bug on the chipset. Due to workarounds for a bug, * this clock source is slow. Consider trying other clock sources -- How would this affect my clock? It says to try another clock source, what type of clock source would it be suggesting I use? Another chip already in the computer? It is an Intel 440BX chipset; on an Dell motherboard. Would that be likely to have another chip source that is compensating? I don't notice a significant clock slowdown, but I'm running NTP, so that could be masking the problem. NTP values appear: to indicated smallish values for clock variance, but I'm not sure what is "standardly" considered good or bad, so I don't have anything to compare to. Relevant ntp time vars show: leap indicator: 00 stratum: 2 precision:-20 root distance:0.01445 s root dispersion: 0.01372 s jitter: 0.002335 s stability:58.565 ppm broadcastdelay: 0.003998 s --- maximum error 130449 us, estimated error 1923 us ntp_adjtime() returns code 0 (OK) offset 1384.000 us, frequency 74.584 ppm, interval 1 s, maximum error 130449 us, estimated error 1923 us, status 0x1 (PLL), time constant 3, precision 1.000 us, tolerance 512 ppm, It seems the estimated error is .1923ms, with a precision of 1us. Is the clock "slowness" indicated by the "offset 1384us, 74.584ppm @ interval 1s? I.e. do I read that as the clock is off by 74.584ppm/s, or ~75us/sec, or do I look at the offset of 1384us/sec, meaning off by .1384ms/s (wouldn't that be 1384ppm?). Seems the stability is fairly low, on the order of 58.656ppm, or about .058ms/s? Seems like fewer questions are being answered these days than in days past. Is this because of a change in the list focus (maybe all the patches being submitted), - or change in list membership, i.e. fewer people up-to-speed on older HW, - or increased specialization in specific kernel areas with fewer having knowledge outside their specific domain, or what? It it is an ugly tradeoff between development time spent and answering questions that might increase understanding of people on the list (or maybe it's such common knowledge that no one bothers to answer... dunno... but thanks for any ideas...especially on the original issue. Linda W. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.19
On Wed, Nov 29, 2006 at 03:11:11PM -0800, Randy Dunlap wrote: > What would it take to have the kernel.org web page and finger banner > give the correct version information? (yessir, not your problem) On a similar vein, it'd be nice if http://www.kernel.org/kdist/version.html would break the entries into separate lines. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mips tx4927 missing brace fix
On Wed, 29 Nov 2006 19:43:46 +, Ralf Baechle <[EMAIL PROTECTED]> wrote: > On Wed, Nov 29, 2006 at 08:30:35PM +0100, Mariusz Kozlowski wrote: > > > This patch adds missing brace at the end of > > toshiba_rbtx4927_irq_isa_init(). > > Thanks Mariusz! Applied, Oh, that was my fault. Thank you. I see the fix was folded into linux-queue tree. Thanks. --- Atsushi Nemoto - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP
Please, it is very difficult to review your work the way you have submitted this patch as a set of 4 patches. These patches have not been split up "logically", but rather they have been split up "per file" with the same exact changelog message in each patch posting. This is very clumsy, and impossible to review, and wastes a lot of mailing list bandwith. We have an excellent file, called Documentation/SubmittingPatches, in the kernel source tree, which explains exactly how to do this correctly. By splitting your patch into 4 patches, one for each file touched, it is impossible to review your patch as a logical whole. Please also provide your patch inline so people can just hit reply in their mail reader client to quote your patch and comment on it. This is impossible with the attachments you've used. Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 1/5][AIO] - Rework compat_sys_io_submit
On Nov 29, 2006, at 2:32 AM, Sébastien Dugué wrote: compat_sys_io_submit() cleanup Cleanup compat_sys_io_submit by duplicating some of the native syscall logic in the compat layer and directly calling io_submit_one() instead of fooling the syscall into thinking it is called from a native 64-bit caller. This is needed for the completion notification patch to avoid having to rewrite each iocb on the caller stack for sys_io_submit() to find the sigevents. You could explicitly mention that this eliminates: - the overhead of copying nr pointers on the userspace caller's stack - the arbitrary PAGE_SIZE/(sizeof(void *)) limit on the number of iocbs that can be submitted Those alone make this worth merging. + if (unlikely(!access_ok(VERIFY_READ, iocb, (nr * sizeof(u32) + return -EFAULT; I'm glad you got that right :) I no doubt would have initially hoisted these little checks into a shared helper function and missed that detail of getting the size of the access_ok() right in the compat case. + put_ioctx(ctx); + + return i? i: ret; sys_io_getevents() reads: put_ioctx(ctx); return i ? i : ret; So while this compat_sys_io_submit() logic seems fine and I would be comfortable with it landing as-is, I'd also appreciate it if we didn't introduce differences between the two functions when it seems just as easy to make them the same. (That chunk is just one example. There's whitespace, missing unlikely()s, etc). - z- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
hrtimer.h
Hi, Since the kernel 2.6.18 has incorporated the high resolution timer itself, I'm trying to test it, but on my GNU/Debian I can't figure out how to include hrtimer.h, that is on /usr/src/linux/include/, the headers. I use the following command to try to compile it. gcc -D__KERNEL__ -I /usr/src/linux/include ex.c ex.c is just the inclusion of hrtimer.h #include int main() { return 0; } and I get this: In file included from /usr/src/linux-headers-2.6.18sbr-24-11-06/include/asm/thread_info.h:16, from /usr/src/linux-headers-2.6.18sbr-24-11-06/include/linux/thread_info.h:21, from /usr/src/linux-headers-2.6.18sbr-24-11-06/include/linux/preempt.h:9, from /usr/src/linux-headers-2.6.18sbr-24-11-06/include/linux/spinlock.h:49, from /usr/src/linux-headers-2.6.18sbr-24-11-06/include/linux/seqlock.h:29, from /usr/src/linux-headers-2.6.18sbr-24-11-06/include/linux/time.h:7, from /usr/src/linux-headers-2.6.18sbr-24-11-06/include/linux/ktime.h:24, from /usr/src/linux-headers-2.6.18sbr-24-11-06/include/linux/hrtimer.h:19, from ex.c:1: /usr/src/linux-headers-2.6.18sbr-24-11-06/include/asm/processor.h:80: error: CONFIG_X86_L1_CACHE_SHIFT undeclared here (not in a function) /usr/src/linux-headers-2.6.18sbr-24-11-06/include/asm/processor.h:80: error: requested alignment is not a constant I will appreciate any hint. Thanks in advance.. Ariel __ LLama Gratis a cualquier PC del Mundo. Llamadas a fijos y móviles desde 1 céntimo por minuto. http://es.voice.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] doc: atomic_add_unless() doesn't imply mb() on failure
Most implementations of atomic_add_unless() can fail (return 0) after the first atomic_read() (before cmpxchg). In that case we have a compiler barrier only. Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]> Documentation/atomic_ops.txt |3 ++- Documentation/memory-barriers.txt |2 +- 2 files changed, 3 insertions(+), 2 deletions(-) --- 19-rc6/Documentation/memory-barriers.txt~doc2006-11-27 21:20:20.0 +0300 +++ 19-rc6/Documentation/memory-barriers.txt2006-11-30 03:32:06.0 +0300 @@ -1492,7 +1492,7 @@ about the state (old or new) implies an atomic_dec_and_test(); atomic_sub_and_test(); atomic_add_negative(); - atomic_add_unless(); + atomic_add_unless();/* when succeeds (returns 1) */ test_and_set_bit(); test_and_clear_bit(); test_and_change_bit(); --- 19-rc6/Documentation/atomic_ops.txt~doc 2006-07-29 05:05:33.0 +0400 +++ 19-rc6/Documentation/atomic_ops.txt 2006-11-30 03:22:58.0 +0300 @@ -137,7 +137,8 @@ If the atomic value v is not equal to u, returns non zero. If v is equal to u then it returns zero. This is done as an atomic operation. -atomic_add_unless requires explicit memory barriers around the operation. +atomic_add_unless requires explicit memory barriers around the operation +unless it fails (returns 0). atomic_inc_not_zero, equivalent to atomic_add_unless(v, 1, 0) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH -rt] RCU priority boosting that survives mild testing
This patch boosts the priority of RCU read-side critical sections when they block to prevent them from being preempted by other non-realtime threads. This patch allows transitive boosting (e.g., to processes holding locks waited on by the RCU read-side critical section) and actually survives light testing, in contrast with its rather large number of predecessors. (All of which are preserved for posterity at http://rdrop.com/users/paulmck/patches -- nothing to hide, so there!!!) The trick is to provide a per-task mutex that is acquired when a task enters the scheduler while in an RCU read-side critical section. This mutex is released by the outermost rcu_read_unlock(). This works even if rcu_read_unlock() is invoked by (say) a hardware irq handler, since the critical section cannot be preempted in that case. One remaining case not handled is the following: rcu_read_lock(); /* code that might be preempted. */ local_irq_save(oldirq); rcu_read_unlock(); local_irq_restore(oldirq); If this case is important to you, please don't keep it a secret!!! A separate task (not yet implemented, but in process) can then acquire a given task's mutex, boosting its priority for the duration of the RCU read-side critical section, as needed to expedite a given RCU grace period. The formerly painful races with rcu_read_unlock() are now harmless -- the boosting task simply needlessly acquires and immediately releases the mutex in that case. There is a new CONFIG_PREEMPT_RCU_BOOST that enables the boosting, defaulting to "n" because this code is quite new and because people writing realtime applications that carefully avoid realtime-priority CPU hogs may not want the degradation in scheduling latency that comes with this patch. This config variable should also greatly reduce the risk that this patch might otherwise pose to innocent bystanders. Some questions: o I currently unconditionally boost to the highest non-realtime priority when a task blocks in an RCU read-side critical section. This is to aid in testing, but I am thinking in terms of removing it. It degrades scheduling latency, and if there is a real problem, the TBD booster task should kick it later. Plus, getting rid of this would significantly reduce the size and intrusiveness of the patch. Does this approach make sense? o I believe I can acquire a mutex with impunity near the beginning of __schedule(). I have a flag that prevents more than one level of recursion in face of nested preemptions (e.g., due to getting a scheduling-clock interrupt just as one was starting __schedule() anyway). Any gotchas I am missing? o Is the code snippet above likely to show up? If it is, I would check for interrupts disabled in rcu_read_unlock(), IPI myself if so, and clean up in preempt_schedule_irq(). I would like to avoid this due to the extra test on the preemption path. Thoughts? I am in the process of testing on 2.6.19-rc6-rt10. Signed-off-by: Paul E. McKenney <[EMAIL PROTECTED]> --- include/linux/init_task.h | 12 include/linux/rcupreempt.h |4 include/linux/sched.h | 12 kernel/Kconfig.preempt | 11 +++ kernel/rcupreempt.c| 23 --- kernel/rtmutex.c |9 ++--- kernel/sched.c | 17 + kernel/softirq.c |1 + 8 files changed, 83 insertions(+), 6 deletions(-) diff -urpNa -X dontdiff linux-2.6.18-rt3/include/linux/init_task.h linux-2.6.18-rt3-rcubp/include/linux/init_task.h --- linux-2.6.18-rt3/include/linux/init_task.h 2006-10-09 17:27:12.0 -0700 +++ linux-2.6.18-rt3-rcubp/include/linux/init_task.h2006-11-27 11:04:03.0 -0800 @@ -91,6 +91,7 @@ extern struct group_info init_groups; .prio = MAX_PRIO-20, \ .static_prio= MAX_PRIO-20, \ .normal_prio= MAX_PRIO-20, \ + INIT_RCU_PRIO \ .policy = SCHED_NORMAL, \ .cpus_allowed = CPU_MASK_ALL, \ .mm = NULL, \ @@ -98,6 +99,7 @@ extern struct group_info init_groups; .run_list = LIST_HEAD_INIT(tsk.run_list), \ .ioprio = 0,\ .time_slice = HZ, \ + INIT_RCU_BOOST \ .tasks = LIST_HEAD_INIT(tsk.tasks),\ .ptrace_children= LIST_HEAD_INIT(tsk.ptrace_children), \ .ptrace_list=
Re: [PATCH] alternatives/paravirt: use NULL for pointers
On Wednesday 29 November 2006 22:17, Randy Dunlap wrote: > From: Randy Dunlap <[EMAIL PROTECTED]> > > Use NULL instead of 0 for pointers. > > arch/x86_64/kernel/../../i386/kernel/alternative.c:432:18: warning: Using > plain integer as NULL pointer > arch/x86_64/kernel/../../i386/kernel/alternative.c:432:44: warning: Using > plain integer as NULL pointer I fixed it in the original patch thanks -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] autofs: fix error code path in autofs_fill_sb()
[PATCH] autofs: fix error code path in autofs_fill_sb() When kernel is compiled with old version of autofs (CONFIG_AUTOFS_FS), and new (observed at least with 5.x.x) automount deamon is started, kernel correctly reports incompatible version of kernel and userland daemon, but then screws things up instead of correct handling of the error: autofs: kernel does not match daemon version = [ BUG: bad unlock balance detected! ] - automount/4199 is trying to release lock (>s_umount_key) at: [] get_sb_nodev+0x76/0xa4 but there are no more locks to release! other info that might help us debug this: no locks held by automount/4199. stack backtrace: [] dump_trace+0x68/0x1b2 [] show_trace_log_lvl+0x18/0x2c [] show_trace+0xf/0x11 [] dump_stack+0x12/0x14 [] print_unlock_inbalance_bug+0xe7/0xf3 [] lock_release+0x8d/0x164 [] up_write+0x14/0x27 [] get_sb_nodev+0x76/0xa4 [] vfs_kern_mount+0x83/0xf6 [] do_kern_mount+0x2d/0x3e [] do_mount+0x607/0x67a [] sys_mount+0x72/0xa4 [] sysenter_past_esp+0x5f/0x99 DWARF2 unwinder stuck at sysenter_past_esp+0x5f/0x99 Leftover inexact backtrace: === and then deadlock comes. The problem: autofs_fill_super() returns EINVAL to get_sb_nodev(), but before that, it calls kill_anon_super() to destroy the superblock which won't be needed. This is however way too soon to call kill_anon_super(), because get_sb_nodev() has to perform its own cleanup of the superblock first (deactivate_super(), etc.). The correct time to call kill_anon_super() is in the autofs_kill_sb() callback, which is called by deactivate_super() at proper time, when the superblock is ready to be killed. I can see the same faulty codepath also in autofs4. This patch solves issues in both filesystems in a same way - it postpones the kill_anon_super() until the proper time is signalized by deactivate_super() calling the kill_sb() callback. Patch against 2.6.19-rc6-mm2. Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]> --- fs/autofs/inode.c|4 ++-- fs/autofs4/inode.c |4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/autofs/inode.c b/fs/autofs/inode.c index 38ede5c..61e04ab 100644 --- a/fs/autofs/inode.c +++ b/fs/autofs/inode.c @@ -31,7 +31,7 @@ void autofs_kill_sb(struct super_block * * just exit when we are called from deactivate_super. */ if (!sbi) - return; + goto out_kill_sb; if ( !sbi->catatonic ) autofs_catatonic_mode(sbi); /* Free wait queues, close pipe */ @@ -44,6 +44,7 @@ void autofs_kill_sb(struct super_block * kfree(sb->s_fs_info); +out_kill_sb: DPRINTK(("autofs: shutting down\n")); kill_anon_super(sb); } @@ -209,7 +210,6 @@ fail_iput: fail_free: kfree(sbi); s->s_fs_info = NULL; - kill_anon_super(s); fail_unlock: return -EINVAL; } diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c index ce7c0f1..be14200 100644 --- a/fs/autofs4/inode.c +++ b/fs/autofs4/inode.c @@ -155,7 +155,7 @@ void autofs4_kill_sb(struct super_block * just exit when we are called from deactivate_super. */ if (!sbi) - return; + goto out_kill_sb; sb->s_fs_info = NULL; @@ -167,6 +167,7 @@ void autofs4_kill_sb(struct super_block kfree(sbi); +out_kill_sb: DPRINTK("shutting down"); kill_anon_super(sb); } @@ -426,7 +427,6 @@ fail_ino: fail_free: kfree(sbi); s->s_fs_info = NULL; - kill_anon_super(s); fail_unlock: return -EINVAL; } -- Jiri Kosina SUSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.19
On Wed, 29 Nov 2006 14:21:21 -0800 (PST) Linus Torvalds (LT) wrote: LT> So go get it. It's one of those rare "perfect" kernels. So if it doesn't LT> happen to compile with your config (or it does compile, but then does LT> unspeakable acts of perversion with your pet dachshund), you can rest easy LT> knowing that it's all your own d*mn fault, and you should just fix your LT> evil ways. Ok, so 2.6.18 used to get along fine with cryptoloop and 2.6.19 refuses to cooperate. An strace of "losetup -e aes /dev/loop0 /dev/hda7" without all the terminal interaction shows: open("/dev/hda7", O_RDWR|O_LARGEFILE) = 3 open("/dev/loop0", O_RDWR|O_LARGEFILE) = 4 mlockall(MCL_CURRENT|MCL_FUTURE)= 0 ... munmap(0xb7fc8000, 4096)= 0 ioctl(4, 0x4c00, 0x3) = 0 close(3)= 0 ioctl(4, 0x4c04, 0xbfc21670)= -1 ENOENT (No such file or directory) ioctl(4, 0x4c02, 0xbfc215e0)= -1 ENOENT (No such file or directory) dup(2) = 3 fcntl64(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE) fstat64(3, {st_mode=S_IFCHR|0720, st_rdev=makedev(4, 1), ...}) = 0 ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fc8000 _llseek(3, 0, 0xbfc21040, SEEK_CUR) = -1 ESPIPE (Illegal seek) write(3, "ioctl: LOOP_SET_STATUS: No such "..., 50ioctl: LOOP_SET_STATUS: No such file or directory) = 50 close(3)= 0 munmap(0xb7fc8000, 4096)= 0 ioctl(4, 0x4c01, 0) = 0 close(4)= 0 exit_group(1) = ? Linux 2.6.18 does not fail at ioctl(4, 0x4c04, ...) I know that dm-crypt is now the preferred method of doing such things, but as long as cryptoloop exists in the kernel I'd expect it to work. Cheers, - Udo signature.asc Description: PGP signature
Re: [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync
On 11/29, Paul E. McKenney wrote: > > On Wed, Nov 29, 2006 at 11:16:46PM +0300, Oleg Nesterov wrote: > > > > Hmm... SRCU can't be used from irq, yes. But I think that both versions > > (spinlock needs _irqsave) can ? > > I didn't think you could call wait_event() from irq. Ah, sorry for confusion, I talked only about read lock/unlock of course. Just in case, it is not safe to do srcu_read_{,un}lock() from irq, per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]++ we need local_t for that. > For the locked version, you would also need spin_lock_irqsave() or some > such to avoid self-deadlock. > > For the atomic version, the fact that synchronize_qrcu() increments > the new counter before decrmenting the old one should mean that calls > to qrcu_read_lock() and qrcu_read_unlock() can be called from irq. Yes, exactly! There is another reason, suppose we did qp->completed++; atomic_inc(qp->ctr + (idx ^ 0x1)); In that case the reader could be stalled if synchronize_qrcu() takes a preemption in between. > But synchronize_qrcu() must be called from process context, since it > can block. Surely. Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.19
On Wed, 29 Nov 2006, Randy Dunlap wrote: > On Wed, 29 Nov 2006 23:21:12 + Alan wrote: > > > On Wed, 29 Nov 2006 15:11:11 -0800 > > Randy Dunlap <[EMAIL PROTECTED]> wrote: > > > > > What would it take to have the kernel.org web page and finger banner > > > give the correct version information? > > > > Patience 8) > > OK. How many days? It _should_ update automatically once everything has mirrored out. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC, PATCH 1/2] qrcu: "quick" srcu implementation
Very much based on ideas, corrections, and patient explanations from Alan and Paul. The current srcu implementation is very good for readers, lock/unlock are extremely cheap. But for that reason it is not possible to avoid synchronize_sched() and polling in synchronize_srcu(). Jens Axboe wrote: > > It works for me, but the overhead is still large. Before it would take > 8-12 jiffies for a synchronize_srcu() to complete without there actually > being any reader locks active, now it takes 2-3 jiffies. So it's > definitely faster, and as suspected the loss of two of three > synchronize_sched() cut down the overhead to a third. 'qrcu' behaves the same as srcu but optimized for writers. The fast path for synchronize_qrcu() is mutex_lock() + atomic_read() + mutex_unlock(). The slow path is __wait_event(), no polling. However, the reader does atomic inc/dec on lock/unlock, and the counters are not per-cpu. Also, unlike srcu, qrcu read lock/unlock can be used in interrupt context, and 'qrcu_struct' can be compile-time initialized. See also (a long) discussion: http://marc.theaimsgroup.com/?t=11637085763 Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]> --- 19-rc6/include/linux/srcu.h~1_qrcu 2006-11-17 19:42:31.0 +0300 +++ 19-rc6/include/linux/srcu.h 2006-11-29 20:22:37.0 +0300 @@ -27,6 +27,8 @@ #ifndef _LINUX_SRCU_H #define _LINUX_SRCU_H +#include + struct srcu_struct_array { int c[2]; }; @@ -50,4 +52,24 @@ void srcu_read_unlock(struct srcu_struct void synchronize_srcu(struct srcu_struct *sp); long srcu_batches_completed(struct srcu_struct *sp); +/* + * fully compatible with srcu, but optimized for writers. + */ + +struct qrcu_struct { + int completed; + atomic_t ctr[2]; + wait_queue_head_t wq; + struct mutex mutex; +}; + +int init_qrcu_struct(struct qrcu_struct *qp); +int qrcu_read_lock(struct qrcu_struct *qp); +void qrcu_read_unlock(struct qrcu_struct *qp, int idx); +void synchronize_qrcu(struct qrcu_struct *qp); + +static inline void cleanup_qrcu_struct(struct qrcu_struct *qp) +{ +} + #endif --- 19-rc6/kernel/srcu.c~1_qrcu 2006-11-17 19:42:31.0 +0300 +++ 19-rc6/kernel/srcu.c2006-11-29 20:09:49.0 +0300 @@ -256,3 +256,55 @@ EXPORT_SYMBOL_GPL(srcu_read_unlock); EXPORT_SYMBOL_GPL(synchronize_srcu); EXPORT_SYMBOL_GPL(srcu_batches_completed); EXPORT_SYMBOL_GPL(srcu_readers_active); + +int init_qrcu_struct(struct qrcu_struct *qp) +{ + qp->completed = 0; + atomic_set(qp->ctr + 0, 1); + atomic_set(qp->ctr + 1, 0); + init_waitqueue_head(>wq); + mutex_init(>mutex); + + return 0; +} + +int qrcu_read_lock(struct qrcu_struct *qp) +{ + for (;;) { + int idx = qp->completed & 0x1; + if (likely(atomic_inc_not_zero(qp->ctr + idx))) + return idx; + } +} + +void qrcu_read_unlock(struct qrcu_struct *qp, int idx) +{ + if (atomic_dec_and_test(qp->ctr + idx)) + wake_up(>wq); +} + +void synchronize_qrcu(struct qrcu_struct *qp) +{ + int idx; + + smp_mb(); + mutex_lock(>mutex); + + idx = qp->completed & 0x1; + if (atomic_read(qp->ctr + idx) == 1) + goto out; + + atomic_inc(qp->ctr + (idx ^ 0x1)); + qp->completed++; + + atomic_dec(qp->ctr + idx); + __wait_event(qp->wq, !atomic_read(qp->ctr + idx)); +out: + mutex_unlock(>mutex); + smp_mb(); +} + +EXPORT_SYMBOL_GPL(init_qrcu_struct); +EXPORT_SYMBOL_GPL(qrcu_read_lock); +EXPORT_SYMBOL_GPL(qrcu_read_unlock); +EXPORT_SYMBOL_GPL(synchronize_qrcu); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC, PATCH 2/2] qrcu: add rcutorture test
Add rcutorture test for qrcu. Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]> --- 19-rc6/kernel/__rcutorture.c2006-11-17 19:42:31.0 +0300 +++ 19-rc6/kernel/rcutorture.c 2006-11-29 20:05:23.0 +0300 @@ -465,6 +465,73 @@ static struct rcu_torture_ops srcu_ops = }; /* + * Definitions for qrcu torture testing. + */ + +static struct qrcu_struct qrcu_ctl; + +static void qrcu_torture_init(void) +{ + init_qrcu_struct(_ctl); + rcu_sync_torture_init(); +} + +static void qrcu_torture_cleanup(void) +{ + synchronize_qrcu(_ctl); + cleanup_qrcu_struct(_ctl); +} + +static int qrcu_torture_read_lock(void) +{ + return qrcu_read_lock(_ctl); +} + +static void qrcu_torture_read_unlock(int idx) +{ + qrcu_read_unlock(_ctl, idx); +} + +static int qrcu_torture_completed(void) +{ + return qrcu_ctl.completed; +} + +static void qrcu_torture_synchronize(void) +{ + synchronize_qrcu(_ctl); +} + +static int qrcu_torture_stats(char *page) +{ + int cnt = 0; + int idx = qrcu_ctl.completed & 0x1; + + cnt += sprintf([cnt], "%s%s per-CPU(idx=%d):", + torture_type, TORTURE_FLAG, idx); + + cnt += sprintf([cnt], " (%d,%d)", + atomic_read(qrcu_ctl.ctr + 0), + atomic_read(qrcu_ctl.ctr + 1)); + + cnt += sprintf([cnt], "\n"); + return cnt; +} + +static struct rcu_torture_ops qrcu_ops = { + .init = qrcu_torture_init, + .cleanup = qrcu_torture_cleanup, + .readlock = qrcu_torture_read_lock, + .readdelay = srcu_read_delay, + .readunlock = qrcu_torture_read_unlock, + .completed = qrcu_torture_completed, + .deferredfree = rcu_sync_torture_deferred_free, + .sync = qrcu_torture_synchronize, + .stats = qrcu_torture_stats, + .name = "qrcu" +}; + +/* * Definitions for sched torture testing. */ @@ -503,8 +570,8 @@ static struct rcu_torture_ops sched_ops }; static struct rcu_torture_ops *torture_ops[] = - { _ops, _sync_ops, _bh_ops, _bh_sync_ops, _ops, - _ops, NULL }; + { _ops, _sync_ops, _bh_ops, _bh_sync_ops, + _ops, _ops, _ops, NULL }; /* * RCU torture writer kthread. Repeatedly substitutes a new structure - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] netfilter: remove broken macro
Hello, This patch removes broken and unused macro. Signed-off-by: Mariusz Kozlowski <[EMAIL PROTECTED]> net/ipv4/netfilter/ip_nat_standalone.c |6 -- 1 file changed, 6 deletions(-) --- linux-2.6.19-rc6-mm2-a/net/ipv4/netfilter/ip_nat_standalone.c 2006-11-16 05:03:40.0 +0100 +++ linux-2.6.19-rc6-mm2-b/net/ipv4/netfilter/ip_nat_standalone.c 2006-11-29 15:31:37.0 +0100 @@ -44,12 +44,6 @@ #define DEBUGP(format, args...) #endif -#define HOOKNAME(hooknum) ((hooknum) == NF_IP_POST_ROUTING ? "POST_ROUTING" \ - : ((hooknum) == NF_IP_PRE_ROUTING ? "PRE_ROUTING" \ - : ((hooknum) == NF_IP_LOCAL_OUT ? "LOCAL_OUT" \ -: ((hooknum) == NF_IP_LOCAL_IN ? "LOCAL_IN" \ - : "*ERROR*"))) - #ifdef CONFIG_XFRM static void nat_decode_session(struct sk_buff *skb, struct flowi *fl) { -- Regards, Mariusz Kozlowski - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/