Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Hi Nick, On Tue, Apr 17, 2007 at 06:29:54AM +0200, Nick Piggin wrote: (...) > And my scheduler for example cuts down the amount of policy code and > code size significantly. I haven't looked at Con's ones for a while, > but I believe they are also much more straightforward than mainline... > > For example, let's say all else is equal between them, then why would > we go with the O(logN) implementation rather than the O(1)? Of course, if this is the case, the question will be raised. But as a general rule, I don't see much potential in O(1) to finely tune scheduling according to several criteria. In O(logN), you can adjust scheduling in realtime at a very low cost. Better processing of varying priorities or fork() comes to mind. Regards, Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][BUG] Fix possible NULL pointer access in 8250 serial driver
On Tue, 17 Apr 2007 11:15:46 +0900 izumi <[EMAIL PROTECTED]> wrote: > Hi, > > I encountered the following kernel panic. The cause of this problem was > NULL pointer access in check_modem_status() in 8250.c. I confirmed > this problem is fixed by the attached patch, but I don't know this > is the correct fix. > > sadc[4378]: NaT consumption 2216203124768 [1] > Modules linked in: binfmt_misc dm_mirror dm_mod thermal processor fan > container button sg e100 eepro100 mii ehci_hcd ohci_hcd > > Pid: 4378, CPU 0, comm: sadc > psr : 1210085a2010 ifs : 8289 ip : [] > Not tainted > ip is at check_modem_status+0xf1/0x360 > unat: pfs : 0289 rsc : 0003 > rnat: 8000cc18 bsps: pr : 00aa6a99 > ldrs: ccv : fpsr: 0009804c8a70033f > csd : ssd : > b0 : a00100481fb0 b6 : a001004822e0 b7 : a00100477f20 > f6 : 1003e f7 : 0ffdba200 > f8 : 100018000 f9 : 10002a000 > f10 : 0fffdc8c0 f11 : 1003e > r1 : a00100b9af40 r2 : 0008 r3 : a00100ad4e21 > r8 : 00bb r9 : 0001 r10 : > r11 : a00100ad4d58 r12 : e37b7df0 r13 : e37b > r14 : 0001 r15 : 0018 r16 : a00100ad4d6c > r17 : r18 : r19 : > r20 : a0010099bc88 r21 : 00bb r22 : 00bb > r23 : c003fc0ff3fe r24 : c003fc00 r25 : 000ff3fe > r26 : a001009b7ad0 r27 : 0001 r28 : a001009b7ad8 > r29 : r30 : a001009b7ad0 r31 : a001009b7ad0 > > Call Trace: > [] show_stack+0x40/0xa0 > sp=e37b7810 bsp=e37b1118 > [] show_regs+0x840/0x880 > sp=e37b79e0 bsp=e37b10c0 > [] die+0x1c0/0x2c0 > sp=e37b79e0 bsp=e37b1078 > [] die_if_kernel+0x50/0x80 > sp=e37b7a00 bsp=e37b1048 > [] ia64_fault+0x11e0/0x1300 > sp=e37b7a00 bsp=e37b0fe8 > [] ia64_leave_kernel+0x0/0x280 > sp=e37b7c20 bsp=e37b0fe8 > [] check_modem_status+0xf0/0x360 > sp=e37b7df0 bsp=e37b0fa0 > [] serial8250_get_mctrl+0x20/0xa0 > sp=e37b7df0 bsp=e37b0f80 > [] uart_read_proc+0x250/0x860 > sp=e37b7df0 bsp=e37b0ee0 > [] proc_file_read+0x1d0/0x4c0 > sp=e37b7e10 bsp=e37b0e80 > [] vfs_read+0x1b0/0x300 > sp=e37b7e20 bsp=e37b0e30 > [] sys_read+0x70/0xe0 > sp=e37b7e20 bsp=e37b0db0 > [] ia64_ret_from_syscall+0x0/0x20 > sp=e37b7e30 bsp=e37b0db0 > [] __kernel_syscall_via_break+0x0/0x20 > sp=e37b8000 bsp=e37b0db0 > > --- > a/drivers/serial/8250.c~fix-possible-null-pointer-access-in-8250-serial-driver > +++ a/drivers/serial/8250.c > @@ -1310,7 +1310,8 @@ static unsigned int check_modem_status(s > { > unsigned int status = serial_in(up, UART_MSR); > > - if (status & UART_MSR_ANY_DELTA && up->ier & UART_IER_MSI) { > + if (status & UART_MSR_ANY_DELTA && up->ier & UART_IER_MSI && > + up->port.info != NULL) { > if (status & UART_MSR_TERI) > up->port.icount.rng++; > if (status & UART_MSR_DDSR) > _ > I'd imagine that other serial drivers might get upset having their ->get_mcrtl() called prior to being opened. Perhaps we should be fixing this in uart_read_proc()? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS (Completely Fair Scheduler), v2
On Tuesday 17 April 2007, Willy Tarreau wrote: >Hi Gene, > >On Tue, Apr 17, 2007 at 12:53:56AM -0400, Gene Heskett wrote: >> On Monday 16 April 2007, Ingo Molnar wrote: >> >this is the second release of the CFS (Completely Fair Scheduler) >> >patchset, against v2.6.21-rc7: >> > >> > http://redhat.com/~mingo/cfs-scheduler/sched-cfs-v2.patch >> > >> >i'd like to thank everyone for the tremendous amount of feedback and >> >testing the v1 patch got - i could hardly keep up with just reading the >> >mails! Some of the stuff people addressed i couldnt implement yet, i >> >mostly concentrated on bugs, regressions and debuggability. >> > >> >there's a fair amount of churn: >> > >> > 15 files changed, 456 insertions(+), 241 deletions(-) >> > >> >But it's an encouraging sign that there was no crash bug found in v1, >> >all the bugs were related to scheduling-behavior details. The code was >> >tested on 3 architectures so far: i686, x86_64 and ia64. Most of the >> >code size increase in -v2 is due to debugging helpers, they'll be >> >removed later. (The new /proc/sched_debug file can be used to see the >> >fine details of CFS scheduling.) >> > >> >Changes since -v1: >> > >> > - make nice levels less starvable. (reported by Willy Tarreau) >> > >> > - fixed child-runs first. A /proc/sys/kernel/sched_child_runs_first >> > flag can be used to turn it on/off. (This might fix the Kaffeine bug >> > reported by S.Ça??lar Onur <) >> > >> > - changed SCHED_FAIR back to SCHED_NORMAL (suggested by Con Kolivas) >> > >> > - UP build fix. (reported by Gabriel C) >> > >> > - timer tick micro-optimization (Dmitry Adamushko) >> > >> > - preemption fix: sched_class->check_preempt_curr method to decide >> > whether to preempt after a wakeup (or at a timer tick). (Found via a >> > fairness-test-utility written for CFS by Mike Galbraith) >> > >> > - start forked children with neutral statistics instead of trying to >> > inherit them from the parent: Willy Tarreau reported that this >> > results in better behavior on extreme workloads, and it also >> > simplifies the code quite nicely. Removed sched_exit() and the >> > ->task_exit() methods. >> > >> > - make nice levels independent of the sched_granularity value >> > >> > - new /proc/sched_debug file listing runqueue details and the rbtree >> > >> > - new SCH-* fields in /proc//status to see scheduling details >> > >> > - new cpu-hog feature (off by default) and sysctl tunable to set it: >> > /proc/sys/kernel/sched_max_hog_history_ns tunable defaults to >> > 0 (off). Positive values are meant the maximum 'memory' that the >> > scheduler has of CPU hogs. >> > >> > - various code cleanups >> > >> > - added more statistics temporarily: sum_exec_runtime, >> > sum_wait_runtime. >> > >> > - added -CFS-v2 to EXTRAVERSION >> > >> >as usual, any sort of feedback, bugreports, fixes and suggestions are >> >more than welcome, >> > >> >Ingo >> >> This one (v2-rc2) is not a keeper I'm sorry to say, Ingo. v2-rc0 was much >> better. Watching amanda run with htop, kmails composer is being subjected >> to 5 to 10 second pauses, and htop says that gzip -best isn't getting more >> that 15% of the cpu, and the /amandatapes drive is being written to in a >> regular pattern that seems to be the cause of the pauses according to >> gkrellm, which also seems to track the size of the writes, and can show >> anything from 4.3k to 54 megs as being written in one cycle of its screen >> update. Somewhat interesting to this, I have amanda doing a verify phase too. During the verify phase (and while I was waiting for gmail to transmit this message, it took 30 minutes before it showed up on the list) I noted that when amrestore fired up, it, and its child tar were only taking about 20% of the cpu between them, and that /dev/hdd was showing a pretty steady 55 to 75MB/sec being read. As to what this tells us, I'm not going to hazard a guess because it wouldn't, this time of the night here in WV, USA, even be a SWAG. Its coming up on 2am and the toothpicks holding my eyes open are sagging badly, making creaking noises even. >Have you tried previous version with the fair-fork patch ? It might be > possible that your workload is sensible to the fork()'s child getting much > CPU upon startup. Willy, I think that patch went by, and was followed by the v2-rc2 so fast that I never got a chance to try it with the v2-rc0 framework. So I believe the answer there is probably no. I never saw a problem with the v2-rc0, but Ingo shot me a message about it without enough detail that I could have tested for it. FWIW, I've been using the CFQ I/O scheduler for quite a while, is it time I gave the AS or Deadline versions another check? They are all built in but I don't know how to change the default on the fly, or even if it can be done. >Ingo, maybe I'm saying something stupid, but in my userland scheduler, when >new tasks are "forked", they are queued at the end of the run queue
Re: [patch] CFS (Completely Fair Scheduler), v2
On Tue, 2007-04-17 at 07:25 +0200, Willy Tarreau wrote: > Have you tried previous version with the fair-fork patch ? It might be > possible > that your workload is sensible to the fork()'s child getting much CPU upon > startup. Dunno about that, but here's a possibly related datapoint. I reported to Ingo yesterday that I was sometimes losing control of my GUI (KDE) under heavy IO. I just reproduced it in mainline rc7. If I start a bonnie, and click around popping windows to the foreground, then poke KDE's menu button, I may lose all GUI capability for a _very_ long time. Here, with bonnie, that means until it gets past writing with putc, and moves on to rewrite. Ages. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [xfs-masters] Re: mm snapshot broken-out-2007-04-11-02-24.tar.gz uploaded
There's a couple of different ways I can see to fix the problem - the first is to not reference the buffer in xlog_iodone() after running the callbacks that may trigger it being freed. I'd prfer to see if this fixes the problem before having to do more invasive surgery. Can you try the patch below to see if it fixes the problem? fs/xfs/xfs_log.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/xfs_log.c === --- 2.6.x-xfs-new.orig/fs/xfs/xfs_log.c 2007-04-03 09:09:36.0 +1000 +++ 2.6.x-xfs-new/fs/xfs/xfs_log.c 2007-04-16 11:40:21.655306665 +1000 @@ -988,14 +988,11 @@ xlog_iodone(xfs_buf_t *bp) } else if (iclog->ic_state & XLOG_STATE_IOERROR) { aborted = XFS_LI_ABORTED; } + /* log I/O is always issued ASYNC, so we should see that here */ I guess this is a left over because at a prior time xlog_sync() took an extra flags param (which could have XFS_LOG_SYNC set) which could do a SYNC write of the iclog. IIRC, we took this extra param out because nobody was ever calling with it set for xlog_sync(). + WARN_ON(!(XFS_BUF_ISASYNC(bp))); xlog_state_done_syncing(iclog, aborted); - if (!(XFS_BUF_ISASYNC(bp))) { - /* -* Corresponding psema() will be done in bwrite(). If we don't -* vsema() here, panic. -*/ - XFS_BUF_V_IODONESEMA(bp); - } + /* do not reference bp here - it may have been freed during unmount */ + } /* xlog_iodone */ /* --Tim - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPU ordering with respect to krefs
Am Donnerstag, 12. April 2007 08:27 schrieb Greg KH: > On Mon, Apr 02, 2007 at 04:33:54PM +0200, Eric Dumazet wrote: > > On Mon, 2 Apr 2007 14:47:59 +0200 > > Oliver Neukum <[EMAIL PROTECTED]> wrote: > > > > > Hi, > > > > > > some atomic operations are only atomic, not ordered. Thus a CPU is allowed > > > to reorder memory references to an object to before the reference is > > > obtained. This fixes it. > > > > > > Regards > > > Oliver > > > Signed-off-by: Oliver Neukum <[EMAIL PROTECTED]> > > > -- > > > > > > --- a/lib/kref.c 2007-04-02 14:40:40.0 +0200 > > > +++ b/lib/kref.c 2007-04-02 14:40:50.0 +0200 > > > @@ -21,6 +21,7 @@ > > > void kref_init(struct kref *kref) > > > { > > > atomic_set(>refcount,1); > > > + smp_mb(); > > > } > > > > I dont understand why smp_mb() is needed here, and not in > > spinlock_init() for example. > > I think, after reading the Documentation/memory-barriers.txt and > Documentation/atomic_ops.txt documentation, that spin_lock_init() also > needs this kind of memory barrier. spin_lock_init() is not an atomic operation. In principle, the issue exists. However, the whole issue is a bit of a grey area. You might take the viewpoint that upping the refcount needs to be under lock, which needs to take care of ordering issues in case of krefs. A new spinlock has the same issue. You need to be careful making them accessible to other CPUs. If you take code like: static int producer() { ... data = kmalloc(...); spin_lock_init(>lock); data->value = some_value; data->next = global_pointer; global_pointer = data; ... } You have an ordering bug anyway, which you can't fix in spin_lock_init(). Regards Oliver - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
slab allocators: Remove obsolete SLAB_MUST_HWCACHE_ALIGN
The flag SLAB_MUST_HWCACHE_ALIGN is 1. Never checked by SLAB at all. 2. A duplicate of SLAB_HWCACHE_ALIGN for SLUB 3. Fulfills the role of SLAB_HWCACHE_ALIGN for SLOB. The only remaining use is in sparc64 and ppc64 and their use there reflects some earlier role that the slab flag once may have had. If its specified then SLAB_HWCACHE_ALIGN is also specified. The flag is confusing, inconsistent and has no purpose. Remove it. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.21-rc6/include/linux/slab.h === --- linux-2.6.21-rc6.orig/include/linux/slab.h 2007-04-16 21:55:03.0 -0700 +++ linux-2.6.21-rc6/include/linux/slab.h 2007-04-16 21:55:10.0 -0700 @@ -26,7 +26,6 @@ typedef struct kmem_cache kmem_cache_t _ #define SLAB_POISON0x0800UL/* DEBUG: Poison objects */ #define SLAB_HWCACHE_ALIGN 0x2000UL/* Align objs on cache lines */ #define SLAB_CACHE_DMA 0x4000UL/* Use GFP_DMA memory */ -#define SLAB_MUST_HWCACHE_ALIGN0x8000UL/* Force alignment even if debuggin is active */ #define SLAB_STORE_USER0x0001UL/* DEBUG: Store the last owner for bug hunting */ #define SLAB_RECLAIM_ACCOUNT 0x0002UL/* Objects are reclaimable */ #define SLAB_PANIC 0x0004UL/* Panic if kmem_cache_create() fails */ Index: linux-2.6.21-rc6/mm/slab.c === --- linux-2.6.21-rc6.orig/mm/slab.c 2007-04-16 21:55:16.0 -0700 +++ linux-2.6.21-rc6/mm/slab.c 2007-04-16 21:55:33.0 -0700 @@ -175,12 +175,12 @@ # define CREATE_MASK (SLAB_DEBUG_INITIAL | SLAB_RED_ZONE | \ SLAB_POISON | SLAB_HWCACHE_ALIGN | \ SLAB_CACHE_DMA | \ -SLAB_MUST_HWCACHE_ALIGN | SLAB_STORE_USER | \ +SLAB_STORE_USER | \ SLAB_RECLAIM_ACCOUNT | SLAB_PANIC | \ SLAB_DESTROY_BY_RCU | SLAB_MEM_SPREAD) #else # define CREATE_MASK (SLAB_HWCACHE_ALIGN | \ -SLAB_CACHE_DMA | SLAB_MUST_HWCACHE_ALIGN | \ +SLAB_CACHE_DMA | \ SLAB_RECLAIM_ACCOUNT | SLAB_PANIC | \ SLAB_DESTROY_BY_RCU | SLAB_MEM_SPREAD) #endif Index: linux-2.6.21-rc6/mm/slub.c === --- linux-2.6.21-rc6.orig/mm/slub.c 2007-04-16 21:55:38.0 -0700 +++ linux-2.6.21-rc6/mm/slub.c 2007-04-16 21:56:07.0 -0700 @@ -1500,7 +1500,7 @@ static int calculate_order(int size) static unsigned long calculate_alignment(unsigned long flags, unsigned long align) { - if (flags & (SLAB_MUST_HWCACHE_ALIGN | SLAB_HWCACHE_ALIGN)) + if (flags & SLAB_HWCACHE_ALIGN) return max_t(unsigned long, align, L1_CACHE_BYTES); if (align < ARCH_SLAB_MINALIGN) @@ -3083,8 +3083,7 @@ SLAB_ATTR(reclaim_account); static ssize_t hwcache_align_show(struct kmem_cache *s, char *buf) { - return sprintf(buf, "%d\n", !!(s->flags & - (SLAB_HWCACHE_ALIGN|SLAB_MUST_HWCACHE_ALIGN))); + return sprintf(buf, "%d\n", !!(s->flags & SLAB_HWCACHE_ALIGN)); } SLAB_ATTR_RO(hwcache_align); Index: linux-2.6.21-rc6/arch/powerpc/mm/hugetlbpage.c === --- linux-2.6.21-rc6.orig/arch/powerpc/mm/hugetlbpage.c 2007-04-16 21:58:53.0 -0700 +++ linux-2.6.21-rc6/arch/powerpc/mm/hugetlbpage.c 2007-04-16 21:59:02.0 -0700 @@ -1063,8 +1063,7 @@ static int __init hugetlbpage_init(void) huge_pgtable_cache = kmem_cache_create("hugepte_cache", HUGEPTE_TABLE_SIZE, HUGEPTE_TABLE_SIZE, - SLAB_HWCACHE_ALIGN | - SLAB_MUST_HWCACHE_ALIGN, + SLAB_HWCACHE_ALIGN, zero_ctor, NULL); if (! huge_pgtable_cache) panic("hugetlbpage_init(): could not create hugepte cache\n"); Index: linux-2.6.21-rc6/arch/powerpc/mm/init_64.c === --- linux-2.6.21-rc6.orig/arch/powerpc/mm/init_64.c 2007-04-16 21:59:08.0 -0700 +++ linux-2.6.21-rc6/arch/powerpc/mm/init_64.c 2007-04-16 21:59:19.0 -0700 @@ -183,8 +183,7 @@ void pgtable_cache_init(void) "for size: %08x...\n", name, i, size); pgtable_cache[i] = kmem_cache_create(name, size, size, -SLAB_HWCACHE_ALIGN | -
Re: [patch] CFS (Completely Fair Scheduler), v2
Hi Gene, On Tue, Apr 17, 2007 at 12:53:56AM -0400, Gene Heskett wrote: > On Monday 16 April 2007, Ingo Molnar wrote: > >this is the second release of the CFS (Completely Fair Scheduler) > >patchset, against v2.6.21-rc7: > > > > http://redhat.com/~mingo/cfs-scheduler/sched-cfs-v2.patch > > > >i'd like to thank everyone for the tremendous amount of feedback and > >testing the v1 patch got - i could hardly keep up with just reading the > >mails! Some of the stuff people addressed i couldnt implement yet, i > >mostly concentrated on bugs, regressions and debuggability. > > > >there's a fair amount of churn: > > > > 15 files changed, 456 insertions(+), 241 deletions(-) > > > >But it's an encouraging sign that there was no crash bug found in v1, > >all the bugs were related to scheduling-behavior details. The code was > >tested on 3 architectures so far: i686, x86_64 and ia64. Most of the > >code size increase in -v2 is due to debugging helpers, they'll be > >removed later. (The new /proc/sched_debug file can be used to see the > >fine details of CFS scheduling.) > > > >Changes since -v1: > > > > - make nice levels less starvable. (reported by Willy Tarreau) > > > > - fixed child-runs first. A /proc/sys/kernel/sched_child_runs_first > > flag can be used to turn it on/off. (This might fix the Kaffeine bug > > reported by S.Ça??lar Onur <) > > > > - changed SCHED_FAIR back to SCHED_NORMAL (suggested by Con Kolivas) > > > > - UP build fix. (reported by Gabriel C) > > > > - timer tick micro-optimization (Dmitry Adamushko) > > > > - preemption fix: sched_class->check_preempt_curr method to decide > > whether to preempt after a wakeup (or at a timer tick). (Found via a > > fairness-test-utility written for CFS by Mike Galbraith) > > > > - start forked children with neutral statistics instead of trying to > > inherit them from the parent: Willy Tarreau reported that this > > results in better behavior on extreme workloads, and it also > > simplifies the code quite nicely. Removed sched_exit() and the > > ->task_exit() methods. > > > > - make nice levels independent of the sched_granularity value > > > > - new /proc/sched_debug file listing runqueue details and the rbtree > > > > - new SCH-* fields in /proc//status to see scheduling details > > > > - new cpu-hog feature (off by default) and sysctl tunable to set it: > > /proc/sys/kernel/sched_max_hog_history_ns tunable defaults to > > 0 (off). Positive values are meant the maximum 'memory' that the > > scheduler has of CPU hogs. > > > > - various code cleanups > > > > - added more statistics temporarily: sum_exec_runtime, > > sum_wait_runtime. > > > > - added -CFS-v2 to EXTRAVERSION > > > >as usual, any sort of feedback, bugreports, fixes and suggestions are > >more than welcome, > > > > Ingo > > This one (v2-rc2) is not a keeper I'm sorry to say, Ingo. v2-rc0 was much > better. Watching amanda run with htop, kmails composer is being subjected to > 5 to 10 second pauses, and htop says that gzip -best isn't getting more that > 15% of the cpu, and the /amandatapes drive is being written to in a regular > pattern that seems to be the cause of the pauses according to gkrellm, which > also seems to track the size of the writes, and can show anything from 4.3k > to 54 megs as being written in one cycle of its screen update. Have you tried previous version with the fair-fork patch ? It might be possible that your workload is sensible to the fork()'s child getting much CPU upon startup. Ingo, maybe I'm saying something stupid, but in my userland scheduler, when new tasks are "forked", they are queued at the end of the run queue with a fixed priority. In our case, this would translate into assigning them the same prio and timeslice as their parent, but queuing them at the end so that they don't make existing tasks starve during huge fork() loads. I don't know how that would be possible (nor if that would help in anything), but I found it was a good compromise over sharing the timeslice with the parent. Perhaps we should have some absolute timeslice and some relative timeslice (eg: X percent of total time divided by the number of tasks) ? Regards, Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
On Monday April 16, [EMAIL PROTECTED] wrote: > > cfq_dispatch_insert() was called with rq == 0. This one is getting really > annoying... and md is involved again (RAID0 this time.) Yeah... weird. RAID0 is so light-weight and so different from RAID1 or RAID5 that I feel fairly safe concluding that the problem isn't in or near md. But that doesn't help you. This really feels like a locking problem. The problem occurs when ->next_rq is NULL, but ->sort_list.rb_node is not NULL. That happens plenty of times in the code (particularly as the first request is inserted) but always under ->queue_lock so it should never be visible to cfq_dispatch_insert.. Except that drivers/scsi/ide-scsi.c:idescsi_eh_reset calls elv_next_request which could ultimately call __cfq_dispatch_requests without taking ->queue_lock (that I can see). But you probably aren't using ide-scsi (does anyone?). Given that interrupts are always disabled when queue_lock is taken, it might be useful to add WARN_ON(!irqs_disabled()); every time ->next_rq is set. Something like the following. It might show something useful if we are lucky. NeilBrown diff .prev/block/cfq-iosched.c ./block/cfq-iosched.c --- .prev/block/cfq-iosched.c 2007-04-17 15:01:36.0 +1000 +++ ./block/cfq-iosched.c 2007-04-17 15:02:25.0 +1000 @@ -628,6 +628,7 @@ static void cfq_remove_request(struct re { struct cfq_queue *cfqq = RQ_CFQQ(rq); + BUG_ON(!irqs_disabled()); if (cfqq->next_rq == rq) cfqq->next_rq = cfq_find_next_rq(cfqq->cfqd, cfqq, rq); @@ -1810,6 +1811,7 @@ cfq_rq_enqueued(struct cfq_data *cfqd, s /* * check if this request is a better next-serve candidate)) { */ + BUG_ON(!irqs_disabled()); cfqq->next_rq = cfq_choose_req(cfqd, cfqq->next_rq, rq); BUG_ON(!cfqq->next_rq); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Memory Allocation
Brian D. McGrew wrote: Good evening gents! I need some help in allocating memory and understanding how the system allocates memory with physical versus virtual page tables. Please consider the following snippet of code. Please, no wisecracks about bad code; it was written in 30 seconds in haste :-) (snip) My test machine is a Dell Precision 490 with dual 5140 processors and 3GB of RAM. If I reduced kMaxSize to (2048 * 2048 * 236) is works. However, I need to allocate an array of char that is (2048 * 2048 * 256) and maybe even as large at (2048 * 2048 * 512). Obviously I have enough physical memory in the box to do this. However, I suspect that I'm running out of page table entries. Please, correct me if I'm wrong; but if I allocate (2048 * 2048 * 236) it work. When I Pretty sure you're wrong. increment to 256 or 512 it fails and it is my suspicion that I just don't have enough more in kernel memory to allocate this much memory in user space. Are you using a 32-bit kernel? If so, most likely you're hitting a limit of the address space layout - there's just not enough room in the address space for an allocation of this size. Because of a piece of 3rd party hardware, I'm forced to run the kernel in the 4GB memory model. What I need to be able to do is allocate an array of char (2048 * 2048 * (up to 512)) in user space *** AND *** I need the addresses that I get back to be contiguous, that's just the way my 3rd party hardware works. I'm inclined to believe that this in not specifically a Linux problem but maybe an architecture problem??? But maybe there is some kind of work around in the kernel for it??? I'd find it hard to believe that I'm the first one that ever needed to use this much memory. I ran this same code on two difference Macs. One of them a Powerbook G4 with 4GB of RAM and it was successful. The other was a Macbook Pro with 4GB of RAM and it failed. Both running OS 10.4.9. And of course it runs just lovely on my Sun workstation with Solaris. Thus, I'm thinking it's an Intel/X86 issue! How the heck to I get past this problem in Linux on the X86 plateform??? -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHSET #master] sysfs: make sysfs disconnect immediately on deletion, take 2
Hello, Maneesh. Maneesh Soni wrote: > I started looking at these patches and parallely also did some testing on a > 8 CPU system. I am using the patches from Greg's tree at > http://www.kernel.org/pub/scm/linux/kernel/git/gregkh/patches.git/ > > I ran following loops parallelly > > # while true; do insmod drivers/net/dummy.ko; sleep 1;rmmod dummy; done > # while true; do find /sys/class/net/dummy0 | xargs cat; sleep 1; done > # while true; do umount /sys; sleep 1; mount -t sysfs none /sys; done > # while true; do find /sys | xargs cat > /dev/null; sleep 1; done > > and got the following oops > > Unable to handle kernel NULL pointer dereference at 004c RIP: > [] simple_unlink+0x14/0x5c Eeek... I'll try to replicate and track down the bug here. FWIW, SCSI also oopses if udev is running due to a bug in SCSI open/close handling. Thanks for testing. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Major qla2xxx regression on sparc64
From: Andrew Vasquez <[EMAIL PROTECTED]> Date: Mon, 16 Apr 2007 19:41:07 -0700 > That verbiage sounds fine -- so would you consider the previous patch > I submitted (with module parameter) along with the wording above? Yes, that sounds fine. > I'm in transit for a redeye to NY so I won't be able to modify the > patch, If you would be amenable to the above, Seokmann, could you > rework the patch? Thanks guys. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Blackfin: blackfin on-chip SPI controller driver
On Mon, 2007-04-16 at 18:31 -0800, David Brownell wrote: > Cleaning out some of my pending-reviews queue ... after you address > these comments I think what I'd like to do is sign off on one clean > patch, rather than initial-plus-cleanups. > > Thanks a lot, David. We will try to cleanup the code and most issues pointed out in your review. > On Monday 05 March 2007 2:41 am, Wu, Bryan wrote: > > > --- linux-2.6.orig/drivers/spi/Kconfig 2007-03-01 11:33:07.0 > > +0800 > > +++ linux-2.6/drivers/spi/Kconfig 2007-03-01 11:40:22.0 +0800 > > I'm adjusting this to address the later patches you sent. > > One global comment I'll make, just in case -- please make > sure all your line-start indents only include tabs, and > there's no space-at-end-of-line stuff going on, or lines > wrapping past column 80. > > I did this review in KMail, which doesn't highlight such > minor errors; and I suspect you're mostly OK, but for a > new driver there's no reason not to be 100% OK in those > particular respects! (And I *did* notice one of your > cleanup patches clearly adding tabs-then-spaces indents.) > Yes, I sent out a coding style incremental patch appending in -mm tree. Should I send out a new patch including the coding style clean up and code updated according to this review, or still submit incremental patches to Andrew? > > > @@ -156,7 +156,11 @@ > > # > > # Add new SPI protocol masters in alphabetical order above this line > > # > > - > > +config SPI_BFIN > > + tristate "SPI controller driver for ADI Blackfin5xx" > > + depends on SPI_MASTER && BFIN > > + help > > + This is the SPI controller master driver for Blackfin 5xx processor. > > Please put this in Kconfig up with the other SPI controller drivers, in > alphabetical order. Just like the comment says. > > Likewise, please add it to the Makefile in alphabetical order. > Got it, it should be followed. > > > --- /dev/null 1970-01-01 00:00:00.0 + > > +++ linux-2.6/drivers/spi/spi_bfin5xx.c 2007-03-01 11:40:22.0 > > +0800 > > > +#ifdef DEBUG > > +#define ASSERT(expr) \ > > + if (!(expr)) { \ > > + printk(KERN_DEBUG "assertion failed! %s[%d]: %s\n", \ > > + __FUNCTION__, __LINE__, #expr); \ > > + panic(KERN_DEBUG "%s", __FUNCTION__); \ > > Seems like either WARN_ON(expr) or BUG_ON(expr) will be better. > The general rule of BUG variants is: don't, unless the system > really can't continue operating. (I see a later patch removed > this entirely, good. > > Yes, we are trying to use kernel generic BUG_ON and WARN_ON to replace our own assert function. I fixed this in other code and obviously it was missed in this driver patch. > > + } > > +#else > > +#define ASSERT(expr) > > +#endif > > + > > +#define IS_DMA_ALIGNED(x) (((u32)(x)&0x07)==0) > > + > > +#define DEFINE_SPI_REG(reg, off) \ > > +static inline u16 read_##reg(void) \ > > +{ return *(volatile unsigned short*)(SPI0_REGBASE + off); } \ > > +static inline void write_##reg(u16 v) \ > > +{*(volatile unsigned short*)(SPI0_REGBASE + off) = v;\ > > + SSYNC();} > > These should be readw() and writew() or similar... also, I can't tell > what SSYNC() does, but it sure looks like something that shouldn't be > hidden like that. I/O memory should be mapped such that writes don't > get re-ordered. And flushing any write buffer should not be forced in > such low-level accessors ... if it's needed, it should be done at the > relevant points in the driver. (Which you seem to do in a few places > below. The duplication is undesirable.) > > > > + > > +DEFINE_SPI_REG(CTRL, 0x00) > > ... this particular style of register accessor is not generally used in Linux. > The typical style is > > u16 value = __raw_readw(SPI0_REGBASE + SPI_CTRL) > __raw_writew(SPI0_REGBASE + SPI_CTRL, value); > > or wrapped in macros so spi_readw(CTRL) and spi_writew(CTRL, value) work. > > Of course, SPI1/SPI2/etc should be supported too ... so it's common to have > those take a pointer to some controller struct with a "void __iomem *regs" > pointer to the rgisters for that instance. spi_readw(master, CTRL) etc. > > > > +#define START_STATE ((void*)0) > > +#define RUNNING_STATE ((void*)1) > > +#define DONE_STATE ((void*)2) > > +#define ERROR_STATE ((void*)-1) > > Normally states would be represented by enum values, which among other > things supports "switch (state) { ... }" state machine code. This driver > is full of uncommon idioms, which will make it harder for most kernel > developers to dive in and help. > > Even if you have a style guide internal to Analog which says to do things > this way ... don't. > > Apparently, the driver author Luke wrote this driver based on drivers/spi/pxa2xx_spi.c. These things are all from pxa2xx_spi.c driver. I will update our driver according to your comments. > > + > > +#define QUEUE_RUNNING 0 > > +#define QUEUE_STOPPED 1 >
Re: [patch] CFS (Completely Fair Scheduler), v2
On Monday 16 April 2007, Ingo Molnar wrote: >this is the second release of the CFS (Completely Fair Scheduler) >patchset, against v2.6.21-rc7: > > http://redhat.com/~mingo/cfs-scheduler/sched-cfs-v2.patch > >i'd like to thank everyone for the tremendous amount of feedback and >testing the v1 patch got - i could hardly keep up with just reading the >mails! Some of the stuff people addressed i couldnt implement yet, i >mostly concentrated on bugs, regressions and debuggability. > >there's a fair amount of churn: > > 15 files changed, 456 insertions(+), 241 deletions(-) > >But it's an encouraging sign that there was no crash bug found in v1, >all the bugs were related to scheduling-behavior details. The code was >tested on 3 architectures so far: i686, x86_64 and ia64. Most of the >code size increase in -v2 is due to debugging helpers, they'll be >removed later. (The new /proc/sched_debug file can be used to see the >fine details of CFS scheduling.) > >Changes since -v1: > > - make nice levels less starvable. (reported by Willy Tarreau) > > - fixed child-runs first. A /proc/sys/kernel/sched_child_runs_first > flag can be used to turn it on/off. (This might fix the Kaffeine bug > reported by S.Çağlar Onur <) > > - changed SCHED_FAIR back to SCHED_NORMAL (suggested by Con Kolivas) > > - UP build fix. (reported by Gabriel C) > > - timer tick micro-optimization (Dmitry Adamushko) > > - preemption fix: sched_class->check_preempt_curr method to decide > whether to preempt after a wakeup (or at a timer tick). (Found via a > fairness-test-utility written for CFS by Mike Galbraith) > > - start forked children with neutral statistics instead of trying to > inherit them from the parent: Willy Tarreau reported that this > results in better behavior on extreme workloads, and it also > simplifies the code quite nicely. Removed sched_exit() and the > ->task_exit() methods. > > - make nice levels independent of the sched_granularity value > > - new /proc/sched_debug file listing runqueue details and the rbtree > > - new SCH-* fields in /proc//status to see scheduling details > > - new cpu-hog feature (off by default) and sysctl tunable to set it: > /proc/sys/kernel/sched_max_hog_history_ns tunable defaults to > 0 (off). Positive values are meant the maximum 'memory' that the > scheduler has of CPU hogs. > > - various code cleanups > > - added more statistics temporarily: sum_exec_runtime, > sum_wait_runtime. > > - added -CFS-v2 to EXTRAVERSION > >as usual, any sort of feedback, bugreports, fixes and suggestions are >more than welcome, > > Ingo This one (v2-rc2) is not a keeper I'm sorry to say, Ingo. v2-rc0 was much better. Watching amanda run with htop, kmails composer is being subjected to 5 to 10 second pauses, and htop says that gzip -best isn't getting more that 15% of the cpu, and the /amandatapes drive is being written to in a regular pattern that seems to be the cause of the pauses according to gkrellm, which also seems to track the size of the writes, and can show anything from 4.3k to 54 megs as being written in one cycle of its screen update. Normally hdd will fire up and take it at about 40+M/second steady till its done when there is a file ready to write even if its a 7GB file. And I can type right on during the disk i/o. But not now. In short, I seem to be heavily I/O bound. But when the write to /dev/hdd3 is done, then gzip -best pops right up to 90% plus cpu and I get my machine back. In between file writes I checked the drives speed with hdparm: [EMAIL PROTECTED] ~]# hdparm -Tt /dev/hdd /dev/hdd: Timing cached reads: 856 MB in 2.01 seconds = 426.15 MB/sec Timing buffered disk reads: 222 MB in 3.01 seconds = 73.68 MB/sec That's not too shabby, and obviously dma is active at least for the reading. gzip -best was running while this was executing. So I think the drive is fine and the scheduling is whats funkity. Sorry. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) After they got rid of capital punishment, they had to hang twice as many people as before. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
floppy.ko
Greetings everybody; At some point in the last, say 6 months or so, some patches have been done to the floppy.c area of the tree, and ever since, I have not been able to build the driver in without wasting around a minute during the bootup with lags and squawks about fd1 showing up in the boot trace on screen, but if I go look, its fd0 that's being pounded on by the driver, mainly bitching about not being able to read the first sector, something it repeats several times, like 4 or 5. I have the usual fd0, a 3.5" 1.44 drive, and fd1, a 5.25" 720k drive in this machine, both are enabled in the bios with the correct types being set there. If I insert a disk, and attempt to mount it, the correct lights come on according to what I typed, but I have had a hell of a time trying to get it to write good images of a legacy machines disk format using dd, from files that I can read with khexedit, and I know are correct from that inspection. The only use its getting these days is in the coco/os9 formats, read and written only by dd and some specialty tools from an os9 kit called toolshed, AFAIK. Built as a module, then modprobed for use, I don't recall seeing this problem. Is this fixable, or is it that I just don't know how to handle this newer code? The currently running kernel, 2.6.21-rc7-CFS-v2 has it built in and it gave me static while booting with no disk in either drive. Naming fd1, while banging on fd0 according to the access leds on the drives. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Nobody wants constructive criticism. It's all we can do to put up with constructive praise. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Tue, Apr 17, 2007 at 02:25:39PM +1000, Peter Williams wrote: > Nick Piggin wrote: > >On Mon, Apr 16, 2007 at 04:10:59PM -0700, Michael K. Edwards wrote: > >>On 4/16/07, Peter Williams <[EMAIL PROTECTED]> wrote: > >>>Note that I talk of run queues > >>>not CPUs as I think a shift to multiple CPUs per run queue may be a good > >>>idea. > >>This observation of Peter's is the best thing to come out of this > >>whole foofaraw. Looking at what's happening in CPU-land, I think it's > >>going to be necessary, within a couple of years, to replace the whole > >>idea of "CPU scheduling" with "run queue scheduling" across a complex, > >>possibly dynamic mix of CPU-ish resources. Ergo, there's not much > >>point in churning the mainline scheduler through a design that isn't > >>significantly more flexible than any of those now under discussion. > > > >Why? If you do that, then your load balancer just becomes less flexible > >because it is harder to have tasks run on one or the other. > > > >You can have single-runqueue-per-domain behaviour (or close to) just by > >relaxing all restrictions on idle load balancing within that domain. It > >is harder to go the other way and place any per-cpu affinity or > >restirctions with multiple cpus on a single runqueue. > > Allowing N (where N can be one or greater) CPUs per run queue actually > increases flexibility as you can still set N to 1 to get the current > behaviour. But you add extra code for that on top of what we have, and are also prevented from making per-cpu assumptions. And you can get N CPUs per runqueue behaviour by having them in a domain with no restrictions on idle balancing. So where does your increased flexibilty come from? > One advantage of allowing multiple CPUs per run queue would be at the > smaller end of the system scale i.e. a PC with a single hyper threading > chip (i.e. 2 CPUs) would not need to worry about load balancing at all > if both CPUs used the one runqueue and all the nasty side effects that > come with hyper threading would be minimized at the same time. I don't know about that -- the current load balancer already minimises the nasty multi threading effects. SMT is very important for IBM's chips for example, and they've never had any problem with that side of it since it was introduced and bugs ironed out (at least, none that I've heard). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] [ALSA] Add first generation macbook subsystem ID
From: Abhijit Bhopatkar <[EMAIL PROTECTED]> First generation MacBooks were getting ignored by sigmatel drivers and wrongly being identified as MACMINI. This patch makes them identify as MACBOOK. Signed-off-by: Abhijit Bhopatkar <[EMAIL PROTECTED]> --- sound/pci/hda/patch_sigmatel.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/sound/pci/hda/patch_sigmatel.c b/sound/pci/hda/patch_sigmatel.c index c94291b..cb99df5 100644 --- a/sound/pci/hda/patch_sigmatel.c +++ b/sound/pci/hda/patch_sigmatel.c @@ -1905,6 +1905,9 @@ static int patch_stac922x(struct hda_codec *codec) */ printk(KERN_INFO "hda_codec: STAC922x, Apple subsys_id=%x\n", codec->subsystem_id); switch (codec->subsystem_id) { + case 0x106b0a00: /* MacBook First generatoin */ + spec->board_config = STAC_MACBOOK; + break; case 0x106b0200: /* MacBook Pro first generation */ spec->board_config = STAC_MACBOOK_PRO_V1; break; -- 1.4.4.2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Tue, Apr 17, 2007 at 02:17:22PM +1000, Peter Williams wrote: > Nick Piggin wrote: > >On Tue, Apr 17, 2007 at 04:29:01AM +0200, Mike Galbraith wrote: > >>On Tue, 2007-04-17 at 10:06 +1000, Peter Williams wrote: > >>>Mike Galbraith wrote: > Demystify what? The casual observer need only read either your attempt > at writing a scheduler, or my attempts at fixing the one we have, to see > that it was high time for someone with the necessary skills to step in. > >>>Make that "someone with the necessary clout". > >>No, I was brutally honest to both of us, but quite correct. > >> > Now progress can happen, which was _not_ happening before. > > >>>This is true. > >>Yup, and progress _is_ happening now, quite rapidly. > > > >Progress as in progress on Ingo's scheduler. I still don't know how we'd > >decide when to replace the mainline scheduler or with what. > > > >I don't think we can say Ingo's is better than the alternatives, can we? > >If there is some kind of bakeoff, then I'd like one of Con's designs to > >be involved, and mine, and Peter's... > > I myself was thinking of this as the chance for a much needed > simplification of the scheduling code and if this can be done with the > result being "reasonable" it then gives us the basis on which to propose > improvements based on the ideas of others such as you mention. > > As the size of the cpusched indicates, trying to evaluate alternative > proposals based on the current O(1) scheduler is fraught. Hopefully, I don't know why. The problem is that you can't really evaluate good proposals by looking at the code (you can say that one is bad, ie. the current one, which has a huge amount of temporal complexity and is explicitly unfair), but it is pretty hard to say one behaves well. And my scheduler for example cuts down the amount of policy code and code size significantly. I haven't looked at Con's ones for a while, but I believe they are also much more straightforward than mainline... For example, let's say all else is equal between them, then why would we go with the O(logN) implementation rather than the O(1)? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Nick Piggin wrote: On Mon, Apr 16, 2007 at 04:10:59PM -0700, Michael K. Edwards wrote: On 4/16/07, Peter Williams <[EMAIL PROTECTED]> wrote: Note that I talk of run queues not CPUs as I think a shift to multiple CPUs per run queue may be a good idea. This observation of Peter's is the best thing to come out of this whole foofaraw. Looking at what's happening in CPU-land, I think it's going to be necessary, within a couple of years, to replace the whole idea of "CPU scheduling" with "run queue scheduling" across a complex, possibly dynamic mix of CPU-ish resources. Ergo, there's not much point in churning the mainline scheduler through a design that isn't significantly more flexible than any of those now under discussion. Why? If you do that, then your load balancer just becomes less flexible because it is harder to have tasks run on one or the other. You can have single-runqueue-per-domain behaviour (or close to) just by relaxing all restrictions on idle load balancing within that domain. It is harder to go the other way and place any per-cpu affinity or restirctions with multiple cpus on a single runqueue. Allowing N (where N can be one or greater) CPUs per run queue actually increases flexibility as you can still set N to 1 to get the current behaviour. One advantage of allowing multiple CPUs per run queue would be at the smaller end of the system scale i.e. a PC with a single hyper threading chip (i.e. 2 CPUs) would not need to worry about load balancing at all if both CPUs used the one runqueue and all the nasty side effects that come with hyper threading would be minimized at the same time. Peter -- Peter Williams [EMAIL PROTECTED] "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely FairScheduler [CFS]
On Tue, 17 Apr 2007, Mike Galbraith wrote: Subject: Re: [Announce] [patch] Modular Scheduler Core and Completely FairScheduler [CFS] On Tue, 2007-04-17 at 05:40 +0200, Nick Piggin wrote: On Tue, Apr 17, 2007 at 04:29:01AM +0200, Mike Galbraith wrote: Yup, and progress _is_ happening now, quite rapidly. Progress as in progress on Ingo's scheduler. I still don't know how we'd decide when to replace the mainline scheduler or with what. I don't think we can say Ingo's is better than the alternatives, can we? No, that would require massive performance testing of all alternatives. If there is some kind of bakeoff, then I'd like one of Con's designs to be involved, and mine, and Peter's... The trouble with a bakeoff is that it's pretty darn hard to get people to test in the first place, and then comes weighting the subjective and hard performance numbers. If they're close in numbers, do you go with the one which starts the least flamewars or what? it's especially hard if the people doing the testing need to find the latest patch and apply it. even having a compile-time option to switch between them at least means that the testers can have confidence that the various patches haven't bitrotted. boot time options would be even better, but I understand from previous discussions I've watched that this is performance critical enough that the overhead of this would throw off the results. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Nick Piggin wrote: On Tue, Apr 17, 2007 at 04:29:01AM +0200, Mike Galbraith wrote: On Tue, 2007-04-17 at 10:06 +1000, Peter Williams wrote: Mike Galbraith wrote: Demystify what? The casual observer need only read either your attempt at writing a scheduler, or my attempts at fixing the one we have, to see that it was high time for someone with the necessary skills to step in. Make that "someone with the necessary clout". No, I was brutally honest to both of us, but quite correct. Now progress can happen, which was _not_ happening before. This is true. Yup, and progress _is_ happening now, quite rapidly. Progress as in progress on Ingo's scheduler. I still don't know how we'd decide when to replace the mainline scheduler or with what. I don't think we can say Ingo's is better than the alternatives, can we? If there is some kind of bakeoff, then I'd like one of Con's designs to be involved, and mine, and Peter's... I myself was thinking of this as the chance for a much needed simplification of the scheduling code and if this can be done with the result being "reasonable" it then gives us the basis on which to propose improvements based on the ideas of others such as you mention. As the size of the cpusched indicates, trying to evaluate alternative proposals based on the current O(1) scheduler is fraught. Hopefully, this initiative can fix this problem. Then we just need Ingo to listen to suggestions and he's showing signs of being willing to do this :-) Maybe the progress is that more key people are becoming open to the idea of changing the scheduler. That too. Peter -- Peter Williams [EMAIL PROTECTED] "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Tue, Apr 17, 2007 at 06:01:29AM +0200, Mike Galbraith wrote: > On Tue, 2007-04-17 at 05:40 +0200, Nick Piggin wrote: > > On Tue, Apr 17, 2007 at 04:29:01AM +0200, Mike Galbraith wrote: > > > > Yup, and progress _is_ happening now, quite rapidly. > > > > Progress as in progress on Ingo's scheduler. I still don't know how we'd > > decide when to replace the mainline scheduler or with what. > > > > I don't think we can say Ingo's is better than the alternatives, can we? > > No, that would require massive performance testing of all alternatives. > > > If there is some kind of bakeoff, then I'd like one of Con's designs to > > be involved, and mine, and Peter's... > > The trouble with a bakeoff is that it's pretty darn hard to get people > to test in the first place, and then comes weighting the subjective and > hard performance numbers. If they're close in numbers, do you go with > the one which starts the least flamewars or what? I don't know how you'd do it. I know you wouldn't count people telling you how good they are (getting people to tell you how bad they are, and whether others do better in a given situation might be slightly move viable). But we have to choose somehow. I'd hope that is going to be based solely on the results and technical properties of the code, so... if we were to somehow determine that the results are exactly the same, we'd go for the the simpler one, wouldn't we? > > Maybe the progress is that more key people are becoming open to the idea > > of changing the scheduler. > > Could be. All was quiet for quite a while, but when RSDL showed up, it > aroused enough interest to show that scheduling woes is on folks radar. Well I know people have had woes with the scheduler for ever (I guess that isn't going to change :P). I think people generally lost a bit of interest in trying to improve the situation because of the upstream problem. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS (Completely Fair Scheduler), v2
Ingo Molnar wrote: this is the second release of the CFS (Completely Fair Scheduler) patchset, against v2.6.21-rc7: http://redhat.com/~mingo/cfs-scheduler/sched-cfs-v2.patch i'd like to thank everyone for the tremendous amount of feedback and testing the v1 patch got - i could hardly keep up with just reading the mails! Some of the stuff people addressed i couldnt implement yet, i mostly concentrated on bugs, regressions and debuggability. Can I make a suggestion? Would it be possible (from now on) to publish changes relevant to the previous patch (eventually leading to a series of patches that describes the evolution of the new scheduler) so that it's easier for us reviewers/critics to see the latest changes. E.g. if import such changes into something like quilt (using my gquilt GUI wrapper, of course :-)) I can then use meld (or similar) to follow what's going as suggestions get folded in and bugs get fixed etc. Thanks Peter -- Peter Williams [EMAIL PROTECTED] "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Tue, 2007-04-17 at 05:40 +0200, Nick Piggin wrote: > On Tue, Apr 17, 2007 at 04:29:01AM +0200, Mike Galbraith wrote: > > Yup, and progress _is_ happening now, quite rapidly. > > Progress as in progress on Ingo's scheduler. I still don't know how we'd > decide when to replace the mainline scheduler or with what. > > I don't think we can say Ingo's is better than the alternatives, can we? No, that would require massive performance testing of all alternatives. > If there is some kind of bakeoff, then I'd like one of Con's designs to > be involved, and mine, and Peter's... The trouble with a bakeoff is that it's pretty darn hard to get people to test in the first place, and then comes weighting the subjective and hard performance numbers. If they're close in numbers, do you go with the one which starts the least flamewars or what? > Maybe the progress is that more key people are becoming open to the idea > of changing the scheduler. Could be. All was quiet for quite a while, but when RSDL showed up, it aroused enough interest to show that scheduling woes is on folks radar. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Mon, Apr 16, 2007 at 04:10:59PM -0700, Michael K. Edwards wrote: > On 4/16/07, Peter Williams <[EMAIL PROTECTED]> wrote: > >Note that I talk of run queues > >not CPUs as I think a shift to multiple CPUs per run queue may be a good > >idea. > > This observation of Peter's is the best thing to come out of this > whole foofaraw. Looking at what's happening in CPU-land, I think it's > going to be necessary, within a couple of years, to replace the whole > idea of "CPU scheduling" with "run queue scheduling" across a complex, > possibly dynamic mix of CPU-ish resources. Ergo, there's not much > point in churning the mainline scheduler through a design that isn't > significantly more flexible than any of those now under discussion. Why? If you do that, then your load balancer just becomes less flexible because it is harder to have tasks run on one or the other. You can have single-runqueue-per-domain behaviour (or close to) just by relaxing all restrictions on idle load balancing within that domain. It is harder to go the other way and place any per-cpu affinity or restirctions with multiple cpus on a single runqueue. > For instance, there are architectures where several "CPUs" > (instruction stream decoders feeding execution pipelines) share parts > of a cache hierarchy ("chip-level multitasking"). On these machines, > you may want to co-schedule a "real" processing task on one pipeline > with a "cache warming" task on the other pipeline -- but only for > tasks whose memory access patterns have been sufficiently analyzed to > write the "cache warming" task code. Some other tasks may want to > idle the second pipeline so they can use the full cache-to-RAM > bandwidth. Yet other tasks may be genuinely CPU-intensive (or I/O > bound but so context-heavy that it's not worth yielding the CPU during > quick I/Os), and hence perfectly happy to run concurrently with an > unrelated task on the other pipeline. We can do all that now with load balancing, affinities or by shutting down threads dynamically. > There are other architectures where several "hardware threads" fight > over parts of a cache hierarchy (sometimes bizarrely described as > "sharing" the cache, kind of the way most two-year-olds "share" toys). > On these machines, one instruction pipeline can't help the other > along cache-wise, but it sure can hurt. A scheduler designed, tested, > and tuned principally on one of these architectures (hint: > "hyperthreading") will probably leave a lot of performance on the > floor on processors in the former category. > > In the not-so-distant future, we're likely to see architectures with > dynamically reconfigurable interconnect between instruction issue > units and execution resources. (This is already quite feasible on, > say, Virtex4 FX devices with multiple PPC cores, or Altera FPGAs with > as many Nios II cores as fit on the chip.) Restoring task context may > involve not just MMU swaps and FPU instructions (with state-dependent > hidden costs) but processsor reconfiguration. Achieving "fairness" > according to any standard that a platform integrator cares about (let > alone an end user) will require a fairly detailed model of the hidden > costs associated with different sorts of task switch. > > So if you are interested in schedulers for some reason other than a > paycheck, let the distros worry about 5% improvements on x86[_64]. > Get hold of some different "hardware" -- say: > - a Xilinx ML410 if you've got $3K to blow and want to explore > reconfigurable processors; > - a SunFire T2000 if you've got $11K and want to mess with a CMT > system that's actually shipping; > - a QEMU-simulated massively SMP x86 if you're poor but clever > enough to implement funky cross-core cache effects yourself; or > - a cycle-accurate simulator from Gaisler or Virtio if you want a > real research project. > Then go explore some more interesting regions of parameter space and > see what the demands on mainline Linux will look like in a few years. There are no doubt improvements to be made, but they are generally intended to be able to be done within the sched-domains framework. I am not aware of a particular need that would be impossible to do using that topology hierarchy and per-CPU runqueues, and there are added complications involved with multiple CPUs per runqueue. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [v4l-dvb-maintainer] [GIT PATCHES] V4L/DVB updates
> > I have tested these patches with 2.6.20-mh1 + v4l-dvb-b5be3479f070 patchset. > > I also tried 2.6.21-rc6 + v4l-dvb-b5be3479f070 patchset and this > > combination > > also works without OOPS. > > > Yes, that shows that the changesets prevent the oops, but it says > nothing about vanilla 2.6.20.y > > Winfast dongles are both dvb-usb based (DiBcom 3000M-C and DiBcom > > DiB7000P), > > but pluto2 is cardbus (pci) based. > > > just as I figured. The pluto2 test results are great to hear, though -- > thank you. > > I think we can include these patches into 2.6.21 and if we receive any > > problem, we still have 2.6.21.Z for fixing, don't we? > > The stable kernel series is not there for that purpose. It is not there > to encourage a rush of patches into a final kernel release, only to > cause potential problems, with the 2.6.x.y series as a fallback for > fixes. We should avoid the need for such last-minute fixes wherever > possible. For sure we should do the best to avoid regressions. But, IMO, a driver for a hotpluggable device (USB) that can't support device hot plug is a serious issue. If nobody have an issue pointing regressions on this, we should really apply the fix for 2.6.21. -- Cheers, Mauro - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Tue, Apr 17, 2007 at 04:29:01AM +0200, Mike Galbraith wrote: > On Tue, 2007-04-17 at 10:06 +1000, Peter Williams wrote: > > Mike Galbraith wrote: > > > > > > Demystify what? The casual observer need only read either your attempt > > > at writing a scheduler, or my attempts at fixing the one we have, to see > > > that it was high time for someone with the necessary skills to step in. > > > > Make that "someone with the necessary clout". > > No, I was brutally honest to both of us, but quite correct. > > > > Now progress can happen, which was _not_ happening before. > > > > > > > This is true. > > Yup, and progress _is_ happening now, quite rapidly. Progress as in progress on Ingo's scheduler. I still don't know how we'd decide when to replace the mainline scheduler or with what. I don't think we can say Ingo's is better than the alternatives, can we? If there is some kind of bakeoff, then I'd like one of Con's designs to be involved, and mine, and Peter's... Maybe the progress is that more key people are becoming open to the idea of changing the scheduler. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2/2] 2.6.21-rc7: known regressions
On Mon, Apr 16, 2007 at 10:26:43AM +0100, Richard Purdie wrote: > > > > > CONFIG_FB_BACKLIGHT=y > > > > > CONFIG_ACPI_VIDEO=n > > > > > > > > That also gets me a dead display. Backlight doesn't turn back on. > > > > > > Anything under /sys/class/backlight? > > > > Entries from ibm_acpi. I rmmod'd that, leaving the dir empty, > > and it still fails. > > What happens if you never load ibm-acpi? Same thing. No backlight on resume. I rm'd the .ko, so there's no chance it got loaded. > I'm a bit puzzled as CONFIG_FB_BACKLIGHT doesn't do anything with the > intelfb driver. One thing it does do is set > CONFIG_BACKLIGHT_CLASS_DEVICE. When you disabled FB_BACKLIGHT and got a > working display on resume, was that set and was (or had) ibm-acpi been > loaded? > > A variety of other options such as ACPI_IBM also set > CONFIG_BACKLIGHT_CLASS_DEVICE although without a backlight driver it > will do nothing hence the suspicion is on ibm-acpi, perhaps interacting > with the backlight class badly. > > Does echoing numbers to /sys/class/backlight/ibm_acpi/brightness change > the backlight brightness as expected? /sys/class/backlight/ibm/brightness takes a value from 0 to 7. Starts off with a default of 0. I tried all values in there, and it made no visible difference. But as the no-backlight thing happens without this even loaded, I think this is a separate problem. > If you can ssh into the machine > after its resumed with the display problem, it would be interesting to > know what the brightness was and if changing it helped too... When the backlight doesn't come on, for some reason, nothing else runs. Capslock works, so it's at least partially alive, but even doing.. echo mem > /sys/power/state ; echo foo >/bar ; sync results in no /bar being created. Ethernet remains down when its in this state too. It's the reason it's taken this long to get any debug info out of it at all. Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Mon, Apr 16, 2007 at 09:28:24AM -0500, Matt Mackall wrote: > On Mon, Apr 16, 2007 at 05:03:49AM +0200, Nick Piggin wrote: > > I'd prefer if we kept a single CPU scheduler in mainline, because I > > think that simplifies analysis and focuses testing. > > I think you'll find something like 80-90% of the testing will be done > on the default choice, even if other choices exist. So you really > won't have much of a problem here. > > But when the only choice for other schedulers is to go out-of-tree, > then only 1% of the people will try it out and those people are > guaranteed to be the ones who saw scheduling problems in mainline. > So the alternative won't end up getting any testing on many of the > workloads that work fine in mainstream so their feedback won't tell > you very much at all. Yeah I concede that perhaps it is the only way to get things going any further. But how do we decide if and when the current scheduler should be demoted from default, and which should replace it? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.21-rc7 hpt366 driver broken
On Mon, 16 Apr 2007 19:43:03 -0700 Mike Mattie <[EMAIL PROTECTED]> wrote: > On Mon, 16 Apr 2007 18:21:12 -0700 > Mike Mattie <[EMAIL PROTECTED]> wrote: > > > On Mon, 16 Apr 2007 16:36:13 +0200 > > Adrian Bunk <[EMAIL PROTECTED]> wrote: > > > > > [ Cc's added, full bug report was in > > > http://lkml.org/lkml/2007/4/16/18 ] > > > > > > On Mon, Apr 16, 2007 at 04:38:22AM -0700, Mike Mattie wrote: > > > > On Sun, 15 Apr 2007 22:48:46 -0700 > > > > Mike Mattie <[EMAIL PROTECTED]> wrote: > > > > > > > > > Hello, > > > > > > > > > > I am testing the 2.6.21-rc7 kernel release. The IDE hpt366 > > > > > driver is crashing hanging the boot. I have basically the same > > > > > config as 2.6.20.7 which works fine (except for netconsole > > > > > mentioned in a previous mail). > > > > > > > > > > here is the hand-copied info: > > > > > > > > > > * "unable to handle paging request" , null deref > > > > > * EIP @ init_chipset_hpt366 > > > > > > > > > > > > > > I am running a git-bisect to see if I can resolve it to a > > > > > commit. > > > > > > > > This was identified as the first broken commit: > > > > > > > > commit 7b73ee05d0acb926923d43d78b61add776ea4bb1 > > > > Author: Sergei Shtylyov <[EMAIL PROTECTED]> > > > > Date: Wed Feb 7 18:18:16 2007 +0100 > > > > > > > > hpt366: init code rewrite > > > > > > > > Reverting is conflicted so it will be a bit longer before I > > > > pin-point any other build-breaks. > > > > > > Thanks for your report. > > > > > > Can you use a digital camera for taking a photograph of the crash? > > > > I can later on tonight, by about 11PM west coast. I also saw > > some hex offsets after the function pointed to by EIP, is there > > a way to decode that to a line number ? I have debugging symbols > > enabled. > > > > I am also doing printk breadcrumbs to pin it down to a block > > or a line. > > I have narrowed the crash with breadcrumbs down to these lines: > > > /* >* Only try the DPLL if we don't have a table for the PCI > clock that >* we are running at for HPT370/A, always use it for > anything newer... * >* NOTE: Using the internal DPLL results in slow reads on 33 > MHz PCI. >* We also don't like using the DPLL because this causes > glitches >* on PRST-/SRST- when the state engine gets reset... >*/ > if (info->chip_type >= HPT374 || info->settings[clock] == > NULL) { u16 f_low, delta = pci_clk < 50 ? 2 : 4; > int adjust; > > printk(KERN_INFO "inside the if\n"); > >/* > * Select 66 MHz DPLL clock only if UltraATA/133 > mode is > * supported/enabled, use 50 MHz DPLL clock > otherwise... */ > if (info->max_mode == 0x04) { > dpll_clk = 66; > clock = ATA_CLOCK_66MHZ; > } else if (dpll_clk) { /* HPT36x chips don't > have DPLL */ dpll_clk = 50; > clock = ATA_CLOCK_50MHZ; > } > > if (info->settings[clock] == NULL) { crashes here since info is deref'd all over the place I am assuming it is the array that is blowing up. I printk'd the value of clock which is "4". that array is either not setup correctly , or it is out-of-bounds (speculation) > printk(KERN_ERR "%s: unknown bus timing!\n", > name); kfree(info); > return -EIO; > } > > printk(KERN_INFO "select DPLL clock\n"); > > This is right around 1171 , (skewed by the crumbs I added). The last > message I receive is "inside if" , it dies before "select DPLL clock". > > Without knowing much about the structs I am not sure what to > print-out. I will narrow it further, and maybe even compare against > what the old working kernel had for variable values. That would take > some time though. > > > > > > cu > > > Adrian > > > > > > -- > > > > > >"Is there not promise of rain?" Ling Tan asked suddenly out > > > of the darkness. There had been need of rain for many > > > days. "Only a promise," Lao Er said. > > >Pearl S. Buck - Dragon Seed > > > signature.asc Description: PGP signature
Re: BUG: Bad page state errors during kernel make (resolved)
Zach Carter wrote: Do you think there might be other bad hw, or another explanation? Well, after updating the BIOS for the motherboard, I was able to rebuild the kernel 6 times in a row with no page state errors. I noticed that the recent BIOS update includes "Enhanced compatibility with Linux": http://www.abit-usa.com/products/mb/bios.php?categories=1=316 In case anyone searching the ML archive has the same problem, the motherboard is an ABIT KN9 ULTRA Socket AM2 NVIDIA nForce 570 Ultra MCP ATX -Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPU_IDLE prevents resuming from STR [was: Re: 2.6.21-rc6-mm1]
On Mon, 16 Apr 2007, Shaohua Li wrote: On Sat, 2007-04-14 at 01:45 +0200, Mattia Dongili wrote: ... please check if the patch at http://marc.info/?l=linux-acpi=117523651630038=2 fixed the issue I have the same system as Mattia, and when I applied this patch and turned CPU_IDLE back on, I got a panic on boot. Unfortunately, the EIP scrolled off screen, so I can't get a line number. (I had the same STR breakage as him; STR did not work with CPU_IDLE turned on, and it did work with CPU_IDLE turned off.) I'm running +rc6+mm(April 11) on a Sony VAIO SZ. joshua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Blackfin: blackfin on-chip SPI controller driver
Cleaning out some of my pending-reviews queue ... after you address these comments I think what I'd like to do is sign off on one clean patch, rather than initial-plus-cleanups. On Monday 05 March 2007 2:41 am, Wu, Bryan wrote: > --- linux-2.6.orig/drivers/spi/Kconfig2007-03-01 11:33:07.0 > +0800 > +++ linux-2.6/drivers/spi/Kconfig 2007-03-01 11:40:22.0 +0800 I'm adjusting this to address the later patches you sent. One global comment I'll make, just in case -- please make sure all your line-start indents only include tabs, and there's no space-at-end-of-line stuff going on, or lines wrapping past column 80. I did this review in KMail, which doesn't highlight such minor errors; and I suspect you're mostly OK, but for a new driver there's no reason not to be 100% OK in those particular respects! (And I *did* notice one of your cleanup patches clearly adding tabs-then-spaces indents.) > @@ -156,7 +156,11 @@ > # > # Add new SPI protocol masters in alphabetical order above this line > # > - > +config SPI_BFIN > + tristate "SPI controller driver for ADI Blackfin5xx" > + depends on SPI_MASTER && BFIN > + help > + This is the SPI controller master driver for Blackfin 5xx processor. Please put this in Kconfig up with the other SPI controller drivers, in alphabetical order. Just like the comment says. Likewise, please add it to the Makefile in alphabetical order. > --- /dev/null 1970-01-01 00:00:00.0 + > +++ linux-2.6/drivers/spi/spi_bfin5xx.c 2007-03-01 11:40:22.0 > +0800 > +#ifdef DEBUG > +#define ASSERT(expr) \ > + if (!(expr)) { \ > + printk(KERN_DEBUG "assertion failed! %s[%d]: %s\n", \ > +__FUNCTION__, __LINE__, #expr); \ > + panic(KERN_DEBUG "%s", __FUNCTION__); \ Seems like either WARN_ON(expr) or BUG_ON(expr) will be better. The general rule of BUG variants is: don't, unless the system really can't continue operating. (I see a later patch removed this entirely, good. > + } > +#else > +#define ASSERT(expr) > +#endif > + > +#define IS_DMA_ALIGNED(x) (((u32)(x)&0x07)==0) > + > +#define DEFINE_SPI_REG(reg, off) \ > +static inline u16 read_##reg(void) \ > +{ return *(volatile unsigned short*)(SPI0_REGBASE + off); } \ > +static inline void write_##reg(u16 v) \ > +{*(volatile unsigned short*)(SPI0_REGBASE + off) = v;\ > + SSYNC();} These should be readw() and writew() or similar... also, I can't tell what SSYNC() does, but it sure looks like something that shouldn't be hidden like that. I/O memory should be mapped such that writes don't get re-ordered. And flushing any write buffer should not be forced in such low-level accessors ... if it's needed, it should be done at the relevant points in the driver. (Which you seem to do in a few places below. The duplication is undesirable.) > + > +DEFINE_SPI_REG(CTRL, 0x00) ... this particular style of register accessor is not generally used in Linux. The typical style is u16 value = __raw_readw(SPI0_REGBASE + SPI_CTRL) __raw_writew(SPI0_REGBASE + SPI_CTRL, value); or wrapped in macros so spi_readw(CTRL) and spi_writew(CTRL, value) work. Of course, SPI1/SPI2/etc should be supported too ... so it's common to have those take a pointer to some controller struct with a "void __iomem *regs" pointer to the rgisters for that instance. spi_readw(master, CTRL) etc. > +#define START_STATE ((void*)0) > +#define RUNNING_STATE ((void*)1) > +#define DONE_STATE ((void*)2) > +#define ERROR_STATE ((void*)-1) Normally states would be represented by enum values, which among other things supports "switch (state) { ... }" state machine code. This driver is full of uncommon idioms, which will make it harder for most kernel developers to dive in and help. Even if you have a style guide internal to Analog which says to do things this way ... don't. > + > +#define QUEUE_RUNNING 0 > +#define QUEUE_STOPPED 1 > + > +int dma_requested; > +char chip_select_flag; These should probably be members of the per-controller state struct, and otherwise should certainly be static. This driver exports a LOT of stuff that should be static ... > + > +struct driver_data { Not the most explanatory of names. Could you do better? > + /* Driver model hookup */ > + struct platform_device *pdev; > + > + /* SPI framework hookup */ > + struct spi_master *master; > + > + /* BFIN hookup */ > + struct bfin5xx_spi_master *master_info; I would have assumed this struct would *BE* the Blackfin-specific spi_master state ... > + > + /* Driver message queue */ > + struct workqueue_struct *workqueue; > + struct work_struct pump_messages; > + spinlock_t lock; > + struct list_head queue; > + int busy; > + int run; > + > + /* Message Transfer pump */ > + struct tasklet_struct pump_transfers; > + > + /* Current message transfer state info */ > +
Re: [Patch -mm 0/3] RFC: module unloading vs. release function
On Tue, 2007-04-17 at 00:44 +0400, Alexey Dobriyan wrote: > On Mon, Apr 16, 2007 at 03:38:52PM -0400, Alan Stern wrote: > > 3. Change the module code so that rmmod can return _before_ the > > module is actually unloaded from memory (but after the module's > > exit routine has completed). This will lead to more problems. > > For example, what if someone tries to modprobe my_module back > > again before it has finished unloading? > > This problem (or its absence) must be already in tree: module_mutex is > dropped for the duration of ->exit() function, so init_module(2) could > load new old module meanwhile. Only if you give it a different name when loading it the second time. Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPU_IDLE prevents resuming from STR [was: Re: 2.6.21-rc6-mm1]
On Mon, 2007-04-16 at 22:50 -0400, Joshua Wise wrote: > On Mon, 16 Apr 2007, Shaohua Li wrote: > > On Sat, 2007-04-14 at 01:45 +0200, Mattia Dongili wrote: > >> ... > > please check if the patch at > > http://marc.info/?l=linux-acpi=117523651630038=2 fixed the issue > > I have the same system as Mattia, and when I applied this patch and turned > CPU_IDLE back on, I got a panic on boot. Unfortunately, the EIP scrolled off > screen, so I can't get a line number. > > (I had the same STR breakage as him; STR did not work with CPU_IDLE turned > on, and it did work with CPU_IDLE turned off.) > > I'm running +rc6+mm(April 11) on a Sony VAIO SZ. Is it possible you can get the log from a serial? I thought at least you can see some log info in the screen, if you haven't serial, please write it down. The boot panic surprise me, as it works here. Thanks, Shaohua - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch -mm 3/3] RFC: Introduce kobject->owner for refcounting.
On Mon, 2007-04-16 at 15:53 -0400, Alan Stern wrote: > The fundamental rule is that whenever you hand out a pointer to a routine > living in a module, the receiver has to increment the module's refcount. > But the driver core violates this rule all over the place. Hi Alan, Your rule is overly simplistic, unfortunately. You have two choices: take a reference count, *or* ensure that the reference will go away when the module's cleanup routine is called. Network drivers are a classic example of the latter. Note that you cannot do both: if the cleanup routine calls something which drops a reference count, it implies that the cleanup routine needs to be called with non-zero reference count, and it won't be (ignoring --force). I hope that clarifies? Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Tue, 2007-04-17 at 10:06 +1000, Peter Williams wrote: > Mike Galbraith wrote: > > > > Demystify what? The casual observer need only read either your attempt > > at writing a scheduler, or my attempts at fixing the one we have, to see > > that it was high time for someone with the necessary skills to step in. > > Make that "someone with the necessary clout". No, I was brutally honest to both of us, but quite correct. > > Now progress can happen, which was _not_ happening before. > > > > This is true. Yup, and progress _is_ happening now, quite rapidly. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.21-rc7 hpt366 driver broken
On Mon, 16 Apr 2007 18:21:12 -0700 Mike Mattie <[EMAIL PROTECTED]> wrote: > On Mon, 16 Apr 2007 16:36:13 +0200 > Adrian Bunk <[EMAIL PROTECTED]> wrote: > > > [ Cc's added, full bug report was in > > http://lkml.org/lkml/2007/4/16/18 ] > > > > On Mon, Apr 16, 2007 at 04:38:22AM -0700, Mike Mattie wrote: > > > On Sun, 15 Apr 2007 22:48:46 -0700 > > > Mike Mattie <[EMAIL PROTECTED]> wrote: > > > > > > > Hello, > > > > > > > > I am testing the 2.6.21-rc7 kernel release. The IDE hpt366 > > > > driver is crashing hanging the boot. I have basically the same > > > > config as 2.6.20.7 which works fine (except for netconsole > > > > mentioned in a previous mail). > > > > > > > > here is the hand-copied info: > > > > > > > > * "unable to handle paging request" , null deref > > > > * EIP @ init_chipset_hpt366 > > > > > > > > > > > I am running a git-bisect to see if I can resolve it to a > > > > commit. > > > > > > This was identified as the first broken commit: > > > > > > commit 7b73ee05d0acb926923d43d78b61add776ea4bb1 > > > Author: Sergei Shtylyov <[EMAIL PROTECTED]> > > > Date: Wed Feb 7 18:18:16 2007 +0100 > > > > > > hpt366: init code rewrite > > > > > > Reverting is conflicted so it will be a bit longer before I > > > pin-point any other build-breaks. > > > > Thanks for your report. > > > > Can you use a digital camera for taking a photograph of the crash? > > I can later on tonight, by about 11PM west coast. I also saw > some hex offsets after the function pointed to by EIP, is there > a way to decode that to a line number ? I have debugging symbols > enabled. > > I am also doing printk breadcrumbs to pin it down to a block > or a line. I have narrowed the crash with breadcrumbs down to these lines: /* * Only try the DPLL if we don't have a table for the PCI clock that * we are running at for HPT370/A, always use it for anything newer... * * NOTE: Using the internal DPLL results in slow reads on 33 MHz PCI. * We also don't like using the DPLL because this causes glitches * on PRST-/SRST- when the state engine gets reset... */ if (info->chip_type >= HPT374 || info->settings[clock] == NULL) { u16 f_low, delta = pci_clk < 50 ? 2 : 4; int adjust; printk(KERN_INFO "inside the if\n"); /* * Select 66 MHz DPLL clock only if UltraATA/133 mode is * supported/enabled, use 50 MHz DPLL clock otherwise... */ if (info->max_mode == 0x04) { dpll_clk = 66; clock = ATA_CLOCK_66MHZ; } else if (dpll_clk) { /* HPT36x chips don't have DPLL */ dpll_clk = 50; clock = ATA_CLOCK_50MHZ; } if (info->settings[clock] == NULL) { printk(KERN_ERR "%s: unknown bus timing!\n", name); kfree(info); return -EIO; } printk(KERN_INFO "select DPLL clock\n"); This is right around 1171 , (skewed by the crumbs I added). The last message I receive is "inside if" , it dies before "select DPLL clock". Without knowing much about the structs I am not sure what to print-out. I will narrow it further, and maybe even compare against what the old working kernel had for variable values. That would take some time though. > > > cu > > Adrian > > > > -- > > > >"Is there not promise of rain?" Ling Tan asked suddenly out > > of the darkness. There had been need of rain for many days. > >"Only a promise," Lao Er said. > >Pearl S. Buck - Dragon Seed > > signature.asc Description: PGP signature
Re: Major qla2xxx regression on sparc64
On Mon, 16 Apr 2007, David Miller wrote: > From: Andrew Vasquez <[EMAIL PROTECTED]> > Date: Mon, 16 Apr 2007 16:47:05 -0700 > > > Dave, according to your earlier emails, the qla2xxx driver worked > > 'fine' in driver versions before commit > > 7aef45ac92f49e76d990b51b7ecd714b9a608be1. If that were the case, then > > you would have seen the warning messages: > > > > ... > > qla_printk(KERN_WARNING, ha, "Falling back to functioning (yet " > > "invalid -- WWPN) defaults.\n"); > > I have in fact seen the message several times and that messages gives > me no reason to believe something needs to be fixed. > > It should have said "PLEASE REPORT THIS to [EMAIL PROTECTED]" or > something similar to indicate the severity better. > > "An invalid WWPN, what's that?" said the user. :) > > How about "FC IDs may conflict and cause miscommunication! Please > report to driver author so this can be fixed!" or similar? That verbiage sounds fine -- so would you consider the previous patch I submitted (with module parameter) along with the wording above? I'm in transit for a redeye to NY so I won't be able to modify the patch, If you would be amenable to the above, Seokmann, could you rework the patch? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.21-rc6-mm1 ATA HPT37x regression
Hi Jeff and crew, I was just testing out 2.6.21-rc6-mm1 to test some Cyclades patches and I noticed that my HPT302 (rev1) controller with a pair of 120gb WD disks are not longer detected and I get the following in the dmesg logs: [ 148.121490] hpt37x: DPLL did not stabilize. Where before, under 2.6.21-rc6 I got the following: [ 173.749349] pata_hpt37x: BIOS has not set timing clocks. [ 173.752949] hpt37x: HPT302: Bus clock 33MHz. [ 173.754409] ACPI: PCI Interrupt :03:06.0[A] -> GSI 18 (level, low) -> IRQ 18 [ 173.758403] ata5: PATA max UDMA/133 cmd 0x0001ecf8 ctl 0x0001ecf2 bmdma 0x000 1e800 irq 18 [ 173.761396] ata6: PATA max UDMA/133 cmd 0x0001ece0 ctl 0x0001ecda bmdma 0x000 1e808 irq 18 [ 173.764319] scsi6 : pata_hpt37x [ 173.928997] ATA: abnormal status 0x78 on port 0x0001ecff [ 173.930511] scsi7 : pata_hpt37x [ 174.094906] ATA: abnormal status 0x8 on port 0x0001ece7 Here's my lspci infomation on the board, it's an addon. My apologies for the crappy word wrapping, xterms inside screen, etc. 03:06.0 RAID bus controller: Triones Technologies, Inc. HPT302/302N (rev 01) Subsystem: Triones Technologies, Inc. Unknown device 0001 Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Step ping- SERR+ FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Memory Allocation
Good evening gents! I need some help in allocating memory and understanding how the system allocates memory with physical versus virtual page tables. Please consider the following snippet of code. Please, no wisecracks about bad code; it was written in 30 seconds in haste :-) #include #include #include #include const static u_long kMaxSize = (2048 * 2048 * 256); void *msg(void *ptr); static u_long threads_done = 0; int main(int argc, char *argv[]) { pthread_t thread1; pthread_t thread2; char *message1 = "Thread 1"; char *message2 = "Thread 2"; int iret1; int iret2; iret1 = pthread_create(, NULL, msg, (void *) message1); iret2 = pthread_create(, NULL, msg, (void *) message2); //pthread_join(thread1, NULL); //pthread_join(thread2, NULL); while (threads_done < 2) { std::cout << "Threads complete: " << threads_done << std::endl; sleep(3); } exit(0); } void * msg(void *ptr) { char *message = (char *) ptr; // // Equal to 1 bank per thread of 256 each 4MP image buffers. 2GB. // char *buffer = new char[kMaxSize]; u_long max = kMaxSize; // // Init each buffer to 'something'. // for (u_long inx = 0; inx < max; inx++) { if (inx % 10240 == 0) { std::cout << message << ": Index: " << inx << std::endl; } buffer[inx] = inx; } free(buffer); threads_done++; } My test machine is a Dell Precision 490 with dual 5140 processors and 3GB of RAM. If I reduced kMaxSize to (2048 * 2048 * 236) is works. However, I need to allocate an array of char that is (2048 * 2048 * 256) and maybe even as large at (2048 * 2048 * 512). Obviously I have enough physical memory in the box to do this. However, I suspect that I'm running out of page table entries. Please, correct me if I'm wrong; but if I allocate (2048 * 2048 * 236) it work. When I increment to 256 or 512 it fails and it is my suspicion that I just don't have enough more in kernel memory to allocate this much memory in user space. Because of a piece of 3rd party hardware, I'm forced to run the kernel in the 4GB memory model. What I need to be able to do is allocate an array of char (2048 * 2048 * (up to 512)) in user space *** AND *** I need the addresses that I get back to be contiguous, that's just the way my 3rd party hardware works. I'm inclined to believe that this in not specifically a Linux problem but maybe an architecture problem??? But maybe there is some kind of work around in the kernel for it??? I'd find it hard to believe that I'm the first one that ever needed to use this much memory. I ran this same code on two difference Macs. One of them a Powerbook G4 with 4GB of RAM and it was successful. The other was a Macbook Pro with 4GB of RAM and it failed. Both running OS 10.4.9. And of course it runs just lovely on my Sun workstation with Solaris. Thus, I'm thinking it's an Intel/X86 issue! How the heck to I get past this problem in Linux on the X86 plateform??? Thanks, -brian [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][BUG] Fix possible NULL pointer access in 8250 serial driver
Hi, I encountered the following kernel panic. The cause of this problem was NULL pointer access in check_modem_status() in 8250.c. I confirmed this problem is fixed by the attached patch, but I don't know this is the correct fix. sadc[4378]: NaT consumption 2216203124768 [1] Modules linked in: binfmt_misc dm_mirror dm_mod thermal processor fan container button sg e100 eepro100 mii ehci_hcd ohci_hcd Pid: 4378, CPU 0, comm: sadc psr : 1210085a2010 ifs : 8289 ip : [] Not tainted ip is at check_modem_status+0xf1/0x360 unat: pfs : 0289 rsc : 0003 rnat: 8000cc18 bsps: pr : 00aa6a99 ldrs: ccv : fpsr: 0009804c8a70033f csd : ssd : b0 : a00100481fb0 b6 : a001004822e0 b7 : a00100477f20 f6 : 1003e f7 : 0ffdba200 f8 : 100018000 f9 : 10002a000 f10 : 0fffdc8c0 f11 : 1003e r1 : a00100b9af40 r2 : 0008 r3 : a00100ad4e21 r8 : 00bb r9 : 0001 r10 : r11 : a00100ad4d58 r12 : e37b7df0 r13 : e37b r14 : 0001 r15 : 0018 r16 : a00100ad4d6c r17 : r18 : r19 : r20 : a0010099bc88 r21 : 00bb r22 : 00bb r23 : c003fc0ff3fe r24 : c003fc00 r25 : 000ff3fe r26 : a001009b7ad0 r27 : 0001 r28 : a001009b7ad8 r29 : r30 : a001009b7ad0 r31 : a001009b7ad0 Call Trace: [] show_stack+0x40/0xa0 sp=e37b7810 bsp=e37b1118 [] show_regs+0x840/0x880 sp=e37b79e0 bsp=e37b10c0 [] die+0x1c0/0x2c0 sp=e37b79e0 bsp=e37b1078 [] die_if_kernel+0x50/0x80 sp=e37b7a00 bsp=e37b1048 [] ia64_fault+0x11e0/0x1300 sp=e37b7a00 bsp=e37b0fe8 [] ia64_leave_kernel+0x0/0x280 sp=e37b7c20 bsp=e37b0fe8 [] check_modem_status+0xf0/0x360 sp=e37b7df0 bsp=e37b0fa0 [] serial8250_get_mctrl+0x20/0xa0 sp=e37b7df0 bsp=e37b0f80 [] uart_read_proc+0x250/0x860 sp=e37b7df0 bsp=e37b0ee0 [] proc_file_read+0x1d0/0x4c0 sp=e37b7e10 bsp=e37b0e80 [] vfs_read+0x1b0/0x300 sp=e37b7e20 bsp=e37b0e30 [] sys_read+0x70/0xe0 sp=e37b7e20 bsp=e37b0db0 [] ia64_ret_from_syscall+0x0/0x20 sp=e37b7e30 bsp=e37b0db0 [] __kernel_syscall_via_break+0x0/0x20 sp=e37b8000 bsp=e37b0db0 Thanks, Taku Izumi Fix the possible NULL pointer access in check_modem_status() in 8250.c. The check_modem_status() would access 'info' member of uart_port structure, but it is not initialized before uart_open() is called. The check_modem_status() can be called through /proc/tty/driver/serial before uart_open() is called. Signed-off-by: Kenji Kaneshige <[EMAIL PROTECTED]> Signed-off-by: Taku Izumi <[EMAIL PROTECTED]> --- drivers/serial/8250.c |3 ++- 1 files changed, 2 insertions(+), 1 deletion(-) Index: linux-2.6.21-rc5/drivers/serial/8250.c === --- linux-2.6.21-rc5.orig/drivers/serial/8250.c 2007-03-26 09:14:37.0 +0900 +++ linux-2.6.21-rc5/drivers/serial/8250.c 2007-04-13 12:06:52.0 +0900 @@ -1310,7 +1310,8 @@ { unsigned int status = serial_in(up, UART_MSR); - if (status & UART_MSR_ANY_DELTA && up->ier & UART_IER_MSI) { + if (status & UART_MSR_ANY_DELTA && up->ier & UART_IER_MSI && + up->port.info != NULL) { if (status & UART_MSR_TERI) up->port.icount.rng++; if (status & UART_MSR_DDSR)
[PATCH 001 of 2] knfsd: Use a spinlock to protect sk_info_authunix
sk_info_authunix is not being protected properly so the object that it points to can be cache_put twice, leading to corruption. We borrow svsk->sk_defer_lock to provide the protection. We should probably rename that lock to have a more generic name - later. Thanks to Gabriel for reporting this. Cc: Greg Banks <[EMAIL PROTECTED]> Cc: Gabriel Barazer <[EMAIL PROTECTED]> Signed-off-by: Neil Brown <[EMAIL PROTECTED]> ### Diffstat output ./net/sunrpc/svcauth_unix.c | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff .prev/net/sunrpc/svcauth_unix.c ./net/sunrpc/svcauth_unix.c --- .prev/net/sunrpc/svcauth_unix.c 2007-04-17 11:42:14.0 +1000 +++ ./net/sunrpc/svcauth_unix.c 2007-04-17 11:42:21.0 +1000 @@ -383,7 +383,10 @@ void svcauth_unix_purge(void) static inline struct ip_map * ip_map_cached_get(struct svc_rqst *rqstp) { - struct ip_map *ipm = rqstp->rq_sock->sk_info_authunix; + struct ip_map *ipm; + struct svc_sock *svsk = rqstp->rq_sock; + spin_lock_bh(>sk_defer_lock); + ipm = svsk->sk_info_authunix; if (ipm != NULL) { if (!cache_valid(>h)) { /* @@ -391,12 +394,14 @@ ip_map_cached_get(struct svc_rqst *rqstp * remembered, e.g. by a second mount from the * same IP address. */ - rqstp->rq_sock->sk_info_authunix = NULL; + svsk->sk_info_authunix = NULL; + spin_unlock_bh(>sk_defer_lock); cache_put(>h, _map_cache); return NULL; } cache_get(>h); } + spin_unlock_bh(>sk_defer_lock); return ipm; } @@ -405,9 +410,15 @@ ip_map_cached_put(struct svc_rqst *rqstp { struct svc_sock *svsk = rqstp->rq_sock; - if (svsk->sk_sock->type == SOCK_STREAM && svsk->sk_info_authunix == NULL) - svsk->sk_info_authunix = ipm; /* newly cached, keep the reference */ - else + spin_lock_bh(>sk_defer_lock); + if (svsk->sk_sock->type == SOCK_STREAM && + svsk->sk_info_authunix == NULL) { + /* newly cached, keep the reference */ + svsk->sk_info_authunix = ipm; + ipm = NULL; + } + spin_unlock_bh(>sk_defer_lock); + if (ipm) cache_put(>h, _map_cache); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 002 of 2] knfsd: Rename sk_defer_lock to sk_lock
Now that sk_defer_lock protects two different things, make the name more generic. Also don't bother with disabling _bh as the lock is only ever taken from process context. Signed-off-by: Neil Brown <[EMAIL PROTECTED]> ### Diffstat output ./include/linux/sunrpc/svcsock.h |3 ++- ./net/sunrpc/svcauth_unix.c | 10 +- ./net/sunrpc/svcsock.c | 13 +++-- 3 files changed, 14 insertions(+), 12 deletions(-) diff .prev/include/linux/sunrpc/svcsock.h ./include/linux/sunrpc/svcsock.h --- .prev/include/linux/sunrpc/svcsock.h2007-04-17 11:42:13.0 +1000 +++ ./include/linux/sunrpc/svcsock.h2007-04-17 11:42:26.0 +1000 @@ -37,7 +37,8 @@ struct svc_sock { atomic_tsk_reserved;/* space on outq that is reserved */ - spinlock_t sk_defer_lock; /* protects sk_deferred */ + spinlock_t sk_lock;/* protects sk_deferred and +* sk_info_authunix */ struct list_headsk_deferred;/* deferred requests that need to * be revisted */ struct mutexsk_mutex; /* to serialize sending data */ diff .prev/net/sunrpc/svcauth_unix.c ./net/sunrpc/svcauth_unix.c --- .prev/net/sunrpc/svcauth_unix.c 2007-04-17 11:42:21.0 +1000 +++ ./net/sunrpc/svcauth_unix.c 2007-04-17 11:42:26.0 +1000 @@ -385,7 +385,7 @@ ip_map_cached_get(struct svc_rqst *rqstp { struct ip_map *ipm; struct svc_sock *svsk = rqstp->rq_sock; - spin_lock_bh(>sk_defer_lock); + spin_lock(>sk_lock); ipm = svsk->sk_info_authunix; if (ipm != NULL) { if (!cache_valid(>h)) { @@ -395,13 +395,13 @@ ip_map_cached_get(struct svc_rqst *rqstp * same IP address. */ svsk->sk_info_authunix = NULL; - spin_unlock_bh(>sk_defer_lock); + spin_unlock(>sk_lock); cache_put(>h, _map_cache); return NULL; } cache_get(>h); } - spin_unlock_bh(>sk_defer_lock); + spin_unlock(>sk_lock); return ipm; } @@ -410,14 +410,14 @@ ip_map_cached_put(struct svc_rqst *rqstp { struct svc_sock *svsk = rqstp->rq_sock; - spin_lock_bh(>sk_defer_lock); + spin_lock(>sk_lock); if (svsk->sk_sock->type == SOCK_STREAM && svsk->sk_info_authunix == NULL) { /* newly cached, keep the reference */ svsk->sk_info_authunix = ipm; ipm = NULL; } - spin_unlock_bh(>sk_defer_lock); + spin_unlock(>sk_lock); if (ipm) cache_put(>h, _map_cache); } diff .prev/net/sunrpc/svcsock.c ./net/sunrpc/svcsock.c --- .prev/net/sunrpc/svcsock.c 2007-04-17 11:42:13.0 +1000 +++ ./net/sunrpc/svcsock.c 2007-04-17 11:42:26.0 +1000 @@ -53,7 +53,8 @@ * svc_serv->sv_lock protects sv_tempsocks, sv_permsocks, sv_tmpcnt. * when both need to be taken (rare), svc_serv->sv_lock is first. * BKL protects svc_serv->sv_nrthread. - * svc_sock->sk_defer_lock protects the svc_sock->sk_deferred list + * svc_sock->sk_lock protects the svc_sock->sk_deferred list + * and the ->sk_info_authunix cache. * svc_sock->sk_flags.SK_BUSY prevents a svc_sock being enqueued multiply. * * Some flags can be set to certain values at any time @@ -1625,7 +1626,7 @@ static struct svc_sock *svc_setup_socket svsk->sk_server = serv; atomic_set(>sk_inuse, 1); svsk->sk_lastrecv = get_seconds(); - spin_lock_init(>sk_defer_lock); + spin_lock_init(>sk_lock); INIT_LIST_HEAD(>sk_deferred); INIT_LIST_HEAD(>sk_ready); mutex_init(>sk_mutex); @@ -1849,9 +1850,9 @@ static void svc_revisit(struct cache_def dprintk("revisit queued\n"); svsk = dr->svsk; dr->svsk = NULL; - spin_lock_bh(>sk_defer_lock); + spin_lock(>sk_lock); list_add(>handle.recent, >sk_deferred); - spin_unlock_bh(>sk_defer_lock); + spin_unlock(>sk_lock); set_bit(SK_DEFERRED, >sk_flags); svc_sock_enqueue(svsk); svc_sock_put(svsk); @@ -1917,7 +1918,7 @@ static struct svc_deferred_req *svc_defe if (!test_bit(SK_DEFERRED, >sk_flags)) return NULL; - spin_lock_bh(>sk_defer_lock); + spin_lock(>sk_lock); clear_bit(SK_DEFERRED, >sk_flags); if (!list_empty(>sk_deferred)) { dr = list_entry(svsk->sk_deferred.next, @@ -1926,6 +1927,6 @@ static struct svc_deferred_req *svc_defe list_del_init(>handle.recent); set_bit(SK_DEFERRED, >sk_flags); } - spin_unlock_bh(>sk_defer_lock); + spin_unlock(>sk_lock); return dr; }
[PATCH 000 of 2] knfsd: Close oopsable race in nfsd
Following two patches fix a bug introduced in 7b2b1fee30df7e2165525cd03f7d1d01a3a56794 and hence is in 2.6.19 and later. The first patch is a minimal fix which is suitable for all kernels since 2.6.19-pre1. The second adds some consequent cleaning up and is probably best left for 2.6.22-rc (and so it not being cc:ed to [EMAIL PROTECTED]). Thanks, NeilBrown [PATCH 001 of 2] knfsd: Use a spinlock to protect sk_info_authunix [PATCH 002 of 2] knfsd: Rename sk_defer_lock to sk_lock - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [AppArmor 31/41] Fix __d_path() for lazy unmounts and make it unambiguous; exclude unreachable mount points from /proc/mounts
On Monday 16 April 2007 23:57, Alan Cox wrote: > I don't believe the existing behaviour _IS_ a mistake. So what would be the arguments why this behavior makes sense, other than legacy? For /proc/mounts, one could argue that the admin might want to see everything, but then that's not actually true even today because /proc/mounts doesn't show lazyily unmounted stuff or mounts from other namespaces, so that everything is quite relative. Along that line of argumentation, I would at least expect unambiguous output, to be able to tell which mountpoints are actually meaningful to the requesting process. It's not only human operators looking at /proc/mounts; applications care as well. But after thinking about this issue quite a while, I really can't see what that should be good for. The current /proc/mounts interface is obviously broken; the chroot example should have demonstrated that. There are also unnecessary special cases because of that, such as having to filter out the rootfs entry when trying to figure out what's really mounted on /, and having to guess what's there and what's not in a particular context. The more complex mount scenarios will get, the more obviously broken the current /proc/mounts interface will become. The getcwd() case is even stronger as the "see everything" argument makes even less sense there. I really can't see why the kernel should return processes fake pathnames. The process is explicitly asking for the current pathname to the working directory, it doesn't want to know what the pathname was at some previous point in time. > > Process can access file descriptors which are unreachable via path name > > just fine indeed, but those fds still don't have a valid path in the > > context of that process. > > Which while problematic to your name based security is just fine to > everything else. Actually, no. We could live fine with leaving getcwd() and /proc/mounts as ambiguous / weird / broken as they are right now. All it would take would be to reambiguate the result of the unambiguous __d_path(), which is really easy. Everything that cares about real pathnames would use the unambiguous version while the legacy interfaces would use the ambiguous version. But that really wouldn't make sense. > Ok, providing the "real" root sees them all it isn't so bad, but to > assume you can filter based upon what the task can see is dodgy as an > assumption. Why? Thanks, Andreas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Chris Friesen wrote: William Lee Irwin III wrote: The sorts of like explicit decisions I'd like to be made for these are: (1) In a mixture of tasks with varying nice numbers, a given nice number corresponds to some share of CPU bandwidth. Implementations should not have the freedom to change this arbitrarily according to some intention. The first question that comes to my mind is whether nice levels should be linear or not. No. That squishes one end of the table too much. It needs to be (approximately) piecewise linear around nice == 0. Here's the mapping I use in my entitlement based schedulers: #define NICE_TO_LP(nice) ((nice >=0) ? (20 - (nice)) : (20 + (nice) * (nice))) It has the (good) feature that a nice == 19 task has 1/20th the entitlement of a nice == 0 task and a nice == -20 task has 21 times the entitlement of a nice == 0 task. It's not strictly linear for negative nice values but is very cheap to calculate and quite easy to invert if necessary. I would lean towards nonlinear as it allows a wider range (although of course at the expense of precision). Maybe something like "each nice level gives X times the cpu of the previous"? I think a value of X somewhere between 1.15 and 1.25 might be reasonable. What about also having something that looks at latency, and how latency changes with niceness? What about specifying the timeframe over which the cpu bandwidth is measured? I currently have a system where the application designers would like it to be totally fair over a period of 1 second. Have you tried the spa_ebs scheduler? The half life is no longer a run time configurable parameter (as making it highly adjustable results in less efficient code) but it could be adjusted to be approximately equivalent to 0.5 seconds by changing some constants in the code. As you can imagine, mainline doesn't do very well in this case. You should look back through the plugsched patches where many of these ideas have been experimented with. Peter -- Peter Williams [EMAIL PROTECTED] "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Problem with ufs nextstep in 2.6.18 (debian)
On Mon, Apr 16, 2007 at 05:04:22PM +0100, Dale Amon wrote: > On Mon, Apr 16, 2007 at 11:32:04AM +0400, Evgeniy Dushistov wrote: > > >The error also happens in 2.6.19, same as in 2.6.18. > > >I extracted this from syslog: > > >Apr 17 00:14:15 kdev kernel: UFS-fs error (device loop0): > > >ufs_check_page: bad entry > > > > Is this happened also with this patch: > > http://lkml.org/lkml/diff/2007/2/5/75/1 > > Thanks. I will try that out tonight GMT. Which kernel > is that against? Will it work against a 2.6.19 or should > I get a 2.6.20 and work with that? Hmmm... looks like that patch is already applied in a 2.6.20.7? I will try that. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Repair-driven file system design (was Re: ZFS with Linux: An Open Plea)
On Mon, Apr 16, 2007 at 03:34:42PM -0700, Valerie Henson wrote: > On Mon, Apr 16, 2007 at 01:07:05PM +1000, David Chinner wrote: > > On Sun, Apr 15, 2007 at 08:50:25PM -0400, Rik van Riel wrote: > > > > > IMHO chunkfs could provide a much more promising approach. > > > > Agreed, that's one method of compartmentalising the problem. > > Agreed, the chunkfs design is only one way to implement repair-driven > file system design - designing your file system to make file system > check and repair fast and easy. I've written a paper on this idea, > which includes some interesting projections estimating that fsck will > take 10 times as long on the 2013 equivalent of a 2006 file system, > due entirely to changes in disk hardware. That's assuming that repair doesn't get any more efficient. ;) > So if your server currently > takes 2 hours to fsck, an equivalent server in 2013 will take about 20 > hours. Eek! Paper here: > > http://infohost.nmt.edu/~val/review/repair.pdf > > While I'm working on chunkfs, I also think that all file systems > should strive for repair-driven design. XFS has already made big > strides in this area (multi-threading fsck for multi-disk file > systems, for example) and I'm excited to see what comes next. Two steps forward, one step back. We found that our original approach to multithreading doesn't always work, and doesn't work at all for single disks. Under some test cases, it goes *much* slower due to increased seeking of the disks. This patch from the folk at Agami: http://oss.sgi.com/archives/xfs/2007-01/msg00135.html used a different threading approach to speeding up the repair process - it basically did object path walking in separate threads to prime the block device page cache so that when the real repair thread needed the block it came from the blockdev cache rather than from disk. This sped up several phases of the repair process because of re-reads needed in the different phases. What we found interesting about this approach is that it showed that prefetching gave as good or better results than simple parallelisation with a rudimentary caching system. In most cases it was superior (lower runtime) to the existing multithreaded xfs_repair. However, the Agami object based prefetch does not speed up phase 3 on a single disk - like strided AG parallelism it increases disk seeks and, as we discovered, causes lots of little backwards seeks to occur. It also performs very poorly when there is not enough memory to cache sufficient objects in the block dev cache (whose size cannot be controlled). It sped things up by using prefetch to speed up (repeated) I/O, not by using intelligent caching. However, this patch has been very instructive on how we could further improve the threading of xfs_repair - intelligent prefetch is better than simple parallelism (from the Agami patch), caching is far better than rereading (from the SGI repair level caching) and that prefetching complements simple parallelism on volumes that can take advantage of it. We've ended up combining a threaded, two phase object walking prefetch with spatial analysis of the inode and object layouts and integration into a smarter internal cache. This cache is now similar to the xfs_buf cache in the kernel and uses direct I/O so if you have enough memory you only need to read objects from disk once. Spatial analysis of the metadata is used to determine the relative density of the metadata in an area of disk before we read it. Using a density function, we determine if we want to do lots of small I/Os or one large I/O to read the entire region in one go and then split it up in memory. Hence as metadata density increases, the number of I/Os decrease and we pull enough data in to (hopefully) keep the CPUs busy. We still walk objects, but any blocks behind where we are currently reading go into a secondary I/O queue to be issued later. Hence we keep moving in one direction across the disk. Once the first pass is complete, we then do the same analysis on the secondary list and run that I/O all in a single pass across the disk. This is effectively a result of observing that repair is typically seek bound and only using 2-3MB/s of the bandwidth a disk has to offer. Where metadata density is high, we are now seeing luns max out on bandwidth rather than being seek bound. Effectively we are hiding latency by using more bandwidth and that is a good tradeoff to make for a seek bound app The result of this is that even on single disks the reading of all the metadata goes faster with this multithreaded prefetch model. A full 250GB SATA disk with a clean filesystem containing ~1.6 million inodes is now taking less than 5 minutes to repair. A 5.5TB RAID5 volume with 30 million inodes is now taking about 4.5 minutes to repair instead of 20 minutes. We're currently creating a multi-hundred million inode filesystem to determine scalability to the current bleeding edge. One thing this makes me consider is
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On Monday 16 April 2007 07:47, Davide Libenzi wrote: > On Mon, 16 Apr 2007, Pavel Pisa wrote: > > I cannot help myself to not report results with GAVL > > tree algorithm there as an another race competitor. > > I believe, that it is better solution for large priority > > queues than RB-tree and even heap trees. It could be > > disputable if the scheduler needs such scalability on > > the other hand. The AVL heritage guarantees lower height > > which results in shorter search times which could > > be profitable for other uses in kernel. > > > > GAVL algorithm is AVL tree based, so it does not suffer from > > "infinite" priorities granularity there as TR does. It allows > > use for generalized case where tree is not fully balanced. > > This allows to cut the first item withour rebalancing. > > This leads to the degradation of the tree by one more level > > (than non degraded AVL gives) in maximum, which is still > > considerably better than RB-trees maximum. > > > > http://cmp.felk.cvut.cz/~pisa/linux/smart-queue-v-gavl.c > > Here are the results on my Opteron 252: > > Testing N=1 > gavl_cfs = 187.20 cycles/loop > CFS = 194.16 cycles/loop > TR = 314.87 cycles/loop > CFS = 194.15 cycles/loop > gavl_cfs = 187.15 cycles/loop > > Testing N=2 > gavl_cfs = 268.94 cycles/loop > CFS = 305.53 cycles/loop > TR = 313.78 cycles/loop > CFS = 289.58 cycles/loop > gavl_cfs = 266.02 cycles/loop > > Testing N=4 > gavl_cfs = 452.13 cycles/loop > CFS = 518.81 cycles/loop > TR = 311.54 cycles/loop > CFS = 516.23 cycles/loop > gavl_cfs = 450.73 cycles/loop > > Testing N=8 > gavl_cfs = 609.29 cycles/loop > CFS = 644.65 cycles/loop > TR = 308.11 cycles/loop > CFS = 667.01 cycles/loop > gavl_cfs = 592.89 cycles/loop > > Testing N=16 > gavl_cfs = 686.30 cycles/loop > CFS = 807.41 cycles/loop > TR = 317.20 cycles/loop > CFS = 810.24 cycles/loop > gavl_cfs = 688.42 cycles/loop > > Testing N=32 > gavl_cfs = 756.57 cycles/loop > CFS = 852.14 cycles/loop > TR = 301.22 cycles/loop > CFS = 876.12 cycles/loop > gavl_cfs = 758.46 cycles/loop > > Testing N=64 > gavl_cfs = 831.97 cycles/loop > CFS = 997.16 cycles/loop > TR = 304.74 cycles/loop > CFS = 1003.26 cycles/loop > gavl_cfs = 832.83 cycles/loop > > Testing N=128 > gavl_cfs = 897.33 cycles/loop > CFS = 1030.36 cycles/loop > TR = 295.65 cycles/loop > CFS = 1035.29 cycles/loop > gavl_cfs = 892.51 cycles/loop > > Testing N=256 > gavl_cfs = 963.17 cycles/loop > CFS = 1146.04 cycles/loop > TR = 295.35 cycles/loop > CFS = 1162.04 cycles/loop > gavl_cfs = 966.31 cycles/loop > > Testing N=512 > gavl_cfs = 1029.82 cycles/loop > CFS = 1218.34 cycles/loop > TR = 288.78 cycles/loop > CFS = 1257.97 cycles/loop > gavl_cfs = 1029.83 cycles/loop > > Testing N=1024 > gavl_cfs = 1091.76 cycles/loop > CFS = 1318.47 cycles/loop > TR = 287.74 cycles/loop > CFS = 1311.72 cycles/loop > gavl_cfs = 1093.29 cycles/loop > > Testing N=2048 > gavl_cfs = 1153.03 cycles/loop > CFS = 1398.84 cycles/loop > TR = 286.75 cycles/loop > CFS = 1438.68 cycles/loop > gavl_cfs = 1149.97 cycles/loop > > > There seem to be some difference from your numbers. This is with: > > gcc version 4.1.2 > > and -O2. But then and Opteron can behave quite differentyl than a Duron on > a bench like this ;) Thanks for testing, but yours numbers are more correct than my first report. My numbers seemed to be over-optimistic even to me, In the fact I have been surprised that difference is so high. But I have tested bad version of code without GAVL_FAFTER option set. The code pushed to the web page has been the correct one. I have not get to look into case until now because I have busy day to prepare some Linux based labs at university. Without GAVL_FAFTER option, insert operation does fail if item with same key is already inserted (intended feature of the code) and as result of that, not all items have been inserted in the test. The meaning of GAVL_FAFTER is find/insert after all items with the same key value. Default behavior is operate on unique keys in tree and reject duplicates. My results are even worse for GAVL than yours. It is possible to try tweak code and optimize it more (likely/unlikely/do not keep last ptr etc) for this actual usage. May it be, that I try this exercise, but I do not expect that the result after tuning would be so much better, that it would outweight some redesign work. I could see some advantages of AVL still, but it has its own drawbacks with need of separate height field and little worse delete in the middle timing. So excuse me for disturbance. I have been only curious how GAVL code would behave in the comparison of other algorithms and I did not kept my premature enthusiasm under the lock. Best wishes Pavel Pisa ./smart-queue-v-gavl -n 4 gavl_cfs = 279.02 cycles/loop CFS = 200.87 cycles/loop TR = 229.55 cycles/loop CFS = 201.23 cycles/loop gavl_cfs = 276.08 cycles/loop ./smart-queue-v-gavl -n 8 gavl_cfs = 310.92 cycles/loop CFS = 288.45 cycles/loop TR = 192.46 cycles/loop CFS
Re: [linux-usb-devel] [PATCH] hid: hid bus prototype 20070416
Jiri Kosina wrote: > On Mon, 16 Apr 2007, Li Yu wrote: > > >> HID bus prototype 20070416 >> > > Hi Li, > > thanks for taking care. Well, the patch is quite huge, do you think you > could split it into separate independent parts (use quilt or something > similar for patch management) which could be reviewed independently? > > As the code changes are often quite non-trivial, layering is changed, > lots of files are touched, etc. it would help a lot. > > OK, I must be next. > Notes from a quick skim through the patch: > > - it seems that you accidentaly deleted the newly added quirk for >mightymouse in the bluetooth hid code? > They should be lost while I play bluetooth. > - what is the point behind HID_QUIRK_SKIP? Why doesn't HID_QUIRK_IGNORE >suffice? And why is it defined in so strange way: > > @@ -270,6 +271,7 @@ struct hid_item { > #define HID_QUIRK_LOGITECH_DESCRIPTOR 0x0010 > #define HID_QUIRK_DUPLICATE_USAGES 0x0020 > #define HID_QUIRK_RESET_LEDS 0x0040 > +#define HID_QUIRK_SKIP 0x8000 > > I am sorry for missing some description here. In simple words, the HID_QUIRK_IGNORE let usbhid do not register some hid devices at all, however, the HID_QUIRK_SKIP just let usbhid skip matching with the device which is marked that quirk, but still register this kind of hid device. So another HID driver still handle have chances to handle it. You can discover out there, the hid-core.c is not pure HID driver, it also take the transports role here. I think this quirk have such difference with others, so I define it so. If we do like so, just change it. That is OK. May be, we need another hid_skiplist[] ? > - there are bunches of some easy codingstyle issues (spaces around '=', >etc) > > Yes, This is one of reasons for it is only for review. > But doing really thorough review is quite hard, as the patch contains lots > of unrelated changes. I'll look at it a little bit more, but when you send > a broken-out version with separate changes, that'd be great. > > OK. > Thanks, > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Chris Friesen wrote: Peter Williams wrote: To my mind scheduling and load balancing are orthogonal and keeping them that way simplifies things. Scuse me if I jump in here, but doesn't the load balancer need some way to figure out a) when to run, and b) which tasks to pull and where to push them? Yes but both of these are independent of the scheduler discipline in force. I suppose you could abstract this into a per-scheduler API, but to me at least these are the hard parts of the load balancer... Load balancing needs to be based on the static priorities (i.e. nice or real time priority) of the runnable tasks not the dynamic priorities. If the load balancer manages to keep the weighted (according to static priority) load and distribution of priorities within the loads on the CPUs roughly equal and the scheduler does a good job of ensuring fairness, interactive responsiveness etc. for the tasks within a CPU then the result will be good system performance within the constraints set by the sys admins use of real time priorities and nice. The smpnice modifications to the load balancer were meant to give it the appropriate behaviour and what we need to fix now is the intra CPU scheduling. Even if the load balancer isn't yet perfect perfecting it can be done separately to fixing the scheduler preferably with as little interdependency as possible. Probably the only contribution to load balancing that the scheduler really needs to make is the calculating of the average weighted load on each of the CPUs (or run queues if there's more than one CPU per runqueue). Peter -- Peter Williams [EMAIL PROTECTED] "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair
Al Boldi wrote: Peter Williams wrote: Al Boldi wrote: Reducing the prio-level granularity may also be helpful; Because of some of the bit operations code makes it a bad idea to have more than 160 priority levels, you're more or less limited to 60 priority levels for SCHED_OTHER tasks (as 100 are used for real time) and you need 40 of these to pay some attention to niceness leaving you about 20 priority levels to use for fiddling. Is that enough? With spa_ebs (now that CPU rate caps have been removed), you have all 60 priorities available for fiddling with as niceness is taken care of when calculating each task's entitlement. Ok, increasing the number of prio-levels is one thing, but I was more thinking of reducing the effective difference between each prio-level. For example, this would allow max_tpt_bonus=18, while the effective range would be 3, thus reducing granularity. Would this be easily introduceable? OK. Now (I think) I see what you mean. I think that you could achieve this effect by shortening the promotion interval which I think is still one of the tunables. This effectively controls the strength of priority levels -- short promotion intervals weaken and long promotion intervals strengthen the effect of different priority levels. Peter -- Peter Williams [EMAIL PROTECTED] "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [CRYPTO] is it really optimized ?
On Mon, Apr 16, 2007 at 10:37:01AM +0200, Francis Moreau wrote: > > BTW, here are figures I got with 2 different versions of the driver > when using tcrypt module. The second being the result with the > optimized driver (no key reloading on each block): > > normal version: > test 4 (128 bit key, 8192 byte blocks): 1 operation in 67991 cycles (8192 > bytes) > > optimized version: > test 4 (128 bit key, 8192 byte blocks): 1 operation in 51783 cycles (8192 > bytes) > > So the gain is 16000 cycles which seems to worth the change, isn't it ? Sounds like it would. It would help of course if you posted the patch :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: AppArmor FAQ
On Mon, 16 Apr 2007, John Johansen wrote: > Label-based security (exemplified by SELinux, and its predecessors in > MLS systems) attaches security policy to the data. As the data flows > through the system, the label sticks to the data, and so security > policy with respect to this data stays intact. This is a good approach > for ensuring secrecy, the kind of problem that intelligence agencies have. Labels are also a good approach for ensuring integrity, which is one of the most fundamental aspects of the security model implemented by SELinux. Some may infer otherwise from your document. > Pathname-based security (exemplified in AppArmor, and its predecessor > Janus http://www.cs.berkeley.edu/~daw/janus/ and other systems like > Systrace http://www.citi.umich.edu/u/provos/systrace/ ) attach security > policy to the name of the data. > > Controlling access to filenames is important because applications > primarily use those names to access the files behind them, and they > depend on getting to the right files. For example, login(1) expects > /etc/passwd to resolve to a valid list of user accounts. And it should, but alas may instead find otherwise due to namespace manipulation, object aliasing (e.g. symlinks), application error, configuration error, corrupted files, corrupted filesystems, misbehavior due to malware infection or various forms user error. A pathname tells you nothing reliable about the security properties of the object its pointing to. It is simply a mechanism for locating and referring to an object. > In the traditional UNIX model, files do have names but not labels, and > applications only operate in terms of those names. Just to be clear (as the above conflates two distinct notions): applications under SELinux still use pathnames for locating and referring to files. SELinux security is enforced within the kernel, and an application which does not have permission to access an object will simply receive an error using the standard Unix mechanisms already used for DAC. For example, a write(2) might fail with an EACCES error code. The pathname used by an application to access an object has _nothing_ to do with the security attributes of the object. Traditional Unix security in fact does not primarily depend on pathnames, but on DAC ownership and permission attributes stored in the file's inode. DAC is of course a form of labeled security. Imagine if you were re-inventing Unix and decided to implement pathname security for DAC instead of inode labeling. What you would have is a more generalized version of apparmor, with the DAC attributes of pathnames for the entire filesystem stored in a text database with an in-kernel regex engine performing path reconstruction and pattern matching on every file access. Sound like a good idea? I hope not. How about an analogy: think of kernel objects which are protected by locks. Do you lock the path to the object or do you lock the object itself? > Pathname-based security puts more emphasis on the integrity of the > system, making secrecy the secondary goal that follows. This assertion is being made without any supporting evidence or rationale. If you're comparing pathname vs. label security, then it is clear that direct object labeling allows the security attributes of the system to be specified completely and unambiguously, whereas integrity enforced via pathnames alone requires several constraints to be applied to the goals of the policy. So, it seems to me that the opposite of what you say is more correct, although it is a fairly oblique argument to start with. More significant to note is that Type Enforcement was designed specifically to address integrity requirements, in response to the limitations of the early MLS models which were focused on confidentiality. See: "A Practical Alternative to Hierarchical Integrity Policies" Boebert & Kain, Proceedings of the Eighth National Computer Security Conference, 1985. "Meeting Critical Security Objectives with Security-Enhanced Linux" http://www.nsa.gov/selinux/papers/ottawa01/index.html Or pretty much any paper on the design of SELinux or Flask. Integrity control is a foundational aspect of TE, Flask and SELinux. I've never understood why AppArmor presentations tend to so bizarrely suggest the opposite. > Caveat: Both label-based security and pathname-based security can > provide both secrecy and integrity protection, the above discussion is > only about which model makes it easier to provide which kind of security. I don't see how you've established anything in this regard. > We acknowledge that not all objects on a UNIX system are paths, and we > agree that there is value in also protecting non-path resources. > Contrary to popular belief, AppArmor is *not* "Pathnames R Us", but > rather "Use native abstractions to mediate stuff": when you mediate > something, you should use the native syntax that users normally use to > access the
Re: PROBLEM: kernel 2.6.20.6 build failed for ppc board chestnut(ibm ppc 750GX/FX)
On Mon, Apr 16, 2007 at 01:13:01PM +0800, Wang, Baojun wrote: > PROBLEM: linux kernel 2.6.20.6 build failed for ppc board chestnut(ibm ppc > 750GX/FX) > Confirmed. arch/ppc isn't getting much love these days. > this brute force patch sould solve the problem: This is missing a Signed-off-by: line. > diff -Nru /tmp/linux-2.6.20.6/arch/ppc/platforms/chestnut.c \ > linux-2.6.20.6/arch/ppc/platforms/chestnut.c > > --- /tmp/linux-2.6.20.6/arch/ppc/platforms/chestnut.c 2007-04-07 > 04:02:48.0 +0800 > +++ linux-2.6.20.6/arch/ppc/platforms/chestnut.c2007-04-13 > 17:09:03.0 +0800 > @@ -432,7 +432,9 @@ > ptbl.name = "User FS"; > ptbl.size = CHESTNUT_32BIT_SIZE; > > - physmap_map.size = CHESTNUT_32BIT_SIZE; > + // physmap_map.size = CHESTNUT_32BIT_SIZE; Just remove this completely. It's not needed any longer. > + physmap_configure(CHESTNUT_32BIT_BASE, CHESTNUT_32BIT_SIZE, > CONFIG_MTD_PHYSMAP_BANKWIDTH, NULL); Technically, this call isn't needed. The chestnut_defconfig already provides the correct variables. josh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Mike Galbraith wrote: On Sun, 2007-04-15 at 13:27 +1000, Con Kolivas wrote: On Saturday 14 April 2007 06:21, Ingo Molnar wrote: [announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS] i'm pleased to announce the first release of the "Modular Scheduler Core and Completely Fair Scheduler [CFS]" patchset: http://redhat.com/~mingo/cfs-scheduler/sched-modular+cfs.patch This project is a complete rewrite of the Linux task scheduler. My goal is to address various feature requests and to fix deficiencies in the vanilla scheduler that were suggested/found in the past few years, both for desktop scheduling and for server scheduling workloads. The casual observer will be completely confused by what on earth has happened here so let me try to demystify things for them. [...] Demystify what? The casual observer need only read either your attempt at writing a scheduler, or my attempts at fixing the one we have, to see that it was high time for someone with the necessary skills to step in. Make that "someone with the necessary clout". Now progress can happen, which was _not_ happening before. This is true. Peter -- Peter Williams [EMAIL PROTECTED] "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Major qla2xxx regression on sparc64
From: Andrew Vasquez <[EMAIL PROTECTED]> Date: Mon, 16 Apr 2007 16:47:05 -0700 > Dave, according to your earlier emails, the qla2xxx driver worked > 'fine' in driver versions before commit > 7aef45ac92f49e76d990b51b7ecd714b9a608be1. If that were the case, then > you would have seen the warning messages: > > ... > qla_printk(KERN_WARNING, ha, "Falling back to functioning (yet " > "invalid -- WWPN) defaults.\n"); I have in fact seen the message several times and that messages gives me no reason to believe something needs to be fixed. It should have said "PLEASE REPORT THIS to [EMAIL PROTECTED]" or something similar to indicate the severity better. "An invalid WWPN, what's that?" said the user. :) How about "FC IDs may conflict and cause miscommunication! Please report to driver author so this can be fixed!" or similar? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.21-rc7
Linus Torvalds wrote: > Since we're still waiting for resolution for some regressions that people > weren't able to work on last week, there's a new -rc kernel out there. > Hopefully we'll get them all and I can do 2.6.21-final next weekend or > so.. > The patch to k8.c didn't make it in: cache_k8_northbridges() is storing config values to incorrect locations (in flush_words) and also its overflowing beyond the allocation, causing slab verification failures. Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]> --- arch/x86_64/kernel/k8.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6.21-rc6/arch/x86_64/kernel/k8.c === --- linux-2.6.21-rc6.orig/arch/x86_64/kernel/k8.c 2007-04-05 19:36:56.0 -0700 +++ linux-2.6.21-rc6/arch/x86_64/kernel/k8.c2007-04-13 07:51:57.0 -0700 @@ -61,8 +61,8 @@ int cache_k8_northbridges(void) dev = NULL; i = 0; while ((dev = next_k8_northbridge(dev)) != NULL) { - k8_northbridges[i++] = dev; - pci_read_config_dword(dev, 0x9c, _words[i]); + k8_northbridges[i] = dev; + pci_read_config_dword(dev, 0x9c, _words[i++]); } k8_northbridges[i] = NULL; return 0; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Major qla2xxx regression on sparc64
On Mon, 16 Apr 2007, David Miller wrote: > From: Andrew Vasquez <[EMAIL PROTECTED]> > Date: Mon, 16 Apr 2007 16:28:51 -0700 > > > Sorry, but let's be realistic, this type of warning would have > > *NEVER* been addressed if we kept the status quo > > Wrong. I watch the logs all the time and would have sent you a fix to > use the Sparc firmware info as soon as I saw the kernel log message. Dave, according to your earlier emails, the qla2xxx driver worked 'fine' in driver versions before commit 7aef45ac92f49e76d990b51b7ecd714b9a608be1. If that were the case, then you would have seen the warning messages: ... qla_printk(KERN_WARNING, ha, "Falling back to functioning (yet " "invalid -- WWPN) defaults.\n"); > Anyone who has worked with me over the last 15 years will let you know > emphatically that this is true. > > AND IN THE MEAN TIME I COULD GET WORK DONE AND MY SYSTEM WOULD BOOT! I understand that, and recognize your contribution, that was never in question. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: intermittant petabyte usage reported with broadcom nic
On Mon, Apr 16, 2007 at 12:10:51PM -0700, Michael Chan wrote: > On Sat, 2007-04-14 at 17:20 -0700, Michael Chan wrote: > > > I also like Andi's idea of using change_page_attr() to isolate the > > problem. I'll try to send you a debug patch in the next few days to try > > that out. Thanks. > > Here's the debug patch for x86 only that will change the statistics > memory block to read-only. If the kernel is corrupting it, you should > get a page fault that will crash the system. If you continue to see > bogus counters, it is definitely a firmware or hardware problem. Please > try it and let me know. Thanks. Ahh. Would truly love to but the moment you said 'crash the system' I had to bail. These boxes are in production and as such a crash would be, shall we say, unwelcome. I might be able to fenagle something but I very-much doubt it. Perhaps Jean-Daniel, who is also experiencing this problem and seemingly more frequently then I, has a box that he could run your patch on. I think we both run pretty-much the same hardware (Dell [12]950s). I've CCed him. -- "To the extent that we overreact, we proffer the terrorists the greatest tribute." - High Court Judge Michael Kirby - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Major qla2xxx regression on sparc64
From: Andrew Vasquez <[EMAIL PROTECTED]> Date: Mon, 16 Apr 2007 16:28:51 -0700 > Sorry, but let's be realistic, this type of warning would have > *NEVER* been addressed if we kept the status quo Wrong. I watch the logs all the time and would have sent you a fix to use the Sparc firmware info as soon as I saw the kernel log message. Anyone who has worked with me over the last 15 years will let you know emphatically that this is true. AND IN THE MEAN TIME I COULD GET WORK DONE AND MY SYSTEM WOULD BOOT! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: Bad page state errors during kernel make
Zach Carter wrote: > > Dave Jones wrote: >> On Sun, Apr 15, 2007 at 08:30:27PM -0700, Zach Carter wrote: >> > list_del corruption. prev->next should be c21a4628, but was e21a4628 >> >> 'c' became 'e' in that last address. A single bit flipped. >> Given you've had this for some time, this smells like a hardware problem. >> memtest86+ will probably show up something. > > Hum. I forgot to mention in my report that I had already run thru 10 > clean passes with memtest86+ > > Do you think there might be other bad hw, or another explanation? memtest86 does not really stress everything a real kernel compile would. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Major qla2xxx regression on sparc64
On Mon, 16 Apr 2007, David Miller wrote: > From: Andrew Vasquez <[EMAIL PROTECTED]> > Date: Mon, 16 Apr 2007 15:25:17 -0700 > > > Fine, I'll agree that wacking-users (and > > I'll wager the outliers) with a 2x4 was a bit extreme, > > And that, right there, is basically the end of the conversation. > > You don't do this to users, ever. > Put a big loud kernel log message in there when this situation > presents itself, use as many capital letters and scary language that > you wish. Let them know that if things explode they get to keep the > pieces. > > But at least try to give them something that works when you know that > you can. > > You don't need to make someone's system unbootable in order to make > them aware of a potential problem. It's very anti-social to approach Sorry, but let's be realistic, this type of warning would have *NEVER* been addressed if we kept the status quo -- your modifications to read the wwpn/wwnn would have never been submitted, everybody would have kept going on blistfully ignorant of the issue. Changes such as these are a common Linux upstream idiom... So, meeting in the middle, with the NVRAM bits restored along with some ability for the user to *knowingly* recognize the problem, I take it, is not going to work for you? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: If not readdir() then what?
On Monday April 16, [EMAIL PROTECTED] wrote: > > The challenge is making it be stable across inserts/deletes, never > mind reboots. And it's not a "little bit of cacheing"; in order to be > correct we would have to cache *forever*, since at least in theory an > NFS client could hold on to a cookie for an arbitrarily long period of > time (weeks, months, years, decades), yes? Yes. But I think we've already establish that the on-disk structure chosen by ext3/htree is not able to perfectly support NFS (which is a pity given that it was written for Linux and Linux is thought to support NFS). Our goal is to find the best mapping possible and, where caching can improve stability for real-world uses, use caching to help stabilise that mapping. > > You're welcome to try, but I suspect it won't take long before you'll > see why I'm asserting that a directory fd cache in nfsd is *way* less > work. :-) You have provided some very helpful insights into how ext3/htree currently works - thanks for that. I will definitely make a closer inspection of the code and so how possible it is to realise by ideas. I'll let you know how I go. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Staircase cpu scheduler v17.1
Greetings all Here is the current release of the Staircase cpu scheduler (the original generation I design that spurned development elsewhere for RSDL), for 2.6.21-rc7 http://ck.kolivas.org/patches/pre-releases/2.6.21-rc7/2.6.21-rc7-ck1/patches/sched-staircase-17.1.patch To remind people where this cpu scheduler fits into the picture: -It is purpose built with interactivity first and foremost. -It aims to be mostly fair most of the time -It is has strong semantics desribing the cpu relationship between different nice levels (nice 19 is 1/20th the cpu of nice 0). -It is resistant to most forms of starvation -Latency of tasks that are not heavily cpu bound is exceptionally low irrespective of nice level -if they stay within their cpu bounds; What this means is you can have and audio application if it uses very little cpu running at nice 19 and it will still be unlikely to skip audio in the presence of a kernel compile nice -20. -Therefore you can renice X or whatever to your heart's content, but then... you don't need to renice X with this design. -The design is a single priority array very low overhead small codebase (the diffstat summary obviously muddied by removing more comments is 4 files changed, 418 insertions(+), 714 deletions(-)) 4 files changed, 418 insertions(+), 714 deletions(-) Disadavantages: -There are heuristics -There are some rare cpu usage patterns that can lead to excessive unfairness and relative starvation. Bonuses: With the addition of further patches in that same directory above it has: - An interactive tunable flag which further increases the fairness and makes nice values more absolutely determine latency (instead of cpu usage vs entitlement determining latency as the default above) /proc/sys/kernel/interactive - A compute tunable which makes timeslices much longer and has delayed preemption for maximum cpu cache utilisation for compute intensive workloads /proc/sys/kernel/compute - A soft realtime unprivileged policy for normal users with a tunable maximum cpu usage set to 80% by default /proc/sys/kernel/iso_cpu - A background scheduling class that uses zero cpu usage resources if any other task wants cpu. This is unashamedly a relatively unfair slightly starveable cpu scheduler with exceptional quality _Desktop_ performance as it was always intended to be. It is NOT intended for mainline use as mainline needs a general purpose cpu scheduler (remember!). I have no intention of pushing it as such given its disadvantages, and don't really care about those disadvantages as I have no intention of trying to "plug up" the theoretical exploits and disadvantages either since desktops aren't really affected BUT this scheduler is great fun to use. Unfortunately the version of this scheduler in plugsched is not up to date with this code. Perhaps if demand for plugsched somehow turns the world on its head then this code may have a place elsewhere too. Enjoy! If you don't like it? Doesn't matter; you have a choice so just use something else. This is code that will only be in -ck. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: so what *is* obsolete and removable?
On Tue, 17 Apr 2007 00:39:10 +0200 Tilman Schmidt wrote: > Am 15.04.2007 22:55 schrieb Robert P. J. Day: > > as i recall, the isdn4linux was *un*obsoleted, wasn't it? > > Actually, it wasn't. > > We *did* reach a consensus that isdn4linux is not obsolete in the > accepted sense of the word, because there is no replacement for it > so far. > > OTOH I have since submitted (twice, in fact) a patch that would remove > the "(obsolete)" label from the Kconfig entry, but somehow nothing > ever became of it. My submissions just linger in LKML, uncommented and > unmerged. Did you submit the patch to Andrew Morton? Is the patch in the -mm patchset? Did Karsten ack the patch? If the patch is in -mm and it's not critical (like this subject), then it probably won't be merged until after 2.6.21 is released... > To sum it up, we agree that the "(obsolete)" label is wrong, but we > won't remove it. I have no idea how to resolve that situation. > > What I do know is that it would be very wrong to remove isdn4linux, > because it has an existing userbase with nowhere else to go. --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
On 4/16/07, Peter Williams <[EMAIL PROTECTED]> wrote: Note that I talk of run queues not CPUs as I think a shift to multiple CPUs per run queue may be a good idea. This observation of Peter's is the best thing to come out of this whole foofaraw. Looking at what's happening in CPU-land, I think it's going to be necessary, within a couple of years, to replace the whole idea of "CPU scheduling" with "run queue scheduling" across a complex, possibly dynamic mix of CPU-ish resources. Ergo, there's not much point in churning the mainline scheduler through a design that isn't significantly more flexible than any of those now under discussion. For instance, there are architectures where several "CPUs" (instruction stream decoders feeding execution pipelines) share parts of a cache hierarchy ("chip-level multitasking"). On these machines, you may want to co-schedule a "real" processing task on one pipeline with a "cache warming" task on the other pipeline -- but only for tasks whose memory access patterns have been sufficiently analyzed to write the "cache warming" task code. Some other tasks may want to idle the second pipeline so they can use the full cache-to-RAM bandwidth. Yet other tasks may be genuinely CPU-intensive (or I/O bound but so context-heavy that it's not worth yielding the CPU during quick I/Os), and hence perfectly happy to run concurrently with an unrelated task on the other pipeline. There are other architectures where several "hardware threads" fight over parts of a cache hierarchy (sometimes bizarrely described as "sharing" the cache, kind of the way most two-year-olds "share" toys). On these machines, one instruction pipeline can't help the other along cache-wise, but it sure can hurt. A scheduler designed, tested, and tuned principally on one of these architectures (hint: "hyperthreading") will probably leave a lot of performance on the floor on processors in the former category. In the not-so-distant future, we're likely to see architectures with dynamically reconfigurable interconnect between instruction issue units and execution resources. (This is already quite feasible on, say, Virtex4 FX devices with multiple PPC cores, or Altera FPGAs with as many Nios II cores as fit on the chip.) Restoring task context may involve not just MMU swaps and FPU instructions (with state-dependent hidden costs) but processsor reconfiguration. Achieving "fairness" according to any standard that a platform integrator cares about (let alone an end user) will require a fairly detailed model of the hidden costs associated with different sorts of task switch. So if you are interested in schedulers for some reason other than a paycheck, let the distros worry about 5% improvements on x86[_64]. Get hold of some different "hardware" -- say: - a Xilinx ML410 if you've got $3K to blow and want to explore reconfigurable processors; - a SunFire T2000 if you've got $11K and want to mess with a CMT system that's actually shipping; - a QEMU-simulated massively SMP x86 if you're poor but clever enough to implement funky cross-core cache effects yourself; or - a cycle-accurate simulator from Gaisler or Virtio if you want a real research project. Then go explore some more interesting regions of parameter space and see what the demands on mainline Linux will look like in a few years. Cheers, - Michael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -rc7 Re] [Trivial] Spelling at drivers/video/Kconfig
On Tue, 2007-04-17 at 00:21 +0200, Miguel Ojeda wrote: > "Trivial patch, against -rc6. I don't know if anyone has fixed this by now." > I'll pick this up. Tony - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc6 + firstfloor patches: BUG: sleeping function called from invalid context at kernel/sched,.c:3643
Andi Kleen wrote: > Hmm, are you sure? Can you double check? With the latest tree? > > I could reproduce the problem and my change fixed the problem for me. > Hm. Me too. I just booted 2.6.21-rc7-ff-paravirt, and it seems fine. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Brad Campbell wrote: > Brad Campbell wrote: >> G'day all, >> >> All I have is a digital photo of this oops. (It's 3.5mb). I have >> serial console configured, but Murphy is watching me carefully and I >> just can't seem to reproduce it while logging the console output. >> > > And as usual, after trying to capture one for 4 days, I get one 10 mins > after I've sent the E-mail :) > > I think I've just found a way to make this easier to reproduce as > /dev/sdd was not even mounted this > time. I just cold booted and started an md5sum -c run on a directory of > about 180GB. > > [ 2566.192665] BUG: unable to handle kernel NULL pointer dereference at > virtual address 005c > [ 2566.218242] printing eip: > [ 2566.226362] c0203169 > [ 2566.232906] *pde = > [ 2566.241274] Oops: [#1] > [ 2566.249637] Modules linked in: > [ 2566.258832] CPU:0 > [ 2566.258833] EIP:0060:[]Not tainted VLI > [ 2566.258834] EFLAGS: 00010082 (2.6.21-rc6-git5 #1) > [ 2566.296146] EIP is at cfq_dispatch_insert+0x19/0x70 > [ 2566.310761] eax: f7a0eae0 ebx: f7a0cb28 ecx: e2f869e8 edx: > > [ 2566.331076] esi: f79fea7c edi: f7d04ac0 ebp: esp: > f6945de0 > [ 2566.351388] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 > [ 2566.368843] Process md5sum (pid: 2875, ti=f6944000 task=f68f4ad0 > task.ti=f6944000) > [ 2566.390975] Stack: f79fea7c f7d04ac0 c02032d9 > f6ae5ef0 c0133411 1000 > [ 2566.416414]0008 0004 0b582fd4 f79fea7c > f7d04ac0 f79fea7c > [ 2566.441870]c0203519 f7a0cb28 f7a0cb28 f79e 0282 > c01fb7a9 c016ea4d > [ 2566.467326] Call Trace: > [ 2566.475236] [] __cfq_dispatch_requests+0x79/0x170 > [ 2566.492224] [] do_generic_mapping_read+0x281/0x470 > [ 2566.509473] [] cfq_dispatch_requests+0x69/0x90 > [ 2566.525681] [] elv_next_request+0x39/0x130 > [ 2566.540850] [] bio_endio+0x5d/0x90 > [ 2566.553942] [] scsi_request_fn+0x45/0x280 > [ 2566.568851] [] blk_run_queue+0x32/0x70 > [ 2566.582982] [] scsi_next_command+0x30/0x50 > [ 2566.598154] [] scsi_end_request+0x9b/0xc0 > [ 2566.613063] [] scsi_io_completion+0x81/0x330 > [ 2566.628751] [] scsi_delete_timer+0xb/0x20 > [ 2566.643661] [] ata_scsi_qc_complete+0x65/0xd0 > [ 2566.659613] [] sd_rw_intr+0x8b/0x220 > [ 2566.673222] [] ata_altstatus+0x1c/0x20 > [ 2566.687352] [] ata_hsm_move+0x14d/0x3f0 > [ 2566.701744] [] scsi_finish_command+0x40/0x60 > [ 2566.717434] [] scsi_softirq_done+0x6f/0xe0 > [ 2566.732604] [] sil_interrupt+0x81/0x90 > [ 2566.746733] [] blk_done_softirq+0x58/0x70 > [ 2566.761644] [] __do_softirq+0x6f/0x80 > [ 2566.775516] [] do_softirq+0x27/0x30 > [ 2566.788866] [] do_IRQ+0x3e/0x80 > [ 2566.801177] [] common_interrupt+0x23/0x28 > [ 2566.816090] === > [ 2566.826793] Code: 3e 05 f0 ff e9 47 ff ff ff 89 f6 8d bc 27 00 00 00 > 00 83 ec 10 89 1c 24 89 6c > 24 0c 89 74 24 04 89 7c 24 08 89 c3 89 d5 8b 40 0c <8b> 72 5c 8b 78 04 > 89 d0 e8 4a fa ff ff 8b 45 14 > 89 ea 25 01 80 > [ 2566.886586] EIP: [] cfq_dispatch_insert+0x19/0x70 SS:ESP > 0068:f6945de0 > [ 2566.909179] Kernel panic - not syncing: Fatal exception in interrupt cfq_dispatch_insert() was called with rq == 0. This one is getting really annoying... and md is involved again (RAID0 this time.) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: so what *is* obsolete and removable?
Am 15.04.2007 22:55 schrieb Robert P. J. Day: > as i recall, the isdn4linux was *un*obsoleted, wasn't it? Actually, it wasn't. We *did* reach a consensus that isdn4linux is not obsolete in the accepted sense of the word, because there is no replacement for it so far. OTOH I have since submitted (twice, in fact) a patch that would remove the "(obsolete)" label from the Kconfig entry, but somehow nothing ever became of it. My submissions just linger in LKML, uncommented and unmerged. To sum it up, we agree that the "(obsolete)" label is wrong, but we won't remove it. I have no idea how to resolve that situation. What I do know is that it would be very wrong to remove isdn4linux, because it has an existing userbase with nowhere else to go. -- Tilman Schmidt E-Mail: [EMAIL PROTECTED] Wehrhausweg 66 Fax: +49 228 4299019 53227 Bonn Germany signature.asc Description: OpenPGP digital signature
Repair-driven file system design (was Re: ZFS with Linux: An Open Plea)
On Mon, Apr 16, 2007 at 01:07:05PM +1000, David Chinner wrote: > On Sun, Apr 15, 2007 at 08:50:25PM -0400, Rik van Riel wrote: > > > IMHO chunkfs could provide a much more promising approach. > > Agreed, that's one method of compartmentalising the problem. Agreed, the chunkfs design is only one way to implement repair-driven file system design - designing your file system to make file system check and repair fast and easy. I've written a paper on this idea, which includes some interesting projections estimating that fsck will take 10 times as long on the 2013 equivalent of a 2006 file system, due entirely to changes in disk hardware. So if your server currently takes 2 hours to fsck, an equivalent server in 2013 will take about 20 hours. Eek! Paper here: http://infohost.nmt.edu/~val/review/repair.pdf While I'm working on chunkfs, I also think that all file systems should strive for repair-driven design. XFS has already made big strides in this area (multi-threading fsck for multi-disk file systems, for example) and I'm excited to see what comes next. -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Major qla2xxx regression on sparc64
From: Andrew Vasquez <[EMAIL PROTECTED]> Date: Mon, 16 Apr 2007 15:25:17 -0700 > Fine, I'll agree that wacking-users (and > I'll wager the outliers) with a 2x4 was a bit extreme, And that, right there, is basically the end of the conversation. You don't do this to users, ever. Put a big loud kernel log message in there when this situation presents itself, use as many capital letters and scary language that you wish. Let them know that if things explode they get to keep the pieces. But at least try to give them something that works when you know that you can. You don't need to make someone's system unbootable in order to make them aware of a potential problem. It's very anti-social to approach things in this way. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [lm-sensors] Could the k8temp driver be interfering with ACPI?
On Monday 16 April 2007 15:14, Luca Tettamanti wrote: > It seems that Asus exposes monitorining data using "ATK0110" (enumerated > in DSDT); I see it both on my P5B-E motherboard and on my notebook (L3D) > (they have different methods though). Another motherboard with the same > device may actually call it "FOOBAR123" or "WTFISTHIS". Yup, we have the same problem with other devices. See the long list of PNP IDs in 8250_pnp.c :-) > Problem is that ACPI methods are not documented at all (how am I > supposed to know that "G6T6" is the reading of the 12V rail?) while the > datasheet of hw monitoring chips (w83627ehf in my case) are public (more > or less). Yes, I see that it's attractive to use a single w83627ehf.c driver. For an ACPI driver, we'd have to build a list of PNP IDs, and possibly information about which methods read which information. That's certainly more work. On the other hand, the ACPI driver would avoid the synchronization issues that started this whole thread. That's a pretty compelling advantage. > Furthermore, sensor driver exposes all the reading of the chip > (e.g. in the DSDT I can't find the VSB or battery voltage). Maybe Asus didn't hook up those readings on the board. I would guess that PC Probe doesn't expose the VSB or battery voltage either. I'm sure you've seen these: http://lists.lm-sensors.org/pipermail/lm-sensors/2005-October/014050.html http://www.lm-sensors.org/wiki/AsusFormulaHacking Looks like nobody took up the challenge, though :-) It looks fun to play with, if only I had the time and hardware. Bjorn - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Major qla2xxx regression on sparc64
On Mon, 16 Apr 2007, David Miller wrote: > From: Andrew Vasquez <[EMAIL PROTECTED]> > Date: Mon, 16 Apr 2007 14:10:49 -0700 > > > Ok, how about the following patch based on the one you posted which > > adds the codes to retrieve the WWPN/WWNN from firmware on SPARC, and > > also adds the module-parameter override I mentioned above. > > > > Perhaps the module-parameter should be set to non-zero in the case of > > SPARC, to take care of your system configurations? > > I think it should default to non-zero always, in fact the option > is completely pointless. > > The guy who hits this had a system which worked previously, and you're > explicitly breaking it. That's wrong. Sorry, 'it' didn't work... 'It' *never* did. > How can you not see that this quality of implementation decision > you're making stinks? You're defending a position which itself left users with a false sense of security and comfort. This is a *real* problem from an enterprise perspective where FC reigns. Fine, I'll agree that wacking-users (and I'll wager the outliers) with a 2x4 was a bit extreme, but I'd much rather handle those users on a case-by-case basis, either by: * If dealing with a PCI card, directing a user to a support staff at QLogic to resolve the NVRAM issues. * If it's some on-board ISP with no NVRAM, as was your SPARC case, then add *proper* codes to retrieve the data from some secondary persistent store. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Loud "pop" coming from hard drive on reboot
Jan Engelhardt wrote: > On Apr 15 2007 12:53, Henrique de Moraes Holschuh wrote: >> On Sat, 14 Apr 2007, Pavel Machek wrote: >>> How common are notebooks that cut power to disks during reboot? >> Assuming it also does this when running Windows, I'd report it as a grave >> bug to the vendor and demand it to be fixed, or the machine to be exchanged >> with another model that doesn't have this defect. > > Given that it does not happen on Windows (IIRC Chuck's post), > then just what is Windows [not] doing that Linux does? It looks like there are two problems here: (1) Some notebooks power off and back on when restarting. Both Linux and other OS handle that badly because they assume power is not interrupted on reboot. The noise emitted is relatively loud. (2) Linux (alone) gives a very muted pop on shutdown. This could be from bad interaction with the shutdown command, or some other reason (drive not given enough time to shut down?) The noise is not very loud, maybe the head did not have to move very far? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -rc7 Re] [Trivial] Spelling at drivers/video/Kconfig
"Trivial patch, against -rc6. I don't know if anyone has fixed this by now." Resend comment: Still present in -rc7. --- drivers/video/Kconfig: - Spelling: "Frambuffer hardware support" drivers/video/Kconfig |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Signed-off-by: Miguel Ojeda Sandonis <[EMAIL PROTECTED]> --- diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig index e4f0dd0..8372ace 100644 --- a/drivers/video/Kconfig +++ b/drivers/video/Kconfig @@ -139,7 +139,7 @@ config FB_TILEBLITTING This is particularly important to one driver, matroxfb. If unsure, say N. -comment "Frambuffer hardware drivers" +comment "Frame buffer hardware drivers" depends on FB config FB_CIRRUS -- Miguel Ojeda http://maxextreme.googlepages.com/index.htm - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] How should an exit routine wait for release() callbacks?
Ah, just found this original thread, now Cornelia's patches make more sense... On Fri, Apr 13, 2007 at 11:24:58AM -0400, Alan Stern wrote: > Tejun, it just occurred to me that you would be interested in this email > thread. Just to bring you up to speed, here's the original question: > > > I've got a module which registers a struct device. (It represents a > > virtual device, not a real one, but that doesn't matter.) Wait, that's the issue right there. Don't do that. devices should be created by busses or the platform core, which owns the release function for them. Individual drivers should not create devices. Hm, but then, how would you ever unload a bus, as the same issue might be there too... Any specific code in the kernel you can point to that has this issue today? thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/7] ARM: OMAP: Merge board specific files from N800 tree
* Tony Lindgren <[EMAIL PROTECTED]> [070409 21:34]: > From: Kai Svahn <[EMAIL PROTECTED]> > > This patch merges board specific files from N800 tree. > Nokia has published the files at: > > http://repository.maemo.org/pool/maemo3.0/free/source/ > kernel-source-rx-34_2.6.18.orig.tar.gz > kernel-source-rx-34_2.6.18-osso29.diff.gz Here's an updated version that fixes compile after my last fix to move externs to board-nokia.h. Regards, Tony >From fd345ea126336a514baf808170f1231999ba2c1d Mon Sep 17 00:00:00 2001 From: Kai Svahn <[EMAIL PROTECTED]> Date: Fri, 26 Jan 2007 12:39:48 -0800 Subject: [PATCH 5/7] ARM: OMAP: Merge board specific files from N800 tree This patch merges board specific files from N800 tree. Nokia has published the files at: http://repository.maemo.org/pool/maemo3.0/free/source/ kernel-source-rx-34_2.6.18.orig.tar.gz kernel-source-rx-34_2.6.18-osso29.diff.gz Signed-off-by: Kai Svahn <[EMAIL PROTECTED]> Signed-off-by: Tony Lindgren <[EMAIL PROTECTED]> Index: linux-2.6/arch/arm/mach-omap2/Kconfig === --- linux-2.6.orig/arch/arm/mach-omap2/Kconfig 2007-04-16 20:50:00.0 + +++ linux-2.6/arch/arm/mach-omap2/Kconfig 2007-04-16 20:50:00.0 + @@ -54,4 +54,13 @@ config MACH_OMAP_APOLLON config MACH_OMAP_2430SDP bool "OMAP 2430 SDP board" - depends on ARCH_OMAP2 && ARCH_OMAP24XX \ No newline at end of file + depends on ARCH_OMAP2 && ARCH_OMAP24XX + +config MACH_NOKIA_N800 + bool "Nokia N800" + depends on ARCH_OMAP24XX + +config MACH_OMAP2_TUSB6010 + bool + depends on ARCH_OMAP2 && ARCH_OMAP2420 + default y if MACH_NOKIA_N800 \ No newline at end of file Index: linux-2.6/arch/arm/mach-omap2/Makefile === --- linux-2.6.orig/arch/arm/mach-omap2/Makefile 2007-04-16 20:50:00.0 + +++ linux-2.6/arch/arm/mach-omap2/Makefile 2007-04-16 20:50:00.0 + @@ -16,4 +16,8 @@ obj-$(CONFIG_MACH_OMAP_GENERIC) += boar obj-$(CONFIG_MACH_OMAP_H4) += board-h4.o obj-$(CONFIG_MACH_OMAP_2430SDP)+= board-2430sdp.o obj-$(CONFIG_MACH_OMAP_APOLLON)+= board-apollon.o +obj-$(CONFIG_MACH_NOKIA_N800) += board-n800.o board-n800-flash.o \ + board-n800-mmc.o board-n800-bt.o \ + board-n800-audio.o board-n800-usb.o \ + board-n800-dsp.o board-n800-pm.o Index: linux-2.6/arch/arm/mach-omap2/board-n800-audio.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6/arch/arm/mach-omap2/board-n800-audio.c2007-04-16 20:50:00.0 + @@ -0,0 +1,366 @@ +/* + * linux/arch/arm/mach-omap2/board-n800-audio.c + * + * Copyright (C) 2006 Nokia Corporation + * Contact: Juha Yrjola + * Jarkko Nikula <[EMAIL PROTECTED]> + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * version 2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA + * 02110-1301 USA + * + */ + +#include +#include +#include +#include + +#include +#include + +#include "../plat-omap/dsp/dsp_common.h" + +#if defined(CONFIG_SPI_TSC2301_AUDIO) && defined(CONFIG_SND_OMAP24XX_EAC) +#define AUDIO_ENABLED + +static struct clk *sys_clkout2; +static struct clk *func96m_clk; +static struct device *eac_device; +static struct device *tsc2301_device; + +static int enable_audio; +static int audio_ok; +static spinlock_t audio_lock; + +/* + * Leaving EAC and sys_clkout2 pins multiplexed to those subsystems results + * in about 2 mA extra current leak when audios are powered down. The + * workaround is to multiplex them to protected mode (with pull-ups enabled) + * whenever audio is not being used. + */ +static int eac_mux_disabled = 0; +static int clkout2_mux_disabled = 0; +static u32 saved_mux[2]; + +static void n800_enable_eac_mux(void) +{ + if (!eac_mux_disabled) + return; + __raw_writel(saved_mux[1], IO_ADDRESS(0x48000124)); + eac_mux_disabled = 0; +} + +static void n800_disable_eac_mux(void) +{ + if (eac_mux_disabled) { + WARN_ON(eac_mux_disabled); + return; + } + saved_mux[1] = __raw_readl(IO_ADDRESS(0x48000124)); + __raw_writel(0x1f1f1f1f,
Re: [patch -mm] i386: use pte_update_defer in ptep_test_and_clear_{dirty,young}
David Rientjes wrote: Sure, but what I really like about the patch is that we're only flushing something if !flush_end in the first place. So we can eliminate any TLB flushing if that VMA didn't need it; that's a change from the current behavior. And since the most obvious use-case for /proc/pid/clear_refs is in conjunction with /proc/pid/smaps for approximating memory footprint, we'll end up saving TLB flushes because the granularity with which that measurement is taken is usually very fine. Acked-by: David Rientjes <[EMAIL PROTECTED]> I like the patch even better if you still batch the flushes, but keep the !flush_end machinery. If I read it correctly, flush_start stays at the lower bound for the whole function, so it is still accurate later. And with the flush outside the spinlock, contention time is lower. Thanks, Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [v4l-dvb-maintainer] [GIT PATCHES] V4L/DVB updates
Am Montag, den 16.04.2007, 12:25 -0400 schrieb Michael Krufky: > CIJOML wrote: > > Dne pondělí 16 duben 2007 17:34 Michael Krufky napsal(a): > > > >> Adrian Bunk wrote: > >> > >>> On Sun, Apr 15, 2007 at 08:33:38PM -0400, Michael Krufky wrote: > >>> > Mauro, > > I've been out of town for the past few days... I just got home and saw > this: > > Mauro Carvalho Chehab wrote: > > >- Fix 1/3 for bug 7819: fixed frontend hotplug issue > >- Fix 2/3 for bug 7819: demux and dvr > >- Fix 3/3 for bug 7819: fixed hotplugging for dvbnet > > > I don't think that this is 2.6.21 material. These patches have not yet > received > enough testing to be sent to mainline. > > I have tested them, and they seem to work for my cxusb device, but we > have yet to hear test results from users of usb dvb devices that do not > use the dvb-usb framework. (ttusb, flexcop-usb, cinergyT2, for example) > > The bug that these patches fix has been around throughout the entire > kernel history of the dvb subsystem. The bug is not a regression -- it > has always been > there. In my opinion, it is too late in 2.6.21 development to apply > this change. > Because these fixes are not obvious, I think we should let them get some > more testing, and have them queued for 2.6.22 . > > >>> Unless I misunderstand anything, this should fix [1]. > >>> > >>> And this is a bug that was reported to be present in 2.6.21-rc but not > >>> in 2.6.20 (and it's therefore a regression, no matter whether the > >>> underlying problem was older and only exposed by some other change). > >>> > >> Not true. The DVB subsystem has NEVER been hot-unpluggable. I confirm > >> that the patches SEEM to be correct, but this has not yet been verified. > >> None of the authors of dvb-core gave their ACK on these changesets. > >> > >> The DVB hotplug issue has been around since the very beginning. I assure > >> you, that I consider this fix to be very important, and I really would love > >> to see it hit mainline. However, given the situation, it is not > >> appropriate to push these in during -rc7 > >> > >> I have doubts on CIJOML's testing method -- there is no way he could have > >> unplugged the device while in use, while running 2.6.20.y and not receive > >> an OOPS. CIJOML, please see the bottom of this email for > >> > >> Sure, this will prevent an OOPS on some, and hopefully all devices... but > >> what if it causes a regression for those untested? > >> > >> Why do we have a merge window, if new changesets are going to be rushed > >> into late -rc kernels without proper testing, and without the ack of a dvb > >> subsystem maintainer? > >> > >> Are we prepared to go for another -rc and 3 or 4 weeks of testing to > >> confirm that this fix doesn't cause new regressions? I don't think so. > >> > >> Markus Rechberger wrote: > >> > >>> The patch has been around on the dvb mailinglist ([PATCH][RFC] DVB > >>> Hotplug Fix, 5. April 2007), > >>> > >> The patch was merged into the development repository at the same time the > >> pull request was issued to Linus. This has NOT been tested on a wide > >> scale. It should go to -mm for a while before being merged to mainline. > >> > >> Mauro Carvalho Chehab wrote: > >> > >>> I also explicitly warned at DVB ML that I were about to send this patch, > >>> together with other fixes, asking the community for more tests. After > >>> that, I received two positive answers on my mailbox from people that > >>> tested and noticed that this really fixed the issue. > >>> > >> One of those positive answers was me - I explained that it worked for me, > >> but we need others to test. > >> > >> You waited ONE DAY after sending this "warning" to the dvb mailing list? ( > >> http://linuxtv.org/pipermail/linux-dvb/2007-April/017204.html ) I saw that > >> email after seeing the pull request to Linus. We dont have users testing > >> the repositories after each commit -- you _really_ need to give some more > >> time to allow for such testing. > >> > >> CIJOML wrote: > >> > >>> Hi, > >>> > >>> I have tested these patches with: > >>> > >>> Freecom DVB-T dongle > >>> Pluto2 pcmcia card > >>> Leadtek WinFast DTV dongle 1st generation > >>> Leadtek WinFast DTV dongle 2nd generation > >>> > >>> These are 4 different devices with 4 different hw and modules. > >>> All works. Please apply. > >>> > >> Well, that helps... But it would still be nice to hear test results on a > >> CinergyT2 or flexcop-usb. > >> > >> Which driver supports those Winfast dongles? We already know for sure that > >> the patches work correctly for any driver based on the dvb-usb framework. > >> > >> If you had the device open, and then disconnect it from the usb bus, no > >> matter what kernel version you're running, you should hit the OOPS. I > >>
Re: [patch] CFS (Completely Fair Scheduler), v2
17 Nis 2007 Sal tarihinde, Ingo Molnar şunları yazmıştı: > - fixed child-runs first. A /proc/sys/kernel/sched_child_runs_first >flag can be used to turn it on/off. (This might fix the Kaffeine bug >reported by S.Çağlar Onur <) Sorry for delayed response but i just find some free time, do you still want me to test mainline + "parent-runs first" patch or will i drop that one and test v2 which can change default behaviour? -- S.Çağlar Onur <[EMAIL PROTECTED]> http://cekirdek.pardus.org.tr/~caglar/ Linux is like living in a teepee. No Windows, no Gates and an Apache in house! signature.asc Description: This is a digitally signed message part.
Re: bug in tcp?
From: Sebastian Kuzminsky <[EMAIL PROTECTED]> Date: Mon, 16 Apr 2007 15:45:19 -0600 > I'm seeing some weird behavior in TCP. The issue is perfectly > reproducible using netcat and other programs. This is what I do: Please send your bug report again, but this time to the [EMAIL PROTECTED] mailing list which is where the networking developers are subscribed and deal with bug reports in the networking. Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [AppArmor 39/41] AppArmor: Profile loading and manipulation, pathname matching
On Mon, Apr 16, 2007 at 11:00:01PM +0100, Alan Cox wrote: > > don't actually have to care --- if loading an invalid profile can bring > > down > > the system, then that's no worse than an arbitrary module that crashes the > > machine. Not sure if there will ever be user loadable profiles; at least at > > that point we had to care. > > CAP_SYS_RAWIO is needed to do arbitary patching/loading in the capability > model so if you are using lesser capabilities it is a (minor) capability > rise but not a big problem, just ugly and wanting a fix > > > > > + /* > > > > +* Replacement needs to allocate a new aa_task_context for each > > > > +* task confined by old_profile. To do this the profile locks > > > > +* are only held when the actual switch is done per task. While > > > > +* looping to allocate a new aa_task_context the old_task list > > > > +* may get shorter if tasks exit/change their profile but will > > > > +* not get longer as new task will not use old_profile detecting > > > > +* that is stale. > > > > +*/ > > > > + do { > > > > + new_cxt = aa_alloc_task_context(GFP_KERNEL | > > > > __GFP_NOFAIL); > > > > > > NOFAIL is usually a bad sign. It should be only used if there is no > > > alternative. > > > > At this point there is no secure alternative to allocating a task context > > --- > > except killing the task, maybe. > > Can you count the number needed, preallocate them and then when you know > for sure either succeed or fail the operation as a whole ? No, to be accurate the count would have to be made with the profile lock held, which would then need to be released so as not to use GFP_ATOMIC for the allocations. An iterative approach could be taken where we do something like repeat: lock profile count if preallocated < count unlock profile if (! allocate count - preallocated) Fail goto repeat do replacement pgpvmw01XYPtd.pgp Description: PGP signature
Re: Major qla2xxx regression on sparc64
From: Andrew Vasquez <[EMAIL PROTECTED]> Date: Mon, 16 Apr 2007 14:10:49 -0700 > Ok, how about the following patch based on the one you posted which > adds the codes to retrieve the WWPN/WWNN from firmware on SPARC, and > also adds the module-parameter override I mentioned above. > > Perhaps the module-parameter should be set to non-zero in the case of > SPARC, to take care of your system configurations? I think it should default to non-zero always, in fact the option is completely pointless. The guy who hits this had a system which worked previously, and you're explicitly breaking it. That's wrong. How can you not see that this quality of implementation decision you're making stinks? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] CFS (Completely Fair Scheduler), v2
this is the second release of the CFS (Completely Fair Scheduler) patchset, against v2.6.21-rc7: http://redhat.com/~mingo/cfs-scheduler/sched-cfs-v2.patch i'd like to thank everyone for the tremendous amount of feedback and testing the v1 patch got - i could hardly keep up with just reading the mails! Some of the stuff people addressed i couldnt implement yet, i mostly concentrated on bugs, regressions and debuggability. there's a fair amount of churn: 15 files changed, 456 insertions(+), 241 deletions(-) But it's an encouraging sign that there was no crash bug found in v1, all the bugs were related to scheduling-behavior details. The code was tested on 3 architectures so far: i686, x86_64 and ia64. Most of the code size increase in -v2 is due to debugging helpers, they'll be removed later. (The new /proc/sched_debug file can be used to see the fine details of CFS scheduling.) Changes since -v1: - make nice levels less starvable. (reported by Willy Tarreau) - fixed child-runs first. A /proc/sys/kernel/sched_child_runs_first flag can be used to turn it on/off. (This might fix the Kaffeine bug reported by S.Çağlar Onur <) - changed SCHED_FAIR back to SCHED_NORMAL (suggested by Con Kolivas) - UP build fix. (reported by Gabriel C) - timer tick micro-optimization (Dmitry Adamushko) - preemption fix: sched_class->check_preempt_curr method to decide whether to preempt after a wakeup (or at a timer tick). (Found via a fairness-test-utility written for CFS by Mike Galbraith) - start forked children with neutral statistics instead of trying to inherit them from the parent: Willy Tarreau reported that this results in better behavior on extreme workloads, and it also simplifies the code quite nicely. Removed sched_exit() and the ->task_exit() methods. - make nice levels independent of the sched_granularity value - new /proc/sched_debug file listing runqueue details and the rbtree - new SCH-* fields in /proc//status to see scheduling details - new cpu-hog feature (off by default) and sysctl tunable to set it: /proc/sys/kernel/sched_max_hog_history_ns tunable defaults to 0 (off). Positive values are meant the maximum 'memory' that the scheduler has of CPU hogs. - various code cleanups - added more statistics temporarily: sum_exec_runtime, sum_wait_runtime. - added -CFS-v2 to EXTRAVERSION as usual, any sort of feedback, bugreports, fixes and suggestions are more than welcome, Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch -mm] i386: use pte_update_defer in ptep_test_and_clear_{dirty,young}
Hugh Dickins wrote: You're right to want to defer your pte_updates, David is right to want to batch his TLB flushes. It bothers me that you have a surprising case, and that unless you abandon your optimization, it imposes a new constraint on how to proceed in common code (without #ifdef'ing around). But perhaps in this case David might concede that the longer we delay the TLB flush, the more likely a referenced bit is to be missed - that is, it gets cleared from the pte, but if that page is accessed again before the TLB is flushed, the processor may well omit to reinstate the accessed bit, and our stats drift away from reality. Compromise patch below: would that be satisfactory to you, David? Although I appreciate the heroics, you needn't do this on our account; the win of a couple thousand cycles is not worth the cost in complexity, IMHO, and the penalty on native quite potentially overshadows this. If you still issue the flush inside the spinlock, as required for this paravirt optimization, you are taking the risk of holding the spinlock an extra long time while issuing a TLB shootdown - which means waiting for an IPI. It might not matter that much on i386, but on big iron (or realtime) systems, this could have significant negative scaling effects for workloads where the page page table was hot on some set of CPUs (say, remapping file pages for database access). In time, the benefits of this optimization to the hypervisor will decrease, while the benefits of optimizing the other way for shorter spinlock time may increase, both in a VM and on native hardware. So I would rather just drop the pte_update_defer down to a pte_update if the flush is not immediately following - as it is nice and simply correct without getting in the way. I see there is more in this thread that I haven't read yet, so I preemptively reserve the right to issue an invalidation of this opinion... Thanks, Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 8/18] ARM: OMAP: Add mailbox support for IVA
* Tony Lindgren <[EMAIL PROTECTED]> [070409 21:23]: > From: Hiroshi DOYU <[EMAIL PROTECTED]> > > This patch adds a generic mailbox interface for for DSP and IVA > (Image Video Accelerator). This patch itself doesn't contain > any IVA driver. Here's an updated version that merges in two later fixes from Hiroshi. Regards, Tony >From 7845896508123512184412464ca22505c13a728d Mon Sep 17 00:00:00 2001 From: Hiroshi DOYU <[EMAIL PROTECTED]> Date: Thu, 7 Dec 2006 15:43:59 -0800 Subject: [PATCH 8/18] ARM: OMAP: Add mailbox support for IVA This patch adds a generic mailbox interface for for DSP and IVA (Image Video Accelerator). This patch itself doesn't contain any IVA driver. Signed-off-by: Hiroshi DOYU <[EMAIL PROTECTED]> Signed-off-by: Juha Yrjola <[EMAIL PROTECTED]> Signed-off-by: Tony Lindgren <[EMAIL PROTECTED]> --- arch/arm/mach-omap1/mailbox.c | 206 arch/arm/mach-omap2/mailbox.c | 310 ++ arch/arm/plat-omap/mailbox.c| 352 +++ arch/arm/plat-omap/mailbox.h| 193 +++ include/asm-arm/arch-omap/mailbox.h | 68 +++ 5 files changed, 1129 insertions(+), 0 deletions(-) Index: linux-2.6/arch/arm/mach-omap1/mailbox.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6/arch/arm/mach-omap1/mailbox.c 2007-04-16 18:19:40.0 + @@ -0,0 +1,206 @@ +/* + * Mailbox reservation modules for DSP + * + * Copyright (C) 2006 Nokia Corporation + * Written by: Hiroshi DOYU <[EMAIL PROTECTED]> + * + * This file is subject to the terms and conditions of the GNU General Public + * License. See the file "COPYING" in the main directory of this archive + * for more details. + */ + +#include +#include +#include +#include +#include +#include +#include + +#define MAILBOX_ARM2DSP1 0x00 +#define MAILBOX_ARM2DSP1b 0x04 +#define MAILBOX_DSP2ARM1 0x08 +#define MAILBOX_DSP2ARM1b 0x0c +#define MAILBOX_DSP2ARM2 0x10 +#define MAILBOX_DSP2ARM2b 0x14 +#define MAILBOX_ARM2DSP1_Flag 0x18 +#define MAILBOX_DSP2ARM1_Flag 0x1c +#define MAILBOX_DSP2ARM2_Flag 0x20 + +unsigned long mbox_base; + +struct omap_mbox1_fifo { + unsigned long cmd; + unsigned long data; + unsigned long flag; +}; + +struct omap_mbox1_priv { + struct omap_mbox1_fifo tx_fifo; + struct omap_mbox1_fifo rx_fifo; +}; + +static inline int mbox_read_reg(unsigned int reg) +{ + return __raw_readw(mbox_base + reg); +} + +static inline void mbox_write_reg(unsigned int val, unsigned int reg) +{ + __raw_writew(val, mbox_base + reg); +} + +/* msg */ +static inline mbox_msg_t omap1_mbox_fifo_read(struct omap_mbox *mbox) +{ + struct omap_mbox1_fifo *fifo = + &((struct omap_mbox1_priv *)mbox->priv)->rx_fifo; + mbox_msg_t msg; + + msg = mbox_read_reg(fifo->data); + msg |= ((mbox_msg_t) mbox_read_reg(fifo->cmd)) << 16; + + return msg; +} + +static inline void +omap1_mbox_fifo_write(struct omap_mbox *mbox, mbox_msg_t msg) +{ + struct omap_mbox1_fifo *fifo = + &((struct omap_mbox1_priv *)mbox->priv)->tx_fifo; + + mbox_write_reg(msg & 0x, fifo->data); + mbox_write_reg(msg >> 16, fifo->cmd); +} + +static inline int omap1_mbox_fifo_empty(struct omap_mbox *mbox) +{ + return 0; +} + +static inline int omap1_mbox_fifo_full(struct omap_mbox *mbox) +{ + struct omap_mbox1_fifo *fifo = + &((struct omap_mbox1_priv *)mbox->priv)->rx_fifo; + + return (mbox_read_reg(fifo->flag)); +} + +/* irq */ +static inline void +omap1_mbox_enable_irq(struct omap_mbox *mbox, omap_mbox_type_t irq) +{ + if (irq == IRQ_RX) + enable_irq(mbox->irq); +} + +static inline void +omap1_mbox_disable_irq(struct omap_mbox *mbox, omap_mbox_type_t irq) +{ + if (irq == IRQ_RX) + disable_irq(mbox->irq); +} + +static inline int +omap1_mbox_is_irq(struct omap_mbox *mbox, omap_mbox_type_t irq) +{ + if (irq == IRQ_TX) + return 0; + return 1; +} + +static struct omap_mbox_ops omap1_mbox_ops = { + .type = OMAP_MBOX_TYPE1, + .fifo_read = omap1_mbox_fifo_read, + .fifo_write = omap1_mbox_fifo_write, + .fifo_empty = omap1_mbox_fifo_empty, + .fifo_full = omap1_mbox_fifo_full, + .enable_irq = omap1_mbox_enable_irq, + .disable_irq= omap1_mbox_disable_irq, + .is_irq = omap1_mbox_is_irq, +}; + +/* FIXME: the following struct should be created automatically by the user id */ + +/* DSP */ +static struct omap_mbox1_priv omap1_mbox_dsp_priv = { + .tx_fifo = { + .cmd= MAILBOX_ARM2DSP1b, + .data = MAILBOX_ARM2DSP1, + .flag = MAILBOX_ARM2DSP1_Flag, + }, +
Re: [AppArmor 39/41] AppArmor: Profile loading and manipulation, pathname matching
> don't actually have to care --- if loading an invalid profile can bring down > the system, then that's no worse than an arbitrary module that crashes the > machine. Not sure if there will ever be user loadable profiles; at least at > that point we had to care. CAP_SYS_RAWIO is needed to do arbitary patching/loading in the capability model so if you are using lesser capabilities it is a (minor) capability rise but not a big problem, just ugly and wanting a fix > > > + /* > > > + * Replacement needs to allocate a new aa_task_context for each > > > + * task confined by old_profile. To do this the profile locks > > > + * are only held when the actual switch is done per task. While > > > + * looping to allocate a new aa_task_context the old_task list > > > + * may get shorter if tasks exit/change their profile but will > > > + * not get longer as new task will not use old_profile detecting > > > + * that is stale. > > > + */ > > > + do { > > > + new_cxt = aa_alloc_task_context(GFP_KERNEL | __GFP_NOFAIL); > > > > NOFAIL is usually a bad sign. It should be only used if there is no > > alternative. > > At this point there is no secure alternative to allocating a task context --- > except killing the task, maybe. Can you count the number needed, preallocate them and then when you know for sure either succeed or fail the operation as a whole ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [AppArmor 31/41] Fix __d_path() for lazy unmounts and make it unambiguous; exclude unreachable mount points from /proc/mounts
> > That is a fairly significant and sudden change to the existing > > kernel/user interface. > > Well, this is not meant for 2.6.21. I hope it is possible to change it in > early 2.6.22; otherwise if we can't fix mistakes from the past we are pretty > doomed. I don't believe the existing behaviour _IS_ a mistake. > > This is untrue. The process can get there (via fd passing with another > > task) > > Process can access file descriptors which are unreachable via path name just > fine indeed, but those fds still don't have a valid path in the context of > that process. Which while problematic to your name based security is just fine to everything else. > We are only talking about mount points unreachable by a particular process; > this does not mean that the mount point isn't reachable by other processes. > Human operators can choose the context from which they are looking > at /proc/mounts. If they are looking form the "real" root, the will see all > mounts that any process can reach (in that namespace). Ok, providing the "real" root sees them all it isn't so bad, but to assume you can filter based upon what the task can see is dodgy as an assumption. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [v4l-dvb-maintainer] Re: [GIT PATCHES] V4L/DVB updates
On Mon, 16 Apr 2007, Dmitry Torokhov wrote: > Hi Mauro, > > On 4/15/07, Mauro Carvalho Chehab <[EMAIL PROTECTED]> wrote: > > - Fix Kernel Bugzilla #8301: spinlock fix for flexcop-pci > > While move of spin_lock_init before request_irq is obviously correct I > wonder what is the reason behind changing spin_lock_irq() into > spin_lock_irqsave() as I do not see flexcop_pci_isr being called from > anywhere but IRQ context. > > BTW, is irq_lock needed at all? There was some more discussion on the linux-dvb list http://www.linuxtv.org/pipermail/linux-dvb/2007-April/017024.html , and I think we came to the conclusion that irq_lock isn't needed at all. It does nothing but serialize the ISR and ISRs are automatically serialized by the kernel. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [1/2] 2.6.21-rc7: known regressions
Adrian Bunk wrote: > This email lists some known regressions in Linus' tree compared to 2.6.20. > > Subject: snd_intel8x0: divide error: > References : http://lkml.org/lkml/2007/3/5/252 > Submitter : Michal Piotrowski <[EMAIL PROTECTED]> > Status : unknown > Oops is in sound/pci/intel8x0.c::snd_intel8x0_update(), part of the interrupt handler: Line 751: ichdev->position += step * ichdev->fragsize1; if (! chip->in_measurement) ichdev->position %= ichdev->size; ichdev->size is 0. Interrupt happened upon request_irq(). Does chip->in_measurement need to be reset because this is a crashdump kernel? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] hpet: Enable hidden HPET on NVidia motherboards
On Tue, Apr 17, 2007 at 12:28:31AM +0300, Mikko Tiihonen wrote: > I actually was more worried that someone might complain that the pci > scanning is copy & paste code from end of the same file. I did try to use > the generic pci functions first but because they insist on enabling > interrupts they cannot be used this early. And this code needs to be run > before the timer initialization. Yes that's the issue. You're adding another PCI scanner copy'n'pasted from the caller of the function you're adding it to. See the problem? > If you want I can submit a separate patch to move the ... not nice pci > scanning code to pci directory under some early_pci_scan(u32 *pci_ids, > hook) function. The same code was already cut in That is what early-quirks is anyways. But the way to scan for multple things is not to add anther recursive scan, but to just extend or change the main loop. > >Also there should be done anything here without confirmation from > >Nvidia that HPET is actually supposed to work. Sometimes hardware > >is disabled by BIOS because it is seriously broken (there was at least > >one other chipset that could corrupt your flash if you force enabled > >HPET in some steppings) > > I hope someone has some secret contacts at NVidia because they have not > been very open with their chipsets. I looked at LinuxBios and their NForce4 > chipset code had just had commented out code that wrote to 0x44 register. > So obviously something more is needed. Andy, can you help please? There is interest in force enabling HPET on boards where the BIOS didn't chose too. We would need a list of PCI-IDs where this is safe to do and what bits to poke. Thanks. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [AppArmor 38/41] AppArmor: Module and LSM hooks
On Thu, Apr 12, 2007 at 11:21:01AM +0100, Alan Cox wrote: > > + > > + /** > > +* parent can ptrace child when > > +* - parent is unconfined > > +* - parent is in complain mode > > +* - parent and child are confined by the same profile > > +*/ > > Your profiles are name based. That means the same profile in a different > namespace does different things. It would be a very odd case where it > mattered but surely the parent ptrace child rule should also require that > the parent and child are in the same namespace when using apparmor name > based security. > you are right we should be requiring parent and child are in the same namespace. This has been fixed. > > +static int apparmor_capget(struct task_struct *task, > > + kernel_cap_t *effective, > > + kernel_cap_t *inheritable, > > + kernel_cap_t *permitted) > > +{ > > + return cap_capget(task, effective, inheritable, permitted); > > +} > > Pointless function should go away. > yes we had a few of those thanks for pointing it out. > > +static int apparmor_sysctl(struct ctl_table *table, int op) > > +{ > > + int error = 0; > > + > > + if ((op & 002) && !capable(CAP_SYS_ADMIN)) > > + error = aa_reject_syscall(current, GFP_KERNEL, > > + "sysctl (write)"); > > + > > + return error; > > The usual file permission security override is DAC not ADMIN. What is the > logic of this choice. > This was a very course grain check that was done to restrict access to sysctl's that could be potentially used to elevated priledge. The check is inconsistent with AppArmor's model and we should be modelling sysctl accesses as pathname access, and then we could be using standard mediation. thanks for the review john pgpY5SiVZbUvM.pgp Description: PGP signature
Re: [PATCH 7/7] [RFC] APM emulation driver for class batteries
On Tue, Apr 17, 2007 at 01:08:29AM +0400, Anton Vorontsov wrote: > On Mon, Apr 16, 2007 at 09:24:21PM +0100, Russell King wrote: > > Utterly unsafe. What happens if some other module gets loaded which > > does this, and then this module is unloaded followed by the other > > module. Result: Oops. > > Right. And loading two modules which changing apm_get_power_status > is a race already. Thus, APM interface needs a mutex. > > Or pda_power should be marked "bool" in Kconfig, as it is done > in arch/arm/common/sharpsl_pm.c. Sharpsl_pm is safe only because it > can't be a module. > > Personally I'd keep things as is for now (i.e. I'd want tristate for > PDA_POWER, not bool). Later APM API can be fixed. Experience shows "Later" more often than not means "never", inspite of what is said at the time the word is used... -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Kernel-discuss] Re: [PATCH 7/7] [RFC] APM emulation driver for class batteries
Hello Russell, Monday, April 16, 2007, 11:24:21 PM, you wrote: > On Fri, Apr 13, 2007 at 05:50:43PM +0400, Anton Vorontsov wrote: >> +static void (*old_apm_get_power_status)(struct apm_power_info*); >> + >> +static int __init apm_battery_init(void) >> +{ >> + printk(KERN_INFO "APM Battery Driver\n"); >> + >> + old_apm_get_power_status = apm_get_power_status; >> + apm_get_power_status = apm_battery_apm_get_power_status; >> + return 0; >> +} >> + >> +static void __exit apm_battery_exit(void) >> +{ >> + apm_get_power_status = old_apm_get_power_status; >> + return; >> +} > Utterly unsafe. What happens if some other module gets loaded which > does this, and then this module is unloaded followed by the other > module. Result: Oops. That's apparently why "APM emulation" goes on its way towards deprecation, right? And why people so detailed about new battery API, as it's everyone's hope that it should replace APM. We exactly provide APM emulation on top of battery API as separate driver because of such issues with APM API. Anyway, any suggestions on solving this "pointer API" issue? Would at least assigning NULL on exit be more safe? (Because yes, there just shouldn't be two APM drivers, and for the weird case there're, it would be nice to at least not segfault.) -- Best regards, Paulmailto:[EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
AppArmor FAQ
Here we present our direct responses to the most frequent questions from the AppArmor from the 2006 post. Use of Pathnames For Access Control --- Some people in the security field believe that pathnames are an inappropriate security mechanism. This depends on what you are primarily trying to protect, and the rest follows from that. Label-based security (exemplified by SELinux, and its predecessors in MLS systems) attaches security policy to the data. As the data flows through the system, the label sticks to the data, and so security policy with respect to this data stays intact. This is a good approach for ensuring secrecy, the kind of problem that intelligence agencies have. Pathname-based security (exemplified in AppArmor, and its predecessor Janus http://www.cs.berkeley.edu/~daw/janus/ and other systems like Systrace http://www.citi.umich.edu/u/provos/systrace/ ) attach security policy to the name of the data. Controlling access to filenames is important because applications primarily use those names to access the files behind them, and they depend on getting to the right files. For example, login(1) expects /etc/passwd to resolve to a valid list of user accounts. In the traditional UNIX model, files do have names but not labels, and applications only operate in terms of those names. Pathname-based security puts more emphasis on the integrity of the system, making secrecy the secondary goal that follows. Caveat: Both label-based security and pathname-based security can provide both secrecy and integrity protection, the above discussion is only about which model makes it easier to provide which kind of security. We acknowledge that not all objects on a UNIX system are paths, and we agree that there is value in also protecting non-path resources. Contrary to popular belief, AppArmor is *not* "Pathnames R Us", but rather "Use native abstractions to mediate stuff": when you mediate something, you should use the native syntax that users normally use to access the object. This follows the UNIX philosophy of "least surprise" so that users can understand the specification. Pathnames are the natural notation for users to understand what file access rights are being granted in the policy, and so AppArmor uses shell syntax for fully qualified pathnames, including shell syntax wildcards. Similarly, AppArmor grants access to POSIX.1e capabilities by name, the name of the capability. In future work where AppArmor will add network access control, the notation will resemble IPTables firewall rules. This is an important part of what makes AppArmor usable: always using the native abstraction for mediating access. We also acknowledge that pathname based access control requires a way to perform pathname matching in the kernel, and this comes at a cost higher than comparing object labels -- which assumes that all objects in the system already have the appropriate labels. However, those concerned with performance should note that AppArmor overhead is already quite low (single-digit percent slowdown). Security is rarely performance-neutral, and AppArmor, and SELinux, are no exception. However, that overhead is small, and can be selectively avoided by not applying AppArmor to performance-sensitive programs. It is also easy to overlook the fact that putting all those labels in place is a pretty expensive operation as well, particularly considering large file systems. So by providing string matching in the kernel, AppArmor trades run-time performance to grant reduced administrative work. It has been suggested that AppArmor's pathname-based syntax could be compiled into SELinux policy, and this is in fact what the SEEdit project http://seedit.sourceforge.net/ does. However, any change in policy requires a complete re-labeling of the file system, and the policy cannot apply to files that do not yet exist. AppArmor's in-kernel string matching allows for policy specifying access to files that might come to exist in the future. Use Of d_path() For Computing Pathnames --- We have been criticized for the use of d_path(), for various reasons: - heuristic discovery of the vfsmount of a dentry, - inability to reliably identify deleted files, - inability to detect unreachable paths, - ambiguity of paths for chroot processes, - file lookup and the access check are not atomic. Most of these issues are fixable (and fixed in the meantime), while the non-atomicity is not really an issue. Because struct vfsmount was not available to LSM hooks for computing pathnames from (dentry, vfsmount) pairs, the version of AppArmor posted last year used heuristics for rediscovering the vfsmounts associated with dentries -- and possibly the wrong ones. We are now passing the vfsmount objects through to all LSM hooks that compute pathnames, and so this heuristic is gone, and now we always use the appropriate vfsmount. The d_path patch already in the -mm tree
Re: [PATCH 6/7] [RFC] ds2760 battery driver
On 4/16/07, Anton Vorontsov <[EMAIL PROTECTED]> wrote: On Mon, Apr 16, 2007 at 12:14:27PM -0700, Matt Reimer wrote: > The shifts (<< 3 and >> 5) are just to get the bits reassembled in the > right positions. The multiplication by 5 and subtracting 1/8 is > because (AFAIK) we can't do floating point multiplication in the > kernel. I'm open to suggestions. Because we are in micro world now, divisions already replaced by multiplication. I.e. /* DS2760 reports voltage in units of 4.88mV, but the battery class * reports in units of uV, so convert by multiplying by 4880. */ di->voltage_raw = (di->raw[DS2760_VOLTAGE_MSB] << 3) | (di->raw[DS2760_VOLTAGE_LSB] >> 5); di->voltage_uV = di->voltage_raw * 4880; As a side effect, now we're not losing any precision. :-) That's a good way to solve the problem. :-) By the way. Matt, you're more familiar with ds2760 specs, could you enlighten me about "* 4" in this snippet? > acr[0] = (di->full_active_mAh * 4) >> 8; ^^^ > acr[1] = (di->full_active_mAh * 4) & 0xff; ^^^ > if (w1_ds2760_write(di->w1_dev, acr, > DS2760_CURRENT_ACCUM_MSB, 2) < 2) > printk(KERN_ERR "ACR reset failed\n"); The accumulated current register (acr) value is in units of 0.25 mAh, so we have to multiply by 4 to convert from units of 1 mAh to 0.25 mAh. Thanks for all your work on this Anton. Matt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] hpet: Enable hidden HPET on NVidia motherboards
On Mon, 16 Apr 2007, Andi Kleen wrote: Mikko Tiihonen <[EMAIL PROTECTED]> writes: It looks probable that most NVidia chipsets have the HPET address at 0x44. It might be possible to enable the HPET even if BIOS did not That seems like a dangerous assumption. If anything this needs to be keyed on specific PCI IDs. The patch contains a list of PCI IDs. Currently the CK804 and MCP55 have been verified to work. Other PCI IDs can be added if needed. And the way you coded a recursive PCI scan is just ... not nice. I actually was more worried that someone might complain that the pci scanning is copy & paste code from end of the same file. I did try to use the generic pci functions first but because they insist on enabling interrupts they cannot be used this early. And this code needs to be run before the timer initialization. If you want I can submit a separate patch to move the ... not nice pci scanning code to pci directory under some early_pci_scan(u32 *pci_ids, hook) function. The same code was already cut in i386/kernel/acpi/earlyquik.c, in x86_64/kernel/aperture.c and in x86_64/kernel/early-quirks.c. Moving the uglyness to a central place would at least hide it from the casual browser. Or would a global flag that the pci scanning code checks to see if locks should be used or not be better? initialize it properly by writing the wanted address there. Some other pci config space bits might need to be fiddled around too, most likely candidates are 0x74 bit 2 and 0xA3 bit ?. One or both of them have been identified to change in some motherboards when HPET is enabled/disabled in BIOS. Or just add a random generator and poke random bits? Should be roughly equivalent. Fair enough, that was just my written hope that some day someone might reverse engineer how HPET is enabled on NVidia chipsets. The patch does not try write to any registers so it should be safe? The code also properly checks that the memory area does not collide with any existing resource. I could of course add a check that there is a HPET at that address, but the hpet driver already checks it itself later and disabled itself it cannot see valid data. Also there should be done anything here without confirmation from Nvidia that HPET is actually supposed to work. Sometimes hardware is disabled by BIOS because it is seriously broken (there was at least one other chipset that could corrupt your flash if you force enabled HPET in some steppings) I hope someone has some secret contacts at NVidia because they have not been very open with their chipsets. I looked at LinuxBios and their NForce4 chipset code had just had commented out code that wrote to 0x44 register. So obviously something more is needed. Even Intel just posted code a while ago that allows enabling HPET from a quirk even if BIOS did not set it up properly. Does anyone know how to _really_ test if HPET works properly (from user space for example). I've just tested with busylooping gettimeofday while changing the clocksource and measured the speed difference. We could then change the quirk to point to instructions on how to test the HPET manually and then request to submit the PCI ID. -Mikko - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/