date:20070416

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-16 Thread Willy Tarreau

Hi Nick,

On Tue, Apr 17, 2007 at 06:29:54AM +0200, Nick Piggin wrote:
(...)
> And my scheduler for example cuts down the amount of policy code and
> code size significantly. I haven't looked at Con's ones for a while,
> but I believe they are also much more straightforward than mainline...
> 
> For example, let's say all else is equal between them, then why would
> we go with the O(logN) implementation rather than the O(1)?

Of course, if this is the case, the question will be raised. But as a
general rule, I don't see much potential in O(1) to finely tune scheduling
according to several criteria. In O(logN), you can adjust scheduling in
realtime at a very low cost. Better processing of varying priorities or
fork() comes to mind.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][BUG] Fix possible NULL pointer access in 8250 serial driver

2007-04-16 Thread Andrew Morton

On Tue, 17 Apr 2007 11:15:46 +0900 izumi <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> I encountered the following kernel panic. The cause of this problem was
> NULL pointer access in check_modem_status() in 8250.c. I confirmed
> this problem is fixed by the attached patch, but I don't know this
> is the correct fix.
> 
> sadc[4378]: NaT consumption 2216203124768 [1]
> Modules linked in: binfmt_misc dm_mirror dm_mod thermal processor fan
> container button sg e100 eepro100 mii ehci_hcd ohci_hcd
> 
> Pid: 4378, CPU 0, comm: sadc
> psr : 1210085a2010 ifs : 8289 ip : []
> Not tainted
> ip is at check_modem_status+0xf1/0x360
> unat:  pfs : 0289 rsc : 0003
> rnat: 8000cc18 bsps:  pr : 00aa6a99
> ldrs:  ccv :  fpsr: 0009804c8a70033f
> csd :  ssd : 
> b0 : a00100481fb0 b6 : a001004822e0 b7 : a00100477f20
> f6 : 1003e f7 : 0ffdba200
> f8 : 100018000 f9 : 10002a000
> f10 : 0fffdc8c0 f11 : 1003e
> r1 : a00100b9af40 r2 : 0008 r3 : a00100ad4e21
> r8 : 00bb r9 : 0001 r10 : 
> r11 : a00100ad4d58 r12 : e37b7df0 r13 : e37b
> r14 : 0001 r15 : 0018 r16 : a00100ad4d6c
> r17 :  r18 :  r19 : 
> r20 : a0010099bc88 r21 : 00bb r22 : 00bb
> r23 : c003fc0ff3fe r24 : c003fc00 r25 : 000ff3fe
> r26 : a001009b7ad0 r27 : 0001 r28 : a001009b7ad8
> r29 :  r30 : a001009b7ad0 r31 : a001009b7ad0
> 
> Call Trace:
> [] show_stack+0x40/0xa0
> sp=e37b7810 bsp=e37b1118
> [] show_regs+0x840/0x880
> sp=e37b79e0 bsp=e37b10c0
> [] die+0x1c0/0x2c0
> sp=e37b79e0 bsp=e37b1078
> [] die_if_kernel+0x50/0x80
> sp=e37b7a00 bsp=e37b1048
> [] ia64_fault+0x11e0/0x1300
> sp=e37b7a00 bsp=e37b0fe8
> [] ia64_leave_kernel+0x0/0x280
> sp=e37b7c20 bsp=e37b0fe8
> [] check_modem_status+0xf0/0x360
> sp=e37b7df0 bsp=e37b0fa0
> [] serial8250_get_mctrl+0x20/0xa0
> sp=e37b7df0 bsp=e37b0f80
> [] uart_read_proc+0x250/0x860
> sp=e37b7df0 bsp=e37b0ee0
> [] proc_file_read+0x1d0/0x4c0
> sp=e37b7e10 bsp=e37b0e80
> [] vfs_read+0x1b0/0x300
> sp=e37b7e20 bsp=e37b0e30
> [] sys_read+0x70/0xe0
> sp=e37b7e20 bsp=e37b0db0
> [] ia64_ret_from_syscall+0x0/0x20
> sp=e37b7e30 bsp=e37b0db0
> [] __kernel_syscall_via_break+0x0/0x20
> sp=e37b8000 bsp=e37b0db0
> 
> --- 
> a/drivers/serial/8250.c~fix-possible-null-pointer-access-in-8250-serial-driver
> +++ a/drivers/serial/8250.c
> @@ -1310,7 +1310,8 @@ static unsigned int check_modem_status(s
>  {
>   unsigned int status = serial_in(up, UART_MSR);
>  
> - if (status & UART_MSR_ANY_DELTA && up->ier & UART_IER_MSI) {
> + if (status & UART_MSR_ANY_DELTA && up->ier & UART_IER_MSI &&
> + up->port.info != NULL) {
>   if (status & UART_MSR_TERI)
>   up->port.icount.rng++;
>   if (status & UART_MSR_DDSR)
> _
> 

I'd imagine that other serial drivers might get upset having their
->get_mcrtl() called prior to being opened.  Perhaps we should be fixing
this in uart_read_proc()?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS (Completely Fair Scheduler), v2

2007-04-16 Thread Gene Heskett

On Tuesday 17 April 2007, Willy Tarreau wrote:
>Hi Gene,
>
>On Tue, Apr 17, 2007 at 12:53:56AM -0400, Gene Heskett wrote:
>> On Monday 16 April 2007, Ingo Molnar wrote:
>> >this is the second release of the CFS (Completely Fair Scheduler)
>> >patchset, against v2.6.21-rc7:
>> >
>> >   http://redhat.com/~mingo/cfs-scheduler/sched-cfs-v2.patch
>> >
>> >i'd like to thank everyone for the tremendous amount of feedback and
>> >testing the v1 patch got - i could hardly keep up with just reading the
>> >mails! Some of the stuff people addressed i couldnt implement yet, i
>> >mostly concentrated on bugs, regressions and debuggability.
>> >
>> >there's a fair amount of churn:
>> >
>> >   15 files changed, 456 insertions(+), 241 deletions(-)
>> >
>> >But it's an encouraging sign that there was no crash bug found in v1,
>> >all the bugs were related to scheduling-behavior details. The code was
>> >tested on 3 architectures so far: i686, x86_64 and ia64. Most of the
>> >code size increase in -v2 is due to debugging helpers, they'll be
>> >removed later. (The new /proc/sched_debug file can be used to see the
>> >fine details of CFS scheduling.)
>> >
>> >Changes since -v1:
>> >
>> > - make nice levels less starvable. (reported by Willy Tarreau)
>> >
>> > - fixed child-runs first. A /proc/sys/kernel/sched_child_runs_first
>> >   flag can be used to turn it on/off. (This might fix the Kaffeine bug
>> >   reported by S.Ça??lar Onur <)
>> >
>> > - changed SCHED_FAIR back to SCHED_NORMAL (suggested by Con Kolivas)
>> >
>> > - UP build fix. (reported by Gabriel C)
>> >
>> > - timer tick micro-optimization (Dmitry Adamushko)
>> >
>> > - preemption fix: sched_class->check_preempt_curr method to decide
>> >   whether to preempt after a wakeup (or at a timer tick). (Found via a
>> >   fairness-test-utility written for CFS by Mike Galbraith)
>> >
>> > - start forked children with neutral statistics instead of trying to
>> >   inherit them from the parent: Willy Tarreau reported that this
>> >   results in better behavior on extreme workloads, and it also
>> >   simplifies the code quite nicely. Removed sched_exit() and the
>> >   ->task_exit() methods.
>> >
>> > - make nice levels independent of the sched_granularity value
>> >
>> > - new /proc/sched_debug file listing runqueue details and the rbtree
>> >
>> > - new SCH-* fields in /proc//status to see scheduling details
>> >
>> > - new cpu-hog feature (off by default) and sysctl tunable to set it:
>> >   /proc/sys/kernel/sched_max_hog_history_ns tunable defaults to
>> >   0 (off). Positive values are meant the maximum 'memory' that the
>> >   scheduler has of CPU hogs.
>> >
>> > - various code cleanups
>> >
>> > - added more statistics temporarily: sum_exec_runtime,
>> >   sum_wait_runtime.
>> >
>> > - added -CFS-v2 to EXTRAVERSION
>> >
>> >as usual, any sort of feedback, bugreports, fixes and suggestions are
>> >more than welcome,
>> >
>> >Ingo
>>
>> This one (v2-rc2) is not a keeper I'm sorry to say, Ingo.  v2-rc0 was much
>> better.  Watching amanda run with htop, kmails composer is being subjected
>> to 5 to 10 second pauses, and htop says that gzip -best isn't getting more
>> that 15% of the cpu, and the /amandatapes drive is being written to in a
>> regular pattern that seems to be the cause of the pauses  according to
>> gkrellm, which also seems to track the size of the writes, and can show
>> anything from 4.3k to 54 megs as being written in one cycle of its screen
>> update.

Somewhat interesting to this, I have amanda doing a verify phase too.  During 
the verify phase (and while I was waiting for gmail to transmit this message, 
it took 30 minutes before it showed up on the list) I noted that when 
amrestore fired up, it, and its child tar were only taking about 20% of the 
cpu between them, and that /dev/hdd was showing a pretty steady 55 to 
75MB/sec being read.  As to what this tells us, I'm not going to hazard a 
guess because it wouldn't, this time of the night here in WV, USA, even be a 
SWAG.  Its coming up on 2am and the toothpicks holding my eyes open are 
sagging badly, making creaking noises even.

>Have you tried previous version with the fair-fork patch ? It might be
> possible that your workload is sensible to the fork()'s child getting much
> CPU upon startup.

Willy, I think that patch went by, and was followed by the v2-rc2 so fast that 
I never got a chance to try it with the v2-rc0 framework.  So I believe the 
answer there is probably no.  I never saw a problem with the v2-rc0, but Ingo 
shot me a message about it without enough detail that I could have tested for 
it.

FWIW, I've been using the CFQ I/O scheduler for quite a while, is it time I 
gave the AS or Deadline versions another check?  They are all built in but I 
don't know how to change the default on the fly, or even if it can be done.

>Ingo, maybe I'm saying something stupid, but in my userland scheduler, when
>new tasks are "forked", they are queued at the end of the run queue

Re: [patch] CFS (Completely Fair Scheduler), v2

2007-04-16 Thread Mike Galbraith

On Tue, 2007-04-17 at 07:25 +0200, Willy Tarreau wrote:

> Have you tried previous version with the fair-fork patch ? It might be 
> possible
> that your workload is sensible to the fork()'s child getting much CPU upon
> startup.

Dunno about that, but here's a possibly related datapoint.  I reported
to Ingo yesterday that I was sometimes losing control of my GUI (KDE)
under heavy IO.  I just reproduced it in mainline rc7.  If I start a
bonnie, and click around popping windows to the foreground, then poke
KDE's menu button, I may lose all GUI capability for a _very_ long time.
Here, with bonnie, that means until it gets past writing with putc, and
moves on to rewrite.  Ages.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [xfs-masters] Re: mm snapshot broken-out-2007-04-11-02-24.tar.gz uploaded

2007-04-16 Thread Timothy Shimmin



There's a couple of different ways I can see to fix the problem -
the first is to not reference the buffer in xlog_iodone() after
running the callbacks that may trigger it being freed. I'd prfer to
see if this fixes the problem before having to do more invasive
surgery.  Can you try the patch below to see if it fixes the
problem?

 fs/xfs/xfs_log.c |   11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

Index: 2.6.x-xfs-new/fs/xfs/xfs_log.c
===
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_log.c 2007-04-03 09:09:36.0 +1000
+++ 2.6.x-xfs-new/fs/xfs/xfs_log.c  2007-04-16 11:40:21.655306665 +1000
@@ -988,14 +988,11 @@ xlog_iodone(xfs_buf_t *bp)
} else if (iclog->ic_state & XLOG_STATE_IOERROR) {
aborted = XFS_LI_ABORTED;
}
+   /* log I/O is always issued ASYNC, so we should see that here */


I guess this is a left over because at a prior time
xlog_sync() took an extra flags param (which could have XFS_LOG_SYNC set)
which could do a SYNC write of the iclog.
IIRC, we took this extra param out because nobody was ever calling with
it set for xlog_sync().


+   WARN_ON(!(XFS_BUF_ISASYNC(bp)));
xlog_state_done_syncing(iclog, aborted);
-   if (!(XFS_BUF_ISASYNC(bp))) {
-   /*
-* Corresponding psema() will be done in bwrite().  If we don't
-* vsema() here, panic.
-*/
-   XFS_BUF_V_IODONESEMA(bp);
-   }
+   /* do not reference bp here - it may have been freed during unmount */
+
 }  /* xlog_iodone */

 /*




--Tim


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPU ordering with respect to krefs

2007-04-16 Thread Oliver Neukum

Am Donnerstag, 12. April 2007 08:27 schrieb Greg KH:
> On Mon, Apr 02, 2007 at 04:33:54PM +0200, Eric Dumazet wrote:
> > On Mon, 2 Apr 2007 14:47:59 +0200
> > Oliver Neukum <[EMAIL PROTECTED]> wrote:
> > 
> > > Hi,
> > > 
> > > some atomic operations are only atomic, not ordered. Thus a CPU is allowed
> > > to reorder memory references to an object to before the reference is
> > > obtained. This fixes it.
> > > 
> > >   Regards
> > >   Oliver
> > > Signed-off-by: Oliver Neukum <[EMAIL PROTECTED]>
> > > --
> > > 
> > > --- a/lib/kref.c  2007-04-02 14:40:40.0 +0200
> > > +++ b/lib/kref.c  2007-04-02 14:40:50.0 +0200
> > > @@ -21,6 +21,7 @@
> > >  void kref_init(struct kref *kref)
> > >  {
> > >   atomic_set(>refcount,1);
> > > + smp_mb();
> > >  }
> > 
> > I dont understand why smp_mb() is needed here, and not in
> > spinlock_init() for example.
> 
> I think, after reading the Documentation/memory-barriers.txt and
> Documentation/atomic_ops.txt documentation, that spin_lock_init() also
> needs this kind of memory barrier.

spin_lock_init() is not an atomic operation.
In principle, the issue exists. However, the whole issue is a bit of a grey
area. You might take the viewpoint that upping the refcount needs to be
under lock, which needs to take care of ordering issues in case of krefs.
A new spinlock has the same issue. You need to be careful making them
accessible to other CPUs.

If you take code like:

static int producer()
{
...
data = kmalloc(...);
spin_lock_init(>lock);
data->value = some_value;
data->next = global_pointer;

global_pointer = data;
...
}

You have an ordering bug anyway, which you can't fix in spin_lock_init().

Regards
Oliver
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

slab allocators: Remove obsolete SLAB_MUST_HWCACHE_ALIGN

2007-04-16 Thread Christoph Lameter

The flag SLAB_MUST_HWCACHE_ALIGN is

1. Never checked by SLAB at all.

2. A duplicate of SLAB_HWCACHE_ALIGN for SLUB

3. Fulfills the role of SLAB_HWCACHE_ALIGN for SLOB.

The only remaining use is in sparc64 and ppc64 and their use there
reflects some earlier role that the slab flag once may have had. If
its specified then SLAB_HWCACHE_ALIGN is also specified.

The flag is confusing, inconsistent and has no purpose.

Remove it.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.21-rc6/include/linux/slab.h
===
--- linux-2.6.21-rc6.orig/include/linux/slab.h  2007-04-16 21:55:03.0 
-0700
+++ linux-2.6.21-rc6/include/linux/slab.h   2007-04-16 21:55:10.0 
-0700
@@ -26,7 +26,6 @@ typedef struct kmem_cache kmem_cache_t _
 #define SLAB_POISON0x0800UL/* DEBUG: Poison objects */
 #define SLAB_HWCACHE_ALIGN 0x2000UL/* Align objs on cache lines */
 #define SLAB_CACHE_DMA 0x4000UL/* Use GFP_DMA memory */
-#define SLAB_MUST_HWCACHE_ALIGN0x8000UL/* Force alignment even 
if debuggin is active */
 #define SLAB_STORE_USER0x0001UL/* DEBUG: Store the 
last owner for bug hunting */
 #define SLAB_RECLAIM_ACCOUNT   0x0002UL/* Objects are reclaimable */
 #define SLAB_PANIC 0x0004UL/* Panic if kmem_cache_create() 
fails */
Index: linux-2.6.21-rc6/mm/slab.c
===
--- linux-2.6.21-rc6.orig/mm/slab.c 2007-04-16 21:55:16.0 -0700
+++ linux-2.6.21-rc6/mm/slab.c  2007-04-16 21:55:33.0 -0700
@@ -175,12 +175,12 @@
 # define CREATE_MASK   (SLAB_DEBUG_INITIAL | SLAB_RED_ZONE | \
 SLAB_POISON | SLAB_HWCACHE_ALIGN | \
 SLAB_CACHE_DMA | \
-SLAB_MUST_HWCACHE_ALIGN | SLAB_STORE_USER | \
+SLAB_STORE_USER | \
 SLAB_RECLAIM_ACCOUNT | SLAB_PANIC | \
 SLAB_DESTROY_BY_RCU | SLAB_MEM_SPREAD)
 #else
 # define CREATE_MASK   (SLAB_HWCACHE_ALIGN | \
-SLAB_CACHE_DMA | SLAB_MUST_HWCACHE_ALIGN | \
+SLAB_CACHE_DMA | \
 SLAB_RECLAIM_ACCOUNT | SLAB_PANIC | \
 SLAB_DESTROY_BY_RCU | SLAB_MEM_SPREAD)
 #endif
Index: linux-2.6.21-rc6/mm/slub.c
===
--- linux-2.6.21-rc6.orig/mm/slub.c 2007-04-16 21:55:38.0 -0700
+++ linux-2.6.21-rc6/mm/slub.c  2007-04-16 21:56:07.0 -0700
@@ -1500,7 +1500,7 @@ static int calculate_order(int size)
 static unsigned long calculate_alignment(unsigned long flags,
unsigned long align)
 {
-   if (flags & (SLAB_MUST_HWCACHE_ALIGN | SLAB_HWCACHE_ALIGN))
+   if (flags & SLAB_HWCACHE_ALIGN)
return max_t(unsigned long, align, L1_CACHE_BYTES);
 
if (align < ARCH_SLAB_MINALIGN)
@@ -3083,8 +3083,7 @@ SLAB_ATTR(reclaim_account);
 
 static ssize_t hwcache_align_show(struct kmem_cache *s, char *buf)
 {
-   return sprintf(buf, "%d\n", !!(s->flags &
-   (SLAB_HWCACHE_ALIGN|SLAB_MUST_HWCACHE_ALIGN)));
+   return sprintf(buf, "%d\n", !!(s->flags & SLAB_HWCACHE_ALIGN));
 }
 SLAB_ATTR_RO(hwcache_align);
 
Index: linux-2.6.21-rc6/arch/powerpc/mm/hugetlbpage.c
===
--- linux-2.6.21-rc6.orig/arch/powerpc/mm/hugetlbpage.c 2007-04-16 
21:58:53.0 -0700
+++ linux-2.6.21-rc6/arch/powerpc/mm/hugetlbpage.c  2007-04-16 
21:59:02.0 -0700
@@ -1063,8 +1063,7 @@ static int __init hugetlbpage_init(void)
huge_pgtable_cache = kmem_cache_create("hugepte_cache",
   HUGEPTE_TABLE_SIZE,
   HUGEPTE_TABLE_SIZE,
-  SLAB_HWCACHE_ALIGN |
-  SLAB_MUST_HWCACHE_ALIGN,
+  SLAB_HWCACHE_ALIGN,
   zero_ctor, NULL);
if (! huge_pgtable_cache)
panic("hugetlbpage_init(): could not create hugepte cache\n");
Index: linux-2.6.21-rc6/arch/powerpc/mm/init_64.c
===
--- linux-2.6.21-rc6.orig/arch/powerpc/mm/init_64.c 2007-04-16 
21:59:08.0 -0700
+++ linux-2.6.21-rc6/arch/powerpc/mm/init_64.c  2007-04-16 21:59:19.0 
-0700
@@ -183,8 +183,7 @@ void pgtable_cache_init(void)
"for size: %08x...\n", name, i, size);
pgtable_cache[i] = kmem_cache_create(name,
 size, size,
-SLAB_HWCACHE_ALIGN |
-

Re: [patch] CFS (Completely Fair Scheduler), v2

2007-04-16 Thread Willy Tarreau

Hi Gene,

On Tue, Apr 17, 2007 at 12:53:56AM -0400, Gene Heskett wrote:
> On Monday 16 April 2007, Ingo Molnar wrote:
> >this is the second release of the CFS (Completely Fair Scheduler)
> >patchset, against v2.6.21-rc7:
> >
> >   http://redhat.com/~mingo/cfs-scheduler/sched-cfs-v2.patch
> >
> >i'd like to thank everyone for the tremendous amount of feedback and
> >testing the v1 patch got - i could hardly keep up with just reading the
> >mails! Some of the stuff people addressed i couldnt implement yet, i
> >mostly concentrated on bugs, regressions and debuggability.
> >
> >there's a fair amount of churn:
> >
> >   15 files changed, 456 insertions(+), 241 deletions(-)
> >
> >But it's an encouraging sign that there was no crash bug found in v1,
> >all the bugs were related to scheduling-behavior details. The code was
> >tested on 3 architectures so far: i686, x86_64 and ia64. Most of the
> >code size increase in -v2 is due to debugging helpers, they'll be
> >removed later. (The new /proc/sched_debug file can be used to see the
> >fine details of CFS scheduling.)
> >
> >Changes since -v1:
> >
> > - make nice levels less starvable. (reported by Willy Tarreau)
> >
> > - fixed child-runs first. A /proc/sys/kernel/sched_child_runs_first
> >   flag can be used to turn it on/off. (This might fix the Kaffeine bug
> >   reported by S.Ça??lar Onur <)
> >
> > - changed SCHED_FAIR back to SCHED_NORMAL (suggested by Con Kolivas)
> >
> > - UP build fix. (reported by Gabriel C)
> >
> > - timer tick micro-optimization (Dmitry Adamushko)
> >
> > - preemption fix: sched_class->check_preempt_curr method to decide
> >   whether to preempt after a wakeup (or at a timer tick). (Found via a
> >   fairness-test-utility written for CFS by Mike Galbraith)
> >
> > - start forked children with neutral statistics instead of trying to
> >   inherit them from the parent: Willy Tarreau reported that this
> >   results in better behavior on extreme workloads, and it also
> >   simplifies the code quite nicely. Removed sched_exit() and the
> >   ->task_exit() methods.
> >
> > - make nice levels independent of the sched_granularity value
> >
> > - new /proc/sched_debug file listing runqueue details and the rbtree
> >
> > - new SCH-* fields in /proc//status to see scheduling details
> >
> > - new cpu-hog feature (off by default) and sysctl tunable to set it:
> >   /proc/sys/kernel/sched_max_hog_history_ns tunable defaults to
> >   0 (off). Positive values are meant the maximum 'memory' that the
> >   scheduler has of CPU hogs.
> >
> > - various code cleanups
> >
> > - added more statistics temporarily: sum_exec_runtime,
> >   sum_wait_runtime.
> >
> > - added -CFS-v2 to EXTRAVERSION
> >
> >as usual, any sort of feedback, bugreports, fixes and suggestions are
> >more than welcome,
> >
> > Ingo
> 
> This one (v2-rc2) is not a keeper I'm sorry to say, Ingo.  v2-rc0 was much 
> better.  Watching amanda run with htop, kmails composer is being subjected to 
> 5 to 10 second pauses, and htop says that gzip -best isn't getting more that 
> 15% of the cpu, and the /amandatapes drive is being written to in a regular 
> pattern that seems to be the cause of the pauses  according to gkrellm, which 
> also seems to track the size of the writes, and can show anything from 4.3k 
> to 54 megs as being written in one cycle of its screen update.

Have you tried previous version with the fair-fork patch ? It might be possible
that your workload is sensible to the fork()'s child getting much CPU upon
startup.

Ingo, maybe I'm saying something stupid, but in my userland scheduler, when
new tasks are "forked", they are queued at the end of the run queue with a
fixed priority. In our case, this would translate into assigning them the
same prio and timeslice as their parent, but queuing them at the end so that
they don't make existing tasks starve during huge fork() loads.

I don't know how that would be possible (nor if that would help in anything),
but I found it was a good compromise over sharing the timeslice with the
parent. Perhaps we should have some absolute timeslice and some relative
timeslice (eg: X percent of total time divided by the number of tasks) ?

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-16 Thread Neil Brown

On Monday April 16, [EMAIL PROTECTED] wrote:
> 
> cfq_dispatch_insert() was called with rq == 0. This one is getting really
> annoying... and md is involved again (RAID0 this time.)

Yeah... weird.
RAID0 is so light-weight and so different from RAID1 or RAID5 that I
feel fairly safe concluding that the problem isn't in or near md.
But that doesn't help you.

This really feels like a locking problem.

The problem occurs when ->next_rq is NULL, but ->sort_list.rb_node is
not NULL.  That happens plenty of times in the code (particularly as
the first request is inserted) but always under ->queue_lock so it
should never be visible to cfq_dispatch_insert..

Except that drivers/scsi/ide-scsi.c:idescsi_eh_reset calls
elv_next_request which could ultimately call __cfq_dispatch_requests
without taking ->queue_lock (that I can see).  But you probably aren't
using ide-scsi (does anyone?).

Given that interrupts are always disabled when queue_lock is taken, it
might be useful to add
   WARN_ON(!irqs_disabled());
every time ->next_rq is set.
Something like the following.

It might show something useful if we are lucky.

NeilBrown

diff .prev/block/cfq-iosched.c ./block/cfq-iosched.c
--- .prev/block/cfq-iosched.c   2007-04-17 15:01:36.0 +1000
+++ ./block/cfq-iosched.c   2007-04-17 15:02:25.0 +1000
@@ -628,6 +628,7 @@ static void cfq_remove_request(struct re
 {
struct cfq_queue *cfqq = RQ_CFQQ(rq);

+   BUG_ON(!irqs_disabled());
if (cfqq->next_rq == rq)
cfqq->next_rq = cfq_find_next_rq(cfqq->cfqd, cfqq, rq);

@@ -1810,6 +1811,7 @@ cfq_rq_enqueued(struct cfq_data *cfqd, s
/*
 * check if this request is a better next-serve candidate)) {
 */
+   BUG_ON(!irqs_disabled());
cfqq->next_rq = cfq_choose_req(cfqd, cfqq->next_rq, rq);
BUG_ON(!cfqq->next_rq);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory Allocation

2007-04-16 Thread Robert Hancock


Brian D. McGrew wrote:

Good evening gents!

I need some help in allocating memory and understanding how the system
allocates memory with physical versus virtual page tables.  Please
consider the following snippet of code.  Please, no wisecracks about bad
code; it was written in 30 seconds in haste :-)


(snip)


My test machine is a Dell Precision 490 with dual 5140 processors and
3GB of RAM.  If I reduced kMaxSize to (2048 * 2048 * 236) is works.
However, I need to allocate an array of char that is (2048 * 2048 * 256)
and maybe even as large at (2048 * 2048 * 512).

Obviously I have enough physical memory in the box to do this.  However,
I suspect that I'm running out of page table entries.  Please, correct
me if I'm wrong; but if I allocate (2048 * 2048 * 236) it work.  When I


Pretty sure you're wrong.


increment to 256 or 512 it fails and it is my suspicion that I just
don't have enough more in kernel memory to allocate this much memory in
user space.  


Are you using a 32-bit kernel? If so, most likely you're hitting a limit 
of the address space layout - there's just not enough room in the 
address space for an allocation of this size.




Because of a piece of 3rd party hardware, I'm forced to run the kernel
in the 4GB memory model.  What I need to be able to do is allocate an
array of char (2048 * 2048 * (up to 512)) in user space *** AND *** I
need the addresses that I get back to be contiguous, that's just the way
my 3rd party hardware works.

I'm inclined to believe that this in not specifically a Linux problem
but maybe an architecture problem???  But maybe there is some kind of
work around in the kernel for it???  I'd find it hard to believe that
I'm the first one that ever needed to use this much memory.

I ran this same code on two difference Macs.  One of them a Powerbook G4
with 4GB of RAM and it was successful.  The other was a Macbook Pro with
4GB of RAM and it failed.  Both running OS 10.4.9.  And of course it
runs just lovely on my Sun workstation with Solaris.  Thus, I'm thinking
it's an Intel/X86 issue!

How the heck to I get past this problem in Linux on the X86 plateform???


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHSET #master] sysfs: make sysfs disconnect immediately on deletion, take 2

2007-04-16 Thread Tejun Heo

Hello, Maneesh.

Maneesh Soni wrote:
> I started looking at these patches and parallely also did some testing on a 
> 8 CPU system. I am using the patches from Greg's tree at
> http://www.kernel.org/pub/scm/linux/kernel/git/gregkh/patches.git/
> 
> I ran following loops parallelly
> 
> # while true; do insmod drivers/net/dummy.ko; sleep 1;rmmod dummy; done
> # while true; do find /sys/class/net/dummy0 | xargs cat; sleep 1; done
> # while true; do umount /sys; sleep 1; mount -t sysfs none /sys; done
> # while true; do find /sys | xargs cat > /dev/null; sleep 1; done
> 
> and got the following oops
> 
> Unable to handle kernel NULL pointer dereference at 004c RIP:
>  [] simple_unlink+0x14/0x5c

Eeek... I'll try to replicate and track down the bug here.  FWIW, SCSI
also oopses if udev is running due to a bug in SCSI open/close handling.

Thanks for testing.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Major qla2xxx regression on sparc64

2007-04-16 Thread David Miller

From: Andrew Vasquez <[EMAIL PROTECTED]>
Date: Mon, 16 Apr 2007 19:41:07 -0700

> That verbiage sounds fine -- so would you consider the previous patch
> I submitted (with module parameter) along with the wording above?

Yes, that sounds fine.

> I'm in transit for a redeye to NY so I won't be able to modify the
> patch, If you would be amenable to the above, Seokmann, could you
> rework the patch?

Thanks guys.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Blackfin: blackfin on-chip SPI controller driver

2007-04-16 Thread Wu, Bryan

On Mon, 2007-04-16 at 18:31 -0800, David Brownell wrote:
> Cleaning out some of my pending-reviews queue ... after you address
> these comments I think what I'd like to do is sign off on one clean
> patch, rather than initial-plus-cleanups.
> 
> 

Thanks a lot, David. We will try to cleanup the code and most issues
pointed out in your review.

> On Monday 05 March 2007 2:41 am, Wu, Bryan wrote:
> 
> > --- linux-2.6.orig/drivers/spi/Kconfig  2007-03-01 11:33:07.0 
> > +0800
> > +++ linux-2.6/drivers/spi/Kconfig   2007-03-01 11:40:22.0 +0800
> 
> I'm adjusting this to address the later patches you sent.
> 
> One global comment I'll make, just in case -- please make
> sure all your line-start indents only include tabs, and
> there's no space-at-end-of-line stuff going on, or lines
> wrapping past column 80.
> 
> I did this review in KMail, which doesn't highlight such
> minor errors; and I suspect you're mostly OK, but for a
> new driver there's no reason not to be 100% OK in those
> particular respects!  (And I *did* notice one of your
> cleanup patches clearly adding tabs-then-spaces indents.)
> 

Yes, I sent out a coding style incremental patch appending in -mm tree.
Should I send out a new patch including the coding style clean up and
code updated according to this review, or still submit incremental
patches to Andrew?

> 
> > @@ -156,7 +156,11 @@
> >  #
> >  # Add new SPI protocol masters in alphabetical order above this line
> >  #
> > -
> > +config SPI_BFIN
> > +   tristate "SPI controller driver for ADI Blackfin5xx"
> > +   depends on SPI_MASTER && BFIN
> > +   help
> > + This is the SPI controller master driver for Blackfin 5xx processor.
> 
> Please put this in Kconfig up with the other SPI controller drivers, in
> alphabetical order.  Just like the comment says.
> 
> Likewise, please add it to the Makefile in alphabetical order.
> 

Got it, it should be followed.

> 
> > --- /dev/null   1970-01-01 00:00:00.0 +
> > +++ linux-2.6/drivers/spi/spi_bfin5xx.c 2007-03-01 11:40:22.0 
> > +0800
> 
> > +#ifdef DEBUG
> > +#define ASSERT(expr) \
> > +   if (!(expr)) { \
> > +   printk(KERN_DEBUG "assertion failed! %s[%d]: %s\n", \
> > +  __FUNCTION__, __LINE__, #expr); \
> > +   panic(KERN_DEBUG "%s", __FUNCTION__); \
> 
> Seems like either WARN_ON(expr) or BUG_ON(expr) will be better.
> The general rule of BUG variants is: don't, unless the system
> really can't continue operating.  (I see a later patch removed
> this entirely, good.
> 
> 
Yes, we are trying to use kernel generic BUG_ON and WARN_ON to replace
our own assert function. I fixed this in other code and obviously it was
missed in this driver patch. 

> > +   }
> > +#else
> > +#define ASSERT(expr)
> > +#endif
> > +
> > +#define IS_DMA_ALIGNED(x) (((u32)(x)&0x07)==0)
> > +
> > +#define DEFINE_SPI_REG(reg, off) \
> > +static inline u16 read_##reg(void) \
> > +{ return *(volatile unsigned short*)(SPI0_REGBASE + off); } \
> > +static inline void write_##reg(u16 v) \
> > +{*(volatile unsigned short*)(SPI0_REGBASE + off) = v;\
> > + SSYNC();}
> 
> These should be readw() and writew() or similar... also, I can't tell
> what SSYNC() does, but it sure looks like something that shouldn't be
> hidden like that.  I/O memory should be mapped such that writes don't
> get re-ordered.  And flushing any write buffer should not be forced in
> such low-level accessors ... if it's needed, it should be done at the
> relevant points in the driver.  (Which you seem to do in a few places
> below.  The duplication is undesirable.)
> 
> 
> > +
> > +DEFINE_SPI_REG(CTRL, 0x00)
> 
> ... this particular style of register accessor is not generally used in Linux.
> The typical style is
> 
>   u16 value = __raw_readw(SPI0_REGBASE + SPI_CTRL)
>   __raw_writew(SPI0_REGBASE + SPI_CTRL, value);
> 
> or wrapped in macros so spi_readw(CTRL) and spi_writew(CTRL, value) work.
> 
> Of course, SPI1/SPI2/etc should be supported too ... so it's common to have
> those take a pointer to some controller struct with a "void __iomem *regs"
> pointer to the rgisters for that instance.  spi_readw(master, CTRL) etc.
>  
> 
> > +#define START_STATE ((void*)0)
> > +#define RUNNING_STATE ((void*)1)
> > +#define DONE_STATE ((void*)2)
> > +#define ERROR_STATE ((void*)-1)
> 
> Normally states would be represented by enum values, which among other
> things supports "switch (state) { ... }" state machine code.  This driver
> is full of uncommon idioms, which will make it harder for most kernel
> developers to dive in and help.
> 
> Even if you have a style guide internal to Analog which says to do things
> this way ... don't.
> 
> 

Apparently, the driver author Luke wrote this driver based on
drivers/spi/pxa2xx_spi.c. These things are all from pxa2xx_spi.c driver.
I will update our driver according to your comments.

> > +
> > +#define QUEUE_RUNNING 0
> > +#define QUEUE_STOPPED 1
>

Re: [patch] CFS (Completely Fair Scheduler), v2

2007-04-16 Thread Gene Heskett

On Monday 16 April 2007, Ingo Molnar wrote:
>this is the second release of the CFS (Completely Fair Scheduler)
>patchset, against v2.6.21-rc7:
>
>   http://redhat.com/~mingo/cfs-scheduler/sched-cfs-v2.patch
>
>i'd like to thank everyone for the tremendous amount of feedback and
>testing the v1 patch got - i could hardly keep up with just reading the
>mails! Some of the stuff people addressed i couldnt implement yet, i
>mostly concentrated on bugs, regressions and debuggability.
>
>there's a fair amount of churn:
>
>   15 files changed, 456 insertions(+), 241 deletions(-)
>
>But it's an encouraging sign that there was no crash bug found in v1,
>all the bugs were related to scheduling-behavior details. The code was
>tested on 3 architectures so far: i686, x86_64 and ia64. Most of the
>code size increase in -v2 is due to debugging helpers, they'll be
>removed later. (The new /proc/sched_debug file can be used to see the
>fine details of CFS scheduling.)
>
>Changes since -v1:
>
> - make nice levels less starvable. (reported by Willy Tarreau)
>
> - fixed child-runs first. A /proc/sys/kernel/sched_child_runs_first
>   flag can be used to turn it on/off. (This might fix the Kaffeine bug
>   reported by S.Çağlar Onur <)
>
> - changed SCHED_FAIR back to SCHED_NORMAL (suggested by Con Kolivas)
>
> - UP build fix. (reported by Gabriel C)
>
> - timer tick micro-optimization (Dmitry Adamushko)
>
> - preemption fix: sched_class->check_preempt_curr method to decide
>   whether to preempt after a wakeup (or at a timer tick). (Found via a
>   fairness-test-utility written for CFS by Mike Galbraith)
>
> - start forked children with neutral statistics instead of trying to
>   inherit them from the parent: Willy Tarreau reported that this
>   results in better behavior on extreme workloads, and it also
>   simplifies the code quite nicely. Removed sched_exit() and the
>   ->task_exit() methods.
>
> - make nice levels independent of the sched_granularity value
>
> - new /proc/sched_debug file listing runqueue details and the rbtree
>
> - new SCH-* fields in /proc//status to see scheduling details
>
> - new cpu-hog feature (off by default) and sysctl tunable to set it:
>   /proc/sys/kernel/sched_max_hog_history_ns tunable defaults to
>   0 (off). Positive values are meant the maximum 'memory' that the
>   scheduler has of CPU hogs.
>
> - various code cleanups
>
> - added more statistics temporarily: sum_exec_runtime,
>   sum_wait_runtime.
>
> - added -CFS-v2 to EXTRAVERSION
>
>as usual, any sort of feedback, bugreports, fixes and suggestions are
>more than welcome,
>
>   Ingo

This one (v2-rc2) is not a keeper I'm sorry to say, Ingo.  v2-rc0 was much 
better.  Watching amanda run with htop, kmails composer is being subjected to 
5 to 10 second pauses, and htop says that gzip -best isn't getting more that 
15% of the cpu, and the /amandatapes drive is being written to in a regular 
pattern that seems to be the cause of the pauses  according to gkrellm, which 
also seems to track the size of the writes, and can show anything from 4.3k 
to 54 megs as being written in one cycle of its screen update.

Normally hdd will fire up and take it at about 40+M/second steady till its 
done when there is a file ready to write even if its a 7GB file.  And I can 
type right on during the disk i/o.  But not now.

In short, I seem to be heavily I/O bound.  But when the write to /dev/hdd3 is 
done, then gzip -best pops right up to 90% plus cpu and I get my machine 
back.

In between file writes I checked the drives speed with hdparm:

[EMAIL PROTECTED] ~]# hdparm -Tt /dev/hdd

/dev/hdd:
 Timing cached reads:   856 MB in  2.01 seconds = 426.15 MB/sec
 Timing buffered disk reads:  222 MB in  3.01 seconds =  73.68 MB/sec

That's not too shabby, and obviously dma is active at least for the reading.

gzip -best was running while this was executing. So I think the drive is fine 
and the scheduling is whats funkity.  Sorry.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
  After they got rid of capital punishment, they had to hang twice
  as many people as before.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

floppy.ko

2007-04-16 Thread Gene Heskett

Greetings everybody;

At some point in the last, say 6 months or so, some patches have been done to 
the floppy.c area of the tree, and ever since, I have not been able to build 
the driver in without wasting around a minute during the bootup with lags and 
squawks about fd1 showing up in the boot trace on screen, but if I go look, 
its fd0 that's being pounded on by the driver, mainly bitching about not 
being able to read the first sector, something it repeats several times, like 
4 or 5.

I have the usual fd0, a 3.5" 1.44 drive, and fd1, a 5.25" 720k drive in this 
machine, both are enabled in the bios with the correct types being set there.

If I insert a disk, and attempt to mount it, the correct lights come on 
according to what I typed, but I have had a hell of a time trying to get it 
to write good images of a legacy machines disk format using dd, from files 
that I can read with khexedit, and I know are correct from that inspection.

The only use its getting these days is in the coco/os9 formats, read and 
written only by dd and some specialty tools from an os9 kit called toolshed, 
AFAIK.

Built as a module, then modprobed for use, I don't recall seeing this problem.

Is this fixable, or is it that I just don't know how to handle this newer 
code?

The currently running kernel, 2.6.21-rc7-CFS-v2 has it built in and it gave me 
static while booting with no disk in either drive.  Naming fd1, while banging 
on fd0 according to the access leds on the drives.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Nobody wants constructive criticism.  It's all we can do to put up with
constructive praise.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-16 Thread Nick Piggin

On Tue, Apr 17, 2007 at 02:25:39PM +1000, Peter Williams wrote:
> Nick Piggin wrote:
> >On Mon, Apr 16, 2007 at 04:10:59PM -0700, Michael K. Edwards wrote:
> >>On 4/16/07, Peter Williams <[EMAIL PROTECTED]> wrote:
> >>>Note that I talk of run queues
> >>>not CPUs as I think a shift to multiple CPUs per run queue may be a good
> >>>idea.
> >>This observation of Peter's is the best thing to come out of this
> >>whole foofaraw.  Looking at what's happening in CPU-land, I think it's
> >>going to be necessary, within a couple of years, to replace the whole
> >>idea of "CPU scheduling" with "run queue scheduling" across a complex,
> >>possibly dynamic mix of CPU-ish resources.  Ergo, there's not much
> >>point in churning the mainline scheduler through a design that isn't
> >>significantly more flexible than any of those now under discussion.
> >
> >Why? If you do that, then your load balancer just becomes less flexible
> >because it is harder to have tasks run on one or the other.
> >
> >You can have single-runqueue-per-domain behaviour (or close to) just by
> >relaxing all restrictions on idle load balancing within that domain. It
> >is harder to go the other way and place any per-cpu affinity or
> >restirctions with multiple cpus on a single runqueue.
> 
> Allowing N (where N can be one or greater) CPUs per run queue actually 
> increases flexibility as you can still set N to 1 to get the current 
> behaviour.

But you add extra code for that on top of what we have, and are also
prevented from making per-cpu assumptions.

And you can get N CPUs per runqueue behaviour by having them in a domain
with no restrictions on idle balancing. So where does your increased
flexibilty come from?

> One advantage of allowing multiple CPUs per run queue would be at the 
> smaller end of the system scale i.e. a PC with a single hyper threading 
> chip (i.e. 2 CPUs) would not need to worry about load balancing at all 
> if both CPUs used the one runqueue and all the nasty side effects that 
> come with hyper threading would be minimized at the same time.

I don't know about that -- the current load balancer already minimises
the nasty multi threading effects. SMT is very important for IBM's chips
for example, and they've never had any problem with that side of it
since it was introduced and bugs ironed out (at least, none that I've
heard).


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] [ALSA] Add first generation macbook subsystem ID

2007-04-16 Thread bainonline

From: Abhijit Bhopatkar <[EMAIL PROTECTED]>

First generation MacBooks were getting ignored by sigmatel drivers
and wrongly being identified as MACMINI. This patch makes them
identify as MACBOOK.

Signed-off-by: Abhijit Bhopatkar <[EMAIL PROTECTED]>
---
 sound/pci/hda/patch_sigmatel.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/sound/pci/hda/patch_sigmatel.c b/sound/pci/hda/patch_sigmatel.c
index c94291b..cb99df5 100644
--- a/sound/pci/hda/patch_sigmatel.c
+++ b/sound/pci/hda/patch_sigmatel.c
@@ -1905,6 +1905,9 @@ static int patch_stac922x(struct hda_codec *codec)
 */
printk(KERN_INFO "hda_codec: STAC922x, Apple subsys_id=%x\n", 
codec->subsystem_id);
switch (codec->subsystem_id) {
+   case 0x106b0a00: /* MacBook First generatoin */
+   spec->board_config = STAC_MACBOOK;
+   break;
case 0x106b0200: /* MacBook Pro first generation */
spec->board_config = STAC_MACBOOK_PRO_V1;
break;
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-16 Thread Nick Piggin

On Tue, Apr 17, 2007 at 02:17:22PM +1000, Peter Williams wrote:
> Nick Piggin wrote:
> >On Tue, Apr 17, 2007 at 04:29:01AM +0200, Mike Galbraith wrote:
> >>On Tue, 2007-04-17 at 10:06 +1000, Peter Williams wrote:
> >>>Mike Galbraith wrote:
> Demystify what?   The casual observer need only read either your attempt
> at writing a scheduler, or my attempts at fixing the one we have, to see
> that it was high time for someone with the necessary skills to step in.
> >>>Make that "someone with the necessary clout".
> >>No, I was brutally honest to both of us, but quite correct.
> >>
> Now progress can happen, which was _not_ happening before.
> 
> >>>This is true.
> >>Yup, and progress _is_ happening now, quite rapidly.
> >
> >Progress as in progress on Ingo's scheduler. I still don't know how we'd
> >decide when to replace the mainline scheduler or with what.
> >
> >I don't think we can say Ingo's is better than the alternatives, can we?
> >If there is some kind of bakeoff, then I'd like one of Con's designs to
> >be involved, and mine, and Peter's...
> 
> I myself was thinking of this as the chance for a much needed 
> simplification of the scheduling code and if this can be done with the 
> result being "reasonable" it then gives us the basis on which to propose 
> improvements based on the ideas of others such as you mention.
> 
> As the size of the cpusched indicates, trying to evaluate alternative 
> proposals based on the current O(1) scheduler is fraught.  Hopefully, 

I don't know why. The problem is that you can't really evaluate good
proposals by looking at the code (you can say that one is bad, ie. the
current one, which has a huge amount of temporal complexity and is
explicitly unfair), but it is pretty hard to say one behaves well.

And my scheduler for example cuts down the amount of policy code and
code size significantly. I haven't looked at Con's ones for a while,
but I believe they are also much more straightforward than mainline...

For example, let's say all else is equal between them, then why would
we go with the O(logN) implementation rather than the O(1)?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-16 Thread Peter Williams


Nick Piggin wrote:

On Mon, Apr 16, 2007 at 04:10:59PM -0700, Michael K. Edwards wrote:

On 4/16/07, Peter Williams <[EMAIL PROTECTED]> wrote:

Note that I talk of run queues
not CPUs as I think a shift to multiple CPUs per run queue may be a good
idea.

This observation of Peter's is the best thing to come out of this
whole foofaraw.  Looking at what's happening in CPU-land, I think it's
going to be necessary, within a couple of years, to replace the whole
idea of "CPU scheduling" with "run queue scheduling" across a complex,
possibly dynamic mix of CPU-ish resources.  Ergo, there's not much
point in churning the mainline scheduler through a design that isn't
significantly more flexible than any of those now under discussion.


Why? If you do that, then your load balancer just becomes less flexible
because it is harder to have tasks run on one or the other.

You can have single-runqueue-per-domain behaviour (or close to) just by
relaxing all restrictions on idle load balancing within that domain. It
is harder to go the other way and place any per-cpu affinity or
restirctions with multiple cpus on a single runqueue.


Allowing N (where N can be one or greater) CPUs per run queue actually 
increases flexibility as you can still set N to 1 to get the current 
behaviour.


One advantage of allowing multiple CPUs per run queue would be at the 
smaller end of the system scale i.e. a PC with a single hyper threading 
chip (i.e. 2 CPUs) would not need to worry about load balancing at all 
if both CPUs used the one runqueue and all the nasty side effects that 
come with hyper threading would be minimized at the same time.


Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely FairScheduler [CFS]

2007-04-16 Thread David Lang


On Tue, 17 Apr 2007, Mike Galbraith wrote:


Subject: Re: [Announce] [patch] Modular Scheduler Core and Completely
FairScheduler [CFS]

On Tue, 2007-04-17 at 05:40 +0200, Nick Piggin wrote:

On Tue, Apr 17, 2007 at 04:29:01AM +0200, Mike Galbraith wrote:



Yup, and progress _is_ happening now, quite rapidly.


Progress as in progress on Ingo's scheduler. I still don't know how we'd
decide when to replace the mainline scheduler or with what.

I don't think we can say Ingo's is better than the alternatives, can we?


No, that would require massive performance testing of all alternatives.


If there is some kind of bakeoff, then I'd like one of Con's designs to
be involved, and mine, and Peter's...


The trouble with a bakeoff is that it's pretty darn hard to get people
to test in the first place, and then comes weighting the subjective and
hard performance numbers.  If they're close in numbers, do you go with
the one which starts the least flamewars or what?


it's especially hard if the people doing the testing need to find the latest 
patch and apply it.


even having a compile-time option to switch between them at least means that the 
testers can have confidence that the various patches haven't bitrotted.


boot time options would be even better, but I understand from previous 
discussions I've watched that this is performance critical enough that the 
overhead of this would throw off the results.


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-16 Thread Peter Williams


Nick Piggin wrote:

On Tue, Apr 17, 2007 at 04:29:01AM +0200, Mike Galbraith wrote:

On Tue, 2007-04-17 at 10:06 +1000, Peter Williams wrote:

Mike Galbraith wrote:

Demystify what?   The casual observer need only read either your attempt
at writing a scheduler, or my attempts at fixing the one we have, to see
that it was high time for someone with the necessary skills to step in.

Make that "someone with the necessary clout".

No, I was brutally honest to both of us, but quite correct.


Now progress can happen, which was _not_ happening before.


This is true.

Yup, and progress _is_ happening now, quite rapidly.


Progress as in progress on Ingo's scheduler. I still don't know how we'd
decide when to replace the mainline scheduler or with what.

I don't think we can say Ingo's is better than the alternatives, can we?
If there is some kind of bakeoff, then I'd like one of Con's designs to
be involved, and mine, and Peter's...


I myself was thinking of this as the chance for a much needed 
simplification of the scheduling code and if this can be done with the 
result being "reasonable" it then gives us the basis on which to propose 
improvements based on the ideas of others such as you mention.


As the size of the cpusched indicates, trying to evaluate alternative 
proposals based on the current O(1) scheduler is fraught.  Hopefully, 
this initiative can fix this problem.  Then we just need Ingo to listen 
to suggestions and he's showing signs of being willing to do this :-)




Maybe the progress is that more key people are becoming open to the idea
of changing the scheduler.


That too.

Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-16 Thread Nick Piggin

On Tue, Apr 17, 2007 at 06:01:29AM +0200, Mike Galbraith wrote:
> On Tue, 2007-04-17 at 05:40 +0200, Nick Piggin wrote:
> > On Tue, Apr 17, 2007 at 04:29:01AM +0200, Mike Galbraith wrote:
>  
> > > Yup, and progress _is_ happening now, quite rapidly.
> > 
> > Progress as in progress on Ingo's scheduler. I still don't know how we'd
> > decide when to replace the mainline scheduler or with what.
> > 
> > I don't think we can say Ingo's is better than the alternatives, can we?
> 
> No, that would require massive performance testing of all alternatives.
> 
> > If there is some kind of bakeoff, then I'd like one of Con's designs to
> > be involved, and mine, and Peter's...
> 
> The trouble with a bakeoff is that it's pretty darn hard to get people
> to test in the first place, and then comes weighting the subjective and
> hard performance numbers.  If they're close in numbers, do you go with
> the one which starts the least flamewars or what?

I don't know how you'd do it. I know you wouldn't count people telling you
how good they are (getting people to tell you how bad they are, and whether
others do better in a given situation might be slightly move viable).

But we have to choose somehow. I'd hope that is going to be based solely
on the results and technical properties of the code, so... if we were to
somehow determine that the results are exactly the same, we'd go for the
the simpler one, wouldn't we?

> > Maybe the progress is that more key people are becoming open to the idea
> > of changing the scheduler.
> 
> Could be.  All was quiet for quite a while, but when RSDL showed up, it
> aroused enough interest to show that scheduling woes is on folks radar.

Well I know people have had woes with the scheduler for ever (I guess that
isn't going to change :P). I think people generally lost a bit of interest
in trying to improve the situation because of the upstream problem.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS (Completely Fair Scheduler), v2

2007-04-16 Thread Peter Williams


Ingo Molnar wrote:
this is the second release of the CFS (Completely Fair Scheduler) 
patchset, against v2.6.21-rc7:


   http://redhat.com/~mingo/cfs-scheduler/sched-cfs-v2.patch

i'd like to thank everyone for the tremendous amount of feedback and 
testing the v1 patch got - i could hardly keep up with just reading the 
mails! Some of the stuff people addressed i couldnt implement yet, i 
mostly concentrated on bugs, regressions and debuggability.


Can I make a suggestion?

Would it be possible (from now on) to publish changes relevant to the 
previous patch (eventually leading to a series of patches that describes 
the evolution of the new scheduler) so that it's easier for us 
reviewers/critics to see the latest changes.  E.g. if import such 
changes into something like quilt (using my gquilt GUI wrapper, of 
course :-)) I can then use meld (or similar) to follow what's going as 
suggestions get folded in and bugs get fixed etc.


Thanks
Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-16 Thread Mike Galbraith

On Tue, 2007-04-17 at 05:40 +0200, Nick Piggin wrote:
> On Tue, Apr 17, 2007 at 04:29:01AM +0200, Mike Galbraith wrote:

> > Yup, and progress _is_ happening now, quite rapidly.
> 
> Progress as in progress on Ingo's scheduler. I still don't know how we'd
> decide when to replace the mainline scheduler or with what.
> 
> I don't think we can say Ingo's is better than the alternatives, can we?

No, that would require massive performance testing of all alternatives.

> If there is some kind of bakeoff, then I'd like one of Con's designs to
> be involved, and mine, and Peter's...

The trouble with a bakeoff is that it's pretty darn hard to get people
to test in the first place, and then comes weighting the subjective and
hard performance numbers.  If they're close in numbers, do you go with
the one which starts the least flamewars or what?

> Maybe the progress is that more key people are becoming open to the idea
> of changing the scheduler.

Could be.  All was quiet for quite a while, but when RSDL showed up, it
aroused enough interest to show that scheduling woes is on folks radar.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-16 Thread Nick Piggin

On Mon, Apr 16, 2007 at 04:10:59PM -0700, Michael K. Edwards wrote:
> On 4/16/07, Peter Williams <[EMAIL PROTECTED]> wrote:
> >Note that I talk of run queues
> >not CPUs as I think a shift to multiple CPUs per run queue may be a good
> >idea.
> 
> This observation of Peter's is the best thing to come out of this
> whole foofaraw.  Looking at what's happening in CPU-land, I think it's
> going to be necessary, within a couple of years, to replace the whole
> idea of "CPU scheduling" with "run queue scheduling" across a complex,
> possibly dynamic mix of CPU-ish resources.  Ergo, there's not much
> point in churning the mainline scheduler through a design that isn't
> significantly more flexible than any of those now under discussion.

Why? If you do that, then your load balancer just becomes less flexible
because it is harder to have tasks run on one or the other.

You can have single-runqueue-per-domain behaviour (or close to) just by
relaxing all restrictions on idle load balancing within that domain. It
is harder to go the other way and place any per-cpu affinity or
restirctions with multiple cpus on a single runqueue.


> For instance, there are architectures where several "CPUs"
> (instruction stream decoders feeding execution pipelines) share parts
> of a cache hierarchy ("chip-level multitasking").  On these machines,
> you may want to co-schedule a "real" processing task on one pipeline
> with a "cache warming" task on the other pipeline -- but only for
> tasks whose memory access patterns have been sufficiently analyzed to
> write the "cache warming" task code.  Some other tasks may want to
> idle the second pipeline so they can use the full cache-to-RAM
> bandwidth.  Yet other tasks may be genuinely CPU-intensive (or I/O
> bound but so context-heavy that it's not worth yielding the CPU during
> quick I/Os), and hence perfectly happy to run concurrently with an
> unrelated task on the other pipeline.

We can do all that now with load balancing, affinities or by shutting
down threads dynamically.


> There are other architectures where several "hardware threads" fight
> over parts of a cache hierarchy (sometimes bizarrely described as
> "sharing" the cache, kind of the way most two-year-olds "share" toys).
> On these machines, one instruction pipeline can't help the other
> along cache-wise, but it sure can hurt.  A scheduler designed, tested,
> and tuned principally on one of these architectures (hint:
> "hyperthreading") will probably leave a lot of performance on the
> floor on processors in the former category.
> 
> In the not-so-distant future, we're likely to see architectures with
> dynamically reconfigurable interconnect between instruction issue
> units and execution resources.  (This is already quite feasible on,
> say, Virtex4 FX devices with multiple PPC cores, or Altera FPGAs with
> as many Nios II cores as fit on the chip.)  Restoring task context may
> involve not just MMU swaps and FPU instructions (with state-dependent
> hidden costs) but processsor reconfiguration.  Achieving "fairness"
> according to any standard that a platform integrator cares about (let
> alone an end user) will require a fairly detailed model of the hidden
> costs associated with different sorts of task switch.
> 
> So if you are interested in schedulers for some reason other than a
> paycheck, let the distros worry about 5% improvements on x86[_64].
> Get hold of some different "hardware" -- say:
>  - a Xilinx ML410 if you've got $3K to blow and want to explore
> reconfigurable processors;
>  - a SunFire T2000 if you've got $11K and want to mess with a CMT
> system that's actually shipping;
>  - a QEMU-simulated massively SMP x86 if you're poor but clever
> enough to implement funky cross-core cache effects yourself; or
>  - a cycle-accurate simulator from Gaisler or Virtio if you want a
> real research project.
> Then go explore some more interesting regions of parameter space and
> see what the demands on mainline Linux will look like in a few years.

There are no doubt improvements to be made, but they are generally
intended to be able to be done within the sched-domains framework. I
am not aware of a particular need that would be impossible to do using
that topology hierarchy and per-CPU runqueues, and there are added
complications involved with multiple CPUs per runqueue.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [v4l-dvb-maintainer] [GIT PATCHES] V4L/DVB updates

2007-04-16 Thread Mauro Carvalho Chehab

> > I have tested these patches with 2.6.20-mh1 + v4l-dvb-b5be3479f070 patchset.
> > I also tried 2.6.21-rc6 + v4l-dvb-b5be3479f070 patchset and this 
> > combination 
> > also works without OOPS.
> >   
> Yes, that shows that the changesets prevent the oops, but it says
> nothing about vanilla 2.6.20.y
> > Winfast dongles are both dvb-usb based (DiBcom 3000M-C and DiBcom 
> > DiB7000P), 
> > but pluto2 is cardbus (pci) based.
> >   
> just as I figured.  The pluto2 test results are great to hear, though --
> thank you.
> > I think we can include these patches into 2.6.21 and if we receive any 
> > problem, we still have 2.6.21.Z for fixing, don't we?
> 
> The stable kernel series is not there for that purpose.  It is not there
> to encourage a rush of patches into a final kernel release, only to
> cause potential problems, with the 2.6.x.y series as a fallback for
> fixes.  We should avoid the need for such last-minute fixes wherever
> possible.
For sure we should do the best to avoid regressions. But, IMO, a driver
for a hotpluggable device (USB) that can't support device hot plug is a
serious issue. 

If nobody have an issue pointing regressions on this, we should really
apply the fix for 2.6.21.

-- 
Cheers,
Mauro

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-16 Thread Nick Piggin

On Tue, Apr 17, 2007 at 04:29:01AM +0200, Mike Galbraith wrote:
> On Tue, 2007-04-17 at 10:06 +1000, Peter Williams wrote:
> > Mike Galbraith wrote:
> > >
> > > Demystify what?   The casual observer need only read either your attempt
> > > at writing a scheduler, or my attempts at fixing the one we have, to see
> > > that it was high time for someone with the necessary skills to step in.
> > 
> > Make that "someone with the necessary clout".
> 
> No, I was brutally honest to both of us, but quite correct.
> 
> > > Now progress can happen, which was _not_ happening before.
> > > 
> > 
> > This is true.
> 
> Yup, and progress _is_ happening now, quite rapidly.

Progress as in progress on Ingo's scheduler. I still don't know how we'd
decide when to replace the mainline scheduler or with what.

I don't think we can say Ingo's is better than the alternatives, can we?
If there is some kind of bakeoff, then I'd like one of Con's designs to
be involved, and mine, and Peter's...

Maybe the progress is that more key people are becoming open to the idea
of changing the scheduler.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2/2] 2.6.21-rc7: known regressions

2007-04-16 Thread Dave Jones

On Mon, Apr 16, 2007 at 10:26:43AM +0100, Richard Purdie wrote:

 > >  > >  > CONFIG_FB_BACKLIGHT=y
 > >  > >  > CONFIG_ACPI_VIDEO=n
 > >  > > 
 > >  > > That also gets me a dead display. Backlight doesn't turn back on.
 > >  > 
 > >  > Anything under /sys/class/backlight?
 > > 
 > > Entries from ibm_acpi.  I rmmod'd that, leaving the dir empty,
 > > and it still fails.
 > 
 > What happens if you never load ibm-acpi?

Same thing. No backlight on resume.
I rm'd the .ko, so there's no chance it got loaded.

 > I'm a bit puzzled as CONFIG_FB_BACKLIGHT doesn't do anything with the
 > intelfb driver. One thing it does do is set
 > CONFIG_BACKLIGHT_CLASS_DEVICE. When you disabled FB_BACKLIGHT and got a
 > working display on resume, was that set and was (or had) ibm-acpi been
 > loaded?
 > 
 > A variety of other options such as ACPI_IBM also set
 > CONFIG_BACKLIGHT_CLASS_DEVICE although without a backlight driver it
 > will do nothing hence the suspicion is on ibm-acpi, perhaps interacting
 > with the backlight class badly.
 > 
 > Does echoing numbers to /sys/class/backlight/ibm_acpi/brightness change
 > the backlight brightness as expected?

/sys/class/backlight/ibm/brightness takes a value from 0 to 7.
Starts off with a default of 0. I tried all values in there, and
it made no visible difference.  But as the no-backlight thing happens
without this even loaded, I think this is a separate problem.

 > If you can ssh into the machine
 > after its resumed with the display problem, it would be interesting to
 > know what the brightness was and if changing it helped too...

When the backlight doesn't come on, for some reason, nothing else
runs.  Capslock works, so it's at least partially alive, but even
doing..

echo mem > /sys/power/state ; echo foo >/bar ; sync

results in no /bar being created.
Ethernet remains down when its in this state too.

It's the reason it's taken this long to get any debug info out of it at all.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-16 Thread Nick Piggin

On Mon, Apr 16, 2007 at 09:28:24AM -0500, Matt Mackall wrote:
> On Mon, Apr 16, 2007 at 05:03:49AM +0200, Nick Piggin wrote:
> > I'd prefer if we kept a single CPU scheduler in mainline, because I
> > think that simplifies analysis and focuses testing.
> 
> I think you'll find something like 80-90% of the testing will be done
> on the default choice, even if other choices exist. So you really
> won't have much of a problem here.
> 
> But when the only choice for other schedulers is to go out-of-tree,
> then only 1% of the people will try it out and those people are
> guaranteed to be the ones who saw scheduling problems in mainline.
> So the alternative won't end up getting any testing on many of the
> workloads that work fine in mainstream so their feedback won't tell
> you very much at all.

Yeah I concede that perhaps it is the only way to get things going
any further. But how do we decide if and when the current scheduler
should be demoted from default, and which should replace it?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] 2.6.21-rc7 hpt366 driver broken

2007-04-16 Thread Mike Mattie

On Mon, 16 Apr 2007 19:43:03 -0700
Mike Mattie <[EMAIL PROTECTED]> wrote:

> On Mon, 16 Apr 2007 18:21:12 -0700
> Mike Mattie <[EMAIL PROTECTED]> wrote:
> 
> > On Mon, 16 Apr 2007 16:36:13 +0200
> > Adrian Bunk <[EMAIL PROTECTED]> wrote:
> > 
> > > [ Cc's added, full bug report was in
> > > http://lkml.org/lkml/2007/4/16/18 ]
> > > 
> > > On Mon, Apr 16, 2007 at 04:38:22AM -0700, Mike Mattie wrote:
> > > > On Sun, 15 Apr 2007 22:48:46 -0700
> > > > Mike Mattie <[EMAIL PROTECTED]> wrote:
> > > > 
> > > > > Hello,
> > > > > 
> > > > > I am testing the 2.6.21-rc7 kernel release. The IDE hpt366
> > > > > driver is crashing hanging the boot. I have basically the same
> > > > > config as 2.6.20.7 which works fine (except for netconsole
> > > > > mentioned in a previous mail).
> > > > > 
> > > > > here is the hand-copied info:
> > > > > 
> > > > > * "unable to handle paging request" , null deref
> > > > > * EIP @ init_chipset_hpt366
> > > > > 
> > > > 
> > > > > I am running a git-bisect to see if I can resolve it to a
> > > > > commit.
> > > > 
> > > > This was identified as the first broken commit:
> > > > 
> > > > commit 7b73ee05d0acb926923d43d78b61add776ea4bb1
> > > > Author: Sergei Shtylyov <[EMAIL PROTECTED]>
> > > > Date:   Wed Feb 7 18:18:16 2007 +0100
> > > > 
> > > > hpt366: init code rewrite
> > > > 
> > > > Reverting is conflicted so it will be a bit longer before I
> > > > pin-point any other build-breaks.
> > > 
> > > Thanks for your report.
> > > 
> > > Can you use a digital camera for taking a photograph of the crash?
> > 
> > I can later on tonight, by about 11PM west coast. I also saw
> > some hex offsets after the function pointed to by EIP, is there
> > a way to decode that to a line number ? I have debugging symbols
> > enabled.
> > 
> > I am also doing printk breadcrumbs to pin it down to a block
> > or a line.
> 
> I have narrowed the crash with breadcrumbs down to these lines:
> 
> 
>   /*
>* Only try the DPLL if we don't have a table for the PCI
> clock that
>* we are running at for HPT370/A, always use it  for
> anything newer... *
>* NOTE: Using the internal DPLL results in slow reads on 33
> MHz PCI.
>* We also  don't like using  the DPLL because this causes
> glitches
>* on PRST-/SRST- when the state engine gets reset...
>*/
>   if (info->chip_type >= HPT374 || info->settings[clock] ==
> NULL) { u16 f_low, delta = pci_clk < 50 ? 2 : 4;
>   int adjust;
> 
>   printk(KERN_INFO "inside the if\n");
> 
>/*
> * Select 66 MHz DPLL clock only if UltraATA/133
> mode is
> * supported/enabled, use 50 MHz DPLL clock
> otherwise... */
>   if (info->max_mode == 0x04) {
>   dpll_clk = 66;
>   clock = ATA_CLOCK_66MHZ;
>   } else if (dpll_clk) {  /* HPT36x chips don't
> have DPLL */ dpll_clk = 50;
>   clock = ATA_CLOCK_50MHZ;
>   }
> 
>   if (info->settings[clock] == NULL) {
 crashes here

since info is deref'd all over the place I am assuming it is the array
that is blowing up.

I printk'd the value of clock which is "4". that array is either not setup
correctly , or it is out-of-bounds (speculation)

>   printk(KERN_ERR "%s: unknown bus timing!\n",
> name); kfree(info);
>   return -EIO;
>   }
> 
>   printk(KERN_INFO "select DPLL clock\n");
> 
> This is right around 1171 , (skewed by the crumbs I added). The last
> message I receive is "inside if" , it dies before "select DPLL clock".
> 
> Without knowing much about the structs I am not sure what to
> print-out. I will narrow it further, and maybe even compare against
> what the old working kernel had for variable values. That would take
> some time though.
> 
> > 
> > > cu
> > > Adrian
> > > 
> > > --
> > > 
> > >"Is there not promise of rain?" Ling Tan asked suddenly out
> > > of the darkness. There had been need of rain for many
> > > days. "Only a promise," Lao Er said.
> > >Pearl S. Buck - Dragon Seed
> > > 


signature.asc
Description: PGP signature

Re: BUG: Bad page state errors during kernel make (resolved)

2007-04-16 Thread Zach Carter





Zach Carter wrote:


Do you think there might be other bad hw, or another explanation?




Well, after updating the BIOS for the motherboard, I was able to rebuild the kernel 6 times in a row 
with no page state errors.  I noticed that the recent BIOS update includes "Enhanced compatibility 
with Linux":


http://www.abit-usa.com/products/mb/bios.php?categories=1=316

In case anyone searching the ML archive has the same problem, the motherboard is an ABIT KN9 ULTRA 
Socket AM2 NVIDIA nForce 570 Ultra MCP ATX


-Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPU_IDLE prevents resuming from STR [was: Re: 2.6.21-rc6-mm1]

2007-04-16 Thread Joshua Wise


On Mon, 16 Apr 2007, Shaohua Li wrote:

On Sat, 2007-04-14 at 01:45 +0200, Mattia Dongili wrote:

...

please check if the patch at
http://marc.info/?l=linux-acpi=117523651630038=2 fixed the issue


I have the same system as Mattia, and when I applied this patch and turned
CPU_IDLE back on, I got a panic on boot. Unfortunately, the EIP scrolled off
screen, so I can't get a line number.

(I had the same STR breakage as him; STR did not work with CPU_IDLE turned
on, and it did work with CPU_IDLE turned off.)

I'm running +rc6+mm(April 11) on a Sony VAIO SZ.

joshua
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Blackfin: blackfin on-chip SPI controller driver

2007-04-16 Thread David Brownell

Cleaning out some of my pending-reviews queue ... after you address
these comments I think what I'd like to do is sign off on one clean
patch, rather than initial-plus-cleanups.

On Monday 05 March 2007 2:41 am, Wu, Bryan wrote:

> --- linux-2.6.orig/drivers/spi/Kconfig2007-03-01 11:33:07.0 
> +0800
> +++ linux-2.6/drivers/spi/Kconfig 2007-03-01 11:40:22.0 +0800

I'm adjusting this to address the later patches you sent.

One global comment I'll make, just in case -- please make
sure all your line-start indents only include tabs, and
there's no space-at-end-of-line stuff going on, or lines
wrapping past column 80.

I did this review in KMail, which doesn't highlight such
minor errors; and I suspect you're mostly OK, but for a
new driver there's no reason not to be 100% OK in those
particular respects!  (And I *did* notice one of your
cleanup patches clearly adding tabs-then-spaces indents.)

> @@ -156,7 +156,11 @@
>  #
>  # Add new SPI protocol masters in alphabetical order above this line
>  #
> -
> +config SPI_BFIN
> + tristate "SPI controller driver for ADI Blackfin5xx"
> + depends on SPI_MASTER && BFIN
> + help
> +   This is the SPI controller master driver for Blackfin 5xx processor.

Please put this in Kconfig up with the other SPI controller drivers, in
alphabetical order.  Just like the comment says.

Likewise, please add it to the Makefile in alphabetical order.

> --- /dev/null 1970-01-01 00:00:00.0 +
> +++ linux-2.6/drivers/spi/spi_bfin5xx.c   2007-03-01 11:40:22.0 
> +0800

> +#ifdef DEBUG
> +#define ASSERT(expr) \
> + if (!(expr)) { \
> + printk(KERN_DEBUG "assertion failed! %s[%d]: %s\n", \
> +__FUNCTION__, __LINE__, #expr); \
> + panic(KERN_DEBUG "%s", __FUNCTION__); \

Seems like either WARN_ON(expr) or BUG_ON(expr) will be better.
The general rule of BUG variants is: don't, unless the system
really can't continue operating.  (I see a later patch removed
this entirely, good.

> + }
> +#else
> +#define ASSERT(expr)
> +#endif
> +
> +#define IS_DMA_ALIGNED(x) (((u32)(x)&0x07)==0)
> +
> +#define DEFINE_SPI_REG(reg, off) \
> +static inline u16 read_##reg(void) \
> +{ return *(volatile unsigned short*)(SPI0_REGBASE + off); } \
> +static inline void write_##reg(u16 v) \
> +{*(volatile unsigned short*)(SPI0_REGBASE + off) = v;\
> + SSYNC();}

These should be readw() and writew() or similar... also, I can't tell
what SSYNC() does, but it sure looks like something that shouldn't be
hidden like that.  I/O memory should be mapped such that writes don't
get re-ordered.  And flushing any write buffer should not be forced in
such low-level accessors ... if it's needed, it should be done at the
relevant points in the driver.  (Which you seem to do in a few places
below.  The duplication is undesirable.)

> +
> +DEFINE_SPI_REG(CTRL, 0x00)

... this particular style of register accessor is not generally used in Linux.
The typical style is

u16 value = __raw_readw(SPI0_REGBASE + SPI_CTRL)
__raw_writew(SPI0_REGBASE + SPI_CTRL, value);

or wrapped in macros so spi_readw(CTRL) and spi_writew(CTRL, value) work.

Of course, SPI1/SPI2/etc should be supported too ... so it's common to have
those take a pointer to some controller struct with a "void __iomem *regs"
pointer to the rgisters for that instance.  spi_readw(master, CTRL) etc.

> +#define START_STATE ((void*)0)
> +#define RUNNING_STATE ((void*)1)
> +#define DONE_STATE ((void*)2)
> +#define ERROR_STATE ((void*)-1)

Normally states would be represented by enum values, which among other
things supports "switch (state) { ... }" state machine code.  This driver
is full of uncommon idioms, which will make it harder for most kernel
developers to dive in and help.

Even if you have a style guide internal to Analog which says to do things
this way ... don't.

> +
> +#define QUEUE_RUNNING 0
> +#define QUEUE_STOPPED 1
> +
> +int dma_requested;
> +char chip_select_flag;

These should probably be members of the per-controller state struct,
and otherwise should certainly be static.  This driver exports a LOT
of stuff that should be static ...

> +
> +struct driver_data {

Not the most explanatory of names.  Could you do better?

> + /* Driver model hookup */
> + struct platform_device *pdev;
> +
> + /* SPI framework hookup */
> + struct spi_master *master;
> +
> + /* BFIN hookup */
> + struct bfin5xx_spi_master *master_info;

I would have assumed this struct would *BE* the Blackfin-specific
spi_master state ...

> +
> + /* Driver message queue */
> + struct workqueue_struct *workqueue;
> + struct work_struct pump_messages;
> + spinlock_t lock;
> + struct list_head queue;
> + int busy;
> + int run;
> +
> + /* Message Transfer pump */
> + struct tasklet_struct pump_transfers;
> +
> + /* Current message transfer state info */
> +

Re: [Patch -mm 0/3] RFC: module unloading vs. release function

2007-04-16 Thread Rusty Russell

On Tue, 2007-04-17 at 00:44 +0400, Alexey Dobriyan wrote:
> On Mon, Apr 16, 2007 at 03:38:52PM -0400, Alan Stern wrote:
> >  3. Change the module code so that rmmod can return _before_ the
> > module is actually unloaded from memory (but after the module's
> > exit routine has completed).  This will lead to more problems.
> > For example, what if someone tries to modprobe my_module back
> > again before it has finished unloading?
> 
> This problem (or its absence) must be already in tree: module_mutex is
> dropped for the duration of ->exit() function, so init_module(2) could
> load new old module meanwhile.

Only if you give it a different name when loading it the second time.

Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CPU_IDLE prevents resuming from STR [was: Re: 2.6.21-rc6-mm1]

2007-04-16 Thread Shaohua Li

On Mon, 2007-04-16 at 22:50 -0400, Joshua Wise wrote:
> On Mon, 16 Apr 2007, Shaohua Li wrote:
> > On Sat, 2007-04-14 at 01:45 +0200, Mattia Dongili wrote:
> >> ...
> > please check if the patch at
> > http://marc.info/?l=linux-acpi=117523651630038=2 fixed the issue
> 
> I have the same system as Mattia, and when I applied this patch and turned
> CPU_IDLE back on, I got a panic on boot. Unfortunately, the EIP scrolled off
> screen, so I can't get a line number.
> 
> (I had the same STR breakage as him; STR did not work with CPU_IDLE turned
> on, and it did work with CPU_IDLE turned off.)
> 
> I'm running +rc6+mm(April 11) on a Sony VAIO SZ.
Is it possible you can get the log from a serial? I thought at least you
can see some log info in the screen, if you haven't serial, please write
it down. The boot panic surprise me, as it works here.

Thanks,
Shaohua
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Patch -mm 3/3] RFC: Introduce kobject->owner for refcounting.

2007-04-16 Thread Rusty Russell

On Mon, 2007-04-16 at 15:53 -0400, Alan Stern wrote:
> The fundamental rule is that whenever you hand out a pointer to a routine
> living in a module, the receiver has to increment the module's refcount.  
> But the driver core violates this rule all over the place.

Hi Alan,

Your rule is overly simplistic, unfortunately.  You have two choices:
take a reference count, *or* ensure that the reference will go away when
the module's cleanup routine is called.  Network drivers are a classic
example of the latter.

Note that you cannot do both: if the cleanup routine calls something
which drops a reference count, it implies that the cleanup routine needs
to be called with non-zero reference count, and it won't be (ignoring
--force).

I hope that clarifies?
Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-16 Thread Mike Galbraith

On Tue, 2007-04-17 at 10:06 +1000, Peter Williams wrote:
> Mike Galbraith wrote:
> >
> > Demystify what?   The casual observer need only read either your attempt
> > at writing a scheduler, or my attempts at fixing the one we have, to see
> > that it was high time for someone with the necessary skills to step in.
> 
> Make that "someone with the necessary clout".

No, I was brutally honest to both of us, but quite correct.

> > Now progress can happen, which was _not_ happening before.
> > 
> 
> This is true.

Yup, and progress _is_ happening now, quite rapidly.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] 2.6.21-rc7 hpt366 driver broken

2007-04-16 Thread Mike Mattie

On Mon, 16 Apr 2007 18:21:12 -0700
Mike Mattie <[EMAIL PROTECTED]> wrote:

> On Mon, 16 Apr 2007 16:36:13 +0200
> Adrian Bunk <[EMAIL PROTECTED]> wrote:
> 
> > [ Cc's added, full bug report was in
> > http://lkml.org/lkml/2007/4/16/18 ]
> > 
> > On Mon, Apr 16, 2007 at 04:38:22AM -0700, Mike Mattie wrote:
> > > On Sun, 15 Apr 2007 22:48:46 -0700
> > > Mike Mattie <[EMAIL PROTECTED]> wrote:
> > > 
> > > > Hello,
> > > > 
> > > > I am testing the 2.6.21-rc7 kernel release. The IDE hpt366
> > > > driver is crashing hanging the boot. I have basically the same
> > > > config as 2.6.20.7 which works fine (except for netconsole
> > > > mentioned in a previous mail).
> > > > 
> > > > here is the hand-copied info:
> > > > 
> > > > * "unable to handle paging request" , null deref
> > > > * EIP @ init_chipset_hpt366
> > > > 
> > > 
> > > > I am running a git-bisect to see if I can resolve it to a
> > > > commit.
> > > 
> > > This was identified as the first broken commit:
> > > 
> > > commit 7b73ee05d0acb926923d43d78b61add776ea4bb1
> > > Author: Sergei Shtylyov <[EMAIL PROTECTED]>
> > > Date:   Wed Feb 7 18:18:16 2007 +0100
> > > 
> > > hpt366: init code rewrite
> > > 
> > > Reverting is conflicted so it will be a bit longer before I
> > > pin-point any other build-breaks.
> > 
> > Thanks for your report.
> > 
> > Can you use a digital camera for taking a photograph of the crash?
> 
> I can later on tonight, by about 11PM west coast. I also saw
> some hex offsets after the function pointed to by EIP, is there
> a way to decode that to a line number ? I have debugging symbols
> enabled.
> 
> I am also doing printk breadcrumbs to pin it down to a block
> or a line.

I have narrowed the crash with breadcrumbs down to these lines:


/*
 * Only try the DPLL if we don't have a table for the PCI clock that
 * we are running at for HPT370/A, always use it  for anything newer...
 *
 * NOTE: Using the internal DPLL results in slow reads on 33 MHz PCI.
 * We also  don't like using  the DPLL because this causes glitches
 * on PRST-/SRST- when the state engine gets reset...
 */
if (info->chip_type >= HPT374 || info->settings[clock] == NULL) {
u16 f_low, delta = pci_clk < 50 ? 2 : 4;
int adjust;

printk(KERN_INFO "inside the if\n");

 /*
  * Select 66 MHz DPLL clock only if UltraATA/133 mode is
  * supported/enabled, use 50 MHz DPLL clock otherwise...
  */
if (info->max_mode == 0x04) {
dpll_clk = 66;
clock = ATA_CLOCK_66MHZ;
} else if (dpll_clk) {  /* HPT36x chips don't have DPLL */
dpll_clk = 50;
clock = ATA_CLOCK_50MHZ;
}

if (info->settings[clock] == NULL) {
printk(KERN_ERR "%s: unknown bus timing!\n", name);
kfree(info);
return -EIO;
}

printk(KERN_INFO "select DPLL clock\n");

This is right around 1171 , (skewed by the crumbs I added). The last
message I receive is "inside if" , it dies before "select DPLL clock".

Without knowing much about the structs I am not sure what to print-out.
I will narrow it further, and maybe even compare against what the old
working kernel had for variable values. That would take some time though.

> 
> > cu
> > Adrian
> > 
> > --
> > 
> >"Is there not promise of rain?" Ling Tan asked suddenly out
> > of the darkness. There had been need of rain for many days.
> >"Only a promise," Lao Er said.
> >Pearl S. Buck - Dragon Seed
> > 


signature.asc
Description: PGP signature

Re: Major qla2xxx regression on sparc64

2007-04-16 Thread Andrew Vasquez

On Mon, 16 Apr 2007, David Miller wrote:

> From: Andrew Vasquez <[EMAIL PROTECTED]>
> Date: Mon, 16 Apr 2007 16:47:05 -0700
> 
> > Dave, according to your earlier emails, the qla2xxx driver worked
> > 'fine' in driver versions before commit
> > 7aef45ac92f49e76d990b51b7ecd714b9a608be1.  If that were the case, then
> > you would have seen the warning messages:
> > 
> > ...
> > qla_printk(KERN_WARNING, ha, "Falling back to functioning (yet "
> > "invalid -- WWPN) defaults.\n");
> 
> I have in fact seen the message several times and that messages gives
> me no reason to believe something needs to be fixed.
> 
> It should have said "PLEASE REPORT THIS to [EMAIL PROTECTED]" or
> something similar to indicate the severity better.
> 
> "An invalid WWPN, what's that?" said the user. :)
> 
> How about "FC IDs may conflict and cause miscommunication!  Please
> report to driver author so this can be fixed!" or similar?

That verbiage sounds fine -- so would you consider the previous patch
I submitted (with module parameter) along with the wording above?

I'm in transit for a redeye to NY so I won't be able to modify the
patch, If you would be amenable to the above, Seokmann, could you
rework the patch?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.21-rc6-mm1 ATA HPT37x regression

2007-04-16 Thread John Stoffel


Hi Jeff and crew,

I was just testing out 2.6.21-rc6-mm1 to test some Cyclades patches
and I noticed that my HPT302 (rev1) controller with a pair of 120gb WD
disks are not longer detected and I get the following in the dmesg
logs:

[  148.121490] hpt37x: DPLL did not stabilize.

Where before, under 2.6.21-rc6 I got the following:

[  173.749349] pata_hpt37x: BIOS has not set timing clocks.
[  173.752949] hpt37x: HPT302: Bus clock 33MHz.
[  173.754409] ACPI: PCI Interrupt :03:06.0[A] -> GSI 18 (level,
low) -> IRQ
 18
[  173.758403] ata5: PATA max UDMA/133 cmd 0x0001ecf8 ctl 0x0001ecf2
bmdma 0x000
1e800 irq 18
[  173.761396] ata6: PATA max UDMA/133 cmd 0x0001ece0 ctl 0x0001ecda
bmdma 0x000
1e808 irq 18
[  173.764319] scsi6 : pata_hpt37x
[  173.928997] ATA: abnormal status 0x78 on port 0x0001ecff
[  173.930511] scsi7 : pata_hpt37x
[  174.094906] ATA: abnormal status 0x8 on port 0x0001ece7


Here's my lspci infomation on the board, it's an addon.  My apologies
for the crappy word wrapping, xterms inside screen, etc. 

  03:06.0 RAID bus controller: Triones Technologies, Inc. HPT302/302N
  (rev 01)
  Subsystem: Triones Technologies, Inc. Unknown device 0001
  Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
  ParErr- Step
  ping- SERR+ FastB2B-
  Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium
  >TAbort- SERR- TAbort- SERR- http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Memory Allocation

2007-04-16 Thread Brian D. McGrew

Good evening gents!

I need some help in allocating memory and understanding how the system
allocates memory with physical versus virtual page tables.  Please
consider the following snippet of code.  Please, no wisecracks about bad
code; it was written in 30 seconds in haste :-)

#include 

#include 
#include 
#include 

const static u_long kMaxSize = (2048 * 2048 * 256);

void *msg(void *ptr);
static u_long threads_done  = 0;

int
main(int argc, char *argv[])
{
 pthread_t thread1;
 pthread_t thread2;

 char *message1 = "Thread 1";
 char *message2 = "Thread 2";

 int iret1;
 int iret2;

 iret1 = pthread_create(, NULL, msg, (void *) message1);
 iret2 = pthread_create(, NULL, msg, (void *) message2);

//pthread_join(thread1, NULL);
//pthread_join(thread2, NULL); 

while (threads_done < 2) {
std::cout << "Threads complete: " << threads_done << std::endl;
sleep(3);
}

exit(0);
}

void *
msg(void *ptr)
{
char *message = (char *) ptr;

//
// Equal to 1 bank per thread of 256 each 4MP image buffers.  2GB.
//
char *buffer = new char[kMaxSize];

u_long max = kMaxSize;

//
// Init each buffer to 'something'.
//
for (u_long inx = 0; inx < max; inx++) {
if (inx % 10240 == 0) {
std::cout << message << ": Index: " << inx << std::endl;
}

buffer[inx] = inx;
}

free(buffer);
threads_done++;
}

My test machine is a Dell Precision 490 with dual 5140 processors and
3GB of RAM.  If I reduced kMaxSize to (2048 * 2048 * 236) is works.
However, I need to allocate an array of char that is (2048 * 2048 * 256)
and maybe even as large at (2048 * 2048 * 512).

Obviously I have enough physical memory in the box to do this.  However,
I suspect that I'm running out of page table entries.  Please, correct
me if I'm wrong; but if I allocate (2048 * 2048 * 236) it work.  When I
increment to 256 or 512 it fails and it is my suspicion that I just
don't have enough more in kernel memory to allocate this much memory in
user space.  

Because of a piece of 3rd party hardware, I'm forced to run the kernel
in the 4GB memory model.  What I need to be able to do is allocate an
array of char (2048 * 2048 * (up to 512)) in user space *** AND *** I
need the addresses that I get back to be contiguous, that's just the way
my 3rd party hardware works.

I'm inclined to believe that this in not specifically a Linux problem
but maybe an architecture problem???  But maybe there is some kind of
work around in the kernel for it???  I'd find it hard to believe that
I'm the first one that ever needed to use this much memory.

I ran this same code on two difference Macs.  One of them a Powerbook G4
with 4GB of RAM and it was successful.  The other was a Macbook Pro with
4GB of RAM and it failed.  Both running OS 10.4.9.  And of course it
runs just lovely on my Sun workstation with Solaris.  Thus, I'm thinking
it's an Intel/X86 issue!

How the heck to I get past this problem in Linux on the X86 plateform???

Thanks,

-brian
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH][BUG] Fix possible NULL pointer access in 8250 serial driver

2007-04-16 Thread izumi

Hi,

I encountered the following kernel panic. The cause of this problem was
NULL pointer access in check_modem_status() in 8250.c. I confirmed
this problem is fixed by the attached patch, but I don't know this
is the correct fix.

sadc[4378]: NaT consumption 2216203124768 [1]
Modules linked in: binfmt_misc dm_mirror dm_mod thermal processor fan
container button sg e100 eepro100 mii ehci_hcd ohci_hcd

Pid: 4378, CPU 0, comm: sadc
psr : 1210085a2010 ifs : 8289 ip : []
Not tainted
ip is at check_modem_status+0xf1/0x360
unat:  pfs : 0289 rsc : 0003
rnat: 8000cc18 bsps:  pr : 00aa6a99
ldrs:  ccv :  fpsr: 0009804c8a70033f
csd :  ssd : 
b0 : a00100481fb0 b6 : a001004822e0 b7 : a00100477f20
f6 : 1003e f7 : 0ffdba200
f8 : 100018000 f9 : 10002a000
f10 : 0fffdc8c0 f11 : 1003e
r1 : a00100b9af40 r2 : 0008 r3 : a00100ad4e21
r8 : 00bb r9 : 0001 r10 : 
r11 : a00100ad4d58 r12 : e37b7df0 r13 : e37b
r14 : 0001 r15 : 0018 r16 : a00100ad4d6c
r17 :  r18 :  r19 : 
r20 : a0010099bc88 r21 : 00bb r22 : 00bb
r23 : c003fc0ff3fe r24 : c003fc00 r25 : 000ff3fe
r26 : a001009b7ad0 r27 : 0001 r28 : a001009b7ad8
r29 :  r30 : a001009b7ad0 r31 : a001009b7ad0

Call Trace:
[] show_stack+0x40/0xa0
sp=e37b7810 bsp=e37b1118
[] show_regs+0x840/0x880
sp=e37b79e0 bsp=e37b10c0
[] die+0x1c0/0x2c0
sp=e37b79e0 bsp=e37b1078
[] die_if_kernel+0x50/0x80
sp=e37b7a00 bsp=e37b1048
[] ia64_fault+0x11e0/0x1300
sp=e37b7a00 bsp=e37b0fe8
[] ia64_leave_kernel+0x0/0x280
sp=e37b7c20 bsp=e37b0fe8
[] check_modem_status+0xf0/0x360
sp=e37b7df0 bsp=e37b0fa0
[] serial8250_get_mctrl+0x20/0xa0
sp=e37b7df0 bsp=e37b0f80
[] uart_read_proc+0x250/0x860
sp=e37b7df0 bsp=e37b0ee0
[] proc_file_read+0x1d0/0x4c0
sp=e37b7e10 bsp=e37b0e80
[] vfs_read+0x1b0/0x300
sp=e37b7e20 bsp=e37b0e30
[] sys_read+0x70/0xe0
sp=e37b7e20 bsp=e37b0db0
[] ia64_ret_from_syscall+0x0/0x20
sp=e37b7e30 bsp=e37b0db0
[] __kernel_syscall_via_break+0x0/0x20
sp=e37b8000 bsp=e37b0db0

Thanks,
Taku Izumi
Fix the possible NULL pointer access in check_modem_status() in
8250.c. The check_modem_status() would access 'info' member of
uart_port structure, but it is not initialized before uart_open() is
called. The check_modem_status() can be called through
/proc/tty/driver/serial before uart_open() is called.

Signed-off-by: Kenji Kaneshige <[EMAIL PROTECTED]>
Signed-off-by: Taku Izumi <[EMAIL PROTECTED]>
---
 drivers/serial/8250.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6.21-rc5/drivers/serial/8250.c
===
--- linux-2.6.21-rc5.orig/drivers/serial/8250.c 2007-03-26 09:14:37.0 
+0900
+++ linux-2.6.21-rc5/drivers/serial/8250.c  2007-04-13 12:06:52.0 
+0900
@@ -1310,7 +1310,8 @@
 {
unsigned int status = serial_in(up, UART_MSR);
 
-   if (status & UART_MSR_ANY_DELTA && up->ier & UART_IER_MSI) {
+   if (status & UART_MSR_ANY_DELTA && up->ier & UART_IER_MSI &&
+   up->port.info != NULL) {
if (status & UART_MSR_TERI)
up->port.icount.rng++;
if (status & UART_MSR_DDSR)

[PATCH 001 of 2] knfsd: Use a spinlock to protect sk_info_authunix

2007-04-16 Thread NeilBrown


sk_info_authunix is not being protected properly so the object that
it points to can be cache_put twice, leading to corruption.

We borrow svsk->sk_defer_lock to provide the protection.  We should probably
rename that lock to have a more generic name - later.

Thanks to Gabriel for reporting this.

Cc: Greg Banks <[EMAIL PROTECTED]>
Cc: Gabriel Barazer <[EMAIL PROTECTED]>
Signed-off-by: Neil Brown <[EMAIL PROTECTED]>

### Diffstat output
 ./net/sunrpc/svcauth_unix.c |   21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff .prev/net/sunrpc/svcauth_unix.c ./net/sunrpc/svcauth_unix.c
--- .prev/net/sunrpc/svcauth_unix.c 2007-04-17 11:42:14.0 +1000
+++ ./net/sunrpc/svcauth_unix.c 2007-04-17 11:42:21.0 +1000
@@ -383,7 +383,10 @@ void svcauth_unix_purge(void)
 static inline struct ip_map *
 ip_map_cached_get(struct svc_rqst *rqstp)
 {
-   struct ip_map *ipm = rqstp->rq_sock->sk_info_authunix;
+   struct ip_map *ipm;
+   struct svc_sock *svsk = rqstp->rq_sock;
+   spin_lock_bh(>sk_defer_lock);
+   ipm = svsk->sk_info_authunix;
if (ipm != NULL) {
if (!cache_valid(>h)) {
/*
@@ -391,12 +394,14 @@ ip_map_cached_get(struct svc_rqst *rqstp
 * remembered, e.g. by a second mount from the
 * same IP address.
 */
-   rqstp->rq_sock->sk_info_authunix = NULL;
+   svsk->sk_info_authunix = NULL;
+   spin_unlock_bh(>sk_defer_lock);
cache_put(>h, _map_cache);
return NULL;
}
cache_get(>h);
}
+   spin_unlock_bh(>sk_defer_lock);
return ipm;
 }
 
@@ -405,9 +410,15 @@ ip_map_cached_put(struct svc_rqst *rqstp
 {
struct svc_sock *svsk = rqstp->rq_sock;
 
-   if (svsk->sk_sock->type == SOCK_STREAM && svsk->sk_info_authunix == 
NULL)
-   svsk->sk_info_authunix = ipm;   /* newly cached, keep the 
reference */
-   else
+   spin_lock_bh(>sk_defer_lock);
+   if (svsk->sk_sock->type == SOCK_STREAM &&
+   svsk->sk_info_authunix == NULL) {
+   /* newly cached, keep the reference */
+   svsk->sk_info_authunix = ipm;
+   ipm = NULL;
+   }
+   spin_unlock_bh(>sk_defer_lock);
+   if (ipm)
cache_put(>h, _map_cache);
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 002 of 2] knfsd: Rename sk_defer_lock to sk_lock

2007-04-16 Thread NeilBrown


Now that sk_defer_lock protects two different things, make the
name more generic.

Also don't bother with disabling _bh as the lock is only
ever taken from process context.

Signed-off-by: Neil Brown <[EMAIL PROTECTED]>

### Diffstat output
 ./include/linux/sunrpc/svcsock.h |3 ++-
 ./net/sunrpc/svcauth_unix.c  |   10 +-
 ./net/sunrpc/svcsock.c   |   13 +++--
 3 files changed, 14 insertions(+), 12 deletions(-)

diff .prev/include/linux/sunrpc/svcsock.h ./include/linux/sunrpc/svcsock.h
--- .prev/include/linux/sunrpc/svcsock.h2007-04-17 11:42:13.0 
+1000
+++ ./include/linux/sunrpc/svcsock.h2007-04-17 11:42:26.0 +1000
@@ -37,7 +37,8 @@ struct svc_sock {
 
atomic_tsk_reserved;/* space on outq that is 
reserved */
 
-   spinlock_t  sk_defer_lock;  /* protects sk_deferred */
+   spinlock_t  sk_lock;/* protects sk_deferred and
+* sk_info_authunix */
struct list_headsk_deferred;/* deferred requests that need 
to
 * be revisted */
struct mutexsk_mutex;   /* to serialize sending data */

diff .prev/net/sunrpc/svcauth_unix.c ./net/sunrpc/svcauth_unix.c
--- .prev/net/sunrpc/svcauth_unix.c 2007-04-17 11:42:21.0 +1000
+++ ./net/sunrpc/svcauth_unix.c 2007-04-17 11:42:26.0 +1000
@@ -385,7 +385,7 @@ ip_map_cached_get(struct svc_rqst *rqstp
 {
struct ip_map *ipm;
struct svc_sock *svsk = rqstp->rq_sock;
-   spin_lock_bh(>sk_defer_lock);
+   spin_lock(>sk_lock);
ipm = svsk->sk_info_authunix;
if (ipm != NULL) {
if (!cache_valid(>h)) {
@@ -395,13 +395,13 @@ ip_map_cached_get(struct svc_rqst *rqstp
 * same IP address.
 */
svsk->sk_info_authunix = NULL;
-   spin_unlock_bh(>sk_defer_lock);
+   spin_unlock(>sk_lock);
cache_put(>h, _map_cache);
return NULL;
}
cache_get(>h);
}
-   spin_unlock_bh(>sk_defer_lock);
+   spin_unlock(>sk_lock);
return ipm;
 }
 
@@ -410,14 +410,14 @@ ip_map_cached_put(struct svc_rqst *rqstp
 {
struct svc_sock *svsk = rqstp->rq_sock;
 
-   spin_lock_bh(>sk_defer_lock);
+   spin_lock(>sk_lock);
if (svsk->sk_sock->type == SOCK_STREAM &&
svsk->sk_info_authunix == NULL) {
/* newly cached, keep the reference */
svsk->sk_info_authunix = ipm;
ipm = NULL;
}
-   spin_unlock_bh(>sk_defer_lock);
+   spin_unlock(>sk_lock);
if (ipm)
cache_put(>h, _map_cache);
 }

diff .prev/net/sunrpc/svcsock.c ./net/sunrpc/svcsock.c
--- .prev/net/sunrpc/svcsock.c  2007-04-17 11:42:13.0 +1000
+++ ./net/sunrpc/svcsock.c  2007-04-17 11:42:26.0 +1000
@@ -53,7 +53,8 @@
  * svc_serv->sv_lock protects sv_tempsocks, sv_permsocks, sv_tmpcnt.
  * when both need to be taken (rare), svc_serv->sv_lock is first.
  * BKL protects svc_serv->sv_nrthread.
- * svc_sock->sk_defer_lock protects the svc_sock->sk_deferred list
+ * svc_sock->sk_lock protects the svc_sock->sk_deferred list
+ * and the ->sk_info_authunix cache.
  * svc_sock->sk_flags.SK_BUSY prevents a svc_sock being enqueued multiply.
  *
  * Some flags can be set to certain values at any time
@@ -1625,7 +1626,7 @@ static struct svc_sock *svc_setup_socket
svsk->sk_server = serv;
atomic_set(>sk_inuse, 1);
svsk->sk_lastrecv = get_seconds();
-   spin_lock_init(>sk_defer_lock);
+   spin_lock_init(>sk_lock);
INIT_LIST_HEAD(>sk_deferred);
INIT_LIST_HEAD(>sk_ready);
mutex_init(>sk_mutex);
@@ -1849,9 +1850,9 @@ static void svc_revisit(struct cache_def
dprintk("revisit queued\n");
svsk = dr->svsk;
dr->svsk = NULL;
-   spin_lock_bh(>sk_defer_lock);
+   spin_lock(>sk_lock);
list_add(>handle.recent, >sk_deferred);
-   spin_unlock_bh(>sk_defer_lock);
+   spin_unlock(>sk_lock);
set_bit(SK_DEFERRED, >sk_flags);
svc_sock_enqueue(svsk);
svc_sock_put(svsk);
@@ -1917,7 +1918,7 @@ static struct svc_deferred_req *svc_defe
 
if (!test_bit(SK_DEFERRED, >sk_flags))
return NULL;
-   spin_lock_bh(>sk_defer_lock);
+   spin_lock(>sk_lock);
clear_bit(SK_DEFERRED, >sk_flags);
if (!list_empty(>sk_deferred)) {
dr = list_entry(svsk->sk_deferred.next,
@@ -1926,6 +1927,6 @@ static struct svc_deferred_req *svc_defe
list_del_init(>handle.recent);
set_bit(SK_DEFERRED, >sk_flags);
}
-   spin_unlock_bh(>sk_defer_lock);
+   spin_unlock(>sk_lock);
return dr;
 }

[PATCH 000 of 2] knfsd: Close oopsable race in nfsd

2007-04-16 Thread NeilBrown

Following two patches fix a bug introduced in 
   7b2b1fee30df7e2165525cd03f7d1d01a3a56794
and hence is in 2.6.19 and later.
The first patch is a minimal fix which is suitable for all kernels 
since 2.6.19-pre1.  The second adds some consequent cleaning up
and is probably best left for 2.6.22-rc (and so it not being cc:ed 
to [EMAIL PROTECTED]).

Thanks,
NeilBrown


 [PATCH 001 of 2] knfsd: Use a spinlock to protect sk_info_authunix
 [PATCH 002 of 2] knfsd: Rename sk_defer_lock to sk_lock
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [AppArmor 31/41] Fix __d_path() for lazy unmounts and make it unambiguous; exclude unreachable mount points from /proc/mounts

2007-04-16 Thread Andreas Gruenbacher

On Monday 16 April 2007 23:57, Alan Cox wrote:
> I don't believe the existing behaviour _IS_ a mistake.

So what would be the arguments why this behavior makes sense, other than 
legacy?

For /proc/mounts, one could argue that the admin might want to see everything, 
but then that's not actually true even today because /proc/mounts doesn't 
show lazyily unmounted stuff or mounts from other namespaces, so that 
everything is quite relative.

Along that line of argumentation, I would at least expect unambiguous output, 
to be able to tell which mountpoints are actually meaningful to the 
requesting process. It's not only human operators looking at /proc/mounts; 
applications care as well.

But after thinking about this issue quite a while, I really can't see what 
that should be good for. The current /proc/mounts interface is obviously 
broken; the chroot example should have demonstrated that. There are also 
unnecessary special cases because of that, such as having to filter out the 
rootfs entry when trying to figure out what's really mounted on /, and having 
to guess what's there and what's not in a particular context. The more 
complex mount scenarios will get, the more obviously broken the 
current /proc/mounts interface will become.

The getcwd() case is even stronger as the "see everything" argument makes even 
less sense there. I really can't see why the kernel should return processes 
fake pathnames. The process is explicitly asking for the current pathname to 
the working directory, it doesn't want to know what the pathname was at some 
previous point in time.

> > Process can access file descriptors which are unreachable via path name
> > just fine indeed, but those fds still don't have a valid path in the
> > context of that process.
> 
> Which while problematic to your name based security is just fine to
> everything else.

Actually, no. We could live fine with leaving getcwd() and /proc/mounts as 
ambiguous / weird / broken as they are right now. All it would take would be 
to reambiguate the result of the unambiguous __d_path(), which is really 
easy. Everything that cares about real pathnames would use the unambiguous 
version while the legacy interfaces would use the ambiguous version. But that 
really wouldn't make sense.

> Ok, providing the "real" root sees them all it isn't so bad, but to
> assume you can filter based upon what the task can see is dodgy as an
> assumption.

Why?

Thanks,
Andreas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-16 Thread Peter Williams


Chris Friesen wrote:

William Lee Irwin III wrote:


The sorts of like explicit decisions I'd like to be made for these are:
(1) In a mixture of tasks with varying nice numbers, a given nice number
corresponds to some share of CPU bandwidth. Implementations
should not have the freedom to change this arbitrarily according
to some intention.


The first question that comes to my mind is whether nice levels should 
be linear or not.


No. That squishes one end of the table too much.  It needs to be 
(approximately) piecewise linear around nice == 0.  Here's the mapping I 
use in my entitlement based schedulers:


#define NICE_TO_LP(nice) ((nice >=0) ? (20 - (nice)) : (20 + (nice) * 
(nice)))


It has the (good) feature that a nice == 19 task has 1/20th the 
entitlement of a nice == 0 task and a nice == -20 task has 21 times the 
entitlement of a nice == 0 task.  It's not strictly linear for negative 
nice values but is very cheap to calculate and quite easy to invert if 
necessary.


 I would lean towards nonlinear as it allows a wider 
range (although of course at the expense of precision).  Maybe something 
like "each nice level gives X times the cpu of the previous"?  I think a 
value of X somewhere between 1.15 and 1.25 might be reasonable.


What about also having something that looks at latency, and how latency 
changes with niceness?


What about specifying the timeframe over which the cpu bandwidth is 
measured?  I currently have a system where the application designers 
would like it to be totally fair over a period of 1 second.


Have you tried the spa_ebs scheduler?  The half life is no longer a run 
time configurable parameter (as making it highly adjustable results in 
less efficient code) but it could be adjusted to be approximately 
equivalent to 0.5 seconds by changing some constants in the code.


 As you can 
imagine, mainline doesn't do very well in this case.


You should look back through the plugsched patches where many of these 
ideas have been experimented with.


Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Problem with ufs nextstep in 2.6.18 (debian)

2007-04-16 Thread Dale Amon

On Mon, Apr 16, 2007 at 05:04:22PM +0100, Dale Amon wrote:
> On Mon, Apr 16, 2007 at 11:32:04AM +0400, Evgeniy Dushistov wrote:
> > >The error also happens in 2.6.19, same as in 2.6.18.
> > >I extracted this from syslog: 
> > >Apr 17 00:14:15 kdev kernel: UFS-fs error (device loop0):
> > >ufs_check_page: bad entry
> > 
> > Is this happened also with this patch:
> > http://lkml.org/lkml/diff/2007/2/5/75/1
> 
> Thanks. I will try that out tonight GMT. Which kernel
> is that against? Will it work against a 2.6.19 or should
> I get a 2.6.20 and work with that?

Hmmm... looks like that patch is already applied in 
a 2.6.20.7? I will try that.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Repair-driven file system design (was Re: ZFS with Linux: An Open Plea)

2007-04-16 Thread David Chinner

On Mon, Apr 16, 2007 at 03:34:42PM -0700, Valerie Henson wrote:
> On Mon, Apr 16, 2007 at 01:07:05PM +1000, David Chinner wrote:
> > On Sun, Apr 15, 2007 at 08:50:25PM -0400, Rik van Riel wrote:
> >
> > > IMHO chunkfs could provide a much more promising approach.
> > 
> > Agreed, that's one method of compartmentalising the problem.
> 
> Agreed, the chunkfs design is only one way to implement repair-driven
> file system design - designing your file system to make file system
> check and repair fast and easy.  I've written a paper on this idea,
> which includes some interesting projections estimating that fsck will
> take 10 times as long on the 2013 equivalent of a 2006 file system,
> due entirely to changes in disk hardware.

That's assuming that repair doesn't get any more efficient. ;)

> So if your server currently
> takes 2 hours to fsck, an equivalent server in 2013 will take about 20
> hours.  Eek!  Paper here:
> 
> http://infohost.nmt.edu/~val/review/repair.pdf
> 
> While I'm working on chunkfs, I also think that all file systems
> should strive for repair-driven design.  XFS has already made big
> strides in this area (multi-threading fsck for multi-disk file
> systems, for example) and I'm excited to see what comes next.

Two steps forward, one step back.

We found that our original approach to multithreading doesn't always
work, and doesn't work at all for single disks. Under some test cases,
it goes *much* slower due to increased seeking of the disks.

This patch from the folk at Agami:

http://oss.sgi.com/archives/xfs/2007-01/msg00135.html

used a different threading approach to speeding up the repair
process - it basically did object path walking in separate threads
to prime the block device page cache so that when the real
repair thread needed the block it came from the blockdev cache
rather than from disk.

This sped up several phases of the repair process because of
re-reads needed in the different phases. What we found interesting
about this approach is that it showed that prefetching gave as good
or better results than simple parallelisation with a rudimentary
caching system. In most cases it was superior (lower runtime) to
the existing multithreaded xfs_repair.

However, the Agami object based prefetch does not speed up phase 3
on a single disk - like strided AG parallelism it increases disk
seeks and, as we discovered, causes lots of little backwards seeks
to occur. It also performs very poorly when there is not enough
memory to cache sufficient objects in the block dev cache (whose
size cannot be controlled). It sped things up by using prefetch to
speed up (repeated) I/O, not by using intelligent caching.

However, this patch has been very instructive on how we could
further improve the threading of xfs_repair - intelligent prefetch
is better than simple parallelism (from the Agami patch), caching is
far better than rereading (from the SGI repair level caching) and
that prefetching complements simple parallelism on volumes that can
take advantage of it.

We've ended up combining a threaded, two phase object walking
prefetch with spatial analysis of the inode and object layouts
and integration into a smarter internal cache. This cache is now
similar to the xfs_buf cache in the kernel and uses direct I/O
so if you have enough memory you only need to read objects from
disk once.

Spatial analysis of the metadata is used to determine the relative
density of the metadata in an area of disk before we read it. Using
a density function, we determine if we want to do lots of small I/Os
or one large I/O to read the entire region in one go and then split
it up in memory. Hence as metadata density increases, the number of
I/Os decrease and we pull enough data in to (hopefully) keep the
CPUs busy.

We still walk objects, but any blocks behind where we are currently
reading go into a secondary I/O queue to be issued later. Hence we
keep moving in one direction across the disk. Once the first pass is
complete, we then do the same analysis on the secondary list and run
that I/O all in a single pass across the disk.

This is effectively a result of observing that repair is typically seek
bound and only using 2-3MB/s of the bandwidth a disk has to offer.
Where metadata density is high, we are now seeing luns max out on
bandwidth rather than being seek bound. Effectively we are hiding
latency by using more bandwidth and that is a good tradeoff to
make for a seek bound app

The result of this is that even on single disks the reading of all
the metadata goes faster with this multithreaded prefetch model.  A
full 250GB SATA disk with a clean filesystem containing ~1.6 million
inodes is now taking less than 5 minutes to repair. A 5.5TB RAID5
volume with 30 million inodes is now taking about 4.5 minutes to
repair instead of 20 minutes. We're currently creating a
multi-hundred million inode filesystem to determine scalability to
the current bleeding edge.

One thing this makes me consider is

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-16 Thread Pavel Pisa

On Monday 16 April 2007 07:47, Davide Libenzi wrote:
> On Mon, 16 Apr 2007, Pavel Pisa wrote:
> > I cannot help myself to not report results with GAVL
> > tree algorithm there as an another race competitor.
> > I believe, that it is better solution for large priority
> > queues than RB-tree and even heap trees. It could be
> > disputable if the scheduler needs such scalability on
> > the other hand. The AVL heritage guarantees lower height
> > which results in shorter search times which could
> > be profitable for other uses in kernel.
> >
> > GAVL algorithm is AVL tree based, so it does not suffer from
> > "infinite" priorities granularity there as TR does. It allows
> > use for generalized case where tree is not fully balanced.
> > This allows to cut the first item withour rebalancing.
> > This leads to the degradation of the tree by one more level
> > (than non degraded AVL gives) in maximum, which is still
> > considerably better than RB-trees maximum.
> >
> > http://cmp.felk.cvut.cz/~pisa/linux/smart-queue-v-gavl.c
>
> Here are the results on my Opteron 252:
>
> Testing N=1
> gavl_cfs = 187.20 cycles/loop
> CFS = 194.16 cycles/loop
> TR  = 314.87 cycles/loop
> CFS = 194.15 cycles/loop
> gavl_cfs = 187.15 cycles/loop
>
> Testing N=2
> gavl_cfs = 268.94 cycles/loop
> CFS = 305.53 cycles/loop
> TR  = 313.78 cycles/loop
> CFS = 289.58 cycles/loop
> gavl_cfs = 266.02 cycles/loop
>
> Testing N=4
> gavl_cfs = 452.13 cycles/loop
> CFS = 518.81 cycles/loop
> TR  = 311.54 cycles/loop
> CFS = 516.23 cycles/loop
> gavl_cfs = 450.73 cycles/loop
>
> Testing N=8
> gavl_cfs = 609.29 cycles/loop
> CFS = 644.65 cycles/loop
> TR  = 308.11 cycles/loop
> CFS = 667.01 cycles/loop
> gavl_cfs = 592.89 cycles/loop
>
> Testing N=16
> gavl_cfs = 686.30 cycles/loop
> CFS = 807.41 cycles/loop
> TR  = 317.20 cycles/loop
> CFS = 810.24 cycles/loop
> gavl_cfs = 688.42 cycles/loop
>
> Testing N=32
> gavl_cfs = 756.57 cycles/loop
> CFS = 852.14 cycles/loop
> TR  = 301.22 cycles/loop
> CFS = 876.12 cycles/loop
> gavl_cfs = 758.46 cycles/loop
>
> Testing N=64
> gavl_cfs = 831.97 cycles/loop
> CFS = 997.16 cycles/loop
> TR  = 304.74 cycles/loop
> CFS = 1003.26 cycles/loop
> gavl_cfs = 832.83 cycles/loop
>
> Testing N=128
> gavl_cfs = 897.33 cycles/loop
> CFS = 1030.36 cycles/loop
> TR  = 295.65 cycles/loop
> CFS = 1035.29 cycles/loop
> gavl_cfs = 892.51 cycles/loop
>
> Testing N=256
> gavl_cfs = 963.17 cycles/loop
> CFS = 1146.04 cycles/loop
> TR  = 295.35 cycles/loop
> CFS = 1162.04 cycles/loop
> gavl_cfs = 966.31 cycles/loop
>
> Testing N=512
> gavl_cfs = 1029.82 cycles/loop
> CFS = 1218.34 cycles/loop
> TR  = 288.78 cycles/loop
> CFS = 1257.97 cycles/loop
> gavl_cfs = 1029.83 cycles/loop
>
> Testing N=1024
> gavl_cfs = 1091.76 cycles/loop
> CFS = 1318.47 cycles/loop
> TR  = 287.74 cycles/loop
> CFS = 1311.72 cycles/loop
> gavl_cfs = 1093.29 cycles/loop
>
> Testing N=2048
> gavl_cfs = 1153.03 cycles/loop
> CFS = 1398.84 cycles/loop
> TR  = 286.75 cycles/loop
> CFS = 1438.68 cycles/loop
> gavl_cfs = 1149.97 cycles/loop
>
>
> There seem to be some difference from your numbers. This is with:
>
> gcc version 4.1.2
>
> and -O2. But then and Opteron can behave quite differentyl than a Duron on
> a bench like this ;)

Thanks for testing, but yours numbers are more correct
than my first report. My numbers seemed to be over-optimistic even
to me, In the fact I have been surprised that difference is so high.
But I have tested bad version of code without GAVL_FAFTER option
set. The code pushed to the web page has been the correct one.
I have not get to look into case until now because I have busy day
to prepare some Linux based labs at university.

Without GAVL_FAFTER option, insert operation does fail
if item with same key is already inserted (intended feature of
the code) and as result of that, not all items have been inserted
in the test. The meaning of GAVL_FAFTER is find/insert after
all items with the same key value. Default behavior is
operate on unique keys in tree and reject duplicates.

My results are even worse for GAVL than yours.
It is possible to try tweak code and optimize it more
(likely/unlikely/do not keep last ptr etc) for this actual usage.
May it be, that I try this exercise, but I do not expect that
the result after tuning would be so much better, that it would
outweight some redesign work. I could see some advantages of AVL
still, but it has its own drawbacks with need of separate height
field and little worse delete in the middle timing.

So excuse me for disturbance. I have been only curious how
GAVL code would behave in the comparison of other algorithms
and I did not kept my premature enthusiasm under the lock.

Best wishes

 Pavel Pisa 


./smart-queue-v-gavl -n 4
gavl_cfs = 279.02 cycles/loop
CFS = 200.87 cycles/loop
TR  = 229.55 cycles/loop
CFS = 201.23 cycles/loop
gavl_cfs = 276.08 cycles/loop

./smart-queue-v-gavl -n 8
gavl_cfs = 310.92 cycles/loop
CFS = 288.45 cycles/loop
TR  = 192.46 cycles/loop
CFS

Re: [linux-usb-devel] [PATCH] hid: hid bus prototype 20070416

2007-04-16 Thread Li Yu

Jiri Kosina wrote:
> On Mon, 16 Apr 2007, Li Yu wrote:
>
>   
>> HID bus prototype 20070416
>> 
>
> Hi Li,
>
> thanks for taking care. Well, the patch is quite huge, do you think you 
> could split it into separate independent parts (use quilt or something 
> similar for patch management) which could be reviewed independently?
>
> As the code changes are often quite non-trivial, layering is changed, 
> lots of files are touched, etc. it would help a lot.
>
>   
OK, I must be next.
> Notes from a quick skim through the patch:
>
> - it seems that you accidentaly deleted the newly added quirk for
>mightymouse in the bluetooth hid code?
>   

They should be lost while I play bluetooth.
> - what is the point behind HID_QUIRK_SKIP? Why doesn't HID_QUIRK_IGNORE
>suffice? And why is it defined in so strange way:
>
> @@ -270,6 +271,7 @@ struct hid_item {
>   #define HID_QUIRK_LOGITECH_DESCRIPTOR  0x0010
>   #define HID_QUIRK_DUPLICATE_USAGES 0x0020
>   #define HID_QUIRK_RESET_LEDS   0x0040
> +#define HID_QUIRK_SKIP 0x8000
>
>   
I am sorry for missing some description here. In simple words, the
HID_QUIRK_IGNORE let usbhid do not register some hid devices at all,
however, the HID_QUIRK_SKIP just let usbhid skip matching with the
device which is marked that quirk, but still register this kind of hid
device. So another HID driver still handle have chances to handle it.
You can discover out there, the hid-core.c is not pure HID driver, it
also take the transports role here.

I think this quirk have such difference with others, so I define it so.
If we do like so, just change it. That is OK.

May be, we need another hid_skiplist[] ?

> - there are bunches of some easy codingstyle issues (spaces around '=',
>etc)
>
>   
Yes, This is one of reasons for it is only for review.
> But doing really thorough review is quite hard, as the patch contains lots 
> of unrelated changes. I'll look at it a little bit more, but when you send 
> a broken-out version with separate changes, that'd be great.
>
>   
OK.
> Thanks,
>
>   

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-16 Thread Peter Williams


Chris Friesen wrote:

Peter Williams wrote:

To my mind scheduling and load balancing are orthogonal and keeping 
them that way simplifies things.


Scuse me if I jump in here, but doesn't the load balancer need some way 
to figure out a) when to run, and b) which tasks to pull and where to 
push them?


Yes but both of these are independent of the scheduler discipline in force.



I suppose you could abstract this into a per-scheduler API, but to me at 
least these are the hard parts of the load balancer...


Load balancing needs to be based on the static priorities (i.e. nice or 
real time priority) of the runnable tasks not the dynamic priorities. 
If the load balancer manages to keep the weighted (according to static 
priority) load and distribution of priorities within the loads on the 
CPUs roughly equal and the scheduler does a good job of ensuring 
fairness, interactive responsiveness etc. for the tasks within a CPU 
then the result will be good system performance within the constraints 
set by the sys admins use of real time priorities and nice.


The smpnice modifications to the load balancer were meant to give it the 
appropriate behaviour and what we need to fix now is the intra CPU 
scheduling.


Even if the load balancer isn't yet perfect perfecting it can be done 
separately to fixing the scheduler preferably with as little 
interdependency as possible.  Probably the only contribution to load 
balancing that the scheduler really needs to make is the calculating of 
the average weighted load on each of the CPUs (or run queues if there's 
more than one CPU per runqueue).


Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair

2007-04-16 Thread Peter Williams


Al Boldi wrote:

Peter Williams wrote:

Al Boldi wrote:

Reducing the prio-level granularity may also be helpful;

Because of some of the bit operations code makes it a bad idea to have
more than 160 priority levels, you're more or less limited to 60
priority levels for SCHED_OTHER tasks (as 100 are used for real time)
and you need 40 of these to pay some attention to niceness leaving you
about 20 priority levels to use for fiddling.  Is that enough?

With spa_ebs (now that CPU rate caps have been removed), you have all 60
priorities available for fiddling with as niceness is taken care of when
calculating each task's entitlement.


Ok, increasing the number of prio-levels is one thing, but I was more 
thinking of reducing the effective difference between each prio-level. For 
example, this would allow max_tpt_bonus=18, while the effective range would 
be 3, thus reducing granularity.  Would this be easily introduceable?


OK.  Now (I think) I see what you mean.  I think that you could achieve 
this effect by shortening the promotion interval which I think is still 
one of the tunables.  This effectively controls the strength of priority 
levels -- short promotion intervals weaken and long promotion intervals 
strengthen the effect of different priority levels.


Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [CRYPTO] is it really optimized ?

2007-04-16 Thread Herbert Xu

On Mon, Apr 16, 2007 at 10:37:01AM +0200, Francis Moreau wrote:
> 
> BTW, here are figures I got with 2 different versions of the driver
> when using tcrypt module. The second being the result with the
> optimized driver (no key reloading on each block):
> 
> normal version:
> test 4 (128 bit key, 8192 byte blocks): 1 operation in 67991 cycles (8192 
> bytes)
> 
> optimized version:
> test 4 (128 bit key, 8192 byte blocks): 1 operation in 51783 cycles (8192 
> bytes)
> 
> So the gain is 16000 cycles which seems to worth the change, isn't it ?

Sounds like it would.  It would help of course if you posted the patch :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: AppArmor FAQ

2007-04-16 Thread James Morris

On Mon, 16 Apr 2007, John Johansen wrote:

> Label-based security (exemplified by SELinux, and its predecessors in
> MLS systems) attaches security policy to the data. As the data flows
> through the system, the label sticks to the data, and so security
> policy with respect to this data stays intact. This is a good approach
> for ensuring secrecy, the kind of problem that intelligence agencies have.

Labels are also a good approach for ensuring integrity, which is one of 
the most fundamental aspects of the security model implemented by SELinux.  

Some may infer otherwise from your document.

> Pathname-based security (exemplified in AppArmor, and its predecessor
> Janus http://www.cs.berkeley.edu/~daw/janus/ and other systems like
> Systrace http://www.citi.umich.edu/u/provos/systrace/ ) attach security
> policy to the name of the data.
> 
> Controlling access to filenames is important because applications
> primarily use those names to access the files behind them, and they
> depend on getting to the right files. For example, login(1) expects
> /etc/passwd to resolve to a valid list of user accounts.

And it should, but alas may instead find otherwise due to namespace 
manipulation, object aliasing (e.g. symlinks), application error, 
configuration error, corrupted files, corrupted filesystems, misbehavior 
due to malware infection or various forms user error.

A pathname tells you nothing reliable about the security properties of the 
object its pointing to.  It is simply a mechanism for locating and 
referring to an object.

> In the traditional UNIX model, files do have names but not labels, and 
> applications only operate in terms of those names.

Just to be clear (as the above conflates two distinct notions):  
applications under SELinux still use pathnames for locating and referring 
to files.

SELinux security is enforced within the kernel, and an application which 
does not have permission to access an object will simply receive an error 
using the standard Unix mechanisms already used for DAC.  For example, a 
write(2) might fail with an EACCES error code.

The pathname used by an application to access an object has _nothing_ to 
do with the security attributes of the object.

Traditional Unix security in fact does not primarily depend on pathnames, 
but on DAC ownership and permission attributes stored in the file's inode.

DAC is of course a form of labeled security.

Imagine if you were re-inventing Unix and decided to implement pathname 
security for DAC instead of inode labeling.  What you would have is a more 
generalized version of apparmor, with the DAC attributes of pathnames for 
the entire filesystem stored in a text database with an in-kernel regex 
engine performing path reconstruction and pattern matching on every file 
access.  Sound like a good idea?  I hope not.

How about an analogy: think of kernel objects which are protected by 
locks.  Do you lock the path to the object or do you lock the object 
itself?

> Pathname-based security puts more emphasis on the integrity of the 
> system, making secrecy the secondary goal that follows.

This assertion is being made without any supporting evidence or rationale.  

If you're comparing pathname vs. label security, then it is clear that 
direct object labeling allows the security attributes of the system to be 
specified completely and unambiguously, whereas integrity enforced via 
pathnames alone requires several constraints to be applied to the goals of 
the policy.  So, it seems to me that the opposite of what you say is more 
correct, although it is a fairly oblique argument to start with.

More significant to note is that Type Enforcement was designed 
specifically to address integrity requirements, in response to the 
limitations of the early MLS models which were focused on confidentiality.

See:

"A Practical Alternative to Hierarchical Integrity Policies"
Boebert & Kain, Proceedings of the Eighth National Computer Security 
Conference, 1985.

"Meeting Critical Security Objectives with Security-Enhanced Linux"
http://www.nsa.gov/selinux/papers/ottawa01/index.html

Or pretty much any paper on the design of SELinux or Flask.

Integrity control is a foundational aspect of TE, Flask and SELinux.  

I've never understood why AppArmor presentations tend to so bizarrely 
suggest the opposite.

> Caveat: Both label-based security and pathname-based security can
> provide both secrecy and integrity protection, the above discussion is
> only about which model makes it easier to provide which kind of security.

I don't see how you've established anything in this regard.

> We acknowledge that not all objects on a UNIX system are paths, and we
> agree that there is value in also protecting non-path resources.
> Contrary to popular belief, AppArmor is *not* "Pathnames R Us", but
> rather "Use native abstractions to mediate stuff":  when you mediate
> something, you should use the native syntax that users normally use to
> access the

Re: PROBLEM: kernel 2.6.20.6 build failed for ppc board chestnut(ibm ppc 750GX/FX)

2007-04-16 Thread Josh Boyer

On Mon, Apr 16, 2007 at 01:13:01PM +0800, Wang, Baojun wrote:
> PROBLEM: linux kernel 2.6.20.6 build failed for ppc board chestnut(ibm ppc 
> 750GX/FX)
> 

Confirmed.  arch/ppc isn't getting much love these days.



> this brute force patch sould solve the problem:

This is missing a Signed-off-by: line.

> diff -Nru /tmp/linux-2.6.20.6/arch/ppc/platforms/chestnut.c \  
> linux-2.6.20.6/arch/ppc/platforms/chestnut.c
> 
> --- /tmp/linux-2.6.20.6/arch/ppc/platforms/chestnut.c   2007-04-07 
> 04:02:48.0 +0800
> +++ linux-2.6.20.6/arch/ppc/platforms/chestnut.c2007-04-13 
> 17:09:03.0 +0800
> @@ -432,7 +432,9 @@
> ptbl.name = "User FS";
> ptbl.size = CHESTNUT_32BIT_SIZE;
> 
> -   physmap_map.size = CHESTNUT_32BIT_SIZE;
> +   // physmap_map.size = CHESTNUT_32BIT_SIZE;

Just remove this completely.  It's not needed any longer.

> +   physmap_configure(CHESTNUT_32BIT_BASE, CHESTNUT_32BIT_SIZE, 
> CONFIG_MTD_PHYSMAP_BANKWIDTH, NULL);

Technically, this call isn't needed.  The chestnut_defconfig already provides
the correct variables.

josh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-16 Thread Peter Williams


Mike Galbraith wrote:

On Sun, 2007-04-15 at 13:27 +1000, Con Kolivas wrote:

On Saturday 14 April 2007 06:21, Ingo Molnar wrote:

[announce] [patch] Modular Scheduler Core and Completely Fair Scheduler
[CFS]

i'm pleased to announce the first release of the "Modular Scheduler Core
and Completely Fair Scheduler [CFS]" patchset:

   http://redhat.com/~mingo/cfs-scheduler/sched-modular+cfs.patch

This project is a complete rewrite of the Linux task scheduler. My goal
is to address various feature requests and to fix deficiencies in the
vanilla scheduler that were suggested/found in the past few years, both
for desktop scheduling and for server scheduling workloads.
The casual observer will be completely confused by what on earth has happened 
here so let me try to demystify things for them.


[...]

Demystify what?   The casual observer need only read either your attempt
at writing a scheduler, or my attempts at fixing the one we have, to see
that it was high time for someone with the necessary skills to step in.


Make that "someone with the necessary clout".


Now progress can happen, which was _not_ happening before.



This is true.

Peter
--
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Major qla2xxx regression on sparc64

2007-04-16 Thread David Miller

From: Andrew Vasquez <[EMAIL PROTECTED]>
Date: Mon, 16 Apr 2007 16:47:05 -0700

> Dave, according to your earlier emails, the qla2xxx driver worked
> 'fine' in driver versions before commit
> 7aef45ac92f49e76d990b51b7ecd714b9a608be1.  If that were the case, then
> you would have seen the warning messages:
> 
>   ...
>   qla_printk(KERN_WARNING, ha, "Falling back to functioning (yet "
>   "invalid -- WWPN) defaults.\n");

I have in fact seen the message several times and that messages gives
me no reason to believe something needs to be fixed.

It should have said "PLEASE REPORT THIS to [EMAIL PROTECTED]" or
something similar to indicate the severity better.

"An invalid WWPN, what's that?" said the user. :)

How about "FC IDs may conflict and cause miscommunication!  Please
report to driver author so this can be fixed!" or similar?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.21-rc7

2007-04-16 Thread Chuck Ebbert

Linus Torvalds wrote:
> Since we're still waiting for resolution for some regressions that people 
> weren't able to work on last week, there's a new -rc kernel out there. 
> Hopefully we'll get them all and I can do 2.6.21-final next weekend or 
> so..
> 

The patch to k8.c didn't make it in:

cache_k8_northbridges() is storing config values to incorrect locations
(in flush_words) and also its overflowing beyond the allocation, causing
slab verification failures.

Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/k8.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.21-rc6/arch/x86_64/kernel/k8.c
===
--- linux-2.6.21-rc6.orig/arch/x86_64/kernel/k8.c   2007-04-05 
19:36:56.0 -0700
+++ linux-2.6.21-rc6/arch/x86_64/kernel/k8.c2007-04-13 07:51:57.0 
-0700
@@ -61,8 +61,8 @@ int cache_k8_northbridges(void)
dev = NULL;
i = 0;
while ((dev = next_k8_northbridge(dev)) != NULL) {
-   k8_northbridges[i++] = dev;
-   pci_read_config_dword(dev, 0x9c, _words[i]);
+   k8_northbridges[i] = dev;
+   pci_read_config_dword(dev, 0x9c, _words[i++]);
}
k8_northbridges[i] = NULL;
return 0;




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Major qla2xxx regression on sparc64

2007-04-16 Thread Andrew Vasquez

On Mon, 16 Apr 2007, David Miller wrote:

> From: Andrew Vasquez <[EMAIL PROTECTED]>
> Date: Mon, 16 Apr 2007 16:28:51 -0700
> 
> > Sorry, but let's be realistic, this type of warning would have
> > *NEVER* been addressed if we kept the status quo
> 
> Wrong.  I watch the logs all the time and would have sent you a fix to
> use the Sparc firmware info as soon as I saw the kernel log message.

Dave, according to your earlier emails, the qla2xxx driver worked
'fine' in driver versions before commit
7aef45ac92f49e76d990b51b7ecd714b9a608be1.  If that were the case, then
you would have seen the warning messages:

...
qla_printk(KERN_WARNING, ha, "Falling back to functioning (yet "
"invalid -- WWPN) defaults.\n");

> Anyone who has worked with me over the last 15 years will let you know
> emphatically that this is true.
> 
> AND IN THE MEAN TIME I COULD GET WORK DONE AND MY SYSTEM WOULD BOOT!

I understand that, and recognize your contribution, that was never in
question.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: intermittant petabyte usage reported with broadcom nic

2007-04-16 Thread CaT

On Mon, Apr 16, 2007 at 12:10:51PM -0700, Michael Chan wrote:
> On Sat, 2007-04-14 at 17:20 -0700, Michael Chan wrote:
> 
> > I also like Andi's idea of using change_page_attr() to isolate the
> > problem.  I'll try to send you a debug patch in the next few days to try
> > that out.  Thanks.
> 
> Here's the debug patch for x86 only that will change the statistics
> memory block to read-only.  If the kernel is corrupting it, you should
> get a page fault that will crash the system.  If you continue to see
> bogus counters, it is definitely a firmware or hardware problem.  Please
> try it and let me know.  Thanks.

Ahh. Would truly love to but the moment you said 'crash the system' I
had to bail. These boxes are in production and as such a crash would be,
shall we say, unwelcome. I might be able to fenagle something but I
very-much doubt it.

Perhaps Jean-Daniel, who is also experiencing this problem and seemingly
more frequently then I, has a box that he could run your patch on. I
think we both run pretty-much the same hardware (Dell [12]950s). I've
CCed him.

-- 
"To the extent that we overreact, we proffer the terrorists the
greatest tribute."
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Major qla2xxx regression on sparc64

2007-04-16 Thread David Miller

From: Andrew Vasquez <[EMAIL PROTECTED]>
Date: Mon, 16 Apr 2007 16:28:51 -0700

> Sorry, but let's be realistic, this type of warning would have
> *NEVER* been addressed if we kept the status quo

Wrong.  I watch the logs all the time and would have sent you a fix to
use the Sparc firmware info as soon as I saw the kernel log message.

Anyone who has worked with me over the last 15 years will let you know
emphatically that this is true.

AND IN THE MEAN TIME I COULD GET WORK DONE AND MY SYSTEM WOULD BOOT!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: Bad page state errors during kernel make

2007-04-16 Thread Chuck Ebbert

Zach Carter wrote:
> 
> Dave Jones wrote:
>> On Sun, Apr 15, 2007 at 08:30:27PM -0700, Zach Carter wrote:
>>  > list_del corruption. prev->next should be c21a4628, but was e21a4628
>>
>> 'c' became 'e' in that last address. A single bit flipped.
>> Given you've had this for some time, this smells like a hardware problem.
>> memtest86+ will probably show up something.
> 
> Hum.   I forgot to mention in my report that I had already run thru 10
> clean passes with memtest86+
> 
> Do you think there might be other bad hw, or another explanation?

memtest86 does not really stress everything a real kernel compile
would.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Major qla2xxx regression on sparc64

2007-04-16 Thread Andrew Vasquez

On Mon, 16 Apr 2007, David Miller wrote:

> From: Andrew Vasquez <[EMAIL PROTECTED]>
> Date: Mon, 16 Apr 2007 15:25:17 -0700
> 
> > Fine, I'll agree that wacking-users (and
> > I'll wager the outliers) with a 2x4 was a bit extreme,
> 
> And that, right there, is basically the end of the conversation.
> 
> You don't do this to users, ever.
> Put a big loud kernel log message in there when this situation
> presents itself, use as many capital letters and scary language that
> you wish.  Let them know that if things explode they get to keep the
> pieces.
> 
> But at least try to give them something that works when you know that
> you can.
>
> You don't need to make someone's system unbootable in order to make
> them aware of a potential problem.  It's very anti-social to approach

Sorry, but let's be realistic, this type of warning would have *NEVER*
been addressed if we kept the status quo -- your modifications to read
the wwpn/wwnn would have never been submitted, everybody would have
kept going on blistfully ignorant of the issue.  Changes such as these
are a common Linux upstream idiom...

So, meeting in the middle, with the NVRAM bits restored along with
some ability for the user to *knowingly* recognize the problem, I take
it, is not going to work for you?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: If not readdir() then what?

2007-04-16 Thread Neil Brown

On Monday April 16, [EMAIL PROTECTED] wrote:
> 
> The challenge is making it be stable across inserts/deletes, never
> mind reboots.  And it's not a "little bit of cacheing"; in order to be
> correct we would have to cache *forever*, since at least in theory an
> NFS client could hold on to a cookie for an arbitrarily long period of
> time (weeks, months, years, decades), yes?

Yes.  But I think we've already establish that the on-disk structure
chosen by ext3/htree is not able to perfectly support NFS (which is a
pity given that it was written for Linux and Linux is thought to
support NFS).  Our goal is to find the best mapping possible and,
where caching can improve stability for real-world uses, use caching
to help stabilise that mapping.

> 
> You're welcome to try, but I suspect it won't take long before you'll
> see why I'm asserting that a directory fd cache in nfsd is *way* less
> work.  :-)

You have provided some very helpful insights into how ext3/htree
currently works - thanks for that.
I will definitely make a closer inspection of the code and so how
possible it is to realise by ideas.  I'll let you know how I go.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Staircase cpu scheduler v17.1

2007-04-16 Thread Con Kolivas

Greetings all

Here is the current release of the Staircase cpu scheduler (the original 
generation I design that spurned development elsewhere for RSDL), for 
2.6.21-rc7

http://ck.kolivas.org/patches/pre-releases/2.6.21-rc7/2.6.21-rc7-ck1/patches/sched-staircase-17.1.patch

To remind people where this cpu scheduler fits into the picture:

-It is purpose built with interactivity first and foremost.
-It aims to be mostly fair most of the time
-It is has strong semantics desribing the cpu relationship between different 
nice levels (nice 19 is 1/20th the cpu of nice 0).
-It is resistant to most forms of starvation
-Latency of tasks that are not heavily cpu bound is exceptionally low 
irrespective of nice level -if they stay within their cpu bounds; What this 
means is you can have and audio application if it uses very little cpu 
running at nice 19 and it will still be unlikely to skip audio in the 
presence of a kernel compile nice -20.
-Therefore you can renice X or whatever to your heart's content, but then... 
you don't need to renice X with this design.
-The design is a single priority array very low overhead small codebase (the 
diffstat summary obviously muddied by removing more comments is 4 files 
changed, 418 insertions(+), 714 deletions(-))
4 files changed, 418 insertions(+), 714 deletions(-)

Disadavantages:
-There are heuristics
-There are some rare cpu usage patterns that can lead to excessive unfairness 
and relative starvation.

Bonuses:
With the addition of further patches in that same directory above it has:
- An interactive tunable flag which further increases the fairness and makes 
nice values more absolutely determine latency (instead of cpu usage vs 
entitlement determining latency as the default above)
/proc/sys/kernel/interactive 
- A compute tunable which makes timeslices much longer and has delayed 
preemption for maximum cpu cache utilisation for compute intensive workloads
/proc/sys/kernel/compute 
- A soft realtime unprivileged policy for normal users with a tunable maximum 
cpu usage set to 80% by default
/proc/sys/kernel/iso_cpu
- A background scheduling class that uses zero cpu usage resources if any 
other task wants cpu.

This is unashamedly a relatively unfair slightly starveable cpu scheduler with 
exceptional quality _Desktop_ performance as it was always intended to be. 

It is NOT intended for mainline use as mainline needs a general purpose cpu 
scheduler (remember!). I have no intention of pushing it as such given its 
disadvantages, and don't really care about those disadvantages as I have no 
intention of trying to "plug up" the theoretical exploits and disadvantages 
either since desktops aren't really affected BUT this scheduler is great fun 
to use. Unfortunately the version of this scheduler in plugsched is not up to 
date with this code. Perhaps if demand for plugsched somehow turns the world 
on its head then this code may have a place elsewhere too.

Enjoy! If you don't like it? Doesn't matter; you have a choice so just use 
something else. This is code that will only be in -ck.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: so what is obsolete and removable?

2007-04-16 Thread Randy Dunlap

On Tue, 17 Apr 2007 00:39:10 +0200 Tilman Schmidt wrote:

> Am 15.04.2007 22:55 schrieb Robert P. J. Day:
> >   as i recall, the isdn4linux was *un*obsoleted, wasn't it?
> 
> Actually, it wasn't.
> 
> We *did* reach a consensus that isdn4linux is not obsolete in the
> accepted sense of the word, because there is no replacement for it
> so far.
> 
> OTOH I have since submitted (twice, in fact) a patch that would remove
> the "(obsolete)" label from the Kconfig entry, but somehow nothing
> ever became of it. My submissions just linger in LKML, uncommented and
> unmerged.

Did you submit the patch to Andrew Morton?
Is the patch in the -mm patchset?
Did Karsten ack the patch?

If the patch is in -mm and it's not critical (like this subject),
then it probably won't be merged until after 2.6.21 is released...


> To sum it up, we agree that the "(obsolete)" label is wrong, but we
> won't remove it. I have no idea how to resolve that situation.
> 
> What I do know is that it would be very wrong to remove isdn4linux,
> because it has an existing userbase with nowhere else to go.


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-16 Thread Michael K. Edwards


On 4/16/07, Peter Williams <[EMAIL PROTECTED]> wrote:

Note that I talk of run queues
not CPUs as I think a shift to multiple CPUs per run queue may be a good
idea.


This observation of Peter's is the best thing to come out of this
whole foofaraw.  Looking at what's happening in CPU-land, I think it's
going to be necessary, within a couple of years, to replace the whole
idea of "CPU scheduling" with "run queue scheduling" across a complex,
possibly dynamic mix of CPU-ish resources.  Ergo, there's not much
point in churning the mainline scheduler through a design that isn't
significantly more flexible than any of those now under discussion.

For instance, there are architectures where several "CPUs"
(instruction stream decoders feeding execution pipelines) share parts
of a cache hierarchy ("chip-level multitasking").  On these machines,
you may want to co-schedule a "real" processing task on one pipeline
with a "cache warming" task on the other pipeline -- but only for
tasks whose memory access patterns have been sufficiently analyzed to
write the "cache warming" task code.  Some other tasks may want to
idle the second pipeline so they can use the full cache-to-RAM
bandwidth.  Yet other tasks may be genuinely CPU-intensive (or I/O
bound but so context-heavy that it's not worth yielding the CPU during
quick I/Os), and hence perfectly happy to run concurrently with an
unrelated task on the other pipeline.

There are other architectures where several "hardware threads" fight
over parts of a cache hierarchy (sometimes bizarrely described as
"sharing" the cache, kind of the way most two-year-olds "share" toys).
On these machines, one instruction pipeline can't help the other
along cache-wise, but it sure can hurt.  A scheduler designed, tested,
and tuned principally on one of these architectures (hint:
"hyperthreading") will probably leave a lot of performance on the
floor on processors in the former category.

In the not-so-distant future, we're likely to see architectures with
dynamically reconfigurable interconnect between instruction issue
units and execution resources.  (This is already quite feasible on,
say, Virtex4 FX devices with multiple PPC cores, or Altera FPGAs with
as many Nios II cores as fit on the chip.)  Restoring task context may
involve not just MMU swaps and FPU instructions (with state-dependent
hidden costs) but processsor reconfiguration.  Achieving "fairness"
according to any standard that a platform integrator cares about (let
alone an end user) will require a fairly detailed model of the hidden
costs associated with different sorts of task switch.

So if you are interested in schedulers for some reason other than a
paycheck, let the distros worry about 5% improvements on x86[_64].
Get hold of some different "hardware" -- say:
 - a Xilinx ML410 if you've got $3K to blow and want to explore
reconfigurable processors;
 - a SunFire T2000 if you've got $11K and want to mess with a CMT
system that's actually shipping;
 - a QEMU-simulated massively SMP x86 if you're poor but clever
enough to implement funky cross-core cache effects yourself; or
 - a cycle-accurate simulator from Gaisler or Virtio if you want a
real research project.
Then go explore some more interesting regions of parameter space and
see what the demands on mainline Linux will look like in a few years.

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -rc7 Re] [Trivial] Spelling at drivers/video/Kconfig

2007-04-16 Thread Antonino A. Daplas

On Tue, 2007-04-17 at 00:21 +0200, Miguel Ojeda wrote:
> "Trivial patch, against -rc6. I don't know if anyone has fixed this by now."
> 

I'll pick this up.

Tony


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc6 + firstfloor patches: BUG: sleeping function called from invalid context at kernel/sched,.c:3643

2007-04-16 Thread Jeremy Fitzhardinge

Andi Kleen wrote:
> Hmm, are you sure? Can you double check?  With the latest tree?
>
> I could reproduce the problem and my change fixed the problem for me.
>   

Hm.  Me too.  I just booted 2.6.21-rc7-ff-paravirt, and it seems fine.

J

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

2007-04-16 Thread Chuck Ebbert

Brad Campbell wrote:
> Brad Campbell wrote:
>> G'day all,
>>
>> All I have is a digital photo of this oops. (It's 3.5mb). I have
>> serial console configured, but Murphy is watching me carefully and I
>> just can't seem to reproduce it while logging the console output.
>>
> 
> And as usual, after trying to capture one for 4 days, I get one 10 mins
> after I've sent the E-mail :)
> 
> I think I've just found a way to make this easier to reproduce as
> /dev/sdd was not even mounted this
> time. I just cold booted and started an md5sum -c run on a directory of
> about 180GB.
> 
> [ 2566.192665] BUG: unable to handle kernel NULL pointer dereference at
> virtual address 005c
> [ 2566.218242]  printing eip:
> [ 2566.226362] c0203169
> [ 2566.232906] *pde = 
> [ 2566.241274] Oops:  [#1]
> [ 2566.249637] Modules linked in:
> [ 2566.258832] CPU:0
> [ 2566.258833] EIP:0060:[]Not tainted VLI
> [ 2566.258834] EFLAGS: 00010082   (2.6.21-rc6-git5 #1)
> [ 2566.296146] EIP is at cfq_dispatch_insert+0x19/0x70
> [ 2566.310761] eax: f7a0eae0   ebx: f7a0cb28   ecx: e2f869e8   edx:
> 
> [ 2566.331076] esi: f79fea7c   edi: f7d04ac0   ebp:    esp:
> f6945de0
> [ 2566.351388] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
> [ 2566.368843] Process md5sum (pid: 2875, ti=f6944000 task=f68f4ad0
> task.ti=f6944000)
> [ 2566.390975] Stack:  f79fea7c f7d04ac0  c02032d9
> f6ae5ef0 c0133411 1000
> [ 2566.416414]0008  0004 0b582fd4 f79fea7c
> f7d04ac0 f79fea7c 
> [ 2566.441870]c0203519 f7a0cb28 f7a0cb28 f79e 0282
> c01fb7a9  c016ea4d
> [ 2566.467326] Call Trace:
> [ 2566.475236]  [] __cfq_dispatch_requests+0x79/0x170
> [ 2566.492224]  [] do_generic_mapping_read+0x281/0x470
> [ 2566.509473]  [] cfq_dispatch_requests+0x69/0x90
> [ 2566.525681]  [] elv_next_request+0x39/0x130
> [ 2566.540850]  [] bio_endio+0x5d/0x90
> [ 2566.553942]  [] scsi_request_fn+0x45/0x280
> [ 2566.568851]  [] blk_run_queue+0x32/0x70
> [ 2566.582982]  [] scsi_next_command+0x30/0x50
> [ 2566.598154]  [] scsi_end_request+0x9b/0xc0
> [ 2566.613063]  [] scsi_io_completion+0x81/0x330
> [ 2566.628751]  [] scsi_delete_timer+0xb/0x20
> [ 2566.643661]  [] ata_scsi_qc_complete+0x65/0xd0
> [ 2566.659613]  [] sd_rw_intr+0x8b/0x220
> [ 2566.673222]  [] ata_altstatus+0x1c/0x20
> [ 2566.687352]  [] ata_hsm_move+0x14d/0x3f0
> [ 2566.701744]  [] scsi_finish_command+0x40/0x60
> [ 2566.717434]  [] scsi_softirq_done+0x6f/0xe0
> [ 2566.732604]  [] sil_interrupt+0x81/0x90
> [ 2566.746733]  [] blk_done_softirq+0x58/0x70
> [ 2566.761644]  [] __do_softirq+0x6f/0x80
> [ 2566.775516]  [] do_softirq+0x27/0x30
> [ 2566.788866]  [] do_IRQ+0x3e/0x80
> [ 2566.801177]  [] common_interrupt+0x23/0x28
> [ 2566.816090]  ===
> [ 2566.826793] Code: 3e 05 f0 ff e9 47 ff ff ff 89 f6 8d bc 27 00 00 00
> 00 83 ec 10 89 1c 24 89 6c
> 24 0c 89 74 24 04 89 7c 24 08 89 c3 89 d5 8b 40 0c <8b> 72 5c 8b 78 04
> 89 d0 e8 4a fa ff ff 8b 45 14
> 89 ea 25 01 80
> [ 2566.886586] EIP: [] cfq_dispatch_insert+0x19/0x70 SS:ESP
> 0068:f6945de0
> [ 2566.909179] Kernel panic - not syncing: Fatal exception in interrupt

cfq_dispatch_insert() was called with rq == 0. This one is getting really
annoying... and md is involved again (RAID0 this time.)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: so what is obsolete and removable?

2007-04-16 Thread Tilman Schmidt

Am 15.04.2007 22:55 schrieb Robert P. J. Day:
>   as i recall, the isdn4linux was *un*obsoleted, wasn't it?

Actually, it wasn't.

We *did* reach a consensus that isdn4linux is not obsolete in the
accepted sense of the word, because there is no replacement for it
so far.

OTOH I have since submitted (twice, in fact) a patch that would remove
the "(obsolete)" label from the Kconfig entry, but somehow nothing
ever became of it. My submissions just linger in LKML, uncommented and
unmerged.

To sum it up, we agree that the "(obsolete)" label is wrong, but we
won't remove it. I have no idea how to resolve that situation.

What I do know is that it would be very wrong to remove isdn4linux,
because it has an existing userbase with nowhere else to go.

-- 
Tilman Schmidt  E-Mail: [EMAIL PROTECTED]
Wehrhausweg 66  Fax: +49 228 4299019
53227 Bonn
Germany



signature.asc
Description: OpenPGP digital signature

Repair-driven file system design (was Re: ZFS with Linux: An Open Plea)

2007-04-16 Thread Valerie Henson

On Mon, Apr 16, 2007 at 01:07:05PM +1000, David Chinner wrote:
> On Sun, Apr 15, 2007 at 08:50:25PM -0400, Rik van Riel wrote:
>
> > IMHO chunkfs could provide a much more promising approach.
> 
> Agreed, that's one method of compartmentalising the problem.

Agreed, the chunkfs design is only one way to implement repair-driven
file system design - designing your file system to make file system
check and repair fast and easy.  I've written a paper on this idea,
which includes some interesting projections estimating that fsck will
take 10 times as long on the 2013 equivalent of a 2006 file system,
due entirely to changes in disk hardware.  So if your server currently
takes 2 hours to fsck, an equivalent server in 2013 will take about 20
hours.  Eek!  Paper here:

http://infohost.nmt.edu/~val/review/repair.pdf

While I'm working on chunkfs, I also think that all file systems
should strive for repair-driven design.  XFS has already made big
strides in this area (multi-threading fsck for multi-disk file
systems, for example) and I'm excited to see what comes next.

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Major qla2xxx regression on sparc64

2007-04-16 Thread David Miller

From: Andrew Vasquez <[EMAIL PROTECTED]>
Date: Mon, 16 Apr 2007 15:25:17 -0700

> Fine, I'll agree that wacking-users (and
> I'll wager the outliers) with a 2x4 was a bit extreme,

And that, right there, is basically the end of the conversation.

You don't do this to users, ever.

Put a big loud kernel log message in there when this situation
presents itself, use as many capital letters and scary language that
you wish.  Let them know that if things explode they get to keep the
pieces.

But at least try to give them something that works when you know that
you can.

You don't need to make someone's system unbootable in order to make
them aware of a potential problem.  It's very anti-social to approach
things in this way.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [lm-sensors] Could the k8temp driver be interfering with ACPI?

2007-04-16 Thread Bjorn Helgaas

On Monday 16 April 2007 15:14, Luca Tettamanti wrote:
> It seems that Asus exposes monitorining data using "ATK0110" (enumerated
> in DSDT); I see it both on my P5B-E motherboard and on my notebook (L3D)
> (they have different methods though). Another motherboard with the same
> device may actually call it "FOOBAR123" or "WTFISTHIS".

Yup, we have the same problem with other devices.  See the long list
of PNP IDs in 8250_pnp.c :-)

> Problem is that ACPI methods are not documented at all (how am I
> supposed to know that "G6T6" is the reading of the 12V rail?) while the
> datasheet of hw monitoring chips (w83627ehf in my case) are public (more
> or less).

Yes, I see that it's attractive to use a single w83627ehf.c driver.
For an ACPI driver, we'd have to build a list of PNP IDs, and possibly
information about which methods read which information.  That's
certainly more work.

On the other hand, the ACPI driver would avoid the synchronization
issues that started this whole thread.  That's a pretty compelling
advantage.

> Furthermore, sensor driver exposes all the reading of the chip
> (e.g. in the DSDT I can't find the VSB or battery voltage).

Maybe Asus didn't hook up those readings on the board.  I would
guess that PC Probe doesn't expose the VSB or battery voltage either.

I'm sure you've seen these:
  http://lists.lm-sensors.org/pipermail/lm-sensors/2005-October/014050.html
  http://www.lm-sensors.org/wiki/AsusFormulaHacking

Looks like nobody took up the challenge, though :-)  It looks fun
to play with, if only I had the time and hardware.

Bjorn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Major qla2xxx regression on sparc64

2007-04-16 Thread Andrew Vasquez

On Mon, 16 Apr 2007, David Miller wrote:

> From: Andrew Vasquez <[EMAIL PROTECTED]>
> Date: Mon, 16 Apr 2007 14:10:49 -0700
> 
> > Ok, how about the following patch based on the one you posted which
> > adds the codes to retrieve the WWPN/WWNN from firmware on SPARC, and
> > also adds the module-parameter override I mentioned above.
> > 
> > Perhaps the module-parameter should be set to non-zero in the case of
> > SPARC, to take care of your system configurations?
> 
> I think it should default to non-zero always, in fact the option
> is completely pointless.
> 
> The guy who hits this had a system which worked previously, and you're
> explicitly breaking it.  That's wrong.

Sorry, 'it' didn't work...  'It' *never* did.

> How can you not see that this quality of implementation decision
> you're making stinks?

You're defending a position which itself left users with a false sense
of security and comfort.  This is a *real* problem from an enterprise
perspective where FC reigns.  Fine, I'll agree that wacking-users (and
I'll wager the outliers) with a 2x4 was a bit extreme, but I'd much
rather handle those users on a case-by-case basis, either by:

* If dealing with a PCI card, directing a user  to a support staff at
  QLogic to resolve the NVRAM issues.

* If it's some on-board ISP with no NVRAM, as was your SPARC case,
  then add *proper* codes to retrieve the data from some secondary
  persistent store.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Loud "pop" coming from hard drive on reboot

2007-04-16 Thread Chuck Ebbert

Jan Engelhardt wrote:
> On Apr 15 2007 12:53, Henrique de Moraes Holschuh wrote:
>> On Sat, 14 Apr 2007, Pavel Machek wrote:
>>> How common are notebooks that cut power to disks during reboot?
>> Assuming it also does this when running Windows, I'd report it as a grave
>> bug to the vendor and demand it to be fixed, or the machine to be exchanged
>> with another model that doesn't have this defect.
> 
> Given that it does not happen on Windows (IIRC Chuck's post),
> then just what is Windows [not] doing that Linux does?

It looks like there are two problems here:

(1) Some notebooks power off and back on when restarting.
Both Linux and other OS handle that badly because they
assume power is not interrupted on reboot. The noise
emitted is relatively loud.

(2) Linux (alone) gives a very muted pop on shutdown. This could
be from bad interaction with the shutdown command, or some
other reason (drive not given enough time to shut down?)
The noise is not very loud, maybe the head did not have to
move very far?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH -rc7 Re] [Trivial] Spelling at drivers/video/Kconfig

2007-04-16 Thread Miguel Ojeda


"Trivial patch, against -rc6. I don't know if anyone has fixed this by now."

Resend comment: Still present in -rc7.
---

drivers/video/Kconfig:
   - Spelling: "Frambuffer hardware support"

drivers/video/Kconfig |2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Signed-off-by: Miguel Ojeda Sandonis <[EMAIL PROTECTED]>
---
diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
index e4f0dd0..8372ace 100644
--- a/drivers/video/Kconfig
+++ b/drivers/video/Kconfig
@@ -139,7 +139,7 @@ config FB_TILEBLITTING
This is particularly important to one driver, matroxfb.  If
unsure, say N.

-comment "Frambuffer hardware drivers"
+comment "Frame buffer hardware drivers"
   depends on FB

config FB_CIRRUS

--
Miguel Ojeda
http://maxextreme.googlepages.com/index.htm
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-usb-devel] How should an exit routine wait for release() callbacks?

2007-04-16 Thread Greg KH

Ah, just found this original thread, now Cornelia's patches make more
sense...

On Fri, Apr 13, 2007 at 11:24:58AM -0400, Alan Stern wrote:
> Tejun, it just occurred to me that you would be interested in this email 
> thread.  Just to bring you up to speed, here's the original question:
> 
> > I've got a module which registers a struct device.  (It represents a
> > virtual device, not a real one, but that doesn't matter.)

Wait, that's the issue right there.

Don't do that.

devices should be created by busses or the platform core, which owns the
release function for them.  Individual drivers should not create
devices.

Hm, but then, how would you ever unload a bus, as the same issue might
be there too...

Any specific code in the kernel you can point to that has this issue
today?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/7] ARM: OMAP: Merge board specific files from N800 tree

2007-04-16 Thread Tony Lindgren

* Tony Lindgren <[EMAIL PROTECTED]> [070409 21:34]:
> From: Kai Svahn <[EMAIL PROTECTED]>
> 
> This patch merges board specific files from N800 tree.
> Nokia has published the files at:
> 
> http://repository.maemo.org/pool/maemo3.0/free/source/
> kernel-source-rx-34_2.6.18.orig.tar.gz
> kernel-source-rx-34_2.6.18-osso29.diff.gz

Here's an updated version that fixes compile after my last fix
to move externs to board-nokia.h.

Regards,

Tony
>From fd345ea126336a514baf808170f1231999ba2c1d Mon Sep 17 00:00:00 2001
From: Kai Svahn <[EMAIL PROTECTED]>
Date: Fri, 26 Jan 2007 12:39:48 -0800
Subject: [PATCH 5/7] ARM: OMAP: Merge board specific files from N800 tree

This patch merges board specific files from N800 tree.
Nokia has published the files at:

http://repository.maemo.org/pool/maemo3.0/free/source/
kernel-source-rx-34_2.6.18.orig.tar.gz
kernel-source-rx-34_2.6.18-osso29.diff.gz

Signed-off-by: Kai Svahn <[EMAIL PROTECTED]>
Signed-off-by: Tony Lindgren <[EMAIL PROTECTED]>

Index: linux-2.6/arch/arm/mach-omap2/Kconfig
===
--- linux-2.6.orig/arch/arm/mach-omap2/Kconfig  2007-04-16 20:50:00.0 
+
+++ linux-2.6/arch/arm/mach-omap2/Kconfig   2007-04-16 20:50:00.0 
+
@@ -54,4 +54,13 @@ config MACH_OMAP_APOLLON
 
 config MACH_OMAP_2430SDP
bool "OMAP 2430 SDP board"
-   depends on ARCH_OMAP2 && ARCH_OMAP24XX
\ No newline at end of file
+   depends on ARCH_OMAP2 && ARCH_OMAP24XX
+
+config MACH_NOKIA_N800
+   bool "Nokia N800"
+   depends on ARCH_OMAP24XX
+
+config MACH_OMAP2_TUSB6010
+   bool
+   depends on ARCH_OMAP2 && ARCH_OMAP2420
+   default y if MACH_NOKIA_N800
\ No newline at end of file
Index: linux-2.6/arch/arm/mach-omap2/Makefile
===
--- linux-2.6.orig/arch/arm/mach-omap2/Makefile 2007-04-16 20:50:00.0 
+
+++ linux-2.6/arch/arm/mach-omap2/Makefile  2007-04-16 20:50:00.0 
+
@@ -16,4 +16,8 @@ obj-$(CONFIG_MACH_OMAP_GENERIC)   += boar
 obj-$(CONFIG_MACH_OMAP_H4) += board-h4.o
 obj-$(CONFIG_MACH_OMAP_2430SDP)+= board-2430sdp.o
 obj-$(CONFIG_MACH_OMAP_APOLLON)+= board-apollon.o
+obj-$(CONFIG_MACH_NOKIA_N800)  += board-n800.o board-n800-flash.o \
+  board-n800-mmc.o board-n800-bt.o \
+  board-n800-audio.o board-n800-usb.o \
+  board-n800-dsp.o board-n800-pm.o
 
Index: linux-2.6/arch/arm/mach-omap2/board-n800-audio.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6/arch/arm/mach-omap2/board-n800-audio.c2007-04-16 
20:50:00.0 +
@@ -0,0 +1,366 @@
+/*
+ * linux/arch/arm/mach-omap2/board-n800-audio.c
+ *
+ * Copyright (C) 2006 Nokia Corporation
+ * Contact: Juha Yrjola
+ *  Jarkko Nikula <[EMAIL PROTECTED]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
+ * 02110-1301 USA
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include "../plat-omap/dsp/dsp_common.h"
+
+#if defined(CONFIG_SPI_TSC2301_AUDIO) && defined(CONFIG_SND_OMAP24XX_EAC)
+#define AUDIO_ENABLED
+
+static struct clk *sys_clkout2;
+static struct clk *func96m_clk;
+static struct device *eac_device;
+static struct device *tsc2301_device;
+
+static int enable_audio;
+static int audio_ok;
+static spinlock_t audio_lock;
+
+/*
+ * Leaving EAC and sys_clkout2 pins multiplexed to those subsystems results
+ * in about 2 mA extra current leak when audios are powered down. The
+ * workaround is to multiplex them to protected mode (with pull-ups enabled)
+ * whenever audio is not being used.
+ */
+static int eac_mux_disabled = 0;
+static int clkout2_mux_disabled = 0;
+static u32 saved_mux[2];
+
+static void n800_enable_eac_mux(void)
+{
+   if (!eac_mux_disabled)
+   return;
+   __raw_writel(saved_mux[1], IO_ADDRESS(0x48000124));
+   eac_mux_disabled = 0;
+}
+
+static void n800_disable_eac_mux(void)
+{
+   if (eac_mux_disabled) {
+   WARN_ON(eac_mux_disabled);
+   return;
+   }
+   saved_mux[1] = __raw_readl(IO_ADDRESS(0x48000124));
+   __raw_writel(0x1f1f1f1f,

Re: [patch -mm] i386: use pte_update_defer in ptep_test_and_clear_{dirty,young}

2007-04-16 Thread Zachary Amsden


David Rientjes wrote:
Sure, but what I really like about the patch is that we're only flushing 
something if !flush_end in the first place.  So we can eliminate any TLB 
flushing if that VMA didn't need it; that's a change from the current 
behavior.  And since the most obvious use-case for /proc/pid/clear_refs is 
in conjunction with /proc/pid/smaps for approximating memory footprint, 
we'll end up saving TLB flushes because the granularity with which that 
measurement is taken is usually very fine.


Acked-by: David Rientjes <[EMAIL PROTECTED]>
  


I like the patch even better if you still batch the flushes, but keep 
the !flush_end machinery.  If I read it correctly, flush_start stays at 
the lower bound for the whole function, so it is still accurate later.  
And with the flush outside the spinlock, contention time is lower.


Thanks,

Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [v4l-dvb-maintainer] [GIT PATCHES] V4L/DVB updates

2007-04-16 Thread hermann pitton

Am Montag, den 16.04.2007, 12:25 -0400 schrieb Michael Krufky:
> CIJOML wrote:
> > Dne pondělí 16 duben 2007 17:34 Michael Krufky napsal(a):
> >   
> >> Adrian Bunk wrote:
> >> 
> >>> On Sun, Apr 15, 2007 at 08:33:38PM -0400, Michael Krufky wrote:
> >>>   
>  Mauro,
> 
>  I've been out of town for the past few days... I just got home and saw
>  this:
> 
>  Mauro Carvalho Chehab wrote:
>  
> >- Fix 1/3 for bug 7819: fixed frontend hotplug issue
> >- Fix 2/3 for bug 7819: demux and dvr
> >- Fix 3/3 for bug 7819: fixed hotplugging for dvbnet
> >   
>  I don't think that this is 2.6.21 material.  These patches have not yet
>  received
>  enough testing to be sent to mainline.
> 
>  I have tested them, and they seem to work for my cxusb device, but we
>  have yet to hear test results from users of usb dvb devices that do not
>  use the dvb-usb framework.  (ttusb, flexcop-usb, cinergyT2, for example)
> 
>  The bug that these patches fix has been around throughout the entire
>  kernel history of the dvb subsystem.  The bug is not a regression -- it
>  has always been
>  there.  In my opinion, it is too late in 2.6.21 development to apply
>  this change.
>  Because these fixes are not obvious, I think we should let them get some
>  more testing, and have them queued for 2.6.22 .
>  
> >>> Unless I misunderstand anything, this should fix [1].
> >>>
> >>> And this is a bug that was reported to be present in 2.6.21-rc but not
> >>> in 2.6.20 (and it's therefore a regression, no matter whether the
> >>> underlying problem was older and only exposed by some other change).
> >>>   
> >> Not true.  The DVB subsystem has NEVER been hot-unpluggable.  I confirm
> >> that the patches SEEM to be correct, but this has not yet been verified. 
> >> None of the authors of dvb-core gave their ACK on these changesets.
> >>
> >> The DVB hotplug issue has been around since the very beginning.  I assure
> >> you, that I consider this fix to be very important, and I really would love
> >> to see it hit mainline.  However, given the situation, it is not
> >> appropriate to push these in during -rc7
> >>
> >> I have doubts on CIJOML's testing method -- there is no way he could have
> >> unplugged the device while in use, while running 2.6.20.y and not receive
> >> an OOPS.  CIJOML, please see the bottom of this email for
> >>
> >> Sure, this will prevent an OOPS on some, and hopefully all devices...  but
> >> what if it causes a regression for those untested?
> >>
> >> Why do we have a merge window, if new changesets are going to be rushed
> >> into late -rc kernels without proper testing, and without the ack of a dvb
> >> subsystem maintainer?
> >>
> >> Are we prepared to go for another -rc and 3 or 4 weeks of testing to
> >> confirm that this fix doesn't cause new regressions?  I don't think so.
> >>
> >> Markus Rechberger wrote:
> >> 
> >>> The patch has been around on the dvb mailinglist ([PATCH][RFC] DVB
> >>> Hotplug Fix, 5. April 2007),
> >>>   
> >> The patch was merged into the development repository at the same time the
> >> pull request was issued to Linus.  This has NOT been tested on a wide
> >> scale.  It should go to -mm for a while before being merged to mainline.
> >>
> >> Mauro Carvalho Chehab wrote:
> >> 
> >>> I also explicitly warned at DVB ML that I were about to send this patch,
> >>> together with other fixes, asking the community for more tests. After
> >>> that, I received two positive answers on my mailbox from people that
> >>> tested and noticed that this really fixed the issue.
> >>>   
> >> One of those positive answers was me -  I explained that it worked for me,
> >> but we need others to test.
> >>
> >> You waited ONE DAY after sending this "warning" to the dvb mailing list?  (
> >> http://linuxtv.org/pipermail/linux-dvb/2007-April/017204.html ) I saw that
> >> email after seeing the pull request to Linus.  We dont have users testing
> >> the repositories after each commit -- you _really_ need to give some more
> >> time to allow for such testing.
> >>
> >> CIJOML wrote:
> >> 
> >>> Hi,
> >>>
> >>> I have tested these patches with:
> >>>
> >>> Freecom DVB-T dongle
> >>> Pluto2 pcmcia card
> >>> Leadtek WinFast DTV dongle 1st generation
> >>> Leadtek WinFast DTV dongle 2nd generation
> >>>
> >>> These are 4 different devices with 4 different hw and modules.
> >>> All works. Please apply.
> >>>   
> >> Well, that helps...  But it would still be nice to hear test results on a
> >> CinergyT2 or flexcop-usb.
> >>
> >> Which driver supports those Winfast dongles?  We already know for sure that
> >> the patches work correctly for any driver based on the dvb-usb framework.
> >>
> >> If you had the device open, and then disconnect it from the usb bus, no
> >> matter what kernel version you're running, you should hit the OOPS.  I
> >>

Re: [patch] CFS (Completely Fair Scheduler), v2

2007-04-16 Thread S.Çağlar Onur

17 Nis 2007 Sal tarihinde, Ingo Molnar şunları yazmıştı: 
>  - fixed child-runs first. A /proc/sys/kernel/sched_child_runs_first
>flag can be used to turn it on/off. (This might fix the Kaffeine bug
>reported by S.Çağlar Onur <)

Sorry for delayed response but i just find some free time, do you still want 
me to test mainline + "parent-runs first" patch or will i drop that one and 
test v2 which can change default behaviour?

-- 
S.Çağlar Onur <[EMAIL PROTECTED]>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!


signature.asc
Description: This is a digitally signed message part.

Re: bug in tcp?

2007-04-16 Thread David Miller

From: Sebastian Kuzminsky <[EMAIL PROTECTED]>
Date: Mon, 16 Apr 2007 15:45:19 -0600

> I'm seeing some weird behavior in TCP.  The issue is perfectly
> reproducible using netcat and other programs.  This is what I do:

Please send your bug report again, but this time to the
[EMAIL PROTECTED] mailing list which is where the
networking developers are subscribed and deal with bug
reports in the networking.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [AppArmor 39/41] AppArmor: Profile loading and manipulation, pathname matching

2007-04-16 Thread John Johansen

On Mon, Apr 16, 2007 at 11:00:01PM +0100, Alan Cox wrote:
> > don't actually have to care --- if loading an invalid profile can bring 
> > down 
> > the system, then that's no worse than an arbitrary module that crashes the 
> > machine. Not sure if there will ever be user loadable profiles; at least at 
> > that point we had to care.
> 
> CAP_SYS_RAWIO is needed to do arbitary patching/loading in the capability
> model so if you are using lesser capabilities it is a (minor) capability
> rise but not a big problem, just ugly and wanting a fix
> 
> > > > +   /*
> > > > +* Replacement needs to allocate a new aa_task_context for each
> > > > +* task confined by old_profile.  To do this the profile locks
> > > > +* are only held when the actual switch is done per task.  While
> > > > +* looping to allocate a new aa_task_context the old_task list
> > > > +* may get shorter if tasks exit/change their profile but will
> > > > +* not get longer as new task will not use old_profile detecting
> > > > +* that is stale.
> > > > +*/
> > > > +   do {
> > > > +   new_cxt = aa_alloc_task_context(GFP_KERNEL | 
> > > > __GFP_NOFAIL);
> > > 
> > > NOFAIL is usually a bad sign. It should be only used if there is no
> > > alternative.
> > 
> > At this point there is no secure alternative to allocating a task context 
> > --- 
> > except killing the task, maybe.
> 
> Can you count the number needed, preallocate them and then when you know
> for sure either succeed or fail the operation as a whole ?

No, to be accurate the count would have to be made with the profile lock
held, which would then need to be released so as not to use GFP_ATOMIC
for the allocations.

An iterative approach could be taken where we do something like
repeat:
  lock profile
 count
 if preallocated < count
unlock profile
if (! allocate count - preallocated)
   Fail
goto repeat
  do replacement


pgpvmw01XYPtd.pgp
Description: PGP signature

Re: Major qla2xxx regression on sparc64

2007-04-16 Thread David Miller

From: Andrew Vasquez <[EMAIL PROTECTED]>
Date: Mon, 16 Apr 2007 14:10:49 -0700

> Ok, how about the following patch based on the one you posted which
> adds the codes to retrieve the WWPN/WWNN from firmware on SPARC, and
> also adds the module-parameter override I mentioned above.
> 
> Perhaps the module-parameter should be set to non-zero in the case of
> SPARC, to take care of your system configurations?

I think it should default to non-zero always, in fact the option
is completely pointless.

The guy who hits this had a system which worked previously, and you're
explicitly breaking it.  That's wrong.

How can you not see that this quality of implementation decision
you're making stinks?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] CFS (Completely Fair Scheduler), v2

2007-04-16 Thread Ingo Molnar


this is the second release of the CFS (Completely Fair Scheduler) 
patchset, against v2.6.21-rc7:

   http://redhat.com/~mingo/cfs-scheduler/sched-cfs-v2.patch

i'd like to thank everyone for the tremendous amount of feedback and 
testing the v1 patch got - i could hardly keep up with just reading the 
mails! Some of the stuff people addressed i couldnt implement yet, i 
mostly concentrated on bugs, regressions and debuggability.

there's a fair amount of churn:

   15 files changed, 456 insertions(+), 241 deletions(-)

But it's an encouraging sign that there was no crash bug found in v1, 
all the bugs were related to scheduling-behavior details. The code was 
tested on 3 architectures so far: i686, x86_64 and ia64. Most of the 
code size increase in -v2 is due to debugging helpers, they'll be 
removed later. (The new /proc/sched_debug file can be used to see the 
fine details of CFS scheduling.)

Changes since -v1:

 - make nice levels less starvable. (reported by Willy Tarreau)

 - fixed child-runs first. A /proc/sys/kernel/sched_child_runs_first 
   flag can be used to turn it on/off. (This might fix the Kaffeine bug
   reported by S.Çağlar Onur <)

 - changed SCHED_FAIR back to SCHED_NORMAL (suggested by Con Kolivas)

 - UP build fix. (reported by Gabriel C)

 - timer tick micro-optimization (Dmitry Adamushko)

 - preemption fix: sched_class->check_preempt_curr method to decide 
   whether to preempt after a wakeup (or at a timer tick). (Found via a
   fairness-test-utility written for CFS by Mike Galbraith)

 - start forked children with neutral statistics instead of trying to 
   inherit them from the parent: Willy Tarreau reported that this 
   results in better behavior on extreme workloads, and it also 
   simplifies the code quite nicely. Removed sched_exit() and the 
   ->task_exit() methods.

 - make nice levels independent of the sched_granularity value

 - new /proc/sched_debug file listing runqueue details and the rbtree

 - new SCH-* fields in /proc//status to see scheduling details

 - new cpu-hog feature (off by default) and sysctl tunable to set it: 
   /proc/sys/kernel/sched_max_hog_history_ns tunable defaults to
   0 (off). Positive values are meant the maximum 'memory' that the 
   scheduler has of CPU hogs.

 - various code cleanups

 - added more statistics temporarily: sum_exec_runtime, 
   sum_wait_runtime.

 - added -CFS-v2 to EXTRAVERSION

as usual, any sort of feedback, bugreports, fixes and suggestions are 
more than welcome,

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch -mm] i386: use pte_update_defer in ptep_test_and_clear_{dirty,young}

2007-04-16 Thread Zachary Amsden


Hugh Dickins wrote:

You're right to want to defer your pte_updates, David is right to want
to batch his TLB flushes.  It bothers me that you have a surprising case,
and that unless you abandon your optimization, it imposes a new constraint
on how to proceed in common code (without #ifdef'ing around).

But perhaps in this case David might concede that the longer we delay
the TLB flush, the more likely a referenced bit is to be missed - that is,
it gets cleared from the pte, but if that page is accessed again before
the TLB is flushed, the processor may well omit to reinstate the accessed
bit, and our stats drift away from reality.

Compromise patch below: would that be satisfactory to you, David?
  


Although I appreciate the heroics, you needn't do this on our account; 
the win of a couple thousand cycles is not worth the cost in complexity, 
IMHO, and the penalty on native quite potentially overshadows this.  If 
you still issue the flush inside the spinlock, as required for this 
paravirt optimization, you are taking the risk of holding the spinlock 
an extra long time while issuing a TLB shootdown - which means waiting 
for an IPI.


It might not matter that much on i386, but on big iron (or realtime) 
systems, this could have significant negative scaling effects for 
workloads where the page page table was hot on some set of CPUs (say, 
remapping file pages for database access).  In time, the benefits of 
this optimization to the hypervisor will decrease, while the benefits of 
optimizing the other way for shorter spinlock time may increase, both in 
a VM and on native hardware.


So I would rather just drop the pte_update_defer down to a pte_update if 
the flush is not immediately following - as it is nice and simply 
correct without getting in the way.  I see there is more in this thread 
that I haven't read yet, so I preemptively reserve the right to issue an 
invalidation of this opinion...


Thanks,

Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 8/18] ARM: OMAP: Add mailbox support for IVA

2007-04-16 Thread Tony Lindgren

* Tony Lindgren <[EMAIL PROTECTED]> [070409 21:23]:
> From: Hiroshi DOYU <[EMAIL PROTECTED]>
> 
> This patch adds a generic mailbox interface for for DSP and IVA
> (Image Video Accelerator). This patch itself doesn't contain
> any IVA driver.

Here's an updated version that merges in two later fixes from Hiroshi.

Regards,

Tony
>From 7845896508123512184412464ca22505c13a728d Mon Sep 17 00:00:00 2001
From: Hiroshi DOYU <[EMAIL PROTECTED]>
Date: Thu, 7 Dec 2006 15:43:59 -0800
Subject: [PATCH 8/18] ARM: OMAP: Add mailbox support for IVA

This patch adds a generic mailbox interface for for DSP and IVA
(Image Video Accelerator). This patch itself doesn't contain
any IVA driver.

Signed-off-by: Hiroshi DOYU <[EMAIL PROTECTED]>
Signed-off-by: Juha Yrjola <[EMAIL PROTECTED]>
Signed-off-by: Tony Lindgren <[EMAIL PROTECTED]>
---
 arch/arm/mach-omap1/mailbox.c   |  206 
 arch/arm/mach-omap2/mailbox.c   |  310 ++
 arch/arm/plat-omap/mailbox.c|  352 +++
 arch/arm/plat-omap/mailbox.h|  193 +++
 include/asm-arm/arch-omap/mailbox.h |   68 +++
 5 files changed, 1129 insertions(+), 0 deletions(-)

Index: linux-2.6/arch/arm/mach-omap1/mailbox.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6/arch/arm/mach-omap1/mailbox.c 2007-04-16 18:19:40.0 
+
@@ -0,0 +1,206 @@
+/*
+ * Mailbox reservation modules for DSP
+ *
+ * Copyright (C) 2006 Nokia Corporation
+ * Written by: Hiroshi DOYU <[EMAIL PROTECTED]>
+ *
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define MAILBOX_ARM2DSP1   0x00
+#define MAILBOX_ARM2DSP1b  0x04
+#define MAILBOX_DSP2ARM1   0x08
+#define MAILBOX_DSP2ARM1b  0x0c
+#define MAILBOX_DSP2ARM2   0x10
+#define MAILBOX_DSP2ARM2b  0x14
+#define MAILBOX_ARM2DSP1_Flag  0x18
+#define MAILBOX_DSP2ARM1_Flag  0x1c
+#define MAILBOX_DSP2ARM2_Flag  0x20
+
+unsigned long mbox_base;
+
+struct omap_mbox1_fifo {
+   unsigned long cmd;
+   unsigned long data;
+   unsigned long flag;
+};
+
+struct omap_mbox1_priv {
+   struct omap_mbox1_fifo tx_fifo;
+   struct omap_mbox1_fifo rx_fifo;
+};
+
+static inline int mbox_read_reg(unsigned int reg)
+{
+   return __raw_readw(mbox_base + reg);
+}
+
+static inline void mbox_write_reg(unsigned int val, unsigned int reg)
+{
+   __raw_writew(val, mbox_base + reg);
+}
+
+/* msg */
+static inline mbox_msg_t omap1_mbox_fifo_read(struct omap_mbox *mbox)
+{
+   struct omap_mbox1_fifo *fifo =
+   &((struct omap_mbox1_priv *)mbox->priv)->rx_fifo;
+   mbox_msg_t msg;
+
+   msg = mbox_read_reg(fifo->data);
+   msg |= ((mbox_msg_t) mbox_read_reg(fifo->cmd)) << 16;
+
+   return msg;
+}
+
+static inline void
+omap1_mbox_fifo_write(struct omap_mbox *mbox, mbox_msg_t msg)
+{
+   struct omap_mbox1_fifo *fifo =
+   &((struct omap_mbox1_priv *)mbox->priv)->tx_fifo;
+
+   mbox_write_reg(msg & 0x, fifo->data);
+   mbox_write_reg(msg >> 16, fifo->cmd);
+}
+
+static inline int omap1_mbox_fifo_empty(struct omap_mbox *mbox)
+{
+   return 0;
+}
+
+static inline int omap1_mbox_fifo_full(struct omap_mbox *mbox)
+{
+   struct omap_mbox1_fifo *fifo =
+   &((struct omap_mbox1_priv *)mbox->priv)->rx_fifo;
+
+   return (mbox_read_reg(fifo->flag));
+}
+
+/* irq */
+static inline void
+omap1_mbox_enable_irq(struct omap_mbox *mbox, omap_mbox_type_t irq)
+{
+   if (irq == IRQ_RX)
+   enable_irq(mbox->irq);
+}
+
+static inline void
+omap1_mbox_disable_irq(struct omap_mbox *mbox, omap_mbox_type_t irq)
+{
+   if (irq == IRQ_RX)
+   disable_irq(mbox->irq);
+}
+
+static inline int
+omap1_mbox_is_irq(struct omap_mbox *mbox, omap_mbox_type_t irq)
+{
+   if (irq == IRQ_TX)
+   return 0;
+   return 1;
+}
+
+static struct omap_mbox_ops omap1_mbox_ops = {
+   .type   = OMAP_MBOX_TYPE1,
+   .fifo_read  = omap1_mbox_fifo_read,
+   .fifo_write = omap1_mbox_fifo_write,
+   .fifo_empty = omap1_mbox_fifo_empty,
+   .fifo_full  = omap1_mbox_fifo_full,
+   .enable_irq = omap1_mbox_enable_irq,
+   .disable_irq= omap1_mbox_disable_irq,
+   .is_irq = omap1_mbox_is_irq,
+};
+
+/* FIXME: the following struct should be created automatically by the user id 
*/
+
+/* DSP */
+static struct omap_mbox1_priv omap1_mbox_dsp_priv = {
+   .tx_fifo = {
+   .cmd= MAILBOX_ARM2DSP1b,
+   .data   = MAILBOX_ARM2DSP1,
+   .flag   = MAILBOX_ARM2DSP1_Flag,
+   },
+

Re: [AppArmor 39/41] AppArmor: Profile loading and manipulation, pathname matching

2007-04-16 Thread Alan Cox

> don't actually have to care --- if loading an invalid profile can bring down 
> the system, then that's no worse than an arbitrary module that crashes the 
> machine. Not sure if there will ever be user loadable profiles; at least at 
> that point we had to care.

CAP_SYS_RAWIO is needed to do arbitary patching/loading in the capability
model so if you are using lesser capabilities it is a (minor) capability
rise but not a big problem, just ugly and wanting a fix

> > > + /*
> > > +  * Replacement needs to allocate a new aa_task_context for each
> > > +  * task confined by old_profile.  To do this the profile locks
> > > +  * are only held when the actual switch is done per task.  While
> > > +  * looping to allocate a new aa_task_context the old_task list
> > > +  * may get shorter if tasks exit/change their profile but will
> > > +  * not get longer as new task will not use old_profile detecting
> > > +  * that is stale.
> > > +  */
> > > + do {
> > > + new_cxt = aa_alloc_task_context(GFP_KERNEL | __GFP_NOFAIL);
> > 
> > NOFAIL is usually a bad sign. It should be only used if there is no
> > alternative.
> 
> At this point there is no secure alternative to allocating a task context --- 
> except killing the task, maybe.

Can you count the number needed, preallocate them and then when you know
for sure either succeed or fail the operation as a whole ?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [AppArmor 31/41] Fix __d_path() for lazy unmounts and make it unambiguous; exclude unreachable mount points from /proc/mounts

2007-04-16 Thread Alan Cox

> > That is a fairly significant and sudden change to the existing
> > kernel/user interface.
> 
> Well, this is not meant for 2.6.21. I hope it is possible to change it in 
> early 2.6.22; otherwise if we can't fix mistakes from the past we are pretty 
> doomed.

I don't believe the existing behaviour _IS_ a mistake.

> > This is untrue. The process can get there (via fd passing with another
> > task)
> 
> Process can access file descriptors which are unreachable via path name just 
> fine indeed, but those fds still don't have a valid path in the context of 
> that process.

Which while problematic to your name based security is just fine to
everything else. 

> We are only talking about mount points unreachable by a particular process; 
> this does not mean that the mount point isn't reachable by other processes. 
> Human operators can choose the context from which they are looking 
> at /proc/mounts. If they are looking form the "real" root, the will see all 
> mounts that any process can reach (in that namespace).

Ok, providing the "real" root sees them all it isn't so bad, but to
assume you can filter based upon what the task can see is dodgy as an
assumption.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [v4l-dvb-maintainer] Re: [GIT PATCHES] V4L/DVB updates

2007-04-16 Thread Trent Piepho

On Mon, 16 Apr 2007, Dmitry Torokhov wrote:
> Hi Mauro,
>
> On 4/15/07, Mauro Carvalho Chehab <[EMAIL PROTECTED]> wrote:
> >   - Fix Kernel Bugzilla #8301: spinlock fix for flexcop-pci
>
> While move of spin_lock_init before request_irq is obviously correct I
> wonder what is the reason behind changing spin_lock_irq() into
> spin_lock_irqsave() as I do not see flexcop_pci_isr being called from
> anywhere but IRQ context.
>
> BTW, is irq_lock needed at all?

There was some more discussion on the linux-dvb list
http://www.linuxtv.org/pipermail/linux-dvb/2007-April/017024.html , and I
think we came to the conclusion that irq_lock isn't needed at all.  It does
nothing but serialize the ISR and ISRs are automatically serialized by the
kernel.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [1/2] 2.6.21-rc7: known regressions

2007-04-16 Thread Chuck Ebbert

Adrian Bunk wrote:

> This email lists some known regressions in Linus' tree compared to 2.6.20.
> 
> Subject: snd_intel8x0: divide error: 
> References : http://lkml.org/lkml/2007/3/5/252
> Submitter  : Michal Piotrowski <[EMAIL PROTECTED]>
> Status : unknown
> 

Oops is in sound/pci/intel8x0.c::snd_intel8x0_update(), part of
the interrupt handler:

Line 751:

ichdev->position += step * ichdev->fragsize1;
if (! chip->in_measurement)
ichdev->position %= ichdev->size;

ichdev->size is 0. Interrupt happened upon request_irq().

Does chip->in_measurement need to be reset because this is a
crashdump kernel?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] hpet: Enable hidden HPET on NVidia motherboards

2007-04-16 Thread Andi Kleen

On Tue, Apr 17, 2007 at 12:28:31AM +0300, Mikko Tiihonen wrote:
> I actually was more worried that someone might complain that the pci 
> scanning is copy & paste code from end of the same file. I did try to use 
> the generic pci functions first but because they insist on enabling 
> interrupts they cannot be used this early. And this code needs to be run 
> before the timer initialization.

Yes that's the issue. You're adding another PCI scanner copy'n'pasted
from the caller of the function you're adding it to. See the problem?

> If you want I can submit a separate patch to move the ... not nice pci 
> scanning code to pci directory under some early_pci_scan(u32 *pci_ids, 
> hook) function. The same code was already cut in 

That is what early-quirks is anyways. But the way to scan for multple
things is not to add anther recursive scan, but to just extend or
change the main loop.

> >Also there should be done anything here without confirmation from
> >Nvidia that HPET is actually supposed to work. Sometimes hardware
> >is disabled by BIOS because it is seriously broken (there was at least
> >one other chipset that could corrupt your flash if you force enabled
> >HPET in some steppings)
> 
> I hope someone has some secret contacts at NVidia because they have not 
> been very open with their chipsets. I looked at LinuxBios and their NForce4 
> chipset code had just had commented out code that wrote to 0x44 register. 
> So obviously something more is needed.

Andy, can you help please?  There is interest in force enabling HPET on
boards where the BIOS didn't chose too. We would need a list of PCI-IDs
where this is safe to do and what bits to poke. Thanks.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [AppArmor 38/41] AppArmor: Module and LSM hooks

2007-04-16 Thread John Johansen

On Thu, Apr 12, 2007 at 11:21:01AM +0100, Alan Cox wrote:
> > +
> > +   /**
> > +* parent can ptrace child when
> > +* - parent is unconfined
> > +* - parent is in complain mode
> > +* - parent and child are confined by the same profile
> > +*/
> 
> Your profiles are name based. That means the same profile in a different
> namespace does different things. It would be a very odd case where it
> mattered but surely the parent ptrace child rule should also require that
> the parent and child are in the same namespace when using apparmor name
> based security.
> 
you are right we should be requiring parent and child are in the same
namespace.  This has been fixed.

> > +static int apparmor_capget(struct task_struct *task,
> > +   kernel_cap_t *effective,
> > +   kernel_cap_t *inheritable,
> > +   kernel_cap_t *permitted)
> > +{
> > +   return cap_capget(task, effective, inheritable, permitted);
> > +}
> 
> Pointless function should go away.
> 
yes we had a few of those thanks for pointing it out.

> > +static int apparmor_sysctl(struct ctl_table *table, int op)
> > +{
> > +   int error = 0;
> > +
> > +   if ((op & 002) && !capable(CAP_SYS_ADMIN))
> > +   error = aa_reject_syscall(current, GFP_KERNEL,
> > + "sysctl (write)");
> > +
> > +   return error;
> 
> The usual file permission security override is DAC not ADMIN. What is the
> logic of this choice.
> 
This was a very course grain check that was done to restrict access to
sysctl's that could be potentially used to elevated priledge.  The check
is inconsistent with AppArmor's model and we should be modelling
sysctl accesses as pathname access, and then we could be using standard
mediation.

thanks for the review
john



pgpY5SiVZbUvM.pgp
Description: PGP signature

Re: [PATCH 7/7] [RFC] APM emulation driver for class batteries

2007-04-16 Thread Russell King

On Tue, Apr 17, 2007 at 01:08:29AM +0400, Anton Vorontsov wrote:
> On Mon, Apr 16, 2007 at 09:24:21PM +0100, Russell King wrote:
> > Utterly unsafe.  What happens if some other module gets loaded which
> > does this, and then this module is unloaded followed by the other
> > module.  Result: Oops.
> 
> Right. And loading two modules which changing apm_get_power_status
> is a race already. Thus, APM interface needs a mutex.
> 
> Or pda_power should be marked "bool" in Kconfig, as it is done
> in arch/arm/common/sharpsl_pm.c. Sharpsl_pm is safe only because it
> can't be a module.
> 
> Personally I'd keep things as is for now (i.e. I'd want tristate for
> PDA_POWER, not bool). Later APM API can be fixed.

Experience shows "Later" more often than not means "never", inspite
of what is said at the time the word is used...

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Kernel-discuss] Re: [PATCH 7/7] [RFC] APM emulation driver for class batteries

2007-04-16 Thread Paul Sokolovsky

Hello Russell,

Monday, April 16, 2007, 11:24:21 PM, you wrote:

> On Fri, Apr 13, 2007 at 05:50:43PM +0400, Anton Vorontsov wrote:
>> +static void (*old_apm_get_power_status)(struct apm_power_info*);
>> +
>> +static int __init apm_battery_init(void)
>> +{
>> + printk(KERN_INFO "APM Battery Driver\n");
>> +
>> + old_apm_get_power_status = apm_get_power_status;
>> + apm_get_power_status = apm_battery_apm_get_power_status;
>> + return 0;
>> +}
>> +
>> +static void __exit apm_battery_exit(void)
>> +{
>> + apm_get_power_status = old_apm_get_power_status;
>> + return;
>> +}

> Utterly unsafe.  What happens if some other module gets loaded which
> does this, and then this module is unloaded followed by the other
> module.  Result: Oops.

That's apparently why "APM emulation" goes on its way towards 
deprecation, right? And why people so detailed about new battery API, as it's 
everyone's hope that it should replace APM.

We exactly provide APM emulation on top of battery API as separate 
driver because of such issues with APM API. Anyway, any suggestions on solving 
this "pointer API" issue? Would at least assigning NULL on exit be more safe? 
(Because yes, there just shouldn't be two APM drivers, and for the weird case 
there're, it would be nice to at least not segfault.)

-- 
Best regards,
 Paulmailto:[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

AppArmor FAQ

2007-04-16 Thread John Johansen

Here we present our direct responses to the most frequent questions from
the AppArmor from the 2006 post.

Use of Pathnames For Access Control
---

Some people in the security field believe that pathnames are an
inappropriate security mechanism.  This depends on what you are
primarily trying to protect, and the rest follows from that.

Label-based security (exemplified by SELinux, and its predecessors in
MLS systems) attaches security policy to the data. As the data flows
through the system, the label sticks to the data, and so security
policy with respect to this data stays intact. This is a good approach
for ensuring secrecy, the kind of problem that intelligence agencies have.

Pathname-based security (exemplified in AppArmor, and its predecessor
Janus http://www.cs.berkeley.edu/~daw/janus/ and other systems like
Systrace http://www.citi.umich.edu/u/provos/systrace/ ) attach security
policy to the name of the data.

Controlling access to filenames is important because applications
primarily use those names to access the files behind them, and they
depend on getting to the right files. For example, login(1) expects
/etc/passwd to resolve to a valid list of user accounts.  In the
traditional UNIX model, files do have names but not labels, and
applications only operate in terms of those names.  Pathname-based
security puts more emphasis on the integrity of the system, making
secrecy the secondary goal that follows.

Caveat: Both label-based security and pathname-based security can
provide both secrecy and integrity protection, the above discussion is
only about which model makes it easier to provide which kind of security.

We acknowledge that not all objects on a UNIX system are paths, and we
agree that there is value in also protecting non-path resources.
Contrary to popular belief, AppArmor is *not* "Pathnames R Us", but
rather "Use native abstractions to mediate stuff":  when you mediate
something, you should use the native syntax that users normally use to
access the object. This follows the UNIX philosophy of "least surprise"
so that users can understand the specification. Pathnames are the
natural notation for users to understand what file access rights are
being granted in the policy, and so AppArmor uses shell syntax for fully
qualified pathnames, including shell syntax wildcards.

Similarly, AppArmor grants access to POSIX.1e capabilities by name, the
name of the capability. In future work where AppArmor will add network
access control, the notation will resemble IPTables firewall rules. This
is an important part of what makes AppArmor usable: always using the
native abstraction for mediating access.

We also acknowledge that pathname based access control requires a way to
perform pathname matching in the kernel, and this comes at a cost higher
than comparing object labels -- which assumes that all objects in the
system already have the appropriate labels.

However, those concerned with performance should note that AppArmor
overhead is already quite low (single-digit percent slowdown). Security
is rarely performance-neutral, and AppArmor, and SELinux, are no
exception. However, that overhead is small, and can be selectively
avoided by not applying AppArmor to performance-sensitive programs.

It is also easy to overlook the fact that putting all those labels in
place is a pretty expensive operation as well, particularly considering
large file systems. So by providing string matching in the kernel,
AppArmor trades run-time performance to grant reduced administrative work.

It has been suggested that AppArmor's pathname-based syntax could be
compiled into SELinux policy, and this is in fact what the SEEdit
project http://seedit.sourceforge.net/ does. However, any change in
policy requires a complete re-labeling of the file system, and the
policy cannot apply to files that do not yet exist. AppArmor's in-kernel
string matching allows for policy specifying access to files that might
come to exist in the future.

Use Of d_path() For Computing Pathnames
---

We have been criticized for the use of d_path(), for various reasons:

 - heuristic discovery of the vfsmount of a dentry,
 - inability to reliably identify deleted files,
 - inability to detect unreachable paths,
 - ambiguity of paths for chroot processes,
 - file lookup and the access check are not atomic.

Most of these issues are fixable (and fixed in the meantime), while the
non-atomicity is not really an issue.

Because struct vfsmount was not available to LSM hooks for computing
pathnames from (dentry, vfsmount) pairs, the version of AppArmor posted
last year used heuristics for rediscovering the vfsmounts associated
with dentries -- and possibly the wrong ones.  We are now passing the
vfsmount objects through to all LSM hooks that compute pathnames, and so
this heuristic is gone, and now we always use the appropriate vfsmount.

The d_path patch already in the -mm tree

Re: [PATCH 6/7] [RFC] ds2760 battery driver

2007-04-16 Thread Matt Reimer

On 4/16/07, Anton Vorontsov <[EMAIL PROTECTED]> wrote:

On Mon, Apr 16, 2007 at 12:14:27PM -0700, Matt Reimer wrote:
>  The shifts (<< 3 and >> 5) are just to get the bits reassembled in the
>  right positions. The multiplication by 5 and subtracting 1/8 is
>  because (AFAIK) we can't do floating point multiplication in the
>  kernel. I'm open to suggestions.

Because we are in micro world now, divisions already replaced by
multiplication. I.e.

/* DS2760 reports voltage in units of 4.88mV, but the battery class
 * reports in units of uV, so convert by multiplying by 4880. */
di->voltage_raw = (di->raw[DS2760_VOLTAGE_MSB] << 3) |
  (di->raw[DS2760_VOLTAGE_LSB] >> 5);
di->voltage_uV = di->voltage_raw * 4880;

As a side effect, now we're not losing any precision. :-)

That's a good way to solve the problem. :-)

By the way. Matt, you're more familiar with ds2760 specs, could you
enlighten me about "* 4" in this snippet?

>   acr[0] = (di->full_active_mAh * 4) >> 8;
  ^^^
>   acr[1] = (di->full_active_mAh * 4) & 0xff;
  ^^^
>   if (w1_ds2760_write(di->w1_dev, acr,
>   DS2760_CURRENT_ACCUM_MSB, 2) < 2)
>  printk(KERN_ERR "ACR reset failed\n");

The accumulated current register (acr) value is in units of 0.25 mAh,
so we have to multiply by 4 to convert from units of 1 mAh to 0.25
mAh.

Thanks for all your work on this Anton.

Matt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] hpet: Enable hidden HPET on NVidia motherboards

2007-04-16 Thread Mikko Tiihonen



On Mon, 16 Apr 2007, Andi Kleen wrote:


Mikko Tiihonen <[EMAIL PROTECTED]> writes:


It looks probable that most NVidia chipsets have the HPET address at
0x44. It might be possible to enable the HPET even if BIOS did not


That seems like a dangerous assumption. If anything this needs to  be keyed
on specific PCI IDs.


The patch contains a list of PCI IDs. Currently the CK804 and MCP55 have been
verified to work. Other PCI IDs can be added if needed.


And the way you coded a recursive PCI scan is just   ... not nice.


I actually was more worried that someone might complain that the pci scanning 
is copy & paste code from end of the same file. I did try to use the generic 
pci functions first but because they insist on enabling interrupts they cannot 
be used this early. And this code needs to be run before the timer 
initialization.


If you want I can submit a separate patch to move the ... not nice pci 
scanning code to pci directory under some early_pci_scan(u32 *pci_ids, hook) 
function. The same code was already cut in 
i386/kernel/acpi/earlyquik.c, in x86_64/kernel/aperture.c and in 
x86_64/kernel/early-quirks.c. Moving the uglyness to a central place would at 
least hide it from the casual browser. Or would a global flag that the pci

scanning code checks to see if locks should be used or not be better?


initialize it properly by writing the wanted address there. Some other
pci config space bits might need to be fiddled around too, most likely
candidates are 0x74 bit 2 and 0xA3 bit ?. One or both of them have been
identified to change in some motherboards when HPET is enabled/disabled
in BIOS.


Or just add a random generator and poke random bits? Should be roughly
equivalent.


Fair enough, that was just my written hope that some day someone might reverse 
engineer how HPET is enabled on NVidia chipsets. The patch does not try write 
to any registers so it should be safe? The code also properly checks that the

memory area does not collide with any existing resource. I could of course add
a check that there is a HPET at that address, but the hpet driver already 
checks it itself later and disabled itself it cannot see valid data.



Also there should be done anything here without confirmation from
Nvidia that HPET is actually supposed to work. Sometimes hardware
is disabled by BIOS because it is seriously broken (there was at least
one other chipset that could corrupt your flash if you force enabled
HPET in some steppings)


I hope someone has some secret contacts at NVidia because they have not been 
very open with their chipsets. I looked at LinuxBios and their NForce4 chipset 
code had just had commented out code that wrote to 0x44 register. So obviously 
something more is needed.


Even Intel just posted code a while ago that allows enabling HPET from a quirk 
even if BIOS did not set it up properly.



Does anyone know how to _really_ test if HPET works properly (from user space 
for example). I've just tested with busylooping gettimeofday while changing 
the clocksource and measured the speed difference. We could then change the 
quirk to point to instructions on how to test the HPET manually and then 
request to submit the PCI ID.


-Mikko
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 >

1 - 100 of 698 matches

Mail list logo