date:20061214

I found a clever way to make the extra IOPL switching invisible to
non-paravirt compiles - since kernel_rpl is statically defined to
be zero there, and only non-zero rpl kernel have a problem restoring IOPL,
as popf does not restore IOPL flags unless run at CPL-0.

Subject: IOPL handling for paravirt guests
Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

diff -r 8110943fd7ad arch/i386/kernel/process.c
--- a/arch/i386/kernel/process.cThu Dec 14 16:15:20 2006 -0800
+++ b/arch/i386/kernel/process.cThu Dec 14 16:21:57 2006 -0800
@@ -665,6 +665,15 @@ struct task_struct fastcall * __switch_t
load_TLS(next, cpu);
 
/*
+* Restore IOPL if needed.  In normal use, the flags restore
+* in the switch assembly will handle this.  But if the kernel
+* is running virtualized at a non-zero CPL, the popf will
+* not restore flags, so it must be done in a separate step.
+*/
+   if (get_kernel_rpl() && unlikely(prev->iopl != next->iopl))
+   set_iopl_mask(next->iopl);
+
+   /*
 * Now maybe handle debug registers and/or IO bitmaps
 */
if (unlikely((task_thread_info(next_p)->flags & _TIF_WORK_CTXSW)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/6] Paravirt CPU hypercall batching mode

The VMI ROM has a mode where hypercalls can be queued and batched.  This turns
out to be a significant win during context switch, but must be done at a
specific point before side effects to CPU state are visible to subsequent
instructions.  This is similar to the MMU batching hooks already provided.
The same hooks could be used by the Xen backend to implement a context switch
multicall.

To explain a bit more about lazy modes in the paravirt patches, basically, the
idea is that only one of lazy CPU or MMU mode can be active at any given time.
Lazy MMU mode is similar to this lazy CPU mode, and allows for batching of
multiple PTE updates (say, inside a remap loop), but to avoid keeping some kind
of state machine about when to flush cpu or mmu updates, we just allow one or
the other to be active.  Although there is no real reason a more comprehensive
scheme could not be implemented, there is also no demonstrated need for this
extra complexity.

Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>
Subject: Paravirt CPU hypercall batching mode

diff -r 01f2e46c1416 arch/i386/kernel/paravirt.c
--- a/arch/i386/kernel/paravirt.c   Thu Dec 14 14:26:24 2006 -0800
+++ b/arch/i386/kernel/paravirt.c   Thu Dec 14 14:44:56 2006 -0800
@@ -545,6 +545,7 @@ struct paravirt_ops paravirt_ops = {
.apic_write_atomic = native_apic_write_atomic,
.apic_read = native_apic_read,
 #endif
+   .set_lazy_mode = (void *)native_nop,
 
.flush_tlb_user = native_flush_tlb,
.flush_tlb_kernel = native_flush_tlb_global,
diff -r 01f2e46c1416 arch/i386/kernel/process.c
--- a/arch/i386/kernel/process.cThu Dec 14 14:26:24 2006 -0800
+++ b/arch/i386/kernel/process.cThu Dec 14 14:50:22 2006 -0800
@@ -665,6 +665,31 @@ struct task_struct fastcall * __switch_t
load_TLS(next, cpu);
 
/*
+* Now maybe handle debug registers and/or IO bitmaps
+*/
+   if (unlikely((task_thread_info(next_p)->flags & _TIF_WORK_CTXSW)
+   || test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)))
+   __switch_to_xtra(next_p, tss);
+
+   disable_tsc(prev_p, next_p);
+
+   /*
+* Leave lazy mode, flushing any hypercalls made here.
+* This must be done before restoring TLS segments so
+* the GDT and LDT are properly updated, and must be
+* done before math_state_restore, so the TS bit is up
+* to date.
+*/
+   arch_leave_lazy_cpu_mode();
+
+   /* If the task has used fpu the last 5 timeslices, just do a full
+* restore of the math state immediately to avoid the trap; the
+* chances of needing FPU soon are obviously high now
+*/
+   if (next_p->fpu_counter > 5)
+   math_state_restore();
+
+   /*
 * Restore %fs if needed.
 *
 * Glibc normally makes %fs be zero.
@@ -673,22 +698,6 @@ struct task_struct fastcall * __switch_t
loadsegment(fs, next->fs);
 
write_pda(pcurrent, next_p);
-
-   /*
-* Now maybe handle debug registers and/or IO bitmaps
-*/
-   if (unlikely((task_thread_info(next_p)->flags & _TIF_WORK_CTXSW)
-   || test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)))
-   __switch_to_xtra(next_p, tss);
-
-   disable_tsc(prev_p, next_p);
-
-   /* If the task has used fpu the last 5 timeslices, just do a full
-* restore of the math state immediately to avoid the trap; the
-* chances of needing FPU soon are obviously high now
-*/
-   if (next_p->fpu_counter > 5)
-   math_state_restore();
 
return prev_p;
 }
diff -r 01f2e46c1416 include/asm-generic/pgtable.h
--- a/include/asm-generic/pgtable.h Thu Dec 14 14:26:24 2006 -0800
+++ b/include/asm-generic/pgtable.h Thu Dec 14 14:44:56 2006 -0800
@@ -183,6 +183,19 @@ static inline void ptep_set_wrprotect(st
 #endif
 
 /*
+ * A facility to provide batching of the reload of page tables with the
+ * actual context switch code for paravirtualized guests.  By convention,
+ * only one of the lazy modes (CPU, MMU) should be active at any given
+ * time, entry should never be nested, and entry and exits should always
+ * be paired.  This is for sanity of maintaining and reasoning about the
+ * kernel code.
+ */
+#ifndef __HAVE_ARCH_ENTER_LAZY_CPU_MODE
+#define arch_enter_lazy_cpu_mode() do {} while (0)
+#define arch_leave_lazy_cpu_mode() do {} while (0)
+#endif
+
+/*
  * When walking page tables, get the address of the next boundary,
  * or the end address of the range if that comes earlier.  Although no
  * vma end wraps to 0, rounded up __boundary may wrap to 0 throughout.
diff -r 01f2e46c1416 include/asm-i386/paravirt.h
--- a/include/asm-i386/paravirt.h   Thu Dec 14 14:26:24 2006 -0800
+++ b/include/asm-i386/paravirt.h   Thu Dec 14 14:44:56 2006 -0800
@@ -146,6 +146,8 @@ struct paravirt_ops
void (fastcall *pmd_clear)(pmd_t *pmdp);
 #endif
 
+   void (fastcall *set_l

[PATCH 5/6] VMI backend for paravirt-ops

Fairly straightforward implementation of VMI backend for paravirt-ops.

Subject: VMI backend for paravirt-ops
Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

diff -r d8711b11c1eb arch/i386/Kconfig
--- a/arch/i386/Kconfig Tue Dec 12 13:51:06 2006 -0800
+++ b/arch/i386/Kconfig Tue Dec 12 13:51:13 2006 -0800
@@ -192,6 +192,15 @@ config PARAVIRT
  under a hypervisor, improving performance significantly.
  However, when run without a hypervisor the kernel is
  theoretically slower.  If in doubt, say N.
+
+config VMI
+   bool "VMI Paravirt-ops support"
+   depends on PARAVIRT
+   default y
+   help
+ VMI provides a paravirtualized interface to multiple hypervisors
+ include VMware ESX server and Xen by connecting to a ROM module
+ provided by the hypervisor.
 
 config ACPI_SRAT
bool
diff -r d8711b11c1eb arch/i386/kernel/Makefile
--- a/arch/i386/kernel/Makefile Tue Dec 12 13:51:06 2006 -0800
+++ b/arch/i386/kernel/Makefile Tue Dec 12 13:51:13 2006 -0800
@@ -39,6 +39,8 @@ obj-$(CONFIG_EARLY_PRINTK)+= early_prin
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
 obj-$(CONFIG_HPET_TIMER)   += hpet.o
 obj-$(CONFIG_K8_NB)+= k8.o
+
+obj-$(CONFIG_VMI)  += vmi.o
 
 # Make sure this is linked after any other paravirt_ops structs: see head.S
 obj-$(CONFIG_PARAVIRT) += paravirt.o
diff -r d8711b11c1eb arch/i386/kernel/head.S
--- a/arch/i386/kernel/head.S   Tue Dec 12 13:51:06 2006 -0800
+++ b/arch/i386/kernel/head.S   Tue Dec 12 13:51:13 2006 -0800
@@ -360,7 +360,7 @@ 1:  movb $1,X86_HARD_MATH
  * cpu_gdt_table and boot_pda; for secondary CPUs, these will be
  * that CPU's GDT and PDA.
  */
-setup_pda:
+ENTRY(setup_pda)
/* get the PDA pointer */
movl start_pda, %eax
 
diff -r d8711b11c1eb arch/i386/kernel/io_apic.c
--- a/arch/i386/kernel/io_apic.cTue Dec 12 13:51:06 2006 -0800
+++ b/arch/i386/kernel/io_apic.cTue Dec 12 13:51:13 2006 -0800
@@ -1914,7 +1914,7 @@ static void __init setup_ioapic_ids_from
 static void __init setup_ioapic_ids_from_mpc(void) { }
 #endif
 
-static int no_timer_check __initdata;
+int no_timer_check __initdata;
 
 static int __init notimercheck(char *s)
 {
diff -r d8711b11c1eb arch/i386/kernel/setup.c
--- a/arch/i386/kernel/setup.c  Tue Dec 12 13:51:06 2006 -0800
+++ b/arch/i386/kernel/setup.c  Tue Dec 12 13:51:13 2006 -0800
@@ -60,6 +60,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -581,6 +582,14 @@ void __init setup_arch(char **cmdline_p)
 
max_low_pfn = setup_memory();
 
+#ifdef CONFIG_VMI
+   /*
+* Must be after max_low_pfn is determined, and before kernel
+* pagetables are setup.
+*/
+   vmi_init();
+#endif
+
/*
 * NOTE: before this point _nobody_ is allowed to allocate
 * any memory using the bootmem allocator.  Although the
diff -r d8711b11c1eb arch/i386/kernel/smpboot.c
--- a/arch/i386/kernel/smpboot.cTue Dec 12 13:51:06 2006 -0800
+++ b/arch/i386/kernel/smpboot.cTue Dec 12 13:51:13 2006 -0800
@@ -63,6 +63,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* Set if we find a B stepping CPU */
 static int __devinitdata smp_b_stepping;
@@ -547,6 +548,9 @@ static void __devinit start_secondary(vo
 * booting is too fragile that we want to limit the
 * things done here to the most necessary things.
 */
+#ifdef CONFIG_VMI
+   vmi_bringup();
+#endif
secondary_cpu_init();
preempt_disable();
smp_callin();
diff -r d8711b11c1eb arch/i386/mm/pgtable.c
--- a/arch/i386/mm/pgtable.cTue Dec 12 13:51:06 2006 -0800
+++ b/arch/i386/mm/pgtable.cTue Dec 12 13:51:13 2006 -0800
@@ -171,6 +171,8 @@ void reserve_top_address(unsigned long r
 void reserve_top_address(unsigned long reserve)
 {
BUG_ON(fixmaps > 0);
+   printk(KERN_INFO "Reserving virtual address space above 0x%08x\n",
+  (int)-reserve);
 #ifdef CONFIG_COMPAT_VDSO
BUG_ON(reserve != 0);
 #else
diff -r d8711b11c1eb include/asm-i386/timer.h
--- a/include/asm-i386/timer.h  Tue Dec 12 13:51:06 2006 -0800
+++ b/include/asm-i386/timer.h  Tue Dec 12 13:51:13 2006 -0800
@@ -8,6 +8,7 @@ void setup_pit_timer(void);
 /* Modifiers for buggy PIT handling */
 extern int pit_latch_buggy;
 extern int timer_ack;
+extern int no_timer_check;
 extern int recalibrate_cpu_khz(void);
 
 #endif
diff -r d8711b11c1eb arch/i386/kernel/vmi.c
--- /dev/null   Thu Jan 01 00:00:00 1970 +
+++ b/arch/i386/kernel/vmi.cTue Dec 12 13:51:13 2006 -0800
@@ -0,0 +1,901 @@
+/*
+ * VMI specific paravirt-ops implementation
+ *
+ * Copyright (C) 2005, VMware, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distri

[PATCH 1/6] Page allocation hooks for VMI backend

The VMI backend uses explicit page type notification to track shadow
page tables.  The allocation of page table roots is especially tricky.
We need to clone the root for non-PAE mode while it is protected under
the pgd lock to correctly copy the shadow.

We don't need to allocate pgds in PAE mode, (PDPs in Intel terminology)
as they only have 4 entries, and are cached entirely by the processor,
which makes shadowing them rather simple.

For base page table level allocation, pmd_populate provides the exact hook
point we need.  Also, we need to allocate pages when splitting a large page,
and we must release pages before returning the page to any free pool.

Despite being required with these slightly odd semantics for VMI, Xen also 
uses these hooks to determine the exact moment when page tables are created
or released.

Subject: Page allocation hooks for VMI backend
Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

===
--- a/arch/i386/kernel/paravirt.c
+++ b/arch/i386/kernel/paravirt.c
@@ -545,6 +545,12 @@ struct paravirt_ops paravirt_ops = {
.flush_tlb_kernel = native_flush_tlb_global,
.flush_tlb_single = native_flush_tlb_single,
 
+   .alloc_pt = (void *)native_nop,
+   .alloc_pd = (void *)native_nop,
+   .alloc_pd_clone = (void *)native_nop,
+   .release_pt = (void *)native_nop,
+   .release_pd = (void *)native_nop,
+
.set_pte = native_set_pte,
.set_pte_at = native_set_pte_at,
.set_pmd = native_set_pmd,
===
--- a/arch/i386/mm/init.c
+++ b/arch/i386/mm/init.c
@@ -62,6 +62,7 @@ static pmd_t * __init one_md_table_init(

 #ifdef CONFIG_X86_PAE
pmd_table = (pmd_t *) alloc_bootmem_low_pages(PAGE_SIZE);
+   paravirt_alloc_pd(__pa(pmd_table) >> PAGE_SHIFT);
set_pgd(pgd, __pgd(__pa(pmd_table) | _PAGE_PRESENT));
pud = pud_offset(pgd, 0);
if (pmd_table != pmd_offset(pud, 0)) 
@@ -82,6 +83,7 @@ static pte_t * __init one_page_table_ini
 {
if (pmd_none(*pmd)) {
pte_t *page_table = (pte_t *) 
alloc_bootmem_low_pages(PAGE_SIZE);
+   paravirt_alloc_pt(__pa(page_table) >> PAGE_SHIFT);
set_pmd(pmd, __pmd(__pa(page_table) | _PAGE_TABLE));
if (page_table != pte_offset_kernel(pmd, 0))
BUG();  
@@ -347,6 +349,8 @@ static void __init pagetable_init (void)
/* Init entries of the first-level page table to the zero page */
for (i = 0; i < PTRS_PER_PGD; i++)
set_pgd(pgd_base + i, __pgd(__pa(empty_zero_page) | 
_PAGE_PRESENT));
+#else
+   paravirt_alloc_pd(__pa(swapper_pg_dir) >> PAGE_SHIFT);
 #endif
 
/* Enable PSE if available */
===
--- a/arch/i386/mm/pageattr.c
+++ b/arch/i386/mm/pageattr.c
@@ -60,6 +60,7 @@ static struct page *split_large_page(uns
address = __pa(address);
addr = address & LARGE_PAGE_MASK; 
pbase = (pte_t *)page_address(base);
+   paravirt_alloc_pt(page_to_pfn(base));
for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE) {
set_pte(&pbase[i], pfn_pte(addr >> PAGE_SHIFT,
   addr == address ? prot : ref_prot));
@@ -166,6 +167,7 @@ __change_page_attr(struct page *page, pg
if (!PageReserved(kpte_page)) {
if (cpu_has_pse && (page_private(kpte_page) == 0)) {
ClearPagePrivate(kpte_page);
+   paravirt_release_pt(page_to_pfn(kpte_page));
list_add(&kpte_page->lru, &df_list);
revert_page(kpte_page, address);
}
===
--- a/arch/i386/mm/pgtable.c
+++ b/arch/i386/mm/pgtable.c
@@ -245,8 +245,14 @@ void pgd_ctor(void *pgd, kmem_cache_t *c
clone_pgd_range((pgd_t *)pgd + USER_PTRS_PER_PGD,
swapper_pg_dir + USER_PTRS_PER_PGD,
KERNEL_PGD_PTRS);
+
if (PTRS_PER_PMD > 1)
return;
+
+   /* must happen under lock */
+   paravirt_alloc_pd_clone(__pa(pgd) >> PAGE_SHIFT,
+   __pa(swapper_pg_dir) >> PAGE_SHIFT,
+   USER_PTRS_PER_PGD, PTRS_PER_PGD - USER_PTRS_PER_PGD);
 
pgd_list_add(pgd);
spin_unlock_irqrestore(&pgd_lock, flags);
@@ -257,6 +263,7 @@ void pgd_dtor(void *pgd, kmem_cache_t *c
 {
unsigned long flags; /* can be called from interrupt context */
 
+   paravirt_release_pd(__pa(pgd) >> PAGE_SHIFT);
spin_lock_irqsave(&pgd_lock, flags);
pgd_list_del(pgd);
spin_unlock_irqrestore(&pgd_lock, flags);
@@ -274,13 +281,18 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
pmd_t *pmd = kmem_cache_alloc(pmd_cache, GFP_KERNEL);
if (!pm

[PATCH 6/6] VMI timer patches

VMI timer code.  It works by taking over the local APIC clock when APIC is
configured, which requires a couple hooks into the APIC code.  The backend
timer code could be commonized into the timer infrastructure, but there are
some pieces missing (stolen time, in particular), and the exact semantics
of when to do accounting for NO_IDLE need to be shared between different
hypervisors as well.  So for now, VMI timer is a separate module.

Subject: VMI timer patches
Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

diff -r 77e4058e936b arch/i386/Kconfig
--- a/arch/i386/Kconfig Thu Dec 14 16:40:14 2006 -0800
+++ b/arch/i386/Kconfig Thu Dec 14 16:40:16 2006 -0800
@@ -1227,3 +1227,12 @@ config KTIME_SCALAR
 config KTIME_SCALAR
bool
default y
+
+config NO_IDLE_HZ
+   bool
+   depends on PARAVIRT
+   default y
+   help
+ Switches the regular HZ timer off when the system is going idle.
+ This helps a hypervisor detect that the Linux system is idle,
+ reducing the overhead of idle systems.
diff -r 77e4058e936b arch/i386/kernel/Makefile
--- a/arch/i386/kernel/Makefile Thu Dec 14 16:40:14 2006 -0800
+++ b/arch/i386/kernel/Makefile Thu Dec 14 16:40:16 2006 -0800
@@ -40,7 +40,7 @@ obj-$(CONFIG_HPET_TIMER)  += hpet.o
 obj-$(CONFIG_HPET_TIMER)   += hpet.o
 obj-$(CONFIG_K8_NB)+= k8.o
 
-obj-$(CONFIG_VMI)  += vmi.o
+obj-$(CONFIG_VMI)  += vmi.o vmitime.o
 
 # Make sure this is linked after any other paravirt_ops structs: see head.S
 obj-$(CONFIG_PARAVIRT) += paravirt.o
diff -r 77e4058e936b arch/i386/kernel/apic.c
--- a/arch/i386/kernel/apic.c   Thu Dec 14 16:40:14 2006 -0800
+++ b/arch/i386/kernel/apic.c   Thu Dec 14 16:40:16 2006 -0800
@@ -1395,7 +1395,7 @@ int __init APIC_init_uniprocessor (void)
if (!skip_ioapic_setup && nr_ioapics)
setup_IO_APIC();
 #endif
-   setup_boot_APIC_clock();
+   setup_boot_clock();
 
return 0;
 }
diff -r 77e4058e936b arch/i386/kernel/entry.S
--- a/arch/i386/kernel/entry.S  Thu Dec 14 16:40:14 2006 -0800
+++ b/arch/i386/kernel/entry.S  Thu Dec 14 16:40:16 2006 -0800
@@ -622,6 +622,11 @@ ENTRY(name)\
 /* The include is where all of the SMP etc. interrupts come from */
 #include "entry_arch.h"
 
+/* This alternate entry is needed because we hijack the apic LVTT */
+#if defined(CONFIG_VMI) && defined(CONFIG_X86_LOCAL_APIC)
+BUILD_INTERRUPT(apic_vmi_timer_interrupt,LOCAL_TIMER_VECTOR)
+#endif
+
 KPROBE_ENTRY(page_fault)
RING0_EC_FRAME
pushl $do_page_fault
diff -r 77e4058e936b arch/i386/kernel/paravirt.c
--- a/arch/i386/kernel/paravirt.c   Thu Dec 14 16:40:14 2006 -0800
+++ b/arch/i386/kernel/paravirt.c   Thu Dec 14 16:40:16 2006 -0800
@@ -544,6 +544,8 @@ struct paravirt_ops paravirt_ops = {
.apic_write = native_apic_write,
.apic_write_atomic = native_apic_write_atomic,
.apic_read = native_apic_read,
+   .setup_boot_clock = setup_boot_APIC_clock,
+   .setup_secondary_clock = setup_secondary_APIC_clock,
 #endif
.set_lazy_mode = (void *)native_nop,
 
diff -r 77e4058e936b arch/i386/kernel/smpboot.c
--- a/arch/i386/kernel/smpboot.cThu Dec 14 16:40:14 2006 -0800
+++ b/arch/i386/kernel/smpboot.cThu Dec 14 16:40:16 2006 -0800
@@ -556,7 +556,7 @@ static void __devinit start_secondary(vo
smp_callin();
while (!cpu_isset(smp_processor_id(), smp_commenced_mask))
rep_nop();
-   setup_secondary_APIC_clock();
+   setup_secondary_clock();
if (nmi_watchdog == NMI_IO_APIC) {
disable_8259A_irq(0);
enable_NMI_through_LVT0(NULL);
@@ -1330,7 +1330,7 @@ static void __init smp_boot_cpus(unsigne
 
smpboot_setup_io_apic();
 
-   setup_boot_APIC_clock();
+   setup_boot_clock();
 
/*
 * Synchronize the TSC with the AP
diff -r 77e4058e936b arch/i386/kernel/time.c
--- a/arch/i386/kernel/time.c   Thu Dec 14 16:40:14 2006 -0800
+++ b/arch/i386/kernel/time.c   Thu Dec 14 16:40:16 2006 -0800
@@ -232,6 +232,7 @@ static void sync_cmos_clock(unsigned lon
 static void sync_cmos_clock(unsigned long dummy);
 
 static DEFINE_TIMER(sync_cmos_timer, sync_cmos_clock, 0, 0);
+int no_sync_cmos_clock;
 
 static void sync_cmos_clock(unsigned long dummy)
 {
@@ -275,7 +276,8 @@ static void sync_cmos_clock(unsigned lon
 
 void notify_arch_cmos_timer(void)
 {
-   mod_timer(&sync_cmos_timer, jiffies + 1);
+   if (!no_sync_cmos_clock)
+   mod_timer(&sync_cmos_timer, jiffies + 1);
 }
 
 static long clock_cmos_diff;
diff -r 77e4058e936b arch/i386/kernel/tsc.c
--- a/arch/i386/kernel/tsc.cThu Dec 14 16:40:14 2006 -0800
+++ b/arch/i386/kernel/tsc.cThu Dec 14 16:40:16 2006 -0800
@@ -23,6 +23,7 @@
  * an extra value to store the TSC freq
  */
 unsigned int tsc_khz;
+unsigned long long (*custom_sched_clock)(void);
 
 int tsc_disable __cpuinitdata = 0;

[PATCH 0/6] VMI paravirt-ops patches

These are the patches for the VMI backend to paravirt-ops.  Base
kernel where I tested them was 2.6.19-git20.

Basically, there are only a couple of hooks needed that were left
out of the initial paravirt-ops merge, and then the backend code
is a very straightforward implementation of the paravirt-ops
functions.

Andrew or Linus, please apply or shoot me nasty feedback that I
will promptly turn into marvelous looking code.  I've Cc'd Andi,
who originally was going to take up the patches, but seems to
have been snowed in.

Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/6] SMP boot hook for paravirt

Add VMI SMP boot hook.  We emulate a regular boot sequence and use the
same APIC IPI initiation, we just poke magic values to load into the CPU
state when the startup IPI is received, rather than having to jump through a
real mode trampoline.

This is all that was needed to get SMP to work.

Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>
Subject: SMP boot hook for paravirt

diff -r acfb7a15715f arch/i386/kernel/paravirt.c
--- a/arch/i386/kernel/paravirt.c   Thu Dec 14 16:22:03 2006 -0800
+++ b/arch/i386/kernel/paravirt.c   Thu Dec 14 16:51:48 2006 -0800
@@ -572,5 +572,7 @@ struct paravirt_ops paravirt_ops = {
 
.irq_enable_sysexit = native_irq_enable_sysexit,
.iret = native_iret,
+
+   .startup_ipi_hook = (void *)native_nop,
 };
 EXPORT_SYMBOL(paravirt_ops);
diff -r acfb7a15715f arch/i386/kernel/smpboot.c
--- a/arch/i386/kernel/smpboot.cThu Dec 14 16:22:03 2006 -0800
+++ b/arch/i386/kernel/smpboot.cThu Dec 14 16:51:52 2006 -0800
@@ -831,6 +831,13 @@ wakeup_secondary_cpu(int phys_apicid, un
num_starts = 0;
 
/*
+* Paravirt / VMI wants a startup IPI hook here to set up the
+* target processor state.
+*/
+   startup_ipi_hook(phys_apicid, (unsigned long) start_secondary,
+(unsigned long) stack_start.esp);
+
+   /*
 * Run STARTUP IPI loop.
 */
Dprintk("#startup loops: %d.\n", num_starts);
diff -r acfb7a15715f include/asm-i386/paravirt.h
--- a/include/asm-i386/paravirt.h   Thu Dec 14 16:22:03 2006 -0800
+++ b/include/asm-i386/paravirt.h   Thu Dec 14 16:51:48 2006 -0800
@@ -151,6 +151,8 @@ struct paravirt_ops
/* These two are jmp to, not actually called. */
void (fastcall *irq_enable_sysexit)(void);
void (fastcall *iret)(void);
+
+   void (fastcall *startup_ipi_hook)(int phys_apicid, unsigned long 
start_eip, unsigned long start_esp);
 };
 
 /* Mark a paravirt probe function. */
@@ -323,6 +325,13 @@ static inline unsigned long apic_read(un
 }
 #endif
 
+#ifdef CONFIG_SMP
+static inline void startup_ipi_hook(int phys_apicid, unsigned long start_eip,
+   unsigned long start_esp)
+{
+   return paravirt_ops.startup_ipi_hook(phys_apicid, start_eip, start_esp);
+}
+#endif
 
 #define __flush_tlb() paravirt_ops.flush_tlb_user()
 #define __flush_tlb_global() paravirt_ops.flush_tlb_kernel()
diff -r acfb7a15715f include/asm-i386/smp.h
--- a/include/asm-i386/smp.hThu Dec 14 16:22:03 2006 -0800
+++ b/include/asm-i386/smp.hThu Dec 14 16:52:21 2006 -0800
@@ -52,6 +52,11 @@ extern void cpu_uninit(void);
 extern void cpu_uninit(void);
 #endif
 
+#ifndef CONFIG_PARAVIRT
+#define startup_ipi_hook(phys_apicid, start_eip, start_esp)\
+do { } while (0)
+#endif
+
 /*
  * This function is needed by all SMP systems. It must _always_ be valid
  * from the initial startup. We map APIC_BASE very early in page_setup(),
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: 2.6.18.4: flush_workqueue calls mutex_lock in interrupt environment

2006-12-14 Thread Chen, Kenneth W

Chen, Kenneth wrote on Thursday, December 14, 2006 5:59 PM
> > It seems utterly insane to have aio_complete() flush a workqueue. That
> > function has to be called from a number of different environments,
> > including non-sleep tolerant environments.
> > 
> > For instance it means that directIO on NFS will now cause the rpciod
> > workqueues to call flush_workqueue(aio_wq), thus slowing down all RPC
> > activity.
> 
> The bug appears to be somewhere else, somehow the ref count on ioctx is
> all messed up.
> 
> In aio_complete, __put_ioctx() should not be invoked because ref count
> on ioctx is supposedly more than 2, aio_complete decrement it once and
> should return without invoking the free function.
> 
> The real freeing ioctx should be coming from exit_aio() or io_destroy(),
> in which case both wait until no further pending AIO request via
> wait_for_all_aios().

Ah, I think I see the bug: it must be a race between io_destroy() and
aio_complete().  A possible scenario:

cpu0   cpu1
io_destroy aio_complete
  wait_for_all_aios {__aio_put_req
 ... ctx->reqs_active--;
 if (!ctx->reqs_active)
return;
  }
  ...
  put_ioctx(ioctx)

 put_ioctx(ctx);
bam! Bug trigger!

AIO finished on cpu1 and while in the middle of aio_complete, cpu0 starts
io_destroy sequence, sees no pending AIO, went ahead decrement the ref
count on ioctx.  At a later point in aio_complete, the put_ioctx decrement
last ref count and calls the ioctx freeing function and there it triggered
the bug warning.

A simple fix would be to access ctx->reqs_active inside ctx spin lock in 
wait_for_all_aios().  At the mean time, I would like to
remove ref counting
for each iocb because we already performing ref count using reqs_active. This
would also prevent similar buggy code in the future.


Signed-off-by: Ken Chen <[EMAIL PROTECTED]>

--- ./fs/aio.c.orig 2006-11-29 13:57:37.0 -0800
+++ ./fs/aio.c  2006-12-14 20:45:14.0 -0800
@@ -298,17 +298,23 @@ static void wait_for_all_aios(struct kio
struct task_struct *tsk = current;
DECLARE_WAITQUEUE(wait, tsk);
 
+   spin_lock_irq(&ctx->ctx_lock);
if (!ctx->reqs_active)
-   return;
+   goto out;
 
add_wait_queue(&ctx->wait, &wait);
set_task_state(tsk, TASK_UNINTERRUPTIBLE);
while (ctx->reqs_active) {
+   spin_unlock_irq(&ctx->ctx_lock);
schedule();
set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+   spin_lock_irq(&ctx->ctx_lock);
}
__set_task_state(tsk, TASK_RUNNING);
remove_wait_queue(&ctx->wait, &wait);
+
+out:
+   spin_unlock_irq(&ctx->ctx_lock);
 }
 
 /* wait_on_sync_kiocb:
@@ -425,7 +431,6 @@ static struct kiocb fastcall *__aio_get_
ring = kmap_atomic(ctx->ring_info.ring_pages[0], KM_USER0);
if (ctx->reqs_active < aio_ring_avail(&ctx->ring_info, ring)) {
list_add(&req->ki_list, &ctx->active_reqs);
-   get_ioctx(ctx);
ctx->reqs_active++;
okay = 1;
}
@@ -538,8 +543,6 @@ int fastcall aio_put_req(struct kiocb *r
spin_lock_irq(&ctx->ctx_lock);
ret = __aio_put_req(ctx, req);
spin_unlock_irq(&ctx->ctx_lock);
-   if (ret)
-   put_ioctx(ctx);
return ret;
 }
 
@@ -795,8 +798,7 @@ static int __aio_run_iocbs(struct kioctx
 */
iocb->ki_users++;   /* grab extra reference */
aio_run_iocb(iocb);
-   if (__aio_put_req(ctx, iocb))  /* drop extra ref */
-   put_ioctx(ctx);
+   __aio_put_req(ctx, iocb);
}
if (!list_empty(&ctx->run_list))
return 1;
@@ -942,7 +944,6 @@ int fastcall aio_complete(struct kiocb *
struct io_event *event;
unsigned long   flags;
unsigned long   tail;
-   int ret;
 
/*
 * Special case handling for sync iocbs:
@@ -1011,18 +1012,12 @@ int fastcall aio_complete(struct kiocb *
pr_debug("%ld retries: %zd of %zd\n", iocb->ki_retried,
iocb->ki_nbytes - iocb->ki_left, iocb->ki_nbytes);
 put_rq:
-   /* everything turned out well, dispose of the aiocb. */
-   ret = __aio_put_req(ctx, iocb);
-
spin_unlock_irqrestore(&ctx->ctx_lock, flags);
 
if (waitqueue_active(&ctx->wait))
wake_up(&ctx->wait);
 
-   if (ret)
-   put_ioctx(ctx);
-
-   return ret;
+   return aio_put_req(iocb);
 }
 
 /* aio_read_evt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.20-rc1-mm1


Temporarily at

http://userweb.kernel.org/~akpm/2.6.20-rc1-mm1/

Will appear later at


ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc1/2.6.20-rc1-mm1/



- Added the avr32 devel tree as git-avr32.patch (Haavard Skinnemoen)

- Don't enable locking API self-tests on powerpc - it explodes in a
  spectacular fashion.




Boilerplate:

- See the `hot-fixes' directory for any important updates to this patchset.

- To fetch an -mm tree using git, use (for example)

  git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git 
tag v2.6.16-rc2-mm1
  git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1

- -mm kernel commit activity can be reviewed by subscribing to the
  mm-commits mailing list.

echo "subscribe mm-commits" | mail [EMAIL PROTECTED]

- If you hit a bug in -mm and it is not obvious which patch caused it, it is
  most valuable if you can perform a bisection search to identify which patch
  introduced the bug.  Instructions for this process are at

http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt

  But beware that this process takes some time (around ten rebuilds and
  reboots), so consider reporting the bug first and if we cannot immediately
  identify the faulty patch, then perform the bisection search.

- When reporting bugs, please try to Cc: the relevant maintainer and mailing
  list on any email.

- When reporting bugs in this kernel via email, please also rewrite the
  email Subject: in some manner to reflect the nature of the bug.  Some
  developers filter by Subject: when looking for messages to read.

- Semi-daily snapshots of the -mm lineup are uploaded to
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on
  the mm-commits list.



Changes since 2.6.19-mm1:

 origin.patch
 git-acpi.patch
 git-alsa.patch
 git-avr32.patch
 git-cpufreq.patch
 git-drm.patch
 git-dvb.patch
 git-gfs2-nmw.patch
 git-ieee1394.patch
 git-infiniband.patch
 git-libata-all.patch
 git-lxdialog.patch
 git-mmc.patch
 git-mmc-fixup.patch
 git-mtd.patch
 git-ubi.patch
 git-netdev-all.patch
 git-ioat.patch
 git-ocfs2.patch
 git-pcmcia.patch
 git-chelsio.patch
 git-selinux.patch
 git-pciseg.patch
 git-s390.patch
 git-sh.patch
 git-sas.patch
 git-sparc64.patch
 git-qla3xxx.patch
 git-wireless.patch
 git-gccbug.patch

 git trees.

-x86-smp-export-smp_num_siblings-for-oprofile.patch
-tty-export-get_current_tty.patch
-ieee80211softmac-fix-errors-related-to-the-work_struct-changes.patch
-kvm-add-missing-include.patch
-kvm-put-kvm-in-a-new-virtualization-menu.patch
-kvm-clean-up-amd-svm-debug-registers-load-and-unload.patch
-kvm-replace-__x86_64__-with-config_x86_64.patch
-fix-more-workqueue-build-breakage-tps65010.patch
-another-build-fix-header-rearrangements-osk.patch
-uml-fix-net_kern-workqueue-abuse.patch
-isdn-gigaset-fix-possible-missing-wakeup.patch
-i2o_exec_exit-and-i2o_driver_exit-should-not-be-__exit.patch
-cpufreq-fix-bug-in-duplicate-freq-elimination-code-in-acpi-cpufreq.patch
-gregkh-driver-modules-state.patch
-gregkh-driver-driver-core-delete-virtual-directory-on-class_unregister.patch
-gregkh-driver-debugfs-inotify-create-mkdir-support.patch
-gregkh-driver-debugfs-coding-style-fixes.patch
-gregkh-driver-debugfs-file-directory-creation-error-handling.patch
-gregkh-driver-debugfs-more-file-directory-creation-error-handling.patch
-gregkh-driver-debugfs-file-directory-removal-fix.patch
-gregkh-driver-driver-core-platform_driver_probe-can-save-codespace-save-codespace.patch
-gregkh-driver-driver-core-make-platform_device_add_data-accept-a-const-pointer.patch
-gregkh-driver-driver-core-deprecate-pm_legacy-default-it-to-n.patch
-drm-fix-return-value-check.patch
-drm-handle-pci_enable_device-failure.patch
-jdelvare-i2c-i2c-documentation-typos.patch
-jdelvare-i2c-i2c-update-i2c-id-list.patch
-jdelvare-i2c-i2c-delete-ite-bus-driver.patch
-jdelvare-i2c-i2c-pnx-new-driver.patch
-jdelvare-i2c-i2c-ibm_iic-add_request_release_mem_region.patch
-jdelvare-i2c-i2c-nforce2-cleanup.patch
-jdelvare-i2c-i2c-lockdep-handle-recursive-locking.patch
-jdelvare-i2c-i2c-at91-new-bus-driver.patch
-jdelvare-i2c-i2c-dev-make-I2C_FUNCS-ioctl-faster.patch
-jdelvare-i2c-i2c-remove-extraneous-whitespace.patch
-jdelvare-i2c-i2c-core-use-__ATTR.patch
-jdelvare-i2c-i2c-i801-documentation-update.patch
-jdelvare-i2c-i2c-fix-broken-ds1337-initialization.patch
-jdelvare-i2c-i2c-versatile-new-arm-bus-driver.patch
-jdelvare-i2c-i2c-discard-del-bus-wrappers.patch
-jdelvare-i2c-i2c-i801-enable-PEC-on-ICH6.patch
-jdelvare-i2c-i2c-dev-fix-return-value-check.patch
-jdelvare-i2c-i2c-dev-merge-kfree.patch
-jdelvare-i2c-i2c-omap-prescaler-formula.patch
-jdelvare-hwmon-hwmon-f71805f-add-fanctl-1-prepare.patch
-jdelvare-hwmon-hwmon-f71805f-add-fanctl-2-manual-mode.patch
-jdelvare-hwmon-hwmon-f71805f-add-fanctl-3-pwm-freq.patch
-jdelvare-hwmon-hwmon-f71805f-add-fanctl-4-pwm-mode.patch
-jdelvare-hwmon-hwmon-f71805f-add-fanctl-5-speed-mode.patch
-jd

crash in 'wake_up_interruptible()' on SMP

2006-12-14 Thread kiran kumar


Can some one explain why I see the below crash on Intel Xeon SMP box.
The kernel version is 2.6.11. This is what I'm trying to do in the
driver.

1.Submit a request to a device in 'unlocked_ioctl()' and issue
'wait_event_interruptible_timeout()' for 10 jiffies. There can be many
such outstanding requests issued by different processes and all these
are placed in a queue.
2.The 'wake_up_interruptible()' is issued either from tasklet or a
poll-thread which polls on the status of the request.
3. The request queue is protected using  'spin_lock_bh/spin_unlock_bh'
to be softIRQ safe. I'm stating this to point that
'spin_lock_irqsave/spin_lock_irqrestore' is issued only within waitQ.

If i either not use 'unlocked_ioctl()' i.e. use ioctl() (or)comment
out 'wake_up_interruptible()' call I don't see the crash. Is
wake_up_interruptible SMP safe???

Regards,
kiran

/-/
[EMAIL PROTECTED] ~]# [ cut here ]
kernel BUG at include/asm/spinlock.h:112!
invalid operand:  [#1]
SMP
Modules linked in: pkp_drv(U) md5 ipv6 parport_pc lp parport autofs4
rfcomm l2cap bluetooth sunrpc dm_mod video button battery ac uhci_hcd
hw_random i2c_i801 i2c_core shpchp e1000 e100 mii floppy sata_sil
libata scsi_mod ext3 jbd
CPU:0
EIP:0060:[]Tainted: P  VLI
EFLAGS: 00210002   (2.6.11-1.1369_FC4smp)
EIP is at _spin_unlock_irqrestore+0x26/0x30
eax: 0001   ebx: cfefb810   ecx: cfefb810   edx: 00200292
esi:    edi: e0af0f20   ebp:    esp: c8ae2e10
ds: 007b   es: 007b   ss: 0068
Process swamp (pid: 5920, threadinfo=c8ae2000 task=c79d4a80)
Stack: badc0ded e0c8e1af  e0af0268 e0a86ef8 e0c87f46 0802 
  00200286 e0a86ef8 00200286 e0c9c580  e0c87dd1 cfec1810 d028a810
   bf9fc228 e0c8dab4 0001 c8ae2000 3f37331a bf9fc228 e0c9c580
/--/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: GPL only modules

2006-12-14 Thread Alexandre Oliva

On Dec 14, 2006, "Jeff V. Merkey" <[EMAIL PROTECTED]> wrote:

> FREE implies a transfer of ownsership

It's about freedom, not price.  And even then, it's the license that
has not cost, not the copyright.

> and you also have to contend with the Doctrine of Estoppel.  i.e. if
> someone has been using the code for over two years, and you have not
> brought a cause of action, you are BARRED from doing so under the
> Doctrine of Estoppel and statute of limitations.

Sure, but we're not necessarily talking about code that is two years
old.  We're talking about future releases.  Then, if someone
interfaces with code that was already there before, they might claim
they're still entitled to do so.  But if it's new code they interface
with, or new code they wrote after this clarification is published,
would they still be entitled to estoppel?  FWIW, IANAL.

-- 
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer   [EMAIL PROTECTED], gcc.gnu.org}
Free Software Evangelist  [EMAIL PROTECTED], gnu.org}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Introduce time_data, a new structure to hold jiffies, xtime, xtime_lock, wall_to_monotonic, calc_load_count and avenrun

On Wed, 13 Dec 2006 22:26:26 +0100
Eric Dumazet <[EMAIL PROTECTED]> wrote:

> This patch introduces a new structure called time_data, where some time 
> keeping related variables are put together to share as few cache lines as 
> possible.

ia64 refers to xtime_lock from assembly and hence doesn't link.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-14 Thread Al Boldi

Nikolai Joukov wrote:
> > Nikolai Joukov wrote:
> > > We have designed a new stackable file system that we called RAIF:
> > > Redundant Array of Independent Filesystems.
> >
> > Great!
> >
> > > We have performed some benchmarking on a 3GHz PC with 2GB of RAM and
> > > U320 SCSI disks.  Compared to the Linux RAID driver, RAIF has
> > > overheads of about 20-25% under the Postmark v1.5 benchmark in case of
> > > striping and replication.  In case of RAID4 and RAID5-like
> > > configurations, RAIF performed about two times *better* than software
> > > RAID and even better than an Adaptec 2120S RAID5 controller.
> >
> > I am not surprised.  RAID 4/5/6 performance is highly sensitive to the
> > underlying hw, and thus needs a fair amount of fine tuning.
>
> Nevertheless, performance is not the biggest advantage of RAIF.  For
> read-biased workloads RAID is always slightly faster than RAIF.  The
> biggest advantages of RAIF are flexible configurations (e.g., can combine
> NFS and local file systems), per-file-type storage policies, and the fact
> that files are stored as files on the lower file systems (which is
> convenient).

Ok, a I was just about to inform you of a three nfs-branch raif which was 
unable to fill the net pipe.  So it looks like a 25% performance hit across 
the board.  Should be possible to reduce to sub 3% though once RAIF matures, 
don't you think?


> > > This is because RAIF is located above
> > > file system caches and can cache parity as normal data when needed. 
> > > We have more performance details in a technical report, if anyone is
> > > interested.
> >
> > Definitely interested.  Can you give a link?
>
> The main focus of the paper is on a general OS profiling method and not
> on RAIF.  However, it has some details about the RAIF benchmarking with
> Postmark in Chapter 9:
>
>   
>
> Figures 9.7 and 9.8 also show profiles of the Linux RAID5 and RAIF5
> operation under the same Postmark workload.


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-14 Thread Al Boldi

Nikolai Joukov wrote:
> > > We started the project in April 2004.  Right now I am using it as my
> > > /home/kolya file system at home.  We believe that at this stage RAIF
> > > is mature enough for others to try it out.  The code is available at:
> > >
> > >   
> > >
> > > The code requires no kernel patches and compiles for a wide range of
> > > kernels as a module.  The latest kernel we used it for is 2.6.13 and
> > > we are in the process of porting it to 2.6.19.
> > >
> > > We will be happy to hear your back.
> >
> > When removing a file from the underlying branch, the oops below happens.
> > Wouldn't it be possible to just fail the branch instead of oopsing?
>
> This is a known problem of all Linux stackable file systems.  Users are
> not supposed to change the file systems below mounted stackable file
> systems (but they can read them).  One of the ways to enforce it is to use
> overlay mounts.  For example, mount the lower file systems at
> /raif/b0 ... /raif/bN and then mount RAIF at /raif.  Stackable file
> systems recently started getting into the kernel and we hope that there
> will be a better solution for this problem in the future.  Having said
> that, you are right: failing the branch would be the right thing to do.

Good.  It seems that there is also some tmpfs/raif-over-nfs deadlock 
situation.  Can't really tell if it's the kernel or the raif, but when do 
you think the patches could be brought into sync with the current mainline?


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/4] lumpy reclaim v2

On Wed, 6 Dec 2006 16:59:35 +
Andy Whitcroft <[EMAIL PROTECTED]> wrote:

> + tmp = __pfn_to_page(pfn);

ia64 doesn't implement __page_to_pfn.  Why did you not use page_to_pfn()?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Abolishing the DMCA

2006-12-14 Thread Alexandre Oliva

On Dec 14, 2006, Greg KH <[EMAIL PROTECTED]> wrote:

> I think you missed the point that my patch prevents valid usages of
> non-GPL modules from happening, which is not acceptable.

What if you changed your patch so as to only permit loading of
possibly-infringing drivers after some flag in /proc is set, and
logging to the console a message explaining (i) why such drivers might
be infringing and how to contact the copyright holders to get the
infringement stopped, and (ii) how to get it loaded if you believe
it's ok.

Then the patch would change from a probably-harmful DRM technique to
an educational tool, that wouldn't impose any major inconvenience to
those who are entitled to use the combination of code that can't be
distributed.

-- 
Alexandre Oliva http://www.lsd.ic.unicamp.br/~oliva/
FSF Latin America Board Member http://www.fsfla.org/
Red Hat Compiler Engineer   [EMAIL PROTECTED], gcc.gnu.org}
Free Software Evangelist  [EMAIL PROTECTED], gnu.org}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 007 of 14] knfsd: SUNRPC: Provide room in svc_rqst for larger addresses

On Wed, 13 Dec 2006 10:59:11 +1100
NeilBrown <[EMAIL PROTECTED]> wrote:

> From: Chuck Lever <[EMAIL PROTECTED]>
> Expand the rq_addr field to allow it to contain larger addresses.

This patch breaks the NFS server on my heroically modern RH FC1 machine.

There's a mysterious 30-second pause when initscripts are bringing up
mountd.

showmount (from a FC5 client) works:

box:/usr/src/25> 0 showmount -e vmm
Export list for vmm:
/ *
/mnt/hda5 *

But things get really exciting when we try to mount it:


box:/usr/src/25> 0 mount vmm:/mnt/hda5 /mnt 
*** buffer overflow detected ***: mount terminated
=== Backtrace: =
/lib64/libc.so.6(__chk_fail+0x2f)[0x32adbdfaef]
mount[0x40bcf8]
mount[0x4044c5]
mount[0x405850]
mount[0x406388]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x32adb1ce54]
mount[0x4034a9]
=== Memory map: 
0040-00414000 r-xp  08:01 3041513
/bin/mount
00513000-00514000 rw-p 00013000 08:01 3041513
/bin/mount
00514000-00516000 rw-p 00514000 00:00 0 
00613000-00615000 rw-p 00013000 08:01 3041513
/bin/mount
00615000-00636000 rw-p 00615000 00:00 0  [heap]
32ad90-32ad91a000 r-xp  08:01 1619031
/lib64/ld-2.4.so
32ada19000-32ada1a000 r--p 00019000 08:01 1619031
/lib64/ld-2.4.so
32ada1a000-32ada1b000 rw-p 0001a000 08:01 1619031
/lib64/ld-2.4.so
32adb0-32adc3f000 r-xp  08:01 1619091
/lib64/libc-2.4.so
32adc3f000-32add3f000 ---p 0013f000 08:01 1619091
/lib64/libc-2.4.so
32add3f000-32add43000 r--p 0013f000 08:01 1619091
/lib64/libc-2.4.so
32add43000-32add44000 rw-p 00143000 08:01 1619091
/lib64/libc-2.4.so
32add44000-32add49000 rw-p 32add44000 00:00 0 
32ade0-32ade02000 r-xp  08:01 1619011
/lib64/libuuid.so.1.2
32ade02000-32adf02000 ---p 2000 08:01 1619011
/lib64/libuuid.so.1.2
32adf02000-32adf03000 rw-p 2000 08:01 1619011
/lib64/libuuid.so.1.2
32ae00-32ae002000 r-xp  08:01 1619095
/lib64/libdl-2.4.so
32ae002000-32ae102000 ---p 2000 08:01 1619095
/lib64/libdl-2.4.so
32ae102000-32ae103000 r--p 2000 08:01 1619095
/lib64/libdl-2.4.so
32ae103000-32ae104000 rw-p 3000 08:01 1619095
/lib64/libdl-2.4.so
32ae20-32ae20e000 r-xp  08:01 1619005
/lib64/libdevmapper.so.1.02
32ae20e000-32ae30e000 ---p e000 08:01 1619005
/lib64/libdevmapper.so.1.02
32ae30e000-32ae31 rw-p e000 08:01 1619005
/lib64/libdevmapper.so.1.02
32ae40-32ae408000 r-xp  08:01 1619066
/lib64/libblkid.so.1.0
32ae408000-32ae508000 ---p 8000 08:01 1619066
/lib64/libblkid.so.1.0
32ae508000-32ae509000 rw-p 8000 08:01 1619066
/lib64/libblkid.so.1.0
32b060-32b060d000 r-xp  08:01 1619093
/lib64/libgcc_s-4.1.1-20060525.so.1
32b060d000-32b070d000 ---p d000 08:01 1619093
/lib64/libgcc_s-4.1.1-20060525.so.1
32b070d000-32b070e000 rw-p d000 08:01 1619093
/lib64/libgcc_s-4.1.1-20060525.so.1
32b280-32b2814000 r-xp  08:01 1619102
/lib64/libselinux.so.1
32b2814000-32b2913000 ---p 00014000 08:01 1619102
/lib64/libselinux.so.1
32b2913000-32b2915000 rw-p 00013000 08:01 1619102
/lib64/libselinux.so.1
32b2915000-32b2916000 rw-p 32b2915000 00:00 0 
32b2a0-32b2a38000 r-xp  08:01 1619101
/lib64/libsepol.so.1
32b2a38000-32b2b37000 ---p 00038000 08:01 1619101
/lib64/libsepol.so.1
32b2b37000-32b2b38000 rw-p 00037000 08:01 1619101
/lib64/libsepol.so.1
32b2b38000-32b2b42000 rw-p 32b2b38000 00:00 0 
2b9eea00c000-2b9eea00d000 rw-p 2b9eea00c000 00:00 0 
2b9eea032000-2b9eea036000 rw-p 2b9eea032000 00:00 0 
2b9eea036000-2b9eea039000 r-xp  08:01 1618858
/lib64/libsetrans.so.0
2b9eea039000-2b9eea138000 ---p 3000 08:01 1618858
/lib64/libsetrans.so.0
2b9eea138000-2b9eea139000 rw-p 2000 08:01 1618858
/lib64/libsetrans.so.0
2b9eea139000-2b9eea143000 r-xp  08:01 1619053
/lib64/libnss_files-2.4.so
2b9eea143000-2b9eea242000 ---p a000 08:01 1619053
/lib64/libnss_files-2.4.so
2b9eea242000-2b9eea243000 r--p 9000 08:01 1619053
/lib64/libnss_files-2.4.so
2b9eea243000-2b9eea244000 rw-p a000 08:01 1619053
/lib64/libnss_files-2.4.so
7fffc0a8800

Re: 2.6.18.3 also 2.6.19 XFS xfs_force_shutdown (was: XFS internal error [...])

2006-12-14 Thread David Chinner

On Thu, Dec 14, 2006 at 06:21:49PM +0900, Shinichiro HIDA wrote:
> Hi,
> 
> ;; Sorry for late, and Thanks for following up.
> 
> > In <[EMAIL PROTECTED]> 
> > David Chinner <[EMAIL PROTECTED]> wrote:
> > On Wed, Dec 13, 2006 at 02:12:23PM +0900, Shinichiro HIDA wrote:
> > > Hi,
> > > 
> > > I met same problem on my 2 machines, 2.6.19 (Debian unstable) also
> > > 2.6.18.3 (Debian stable),
> > Should have been preceeded with some other output explaining the
> > reason for the shutdown.

> Dec 12 21:31:25 lune kernel: xfs_da_do_buf: bno 16777216
> Dec 12 21:31:25 lune kernel: dir: inode 9078346
> Dec 12 21:31:25 lune kernel: Filesystem "hdf5": XFS internal error 
> xfs_da_do_buf(1) at line 1995 of file fs/xfs/xfs_da_btree.c.  Caller 
> 0xc02982ec

Ok, that bno (16777216) is a definite sign of corruption
caused by the 2.6.17.x (x <=6) kernels.

> > Did these machines run 2.6.17.x where x<= 6?
> > i.e. is this problem:
> 
> > http://oss.sgi.com/projects/xfs/faq.html#dir2
> 
> Yes, I could boot this machine(lune) with 2.6.17.6. 

I wasn't suggesting that you use this kernel - that could cause more
corruption to occur.  What I was asking is if you have run a kernel
of this version in the past (i.e. before you upgraded to 2.6.18.3)?

Regardless, I suggest you get the latest xfsprogs and run xfs_repair
on your filesystems to fix the problem.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[BUG -rt] scheduling in atomic.

2006-12-14 Thread Steven Rostedt

Ingo,

I've hit this. I compiled the kernel as CONFIG_PREEMPT, and turned off
IRQ's as threads.

BUG: scheduling while atomic: swapper/0x0001/1, CPU#3

Call Trace:
 [] dump_trace+0xaa/0x404
 [] show_trace+0x3c/0x52
 [] dump_stack+0x15/0x17
 [] __sched_text_start+0x8a/0xbb7
 [] schedule+0xd3/0xf3
 [] flush_cpu_workqueue+0x72/0xa4
 [] flush_workqueue+0x6d/0x95
 [] schedule_on_each_cpu+0xe8/0xff
 [] filevec_add_drain_all+0x12/0x14
 [] remove_proc_entry+0xaf/0x258
 [] unregister_handler_proc+0x23/0x48
 [] free_irq+0xda/0x114
 [] i8042_probe+0x338/0x75c
 [] platform_drv_probe+0x12/0x14
 [] really_probe+0x54/0xee
 [] driver_probe_device+0xae/0xba
 [] __device_attach+0x9/0xb
 [] bus_for_each_drv+0x47/0x7d
 [] device_attach+0x65/0x79
 [] bus_attach_device+0x24/0x4c
 [] device_add+0x38f/0x505
 [] platform_device_add+0x11a/0x152
 [] i8042_init+0x2b0/0x30d
 [] init+0x182/0x344
 [] child_rip+0xa/0x12


Seems that we have this in remove_proc_entry:

spin_lock(&proc_subdir_lock);
for (p = &parent->subdir; *p; p=&(*p)->next ) {

[...]

proc_kill_inodes(de);

[...]

}
spin_unlock(&proc_subdir_lock);

And in proc_kill_inodes:

static void proc_kill_inodes(struct proc_dir_entry *de)
{
struct file *filp;
struct super_block *sb = proc_mnt->mnt_sb;

/*
 * Actually it's a partial revoke().
 */
filevec_add_drain_all();

[...]
}

and in filevec_add_drain_all:

int filevec_add_drain_all(void)
{
return schedule_on_each_cpu(filevec_add_drain_per_cpu, NULL);
}


And schedule_on_each_cpu is easily schedulable.

So it seems that it schedules while holding a spin lock.

I don't know this code very well, and don't have time to look too deep
into it, but I figure that I would report it.

-- Steve




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2.6.20-rc1] fix vm_events_fold_cpu() build breakage

2006-12-14 Thread Magnus Damm

fix vm_events_fold_cpu() build breakage

2.6.20-rc1 does not build properly if CONFIG_VM_EVENT_COUNTERS is set
and CONFIG_HOTPLUG is unset:

  CC  init/version.o
  LD  init/built-in.o
  LD  .tmp_vmlinux1
mm/built-in.o: In function `page_alloc_cpu_notify':
page_alloc.c:(.text+0x56eb): undefined reference to `vm_events_fold_cpu'
make: *** [.tmp_vmlinux1] Error 1

Signed-Off-By: Magnus Damm <[EMAIL PROTECTED]>
---

 Applies on top of linux-2.6.20-rc1.

 include/linux/vmstat.h |4 
 1 file changed, 4 insertions(+)

--- 0001/include/linux/vmstat.h
+++ 0003/include/linux/vmstat.h 2006-12-15 11:46:23.0 +0900
@@ -73,7 +73,11 @@ static inline void count_vm_events(enum 
 }
 
 extern void all_vm_events(unsigned long *);
+#ifdef CONFIG_HOTPLUG
 extern void vm_events_fold_cpu(int cpu);
+#else
+#define vm_events_fold_cpu(x)  do { } while (0)
+#endif
 
 #else
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ieee1394 in 2.6.20-rc1 (was Re: Linux 2.6.20-rc1)

2006-12-14 Thread Gene Heskett

On Thursday 14 December 2006 12:48, Stefan Richter wrote:
[...]
>
>(Anyway, that's unrelated to Gene's issues.)

And which I haven't had a chance to check yet, the camera is still in the 
truck and I've been busier than a one legged man in an ass kicking 
contest today.  I did get 2.6.20-rc1 built and its whats running, but 
that is as far as I got, too many other honeydo's.  Tomorrow hopefully.  
If I don't wind up using a backhoe for a divining rod, looking for our 
sewer which is beginning to nag us occasionally.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2006 by Maurice Eugene Heskett, all rights reserved.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.18 mmap hangs unrelated apps

2006-12-14 Thread Michal Sabala

Hello LKML,

I am observing processes entering uninterruptible sleep apparently due
to an unrelated application using mmap over nfs. Applications in
"uninterruptible sleep" hang indefinitely while other applications
continue working properly.

The code causing the mmap nfs hangs does the following:
(as replicated by the included test-mmap.c file)

  1. create file on nfs (file_A, descr_A)
  2. make file_A a sparse 200MB file
  3. mmap descr_A
  4. close descr_A
  5. unlink file_A
  6. memcpy 200MB to mmaped buffer
  7. create a second file on nfs (file_B, descr_B)
  8. write() 200MB from mmaped buffer to descr_B
  9. close descr_B
  10. munmap first file

This code may need to be ran tens to hundred runs to trigger the condition.

During the execution of the above code, unrelated applications enter
uninterruptible sleep (D) - usually firefox2.0, Xorg/XFree86, gimp2.2, gconfd
or bash; probably the most active processes.

`dmesg` shows nothing of interest.

`free` shows anywhere between 1MB and 80MB of memory still remaining
free when the problem occurs.

`cat /proc/*PID*/wchan` for all hanging processes contains page_sync.

* Client Setups:

  Linux 2.6.18 debian kernel (not tainted)
  Intel P3/800
  512MB ram
  0 swap
  NFS root (rw,noatime,rsize=8192,wsize=8192,nfsvers=3,hard,lock,udp)
  NIC: 100mbit tulip Cardbus
  NFS server is Linux 2.6.8 (debian)
  Gnome running with ooffice, gimp2.2 and firefox2 open

  and

  Linux 2.6.18 debian kernel (not tainted)
  Intel P4/2.8
  mem=192M boot option
  0 swap
  NFS home (rw,nosuid,rsize=8192,wsize=8192,hard)
  NIC: 100mbit e100 PCI
  NFS server is Apple OSX 10.3
  Gnome running with ooffice, gimp2.2 and firefox2 open

This happens with NFS servers based on Linux 2.6.8 and OSX 10.3.x. There
is nothing unusual in the server log files. Other than large nfs mmaps
on limited ram clients, NFS clients are 100% stable (file locking, performance,
6 month uptimes, etc..)

NOTE:
  I also ran the same code on the P4 machine in /tmp (local disk)
and it too caused some applications to enter uninterruptible sleep
(dozens of consecutive runs were needed). As such this looks not to
be directly related to nfs.

I would like to assist in any way I can in tracking this bug. I am open
to running patched kernels, etc...

Thank You,
 Sincerely,

   Michal Sabala



PS. thank you for all the hard work on the Linux kernel.


--- test-mmap.c: 

#include 
#include 
#include 
#include 
#include 
#include 

#include 

int main (int argc, char * argv[] ){

  char * data = 0;
  int blocks = 12800;
  int bSize = 16384;
  
  char mmapFileName[] = "temp-XX";
  int mmapFileDes = mkstemp( mmapFileName );
  if ( mmapFileDes == -1 ){
printf( "cannot make temporary file %s !\n", mmapFileName );
exit( -1 );
  }

  printf( "using desc %d tempfile %s\n", mmapFileDes, mmapFileName );

  errno = 0;  
  if ( lseek( mmapFileDes, (blocks*bSize)-1, SEEK_SET ) == -1 ){
if ( errno != 0 ){
perror ( "lseek error: " );
}
printf(  "cannot lseek tempfile %s !\n", mmapFileName);
close( mmapFileDes );
unlink( mmapFileName );
exit( -1 );
  }

  if ( write( mmapFileDes, "X", 1 ) != 1 ){
printf(  "cannot sparse write tempfile %s !\n", mmapFileName);
close( mmapFileDes );
unlink( mmapFileName );
exit( -1 );
  }

  data = mmap ( NULL, (blocks*bSize), PROT_READ | PROT_WRITE, MAP_SHARED, 
mmapFileDes, 0 );
  if ( data == (void *) -1 ){
printf(  "mmap of %s failed!\n", mmapFileName );
close( mmapFileDes );
unlink( mmapFileName );
exit( -1 );
  }

  printf( "block size: %d, blocks num: %d\n", bSize, blocks);

  close( mmapFileDes );
  unlink( mmapFileName );

  int i;
  char * ptr = data;
  for ( i = 1; i <= blocks; i++ ){
printf( "wrote %d of %d blocks to %s\n", i, blocks, mmapFileName );
memset( ptr, 0, bSize ); 
ptr += bSize;
  }

  // msync( data, blocks*bSize, MS_SYNC );

  char destFile[] = "destination-XX";
  int destDes = mkstemp( destFile );
  if ( destDes == -1 ){
printf( "cannot make destination file %s !\n", destFile );
exit( -1 );
  }

  printf( "using desc %d destfile %s\n", destDes, destFile);
 
  ptr = data;
  for ( i = 1; i <= blocks; i++ ){
int wLen = write( destDes, ptr, bSize );
printf( "wrote %d of %d blocks to %s\n", i, blocks, destFile );
if ( wLen != bSize ){
  printf( "debug: short write to %s at %d bytes\n", destFile, wLen );
}
ptr += bSize;
  }
  
  close( destDes );
  
  munmap( data, blocks*bSize );

  exit( 0 );
}

-- 
Michal "Saahbs" Sabala
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: 2.6.18.4: flush_workqueue calls mutex_lock in interrupt environment

2006-12-14 Thread Chen, Kenneth W

Andrew Morton wrote on Thursday, December 14, 2006 5:20 PM
> it's hard to disagree.
> 
> Begin forwarded message:
> > On Wed, 2006-12-13 at 08:25 +0100, xb wrote:
> > > Hi all,
> > > 
> > > Running some IO stress tests on a 8*ways IA64 platform, we got:
> > >  BUG: warning at kernel/mutex.c:132/__mutex_lock_common()  message
> > > followed by:
> > >  Unable to handle kernel paging request at virtual address
> > > 00200200
> > > oops corresponding to anon_vma_unlink() calling list_del() on a
> > > poisonned list.
> > > 
> > > Having a look to the stack, we see that flush_workqueue() calls
> > > mutex_lock() with softirqs disabled.
> > 
> > something is wrong here... flush_workqueue() is a sleeping function and
> > is not allowed to be called in such a context!
> 
> It seems utterly insane to have aio_complete() flush a workqueue. That
> function has to be called from a number of different environments,
> including non-sleep tolerant environments.
> 
> For instance it means that directIO on NFS will now cause the rpciod
> workqueues to call flush_workqueue(aio_wq), thus slowing down all RPC
> activity.

The bug appears to be somewhere else, somehow the ref count on ioctx is
all messed up.

In aio_complete, __put_ioctx() should not be invoked because ref count
on ioctx is supposedly more than 2, aio_complete decrement it once and
should return without invoking the free function.

The real freeing ioctx should be coming from exit_aio() or io_destroy(),
in which case both wait until no further pending AIO request via
wait_for_all_aios().

- Ken
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 09/24] PKT_SCHED act_gact: division by zero

2.6.18-stable review patch.  If anyone has any objections, please let us know.
--

From: David Miller <[EMAIL PROTECTED]>

Not returning -EINVAL, because someone might want to use the value
zero in some future gact_prob algorithm?

Signed-off-by: Kim Nordlund <[EMAIL PROTECTED]>
Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
---
 net/sched/act_gact.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- linux-2.6.18.5.orig/net/sched/act_gact.c
+++ linux-2.6.18.5/net/sched/act_gact.c
@@ -54,14 +54,14 @@ static DEFINE_RWLOCK(gact_lock);
 #ifdef CONFIG_GACT_PROB
 static int gact_net_rand(struct tcf_gact *p)
 {
-   if (net_random()%p->pval)
+   if (!p->pval || net_random()%p->pval)
return p->action;
return p->paction;
 }
 
 static int gact_determ(struct tcf_gact *p)
 {
-   if (p->bstats.packets%p->pval)
+   if (!p->pval || p->bstats.packets%p->pval)
return p->action;
return p->paction;
 }

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 11/24] XFRM: Use output device disable_xfrm for forwarded packets

2.6.18-stable review patch.  If anyone has any objections, please let us know.
--

From: David Miller <[EMAIL PROTECTED]>

Currently the behaviour of disable_xfrm is inconsistent between
locally generated and forwarded packets. For locally generated
packets disable_xfrm disables the policy lookup if it is set on
the output device, for forwarded traffic however it looks at the
input device. This makes it impossible to disable xfrm on all
devices but a dummy device and use normal routing to direct
traffic to that device.

Always use the output device when checking disable_xfrm.

Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>
Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
---
commit 9be2b4e36fb04bbc968693ef95a75acc17cf2931
Author: Patrick McHardy <[EMAIL PROTECTED]>
Date:   Mon Dec 4 19:59:00 2006 -0800

 net/ipv4/route.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-2.6.18.5.orig/net/ipv4/route.c
+++ linux-2.6.18.5/net/ipv4/route.c
@@ -1775,7 +1775,7 @@ static inline int __mkroute_input(struct
 #endif
if (in_dev->cnf.no_policy)
rth->u.dst.flags |= DST_NOPOLICY;
-   if (in_dev->cnf.no_xfrm)
+   if (out_dev->cnf.no_xfrm)
rth->u.dst.flags |= DST_NOXFRM;
rth->fl.fl4_dst = daddr;
rth->rt_dst = daddr;

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 21/24] softirq: remove BUG_ONs which can incorrectly trigger

2.6.18-stable review patch.  If anyone has any objections, please let us know.
--

From: Zachary Amsden <[EMAIL PROTECTED]>

It is possible to have tasklets get scheduled before softirqd has had a chance
to spawn on all CPUs.  This is totally harmless; after success during action
CPU_UP_PREPARE, action CPU_ONLINE will be called, which immediately wakes
softirqd on the appropriate CPU to process the already pending tasklets.  So
there is no danger of having a missed wakeup for any tasklets that were
already pending.

In particular, i386 is affected by this during startup, and is visible when
using a very large initrd; during the time it takes for the initrd to be
decompressed, a timer IRQ can come in and schedule RCU callbacks.  It is also
possible that resending of a hardware IRQ via a softirq triggers the same bug.

Because of different timing conditions, this shows up in all emulators and
virtual machines tested, including Xen, VMware, Virtual PC, and Qemu.  It is
also possible to trigger on native hardware with a large enough initrd,
although I don't have a reliable case demonstrating that.

Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Cc: Ingo Molnar <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
---

 kernel/softirq.c |2 --
 1 file changed, 2 deletions(-)

--- linux-2.6.18.5.orig/kernel/softirq.c
+++ linux-2.6.18.5/kernel/softirq.c
@@ -574,8 +574,6 @@ static int __cpuinit cpu_callback(struct
 
switch (action) {
case CPU_UP_PREPARE:
-   BUG_ON(per_cpu(tasklet_vec, hotcpu).list);
-   BUG_ON(per_cpu(tasklet_hi_vec, hotcpu).list);
p = kthread_create(ksoftirqd, hcpu, "ksoftirqd/%d", hotcpu);
if (IS_ERR(p)) {
printk("ksoftirqd for %i failed\n", hotcpu);

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 14/24] IrDA: Incorrect TTP header reservation

2.6.18-stable review patch.  If anyone has any objections, please let us know.
--

From: Jeet Chaudhuri <[EMAIL PROTECTED]>

We must reserve SAR + MAX_HEADER bytes for IrLMP to fit in.
This fixes an oops reported (and fixed) by Jeet Chaudhuri, when max_sdu_size
is greater than 0.

Signed-off-by: Samuel Ortiz <[EMAIL PROTECTED]>
Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>

---
 net/irda/irttp.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- linux-2.6.18.5.orig/net/irda/irttp.c
+++ linux-2.6.18.5/net/irda/irttp.c
@@ -1098,7 +1098,7 @@ int irttp_connect_request(struct tsap_cb
return -ENOMEM;
 
/* Reserve space for MUX_CONTROL and LAP header */
-   skb_reserve(tx_skb, TTP_MAX_HEADER);
+   skb_reserve(tx_skb, TTP_MAX_HEADER + TTP_SAR_HEADER);
} else {
tx_skb = userdata;
/*
@@ -1346,7 +1346,7 @@ int irttp_connect_response(struct tsap_c
return -ENOMEM;
 
/* Reserve space for MUX_CONTROL and LAP header */
-   skb_reserve(tx_skb, TTP_MAX_HEADER);
+   skb_reserve(tx_skb, TTP_MAX_HEADER + TTP_SAR_HEADER);
} else {
tx_skb = userdata;
/*

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] procfs: Fix race between proc_readdir and remove_proc_entry

2006-12-14 Thread Darrick J. Wong

Oops, sent a corrupt and old version of the patch.  Here's
the correct patch.

While running a insmod/rmmod loop with the mptsas driver
(vanilla 2.6.19, IBM Intellistation Z30, SAS1064E controller
if it matters), I encountered a bad dereference of the
pointer "de":

spin_unlock(&proc_subdir_lock);
if (filldir(dirent, de->name, de->namelen, filp->f_pos,
de->low_ino, de->mode >> 12) < 0)
goto out;
spin_lock(&proc_subdir_lock);
filp->f_pos++;
de = de->next;

I believe what's happening here is that proc_readdir drops
proc_subdir_lock to call filldir() on the /proc/mpt directory
at the same time mptbase is being unloaded.  The unload causes
the removal of /proc/mpt, which means that de is overwritten
with the slab poison value as it is being freed.  We reacquire
the lock and try to grab the next value of de, but by then the
next pointer has been lost, and we crash.

Signed-off-by: Darrick J. Wong <[EMAIL PROTECTED]>
---

 fs/proc/generic.c  |7 +--
 fs/proc/inode.c|4 ++--
 fs/proc/internal.h |3 +++
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index 4ba0300..7e77d7e 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -429,7 +429,7 @@ struct dentry *proc_lookup(struct inode 
 int proc_readdir(struct file * filp,
void * dirent, filldir_t filldir)
 {
-   struct proc_dir_entry * de;
+   struct proc_dir_entry * de, *next;
unsigned int ino;
int i;
struct inode *inode = filp->f_dentry->d_inode;
@@ -477,13 +477,16 @@ int proc_readdir(struct file * filp,
 
do {
/* filldir passes info to user space */
+   de_get(de);
spin_unlock(&proc_subdir_lock);
if (filldir(dirent, de->name, de->namelen, 
filp->f_pos,
de->low_ino, de->mode >> 12) < 0)
goto out;
spin_lock(&proc_subdir_lock);
filp->f_pos++;
-   de = de->next;
+   next = de->next;
+   de_put(de);
+   de = next;
} while (de);
spin_unlock(&proc_subdir_lock);
}
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 49dfb2a..4b5a61c 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -21,7 +21,7 @@ #include 
 
 #include "internal.h"
 
-static inline struct proc_dir_entry * de_get(struct proc_dir_entry *de)
+struct proc_dir_entry * de_get(struct proc_dir_entry *de)
 {
if (de)
atomic_inc(&de->count);
@@ -31,7 +31,7 @@ static inline struct proc_dir_entry * de
 /*
  * Decrements the use count and checks for deferred deletion.
  */
-static void de_put(struct proc_dir_entry *de)
+void de_put(struct proc_dir_entry *de)
 {
if (de) {   
lock_kernel();  
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 987c773..f4751ac 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -65,3 +65,6 @@ static inline int proc_fd(struct inode *
 {
return PROC_I(inode)->fd;
 }
+
+struct proc_dir_entry * de_get(struct proc_dir_entry *de);
+void de_put(struct proc_dir_entry *de);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 08/24] NETFILTER: ip_tables: revision support for compat code

2.6.18-stable review patch.  If anyone has any objections, please let us know.
--

From: Patrick McHardy <[EMAIL PROTECTED]>

---
commit 79030ed07de673e8451a03aecb9ada9f4d75d491
tree 4ba8bd843c8bc95db0ea6877880b73d06da620e5
parent bec71b162747708d4b45b0cd399b484f52f2901a
author Patrick McHardy <[EMAIL PROTECTED]> Wed, 20 Sep 2006 12:05:08 -0700
committer David S. Miller <[EMAIL PROTECTED]> Fri, 22 Sep 2006 15:20:00 -0700

 net/ipv4/netfilter/ip_tables.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- linux-2.6.18.5.orig/net/ipv4/netfilter/ip_tables.c
+++ linux-2.6.18.5/net/ipv4/netfilter/ip_tables.c
@@ -1989,6 +1989,8 @@ compat_get_entries(struct compat_ipt_get
return ret;
 }
 
+static int do_ipt_get_ctl(struct sock *, int, void __user *, int *);
+
 static int
 compat_do_ipt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
@@ -2005,8 +2007,7 @@ compat_do_ipt_get_ctl(struct sock *sk, i
ret = compat_get_entries(user, len);
break;
default:
-   duprintf("compat_do_ipt_get_ctl: unknown request %i\n", cmd);
-   ret = -EINVAL;
+   ret = do_ipt_get_ctl(sk, cmd, user, len);
}
return ret;
 }

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 04/24] EBTABLES: Deal with the worst-case behaviour in loop checks.

2.6.18-stable review patch.  If anyone has any objections, please let us know.
--

From: Al Viro <[EMAIL PROTECTED]>

No need to revisit a chain we'd already finished with during
the check for current hook.  It's either instant loop (which
we'd just detected) or a duplicate work.

Signed-off-by: Al Viro <[EMAIL PROTECTED]>
Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
---
 net/bridge/netfilter/ebtables.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- linux-2.6.18.5.orig/net/bridge/netfilter/ebtables.c
+++ linux-2.6.18.5/net/bridge/netfilter/ebtables.c
@@ -739,7 +739,9 @@ static int check_chainloops(struct ebt_e
BUGPRINT("loop\n");
return -1;
}
-   /* this can't be 0, so the above test is correct */
+   if (cl_s[i].hookmask & (1 << hooknr))
+   goto letscontinue;
+   /* this can't be 0, so the loop test is correct */
cl_s[i].cs.n = pos + 1;
pos = 0;
cl_s[i].cs.e = ((void *)e + e->next_offset);

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 05/24] EBTABLES: Prevent wraparounds in checks for entry components sizes.

2.6.18-stable review patch.  If anyone has any objections, please let us know.
--

From: Al Viro <[EMAIL PROTECTED]>

---
 net/bridge/netfilter/ebtables.c |   17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

--- linux-2.6.18.5.orig/net/bridge/netfilter/ebtables.c
+++ linux-2.6.18.5/net/bridge/netfilter/ebtables.c
@@ -360,10 +360,11 @@ ebt_check_match(struct ebt_entry_match *
const char *name, unsigned int hookmask, unsigned int *cnt)
 {
struct ebt_match *match;
+   size_t left = ((char *)e + e->watchers_offset) - (char *)m;
int ret;
 
-   if (((char *)m) + m->match_size + sizeof(struct ebt_entry_match) >
-  ((char *)e) + e->watchers_offset)
+   if (left < sizeof(struct ebt_entry_match) ||
+   left - sizeof(struct ebt_entry_match) < m->match_size)
return -EINVAL;
match = find_match_lock(m->u.name, &ret, &ebt_mutex);
if (!match)
@@ -389,10 +390,11 @@ ebt_check_watcher(struct ebt_entry_watch
const char *name, unsigned int hookmask, unsigned int *cnt)
 {
struct ebt_watcher *watcher;
+   size_t left = ((char *)e + e->target_offset) - (char *)w;
int ret;
 
-   if (((char *)w) + w->watcher_size + sizeof(struct ebt_entry_watcher) >
-  ((char *)e) + e->target_offset)
+   if (left < sizeof(struct ebt_entry_watcher) ||
+  left - sizeof(struct ebt_entry_watcher) < w->watcher_size)
return -EINVAL;
watcher = find_watcher_lock(w->u.name, &ret, &ebt_mutex);
if (!watcher)
@@ -595,6 +597,7 @@ ebt_check_entry(struct ebt_entry *e, str
struct ebt_entry_target *t;
struct ebt_target *target;
unsigned int i, j, hook = 0, hookmask = 0;
+   size_t gap = e->next_offset - e->target_offset;
int ret;
 
/* don't mess with the struct ebt_entries */
@@ -656,8 +659,7 @@ ebt_check_entry(struct ebt_entry *e, str
 
t->u.target = target;
if (t->u.target == &ebt_standard_target) {
-   if (e->target_offset + sizeof(struct ebt_standard_target) >
-  e->next_offset) {
+   if (gap < sizeof(struct ebt_standard_target)) {
BUGPRINT("Standard target size too big\n");
ret = -EFAULT;
goto cleanup_watchers;
@@ -668,8 +670,7 @@ ebt_check_entry(struct ebt_entry *e, str
ret = -EFAULT;
goto cleanup_watchers;
}
-   } else if ((e->target_offset + t->target_size +
-  sizeof(struct ebt_entry_target) > e->next_offset) ||
+   } else if (t->target_size > gap - sizeof(struct ebt_entry_target) ||
   (t->u.target->check &&
   t->u.target->check(name, hookmask, e, t->data, t->target_size) != 
0)){
module_put(t->u.target->me);

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 06/24] NET_SCHED: policer: restore compatibility with old iproute binaries

2.6.18-stable review patch.  If anyone has any objections, please let us know.
--

From: Patrick McHardy <[EMAIL PROTECTED]>

The tc actions increased the size of struct tc_police, which broke
compatibility with old iproute binaries since both the act_police
and the old NET_CLS_POLICE code check for an exact size match.

Since the new members are not even used, the simple fix is to also
accept the size of the old structure. Dumping is not affected since
old userspace will receive a bigger structure, which is handled fine.

Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>
Acked-by: Jamal Hadi Salim <[EMAIL PROTECTED]>
Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
---
 net/sched/act_police.c |   26 ++
 1 file changed, 22 insertions(+), 4 deletions(-)

--- linux-2.6.18.5.orig/net/sched/act_police.c
+++ linux-2.6.18.5/net/sched/act_police.c
@@ -44,6 +44,18 @@ static struct tcf_police *tcf_police_ht[
 /* Policer hash table lock */
 static DEFINE_RWLOCK(police_lock);
 
+/* old policer structure from before tc actions */
+struct tc_police_compat
+{
+   u32 index;
+   int action;
+   u32 limit;
+   u32 burst;
+   u32 mtu;
+   struct tc_ratespec  rate;
+   struct tc_ratespec  peakrate;
+};
+
 /* Each policer is serialized by its individual spinlock */
 
 static __inline__ unsigned tcf_police_hash(u32 index)
@@ -169,12 +181,15 @@ static int tcf_act_police_locate(struct 
struct tc_police *parm;
struct tcf_police *p;
struct qdisc_rate_table *R_tab = NULL, *P_tab = NULL;
+   int size;
 
if (rta == NULL || rtattr_parse_nested(tb, TCA_POLICE_MAX, rta) < 0)
return -EINVAL;
 
-   if (tb[TCA_POLICE_TBF-1] == NULL ||
-   RTA_PAYLOAD(tb[TCA_POLICE_TBF-1]) != sizeof(*parm))
+   if (tb[TCA_POLICE_TBF-1] == NULL)
+   return -EINVAL;
+   size = RTA_PAYLOAD(tb[TCA_POLICE_TBF-1]);
+   if (size != sizeof(*parm) && size != sizeof(struct tc_police_compat))
return -EINVAL;
parm = RTA_DATA(tb[TCA_POLICE_TBF-1]);
 
@@ -413,12 +428,15 @@ struct tcf_police * tcf_police_locate(st
struct tcf_police *p;
struct rtattr *tb[TCA_POLICE_MAX];
struct tc_police *parm;
+   int size;
 
if (rtattr_parse_nested(tb, TCA_POLICE_MAX, rta) < 0)
return NULL;
 
-   if (tb[TCA_POLICE_TBF-1] == NULL ||
-   RTA_PAYLOAD(tb[TCA_POLICE_TBF-1]) != sizeof(*parm))
+   if (tb[TCA_POLICE_TBF-1] == NULL)
+   return NULL;
+   size = RTA_PAYLOAD(tb[TCA_POLICE_TBF-1]);
+   if (size != sizeof(*parm) && size != sizeof(struct tc_police_compat))
return NULL;
 
parm = RTA_DATA(tb[TCA_POLICE_TBF-1]);

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 20/24] skip data conversion in compat_sys_mount when data_page is NULL

2.6.18-stable review patch.  If anyone has any objections, please let us know.
--

From: Andrey Mirkin <[EMAIL PROTECTED]>

OpenVZ Linux kernel team has found a problem with mounting in compat mode.

Simple command "mount -t smbfs ..." on Fedora Core 5 distro in 32-bit mode
leads to oops:

Unable to handle kernel NULL pointer dereference at  RIP:
[] compat_sys_mount+0xd6/0x290
PGD 34d48067 PUD 34d03067 PMD 0
Oops:  [1] SMP
CPU: 0
Modules linked in: iptable_nat simfs smbfs ip_nat ip_conntrack vzdquota
parport_pc lp parport 8021q bridge llc vznetdev vzmon nfs lockd sunrpc vzdev
iptable_filter af_packet xt_length ipt_ttl xt_tcpmss ipt_TCPMSS
iptable_mangle xt_limit ipt_tos ipt_REJECT ip_tables x_tables thermal
processor fan button battery asus_acpi ac uhci_hcd ehci_hcd usbcore i2c_i801
i2c_core e100 mii floppy ide_cd cdrom
Pid: 14656, comm: mount
RIP: 0060:[]  []
compat_sys_mount+0xd6/0x290
RSP: :810034d31f38  EFLAGS: 00010292
RAX: 002c RBX:  RCX: 
RDX: 810034c86bc0 RSI: 0096 RDI: 8061fc90
RBP: 810034d31f78 R08:  R09: 000d
R10: 810034d31e58 R11: 0001 R12: 810039dc3000
R13: 0805ea48 R14:  R15: c0ed
FS:  () GS:80749000(0033) knlGS:b7d556b0
CS:  0060 DS: 007b ES: 007b CR0: 8005003b
CR2:  CR3: 34d43000 CR4: 06e0
Process mount (pid: 14656, veid=300, threadinfo 810034d3, task
810034c86bc0)
Stack:   810034dd 810034e4a000 0805ea48
    
 0805ea48 8021e64e  
Call Trace:
 [] ia32_sysret+0x0/0xa

Code: 83 3b 06 0f 85 41 01 00 00 0f b7 43 0c 89 43 14 0f b7 43 0a
RIP  [] compat_sys_mount+0xd6/0x290
 RSP 
CR2: 

The problem is that data_page pointer can be NULL, so we should skip data
conversion in this case.

Signed-off-by: Andrey Mirkin <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
---

 fs/compat.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-2.6.18.5.orig/fs/compat.c
+++ linux-2.6.18.5/fs/compat.c
@@ -873,7 +873,7 @@ asmlinkage long compat_sys_mount(char __
 
retval = -EINVAL;
 
-   if (type_page) {
+   if (type_page && data_page) {
if (!strcmp((char *)type_page, SMBFS_NAME)) {
do_smb_super_data_conv((void *)data_page);
} else if (!strcmp((char *)type_page, NCPFS_NAME)) {

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 18/24] ieee1394: ohci1394: add PPC_PMAC platform code to driver probe

2.6.18-stable review patch.  If anyone has any objections, please let us know.
--

From: Stefan Richter <[EMAIL PROTECTED]>

Fixes http://bugzilla.kernel.org/show_bug.cgi?id=7431
iBook G3 threw a machine check exception and put the display backlight
to full brightness after ohci1394 was unloaded and reloaded.

Signed-off-by: Stefan Richter <[EMAIL PROTECTED]>
[EMAIL PROTECTED]: also added missing if condition, commit
 63cca59e89892497e95e1e9c7156d3345fb7e2e8]
Signed-off-by: Daniel Drake <[EMAIL PROTECTED]>
Acked-by: Stefan Richter <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
---
It fixes a kernel oops which occurs when the ohci1394 driver is reloaded on PPC
http://bugs.gentoo.org/154851

 drivers/ieee1394/ohci1394.c |   21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

--- linux-2.6.18.5.orig/drivers/ieee1394/ohci1394.c
+++ linux-2.6.18.5/drivers/ieee1394/ohci1394.c
@@ -3218,6 +3218,19 @@ static int __devinit ohci1394_pci_probe(
struct ti_ohci *ohci;   /* shortcut to currently handled device */
resource_size_t ohci_base;
 
+#ifdef CONFIG_PPC_PMAC
+   /* Necessary on some machines if ohci1394 was loaded/ unloaded before */
+   if (machine_is(powermac)) {
+   struct device_node *of_node = pci_device_to_OF_node(dev);
+
+   if (of_node) {
+   pmac_call_feature(PMAC_FTR_1394_CABLE_POWER, of_node,
+ 0, 1);
+   pmac_call_feature(PMAC_FTR_1394_ENABLE, of_node, 0, 1);
+   }
+   }
+#endif /* CONFIG_PPC_PMAC */
+
 if (pci_enable_device(dev))
FAIL(-ENXIO, "Failed to enable OHCI hardware");
 pci_set_master(dev);
@@ -3506,11 +3519,9 @@ static void ohci1394_pci_remove(struct p
 #endif
 
 #ifdef CONFIG_PPC_PMAC
-   /* On UniNorth, power down the cable and turn off the chip
-* clock when the module is removed to save power on
-* laptops. Turning it back ON is done by the arch code when
-* pci_enable_device() is called */
-   {
+   /* On UniNorth, power down the cable and turn off the chip clock
+* to save power on laptops */
+   if (machine_is(powermac)) {
struct device_node* of_node;
 
of_node = pci_device_to_OF_node(ohci->dev);

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 03/24] EBTABLES: Verify that ebt_entries have zero ->distinguisher.

2.6.18-stable review patch.  If anyone has any objections, please let us know.
--

From: Al Viro <[EMAIL PROTECTED]>

We need that for iterator to work; existing check had been too weak.

Signed-off-by: Al Viro <[EMAIL PROTECTED]>
Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
---
 net/bridge/netfilter/ebtables.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

--- linux-2.6.18.5.orig/net/bridge/netfilter/ebtables.c
+++ linux-2.6.18.5/net/bridge/netfilter/ebtables.c
@@ -439,7 +439,7 @@ ebt_check_entry_size_and_hooks(struct eb
/* beginning of a new chain
   if i == NF_BR_NUMHOOKS it must be a user defined chain */
if (i != NF_BR_NUMHOOKS || !(e->bitmask & EBT_ENTRY_OR_ENTRIES)) {
-   if ((e->bitmask & EBT_ENTRY_OR_ENTRIES) != 0) {
+   if (e->bitmask != 0) {
/* we make userspace set this right,
   so there is no misunderstanding */
BUGPRINT("EBT_ENTRY_OR_ENTRIES shouldn't be set "
@@ -522,7 +522,7 @@ ebt_get_udc_positions(struct ebt_entry *
int i;
 
/* we're only interested in chain starts */
-   if (e->bitmask & EBT_ENTRY_OR_ENTRIES)
+   if (e->bitmask)
return 0;
for (i = 0; i < NF_BR_NUMHOOKS; i++) {
if ((valid_hooks & (1 << i)) == 0)
@@ -572,7 +572,7 @@ ebt_cleanup_entry(struct ebt_entry *e, u
 {
struct ebt_entry_target *t;
 
-   if ((e->bitmask & EBT_ENTRY_OR_ENTRIES) == 0)
+   if (e->bitmask == 0)
return 0;
/* we're done */
if (cnt && (*cnt)-- == 0)
@@ -598,7 +598,7 @@ ebt_check_entry(struct ebt_entry *e, str
int ret;
 
/* don't mess with the struct ebt_entries */
-   if ((e->bitmask & EBT_ENTRY_OR_ENTRIES) == 0)
+   if (e->bitmask == 0)
return 0;
 
if (e->bitmask & ~EBT_F_MASK) {
@@ -1316,7 +1316,7 @@ static inline int ebt_make_names(struct 
char *hlp;
struct ebt_entry_target *t;
 
-   if ((e->bitmask & EBT_ENTRY_OR_ENTRIES) == 0)
+   if (e->bitmask == 0)
return 0;
 
hlp = ubase - base + (char *)e + e->target_offset;

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 23/24] forcedeth: Disable INTx when enabling MSI in forcedeth

2.6.18-stable review patch.  If anyone has any objections, please let us know.
--

From: Daniel Barkalow <[EMAIL PROTECTED]>

At least some nforce cards continue to send legacy interrupts when MSI
is enabled, and these interrupts are treated as unhandled by the
kernel. This patch disables legacy interrupts explicitly when enabling
MSI mode.

The correct fix is to change the MSI infrastructure to disable legacy
interrupts when enabling MSI, but this is potentially risky if the
device isn't PCI-2.3 or is quirky, so the correct fix is going into
mainline, while patches like this one go into -stable.

Legend has it that it is most correct to disable legacy interrupts
before enabling MSI, but the mainline patch does it in the other
order, and this patch is "obviously" the same as mainline.

Signed-off-by: Daniel Barkalow <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
---

The general patch got into mainline last night, and this patch is clearly 
the same as that one, limited to the case of forcedeth (the pci_intx() 
calls are lifted from {enable,disable}_msi_mode to all of the indirect 
callers in forcedeth).

 drivers/net/forcedeth.c |3 +++
 1 file changed, 3 insertions(+)

--- linux-2.6.18.5.orig/drivers/net/forcedeth.c
+++ linux-2.6.18.5/drivers/net/forcedeth.c
@@ -2692,11 +2692,13 @@ static int nv_request_irq(struct net_dev
}
if (ret != 0 && np->msi_flags & NV_MSI_CAPABLE) {
if ((ret = pci_enable_msi(np->pci_dev)) == 0) {
+   pci_intx(np->pci_dev, 0);
np->msi_flags |= NV_MSI_ENABLED;
if ((!intr_test && request_irq(np->pci_dev->irq, 
&nv_nic_irq, IRQF_SHARED, dev->name, dev) != 0) ||
(intr_test && request_irq(np->pci_dev->irq, 
&nv_nic_irq_test, IRQF_SHARED, dev->name, dev) != 0)) {
printk(KERN_INFO "forcedeth: request_irq failed 
%d\n", ret);
pci_disable_msi(np->pci_dev);
+   pci_intx(np->pci_dev, 1);
np->msi_flags &= ~NV_MSI_ENABLED;
goto out_err;
}
@@ -2739,6 +2741,7 @@ static void nv_free_irq(struct net_devic
free_irq(np->pci_dev->irq, dev);
if (np->msi_flags & NV_MSI_ENABLED) {
pci_disable_msi(np->pci_dev);
+   pci_intx(np->pci_dev, 1);
np->msi_flags &= ~NV_MSI_ENABLED;
}
}

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 24/24] Bluetooth: Add packet size checks for CAPI messages (CVE-2006-6106)

2.6.18-stable review patch.  If anyone has any objections, please let us know.
--

From: Marcel Holtmann <[EMAIL PROTECTED]>

With malformed packets it might be possible to overwrite internal
CMTP and CAPI data structures. This patch adds additional length
checks to prevent these kinds of remote attacks.

Signed-off-by: Marcel Holtmann <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
---

 net/bluetooth/cmtp/capi.c |   39 +--
 1 file changed, 33 insertions(+), 6 deletions(-)

--- linux-2.6.18.5.orig/net/bluetooth/cmtp/capi.c
+++ linux-2.6.18.5/net/bluetooth/cmtp/capi.c
@@ -196,6 +196,9 @@ static void cmtp_recv_interopmsg(struct 
 
switch (CAPIMSG_SUBCOMMAND(skb->data)) {
case CAPI_CONF:
+   if (skb->len < CAPI_MSG_BASELEN + 10)
+   break;
+
func = CAPIMSG_U16(skb->data, CAPI_MSG_BASELEN + 5);
info = CAPIMSG_U16(skb->data, CAPI_MSG_BASELEN + 8);
 
@@ -226,6 +229,9 @@ static void cmtp_recv_interopmsg(struct 
break;
 
case CAPI_FUNCTION_GET_PROFILE:
+   if (skb->len < CAPI_MSG_BASELEN + 11 + 
sizeof(capi_profile))
+   break;
+
controller = CAPIMSG_U16(skb->data, CAPI_MSG_BASELEN + 
11);
msgnum = CAPIMSG_MSGID(skb->data);
 
@@ -246,17 +252,26 @@ static void cmtp_recv_interopmsg(struct 
break;
 
case CAPI_FUNCTION_GET_MANUFACTURER:
+   if (skb->len < CAPI_MSG_BASELEN + 15)
+   break;
+
controller = CAPIMSG_U32(skb->data, CAPI_MSG_BASELEN + 
10);
 
if (!info && ctrl) {
+   int len = min_t(uint, CAPI_MANUFACTURER_LEN,
+   skb->data[CAPI_MSG_BASELEN + 
14]);
+
+   memset(ctrl->manu, 0, CAPI_MANUFACTURER_LEN);
strncpy(ctrl->manu,
-   skb->data + CAPI_MSG_BASELEN + 15,
-   skb->data[CAPI_MSG_BASELEN + 14]);
+   skb->data + CAPI_MSG_BASELEN + 15, len);
}
 
break;
 
case CAPI_FUNCTION_GET_VERSION:
+   if (skb->len < CAPI_MSG_BASELEN + 32)
+   break;
+
controller = CAPIMSG_U32(skb->data, CAPI_MSG_BASELEN + 
12);
 
if (!info && ctrl) {
@@ -269,13 +284,18 @@ static void cmtp_recv_interopmsg(struct 
break;
 
case CAPI_FUNCTION_GET_SERIAL_NUMBER:
+   if (skb->len < CAPI_MSG_BASELEN + 17)
+   break;
+
controller = CAPIMSG_U32(skb->data, CAPI_MSG_BASELEN + 
12);
 
if (!info && ctrl) {
+   int len = min_t(uint, CAPI_SERIAL_LEN,
+   skb->data[CAPI_MSG_BASELEN + 
16]);
+
memset(ctrl->serial, 0, CAPI_SERIAL_LEN);
strncpy(ctrl->serial,
-   skb->data + CAPI_MSG_BASELEN + 17,
-   skb->data[CAPI_MSG_BASELEN + 16]);
+   skb->data + CAPI_MSG_BASELEN + 17, len);
}
 
break;
@@ -284,14 +304,18 @@ static void cmtp_recv_interopmsg(struct 
break;
 
case CAPI_IND:
+   if (skb->len < CAPI_MSG_BASELEN + 6)
+   break;
+
func = CAPIMSG_U16(skb->data, CAPI_MSG_BASELEN + 3);
 
if (func == CAPI_FUNCTION_LOOPBACK) {
+   int len = min_t(uint, skb->len - CAPI_MSG_BASELEN - 6,
+   skb->data[CAPI_MSG_BASELEN + 
5]);
appl = CAPIMSG_APPID(skb->data);
msgnum = CAPIMSG_MSGID(skb->data);
cmtp_send_interopmsg(session, CAPI_RESP, appl, msgnum, 
func,
-   skb->data + CAPI_MSG_BASELEN + 
6,
-   skb->data[CAPI_MSG_BASELEN + 
5]);
+   skb->data + CAPI_MSG_BASELEN + 
6, len);
}
 
break;
@@ -309,6 +333,9 @@ void cmtp_recv_capimsg(struct cmtp_sessi
 
BT_DBG("session %p skb %p len %d", session, skb, skb->len);
 
+   if (skb->len < CAPI_MSG_BASELEN)
+   return;
+
if (CAPIMSG_COMMAND(skb->data) == CAPI_INTEROPERABILITY) {
cmtp_recv_interopmsg(session, skb);
return;

--
-
To unsubscribe from this list

[patch 02/24] EBTABLES: Fix wraparounds in ebt_entries verification.

2.6.18-stable review patch.  If anyone has any objections, please let us know.
--

From: Al Viro <[EMAIL PROTECTED]>

We need to verify that
a) we are not too close to the end of buffer to dereference
b) next entry we'll be checking won't be _before_ our

While we are at it, don't subtract unrelated pointers...

Signed-off-by: Al Viro <[EMAIL PROTECTED]>
Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
---
 net/bridge/netfilter/ebtables.c |   23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

--- linux-2.6.18.5.orig/net/bridge/netfilter/ebtables.c
+++ linux-2.6.18.5/net/bridge/netfilter/ebtables.c
@@ -423,13 +423,17 @@ ebt_check_entry_size_and_hooks(struct eb
struct ebt_entries **hook_entries, unsigned int *n, unsigned int *cnt,
unsigned int *totalcnt, unsigned int *udc_cnt, unsigned int valid_hooks)
 {
+   unsigned int offset = (char *)e - newinfo->entries;
+   size_t left = (limit - base) - offset;
int i;
 
+   if (left < sizeof(unsigned int))
+   goto Esmall;
+
for (i = 0; i < NF_BR_NUMHOOKS; i++) {
if ((valid_hooks & (1 << i)) == 0)
continue;
-   if ( (char *)hook_entries[i] - base ==
-  (char *)e - newinfo->entries)
+   if ((char *)hook_entries[i] == base + offset)
break;
}
/* beginning of a new chain
@@ -450,11 +454,8 @@ ebt_check_entry_size_and_hooks(struct eb
return -EINVAL;
}
/* before we look at the struct, be sure it is not too big */
-   if ((char *)hook_entries[i] + sizeof(struct ebt_entries)
-  > limit) {
-   BUGPRINT("entries_size too small\n");
-   return -EINVAL;
-   }
+   if (left < sizeof(struct ebt_entries))
+   goto Esmall;
if (((struct ebt_entries *)e)->policy != EBT_DROP &&
   ((struct ebt_entries *)e)->policy != EBT_ACCEPT) {
/* only RETURN from udc */
@@ -477,6 +478,8 @@ ebt_check_entry_size_and_hooks(struct eb
return 0;
}
/* a plain old entry, heh */
+   if (left < sizeof(struct ebt_entry))
+   goto Esmall;
if (sizeof(struct ebt_entry) > e->watchers_offset ||
   e->watchers_offset > e->target_offset ||
   e->target_offset >= e->next_offset) {
@@ -488,10 +491,16 @@ ebt_check_entry_size_and_hooks(struct eb
BUGPRINT("target size too small\n");
return -EINVAL;
}
+   if (left < e->next_offset)
+   goto Esmall;
 
(*cnt)++;
(*totalcnt)++;
return 0;
+
+Esmall:
+   BUGPRINT("entries_size too small\n");
+   return -EINVAL;
 }
 
 struct ebt_cl_stack

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 17/24] V4L: Fix broken TUNER_LG_NTSC_TAPE radio support

2.6.18-stable review patch.  If anyone has any objections, please let us know.
--

From: Hans Verkuil <[EMAIL PROTECTED]>

The TUNER_LG_NTSC_TAPE is identical in all respects to the
TUNER_PHILIPS_FM1236_MK3. So use the params struct for the Philips tuner.
Also add this LG_NTSC_TAPE tuner to the switches where radio specific
parameters are set so it behaves like a TUNER_PHILIPS_FM1236_MK3. This
change fixes the radio support for this tuner (the wrong bandswitch byte
was used).

Thanks to Andy Walls <[EMAIL PROTECTED]> for finding this bug.

Signed-off-by: Hans Verkuil <[EMAIL PROTECTED]>
Signed-off-by: Mauro Carvalho Chehab <[EMAIL PROTECTED]>
Signed-off-by: Michael Krufky <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>

---

 drivers/media/video/tuner-simple.c |2 ++
 drivers/media/video/tuner-types.c  |   14 ++
 2 files changed, 4 insertions(+), 12 deletions(-)

--- linux-2.6.18.5.orig/drivers/media/video/tuner-simple.c
+++ linux-2.6.18.5/drivers/media/video/tuner-simple.c
@@ -108,6 +108,7 @@ static int tuner_stereo(struct i2c_clien
case TUNER_PHILIPS_FM1216ME_MK3:
case TUNER_PHILIPS_FM1236_MK3:
case TUNER_PHILIPS_FM1256_IH3:
+   case TUNER_LG_NTSC_TAPE:
stereo = ((status & TUNER_SIGNAL) == TUNER_STEREO_MK3);
break;
default:
@@ -419,6 +420,7 @@ static void default_set_radio_freq(struc
case TUNER_PHILIPS_FM1216ME_MK3:
case TUNER_PHILIPS_FM1236_MK3:
case TUNER_PHILIPS_FMD1216ME_MK3:
+   case TUNER_LG_NTSC_TAPE:
buffer[3] = 0x19;
break;
case TUNER_TNF_5335MF:
--- linux-2.6.18.5.orig/drivers/media/video/tuner-types.c
+++ linux-2.6.18.5/drivers/media/video/tuner-types.c
@@ -671,16 +671,6 @@ static struct tuner_params tuner_panason
},
 };
 
-/*  TUNER_LG_NTSC_TAPE - LGINNOTEK NTSC  */
-
-static struct tuner_params tuner_lg_ntsc_tape_params[] = {
-   {
-   .type   = TUNER_PARAM_TYPE_NTSC,
-   .ranges = tuner_fm1236_mk3_ntsc_ranges,
-   .count  = ARRAY_SIZE(tuner_fm1236_mk3_ntsc_ranges),
-   },
-};
-
 /*  TUNER_TNF_8831BGFF - Philips PAL  */
 
 static struct tuner_range tuner_tnf_8831bgff_pal_ranges[] = {
@@ -1331,8 +1321,8 @@ struct tunertype tuners[] = {
},
[TUNER_LG_NTSC_TAPE] = { /* LGINNOTEK NTSC */
.name   = "LG NTSC (TAPE series)",
-   .params = tuner_lg_ntsc_tape_params,
-   .count  = ARRAY_SIZE(tuner_lg_ntsc_tape_params),
+   .params = tuner_fm1236_mk3_params,
+   .count  = ARRAY_SIZE(tuner_fm1236_mk3_params),
},
[TUNER_TNF_8831BGFF] = { /* Philips PAL */
.name   = "Tenna TNF 8831 BGFF)",

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 16/24] DVB: lgdt330x: fix signal / lock status detection bug

2.6.18-stable review patch.  If anyone has any objections, please let us know.
--

From: Michael Krufky <[EMAIL PROTECTED]>

In some cases when using VSB, the AGC status register has been known to
falsely report "no signal" when in fact there is a carrier lock.  The
datasheet labels these status flags as QAM only, yet the lgdt330x
module is using these flags for both QAM and VSB.

This patch allows for the carrier recovery lock status register to be
tested, even if the agc signal status register falsely reports no signal.

Thanks to jcrews from #linuxtv in irc, for initially reporting this bug.

Signed-off-by: Michael Krufky <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>

---

 drivers/media/dvb/frontends/lgdt330x.c |6 --
 1 file changed, 6 deletions(-)

--- linux-2.6.18.5.orig/drivers/media/dvb/frontends/lgdt330x.c
+++ linux-2.6.18.5/drivers/media/dvb/frontends/lgdt330x.c
@@ -435,9 +435,6 @@ static int lgdt3302_read_status(struct d
/* Test signal does not exist flag */
/* as well as the AGC lock flag.   */
*status |= FE_HAS_SIGNAL;
-   } else {
-   /* Without a signal all other status bits are meaningless */
-   return 0;
}
 
/*
@@ -500,9 +497,6 @@ static int lgdt3303_read_status(struct d
/* Test input signal does not exist flag */
/* as well as the AGC lock flag.   */
*status |= FE_HAS_SIGNAL;
-   } else {
-   /* Without a signal all other status bits are meaningless */
-   return 0;
}
 
/* Carrier Recovery Lock Status Register */

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 19/24] ARM: Add sys_*at syscalls

2.6.18-stable review patch.  If anyone has any objections, please let us know.
--

From: Russell King <[EMAIL PROTECTED]>

Later glibc requires the *at syscalls.  Add them.

Signed-off-by: Russell King <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
---
 arch/arm/kernel/calls.S  |   13 +
 include/asm-arm/unistd.h |   13 +
 2 files changed, 26 insertions(+)

bca0b8e75f6b7cf52cf52c967286b72d84f9b37e
--- linux-2.6.18.5.orig/arch/arm/kernel/calls.S
+++ linux-2.6.18.5/arch/arm/kernel/calls.S
@@ -331,6 +331,19 @@
CALL(sys_mbind)
 /* 320 */  CALL(sys_get_mempolicy)
CALL(sys_set_mempolicy)
+   CALL(sys_openat)
+   CALL(sys_mkdirat)
+   CALL(sys_mknodat)
+/* 325 */  CALL(sys_fchownat)
+   CALL(sys_futimesat)
+   CALL(sys_fstatat64)
+   CALL(sys_unlinkat)
+   CALL(sys_renameat)
+/* 330 */  CALL(sys_linkat)
+   CALL(sys_symlinkat)
+   CALL(sys_readlinkat)
+   CALL(sys_fchmodat)
+   CALL(sys_faccessat)
 #ifndef syscalls_counted
 .equ syscalls_padding, ((NR_syscalls + 3) & ~3) - NR_syscalls
 #define syscalls_counted
--- linux-2.6.18.5.orig/include/asm-arm/unistd.h
+++ linux-2.6.18.5/include/asm-arm/unistd.h
@@ -347,6 +347,19 @@
 #define __NR_mbind (__NR_SYSCALL_BASE+319)
 #define __NR_get_mempolicy (__NR_SYSCALL_BASE+320)
 #define __NR_set_mempolicy (__NR_SYSCALL_BASE+321)
+#define __NR_openat(__NR_SYSCALL_BASE+322)
+#define __NR_mkdirat   (__NR_SYSCALL_BASE+323)
+#define __NR_mknodat   (__NR_SYSCALL_BASE+324)
+#define __NR_fchownat  (__NR_SYSCALL_BASE+325)
+#define __NR_futimesat (__NR_SYSCALL_BASE+326)
+#define __NR_fstatat64 (__NR_SYSCALL_BASE+327)
+#define __NR_unlinkat  (__NR_SYSCALL_BASE+328)
+#define __NR_renameat  (__NR_SYSCALL_BASE+329)
+#define __NR_linkat(__NR_SYSCALL_BASE+330)
+#define __NR_symlinkat (__NR_SYSCALL_BASE+331)
+#define __NR_readlinkat(__NR_SYSCALL_BASE+332)
+#define __NR_fchmodat  (__NR_SYSCALL_BASE+333)
+#define __NR_faccessat (__NR_SYSCALL_BASE+334)
 
 /*
  * The following SWIs are ARM private.

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.20-rc1

2006-12-14 Thread Alistair John Strachan

On Friday 15 December 2006 00:48, Alistair John Strachan wrote:
> On Thursday 14 December 2006 21:20, Jens Axboe wrote:
> > On Thu, Dec 14 2006, Alistair John Strachan wrote:
> > > Hi Jens,
> > >
> > > On Thursday 14 December 2006 20:48, Jens Axboe wrote:
> > > > On Thu, Dec 14 2006, Jens Axboe wrote:
> > > > > > I'll do that if nobody comes up with anything obvious.
> > > > >
> > > > > If you can just test 2.6.19-git1, then we'll know if it's the SG_IO
> > > > > patch again.
> > > >
> > > > Actually, you should test 2.6.19-git1 with this patch applied as
> > > > well.
> > >
> > > 2.6.19-git1 with FUJITA Tomonori's bio-leak fix doesn't break, and
> > > hddtemp continues to work fine:
> > >
> > > [root] 21:10 [~] hddtemp /dev/sda /dev/sdb /dev/sdc /dev/sdd
> > > /dev/sda: WDC WD2500KS-00MJB0: 29°C
> > > /dev/sdb: WDC WD2500KS-00MJB0: 27°C
> > > /dev/sdc: Maxtor 6B200M0: 28°C
> > > /dev/sdd: Maxtor 6B200M0: 26°C
> > >
> > > I've added the strace results to the URL previously posted, with the
> > > config.
> >
> > Then it is likely the sata updates, SG_IO is off the hook.
>
> I bisected all the way down to 0e75f9063f5c55fb0b0b546a7c356f8ec186825e,
> which git reckons is the culprit. I wasn't able to revert this commit to
> test, because it has conflicts.

Whatever the change is, it's subtle. I don't see the problem in git1+patch, 
but I know this kernel _includes_ this changeset.

In total isolation, v2.6.19..0e75f9063f5c55fb0b0b546a7c356f8ec186825e it 
breaks. Reverting just 0e75f9063f5c55fb0b0b546a7c356f8ec186825e, it works 
again.

So I think this is the source, but I can't explain why it "goes away" before 
git1 and "comes back" before 2.6.20-rc1.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 22/24] m32r: make userspace headers platform-independent

2.6.18-stable review patch.  If anyone has any objections, please let us know.
--

From: Hirokazu Takata <[EMAIL PROTECTED]>

The m32r kernel 2.6.18-rc1 or after cause build errors of "unknown isa
configuration" for userspace application programs, such as glibc, gdb, etc.

This is because the recent kernel do not include linux/config.h not to expose
kernel headers for userspace.

To fix the above compile errors, this patch fixes two headers ptrace.h and
sigcontext.h for m32r and makes them platform-independent.

Signed-off-by: Hirokazu Takata <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
---

 arch/m32r/kernel/entry.S  |   65 ++
 include/asm-m32r/ptrace.h |   28 ++
 include/asm-m32r/sigcontext.h |   13 +---
 3 files changed, 35 insertions(+), 71 deletions(-)

--- linux-2.6.18.5.orig/arch/m32r/kernel/entry.S
+++ linux-2.6.18.5/arch/m32r/kernel/entry.S
@@ -23,35 +23,35 @@
  * updated in fork.c:copy_thread, signal.c:do_signal,
  * ptrace.c and ptrace.h
  *
- * M32Rx/M32R2 M32R
- *   @(sp)  - r4   ditto
- *   @(0x04,sp) - r5   ditto
- *   @(0x08,sp) - r6   ditto
- *   @(0x0c,sp) - *pt_regs ditto
- *   @(0x10,sp) - r0   ditto
- *   @(0x14,sp) - r1   ditto
- *   @(0x18,sp) - r2   ditto
- *   @(0x1c,sp) - r3   ditto
- *   @(0x20,sp) - r7   ditto
- *   @(0x24,sp) - r8   ditto
- *   @(0x28,sp) - r9   ditto
- *   @(0x2c,sp) - r10  ditto
- *   @(0x30,sp) - r11  ditto
- *   @(0x34,sp) - r12  ditto
- *   @(0x38,sp) - syscall_nr   ditto
- *   @(0x3c,sp) - acc0h@(0x3c,sp) - acch
- *   @(0x40,sp) - acc0l@(0x40,sp) - accl
- *   @(0x44,sp) - acc1h@(0x44,sp) - dummy_acc1h
- *   @(0x48,sp) - acc1l@(0x48,sp) - dummy_acc1l
- *   @(0x4c,sp) - psw  ditto
- *   @(0x50,sp) - bpc  ditto
- *   @(0x54,sp) - bbpswditto
- *   @(0x58,sp) - bbpc ditto
- *   @(0x5c,sp) - spu (cr3)ditto
- *   @(0x60,sp) - fp (r13) ditto
- *   @(0x64,sp) - lr (r14) ditto
- *   @(0x68,sp) - spi (cr2)ditto
- *   @(0x6c,sp) - orig_r0  ditto
+ * M32R/M32Rx/M32R2
+ *   @(sp)  - r4
+ *   @(0x04,sp) - r5
+ *   @(0x08,sp) - r6
+ *   @(0x0c,sp) - *pt_regs
+ *   @(0x10,sp) - r0
+ *   @(0x14,sp) - r1
+ *   @(0x18,sp) - r2
+ *   @(0x1c,sp) - r3
+ *   @(0x20,sp) - r7
+ *   @(0x24,sp) - r8
+ *   @(0x28,sp) - r9
+ *   @(0x2c,sp) - r10
+ *   @(0x30,sp) - r11
+ *   @(0x34,sp) - r12
+ *   @(0x38,sp) - syscall_nr
+ *   @(0x3c,sp) - acc0h
+ *   @(0x40,sp) - acc0l
+ *   @(0x44,sp) - acc1h; ISA_DSP_LEVEL2 only
+ *   @(0x48,sp) - acc1l; ISA_DSP_LEVEL2 only
+ *   @(0x4c,sp) - psw
+ *   @(0x50,sp) - bpc
+ *   @(0x54,sp) - bbpsw
+ *   @(0x58,sp) - bbpc
+ *   @(0x5c,sp) - spu (cr3)
+ *   @(0x60,sp) - fp (r13)
+ *   @(0x64,sp) - lr (r14)
+ *   @(0x68,sp) - spi (cr2)
+ *   @(0x6c,sp) - orig_r0
  */
 
 #include 
@@ -95,17 +95,10 @@
 #define R11(reg)   @(0x30,reg)
 #define R12(reg)   @(0x34,reg)
 #define SYSCALL_NR(reg)@(0x38,reg)
-#if defined(CONFIG_ISA_M32R2) && defined(CONFIG_ISA_DSP_LEVEL2)
 #define ACC0H(reg) @(0x3C,reg)
 #define ACC0L(reg) @(0x40,reg)
 #define ACC1H(reg) @(0x44,reg)
 #define ACC1L(reg) @(0x48,reg)
-#elif defined(CONFIG_ISA_M32R2) || defined(CONFIG_ISA_M32R)
-#define ACCH(reg)  @(0x3C,reg)
-#define ACCL(reg)  @(0x40,reg)
-#else
-#error unknown isa configuration
-#endif
 #define PSW(reg)   @(0x4C,reg)
 #define BPC(reg)   @(0x50,reg)
 #define BBPSW(reg) @(0x54,reg)
--- linux-2.6.18.5.orig/include/asm-m32r/ptrace.h
+++ linux-2.6.18.5/include/asm-m32r/ptrace.h
@@ -33,21 +33,10 @@
 #define PT_R15 PT_SP
 
 /* processor status and miscellaneous context registers.  */
-#if defined(CONFIG_ISA_M32R2) && defined(CONFIG_ISA_DSP_LEVEL2)
 #define PT_ACC0H   15
 #define PT_ACC0L   16
-#define PT_ACC1H   17
-#define PT_ACC1L   18
-#define PT_ACCHPT_ACC0H
-#define PT_ACCLPT_ACC0L
-#elif defined(CONFIG_ISA_M32R2) || defined(CONFIG_ISA_M32R)
-#define PT_ACCH15
-#define PT_ACCL16
-#define PT_DUMMY_ACC1H 17
-#define PT_DUMMY_ACC1L 18
-#else
-#error unknown isa conifiguration
-#endif
+#define PT_ACC1H   17  /* ISA_DSP_LEVEL2 only */
+#define PT_ACC1L   18  /* ISA_DSP_LE

[patch 13/24] IPSEC: Fix inetpeer leak in ipv4 xfrm dst entries.

2.6.18-stable review patch.  If anyone has any objections, please let us know.
--

From: David Miller <[EMAIL PROTECTED]>

We grab a reference to the route's inetpeer entry but
forget to release it in xfrm4_dst_destroy().

Bug discovered by Kazunori MIYAZAWA <[EMAIL PROTECTED]>

Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
---
commit 26db167702756d0022f8ea5f1f30cad3018cfe31
Author: David S. Miller <[EMAIL PROTECTED]>
Date:   Wed Dec 6 23:45:15 2006 -0800

 net/ipv4/xfrm4_policy.c |2 ++
 1 file changed, 2 insertions(+)

--- linux-2.6.18.5.orig/net/ipv4/xfrm4_policy.c
+++ linux-2.6.18.5/net/ipv4/xfrm4_policy.c
@@ -252,6 +252,8 @@ static void xfrm4_dst_destroy(struct dst
 
if (likely(xdst->u.rt.idev))
in_dev_put(xdst->u.rt.idev);
+   if (likely(xdst->u.rt.peer))
+   inet_putpeer(xdst->u.rt.peer);
xfrm_dst_destroy(xdst);
 }
 

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 15/24] bonding: incorrect bonding state reported via ioctl

2.6.18-stable review patch.  If anyone has any objections, please let us know.
--

From: Andy Gospodarek <[EMAIL PROTECTED]>

This is a small fix-up to finish out the work done by Jay Vosburgh to
add carrier-state support for bonding devices.  The output in
/proc/net/bonding/bondX was correct, but when collecting the same info
via an iotcl it could still be incorrect.

Signed-off-by: Andy Gospodarek <[EMAIL PROTECTED]>
Cc: Jeff Garzik <[EMAIL PROTECTED]>
Cc: Stephen Hemminger <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
---

 drivers/net/bonding/bond_main.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-2.6.18.5.orig/drivers/net/bonding/bond_main.c
+++ linux-2.6.18.5/drivers/net/bonding/bond_main.c
@@ -3547,7 +3547,7 @@ static int bond_do_ioctl(struct net_devi
mii->val_out = 0;
read_lock_bh(&bond->lock);
read_lock(&bond->curr_slave_lock);
-   if (bond->curr_active_slave) {
+   if (netif_carrier_ok(bond->dev)) {
mii->val_out = BMSR_LSTATUS;
}
read_unlock(&bond->curr_slave_lock);

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 07/24] dm crypt: Fix data corruption with dm-crypt over RAID5

2.6.18-stable review patch.  If anyone has any objections, please let us know.
--

From: Christophe Saout <[EMAIL PROTECTED]>

Fix corruption issue with dm-crypt on top of software raid5. Cancelled
readahead bio's that report no error, just have BIO_UPTODATE cleared
were reported as successful reads to the higher layers (and leaving
random content in the buffer cache). Already fixed in 2.6.19.

Signed-off-by: Christophe Saout <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
---
 drivers/md/dm-crypt.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

--- linux-2.6.18.5.orig/drivers/md/dm-crypt.c
+++ linux-2.6.18.5/drivers/md/dm-crypt.c
@@ -717,13 +717,15 @@ static int crypt_endio(struct bio *bio, 
if (bio->bi_size)
return 1;
 
+   if (!bio_flagged(bio, BIO_UPTODATE) && !error)
+   error = -EIO;
+
bio_put(bio);
 
/*
 * successful reads are decrypted by the worker thread
 */
-   if ((bio_data_dir(bio) == READ)
-   && bio_flagged(bio, BIO_UPTODATE)) {
+   if (bio_data_dir(io->bio) == READ && !error) {
kcryptd_queue_io(io);
return 0;
}

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 12/24] dm snapshot: fix freeing pending exception

2.6.18-stable review patch.  If anyone has any objections, please let us know.
--

From: Milan Broz <[EMAIL PROTECTED]>

Fix oops when removing full snapshot
kernel bugzilla bug 7040

If a snapshot became invalid (full) while there is outstanding 
pending_exception, pending_complete() forgets to remove
the corresponding exception from its exception table before freeing it.

Already fixed in 2.6.19.

Signed-off-by: Milan Broz <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
---
 drivers/md/dm-snap.c |1 +
 1 file changed, 1 insertion(+)

--- linux-2.6.18.5.orig/drivers/md/dm-snap.c
+++ linux-2.6.18.5/drivers/md/dm-snap.c
@@ -691,6 +691,7 @@ static void pending_complete(struct pend
 
free_exception(e);
 
+   remove_exception(&pe->e);
error_snapshot_bios(pe);
goto out;
}

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 00/24] -stable review

This is the start of the stable review cycle for the 2.6.18.6 release.
There are 24 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let us know.  If anyone is a maintainer of the proper subsystem, and
wants to add a Signed-off-by: line to the patch, please respond with it.

These patches are sent out with a number of different people on the
Cc: line.  If you wish to be a reviewer, please email [EMAIL PROTECTED]
to add your name to the list.  If you want to be off the reviewer list,
also email us.

Responses should be made by Sun Dec 17 01:30 UTC.  Anything received
after that time might be too late.

thanks,

the -stable release team
--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 01/24] softmac: remove netif_tx_disable when scanning

2.6.18-stable review patch.  If anyone has any objections, please let us know.
--

From: Michael Buesch <[EMAIL PROTECTED]>

In the scan section of ieee80211softmac, network transmits are disabled.
When SoftMAC re-enables transmits, it may override the wishes of a driver
that may have very good reasons for disabling transmits. At least one failure
in bcm43xx can be traced to this problem. In addition, several unexplained
problems may arise from the unexpected enabling of transmits.

Signed-off-by: Michael Buesch <[EMAIL PROTECTED]>
Signed-off-by: Larry Finger <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
---
 net/ieee80211/softmac/ieee80211softmac_scan.c |2 --
 1 file changed, 2 deletions(-)

--- linux-2.6.18.5.orig/net/ieee80211/softmac/ieee80211softmac_scan.c
+++ linux-2.6.18.5/net/ieee80211/softmac/ieee80211softmac_scan.c
@@ -47,7 +47,6 @@ ieee80211softmac_start_scan(struct ieee8
sm->scanning = 1;
spin_unlock_irqrestore(&sm->lock, flags);
 
-   netif_tx_disable(sm->ieee->dev);
ret = sm->start_scan(sm->dev);
if (ret) {
spin_lock_irqsave(&sm->lock, flags);
@@ -248,7 +247,6 @@ void ieee80211softmac_scan_finished(stru
if (net)
sm->set_channel(sm->dev, net->channel);
}
-   netif_wake_queue(sm->ieee->dev);
ieee80211softmac_call_events(sm, IEEE80211SOFTMAC_EVENT_SCAN_FINISHED, 
NULL);
 }
 EXPORT_SYMBOL_GPL(ieee80211softmac_scan_finished);

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 10/24] SUNHME: Fix for sunhme failures on x86

2.6.18-stable review patch.  If anyone has any objections, please let us know.
--

From: Jurij Smakov <[EMAIL PROTECTED]>

The following patch fixes the failure of sunhme drivers on x86 hosts
due to missing pci_enable_device() and pci_set_master() calls, lost
during code refactoring. It has been filed as bugzilla bug #7502 [0]
and Debian bug #397460 [1].

[0] http://bugzilla.kernel.org/show_bug.cgi?id=7502
[1] http://bugs.debian.org/397460

Signed-off-by: Jurij Smakov <[EMAIL PROTECTED]>
Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
Signed-off-by: Chris Wright <[EMAIL PROTECTED]>
---
 drivers/net/sunhme.c |5 +
 1 file changed, 5 insertions(+)

--- linux-2.6.18.5.orig/drivers/net/sunhme.c
+++ linux-2.6.18.5/drivers/net/sunhme.c
@@ -3012,6 +3012,11 @@ static int __devinit happy_meal_pci_prob
 #endif
 
err = -ENODEV;
+
+   if (pci_enable_device(pdev))
+   goto err_out;
+   pci_set_master(pdev);
+
if (!strcmp(prom_name, "SUNW,qfe") || !strcmp(prom_name, "qfe")) {
qp = quattro_pci_find(pdev);
if (qp == NULL)

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/24] -stable review

patch roll-up is available at:


http://www.kernel.org/pub/linux/kernel/people/chrisw/stable/patch-2.6.18.6-rc1.{gz,bz2}

once mirroring has completed.

thanks,
-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/4] Add

On 12/14/06, Randy Dunlap <[EMAIL PROTECTED]> wrote:

> - (void) do_syslog(0,NULL,0);
> + (void) do_syslog(KLOG_CLOSE,NULL,0);

Please use a space after the commas (even though you just left it
as it already was).

Will change for the next revision.

zw
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/4] permission mapping for sys_syslog operations

On 12/14/06, Randy Dunlap <[EMAIL PROTECTED]> wrote:

> +#define security_syslog_or_fail(type) do {   \
> + int error = security_syslog(type);  \
> + if (error)  \
> + return error;   \
> + } while (0)
> +

From Documentation/CodingStyle:

Things to avoid when using macros:

1) macros that affect control flow: ...

It says "avoid", not "never use".  If you can think of another way to
code this function that won't completely obscure the actual operations
with the security checks, I will be happy to change it.

zw
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems

2006-12-14 Thread Nikolai Joukov

> Well, Congratulations, Doctor!!  [Must be nice to be exiled to Stony
> Brook!!  Oh, well, not I]

Long Island is a very nice place with lots of vineries and perfect sand
beaches - don't envy :-)

> Here's hoping that source exists, and that it is available for us.

I guess, you are subscribed to the linux-raid list only.  Unfortunately, I
didn't CC my post to that list and one of the replies was CC'd there
without the link.  The original post is available here:

  

And the link to the sources is:

  

Nikolai.
-
Nikolai Joukov, Ph.D.
Filesystems and Storage Laboratory
Stony Brook University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Problem in EHCI 2.6.15

2006-12-14 Thread Conio sandiago


Hi all,
I am working on a ARM based SOV and i am tryig to add glue logic for
EHCI controller.
But i am facing some problem.

I want to know ,are there any known problem in 2.6.15?
Thanks
Ashwini
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.20-rc1

2006-12-14 Thread Robert Hancock


Alistair John Strachan wrote:
I bisected all the way down to 0e75f9063f5c55fb0b0b546a7c356f8ec186825e, which 
git reckons is the culprit. I wasn't able to revert this commit to test, 
because it has conflicts.


Any ideas?


That would be this one I assume?

[PATCH] block: support larger block pc requests

author  Mike Christie <[EMAIL PROTECTED]>
Fri, 1 Dec 2006 09:40:55 + (10:40 +0100)
committer   Jens Axboe <[EMAIL PROTECTED]>
Fri, 1 Dec 2006 09:40:55 + (10:40 +0100)
commit  0e75f9063f5c55fb0b0b546a7c356f8ec186825e
treedb138f641175403546c2147def4b405f3ff453a8
parent  ad2d7225709b11da47e092634cbdf0591829ae9c
[PATCH] block: support larger block pc requests

This patch modifies blk_rq_map/unmap_user() and the cdrom and scsi_ioctl.c
users so that it supports requests larger than bio by chaining them 
together.


Signed-off-by: Mike Christie <[EMAIL PROTECTED]>
Signed-off-by: Jens Axboe <[EMAIL PROTECTED]>

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git patches] libata updates

>  
> +config PATA_IT8213
> + tristate "IT8213 PATA support (Experimental)"
> + depends on PCI && EXPERIMENTAL
> + help
> +   This option enables support for the ITE 821 PATA

Typo (IT8213) - probably my fault but only just noticed it


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/4] permission mapping for sys_syslog operations

2006-12-14 Thread Randy Dunlap

On Thu, 14 Dec 2006 16:16:41 -0800 Zack Weinberg wrote:

> As suggested by Stephen Smalley: map the various sys_syslog operations
> to a smaller set of privilege codes before calling security modules.
> This patch changes the security module interface!  There should be no
> change in the actual security semantics enforced by dummy, capability,
> nor SELinux (with one exception, clearly marked in sys_syslog).
> 
> Change from previous version of patch: the privilege codes are now
> in linux/security.h instead of linux/klog.h, and use the LSM_* naming
> convention used for other such constants in that file.
> 
> 
> Index: linux-2.6/kernel/printk.c
> ===
> --- linux-2.6.orig/kernel/printk.c2006-12-13 16:06:22.0 -0800
> +++ linux-2.6/kernel/printk.c 2006-12-13 16:08:30.0 -0800
> @@ -164,6 +164,12 @@
>  
>  __setup("log_buf_len=", log_buf_len_setup);
>  
> +#define security_syslog_or_fail(type) do {   \
> + int error = security_syslog(type);  \
> + if (error)  \
> + return error;   \
> + } while (0)
> +

>From Documentation/CodingStyle:

Things to avoid when using macros:

1) macros that affect control flow: ...


>  /* See linux/klog.h for the command numbers passed as the first argument.  */
>  int do_syslog(int type, char __user *buf, int len)
>  {
> @@ -172,16 +178,15 @@
>   char c;
>   int error = 0;
>  
> - error = security_syslog(type);
> - if (error)
> - return error;
> -
>   switch (type) {
>   case KLOG_CLOSE:
> + security_syslog_or_fail(LSM_KLOG_READ);
>   break;
>   case KLOG_OPEN:
> + security_syslog_or_fail(LSM_KLOG_READ);
>   break;
>   case KLOG_READ:
> + security_syslog_or_fail(LSM_KLOG_READ);
>   error = -EINVAL;
>   if (!buf || len < 0)
>   goto out;
> @@ -213,9 +218,11 @@
>   error = i;
>   break;
>   case KLOG_READ_CLEAR_HIST:
> + security_syslog_or_fail(LSM_KLOG_CLEARHIST);
>   do_clear = 1;
>   /* FALL THRU */
>   case KLOG_READ_HIST:
> + security_syslog_or_fail(LSM_KLOG_READHIST);
>   error = -EINVAL;
>   if (!buf || len < 0)
>   goto out;
> @@ -269,15 +276,19 @@
>   }
>   break;
>   case KLOG_CLEAR_HIST:
> + security_syslog_or_fail(LSM_KLOG_CLEARHIST);
>   logged_chars = 0;
>   break;
>   case KLOG_DISABLE_CONSOLE:
> + security_syslog_or_fail(LSM_KLOG_CONSOLE);
>   console_loglevel = minimum_console_loglevel;
>   break;
>   case KLOG_ENABLE_CONSOLE:
> + security_syslog_or_fail(LSM_KLOG_CONSOLE);
>   console_loglevel = default_console_loglevel;
>   break;
>   case KLOG_SET_CONSOLE_LVL:
> + security_syslog_or_fail(LSM_KLOG_CONSOLE);
>   error = -EINVAL;
>   if (len < 1 || len > 8)
>   goto out;
> @@ -287,9 +298,18 @@
>   error = 0;
>   break;
>   case KLOG_GET_UNREAD:
> + security_syslog_or_fail(LSM_KLOG_READ);
>   error = log_end - log_start;
>   break;


---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] procfs: Fix race between proc_readdir and remove_proc_entry

2006-12-14 Thread Darrick J. Wong

Hi,

While running a insmod/rmmod loop with the mptsas driver (vanilla
2.6.19, IBM Intellistation Z30, SAS1064E controller if it matters),
I encountered the following messages from the kernel:

[53092.441412] general protection fault:  [1] PREEMPT SMP 
[53092.447058] CPU 4 
[53092.449108] Modules linked in: mptbase scsi_transport_sas ext2 ext3 jbd 
mbcache nbd acpi_cpufreq processor cpufreq_userspace cpufreq_stats
 cpufreq_powersave cpufreq_ondemand freq_table cpufreq_conservative dm_mod 
md_mod ipv6 fuse ata_generic sg sd_mod ata_piix libata mousedev ts
dev serio_raw evdev floppy rtc snd_hda_intel snd_hda_codec snd_pcm_oss 
snd_mixer_oss snd_pcm ohci1394 generic ieee1394 piix scsi_mod snd_time
r ide_core ehci_hcd uhci_hcd snd usbcore soundcore snd_page_alloc shpchp 
pci_hotplug unix
[53092.495003] Pid: 570, comm: udevd Not tainted 2.6.19-dic64 #6
[53092.500753] RIP: 0010:[]  [] 
proc_readdir+0x110/0x186
[53092.508968] RSP: 0018:8100be829e78  EFLAGS: 00010246
[53092.514289] RAX:  RBX: 6b6b6b6b6b6b6b6b RCX: 2218b2b5
[53092.521429] RDX: 801fafc5 RSI: 0001 RDI: 810092cf7e48
[53092.528564] RBP: 8100be829eb8 R08: 0002 R09: 
[53092.535700] R10: 810092cf7e48 R11: 0028 R12: 810005934a08
[53092.542836] R13: 0002 R14: 810092cf7e48 R15: 8013b9e6
[53092.549972] FS:  2b19f2a14d70() GS:8100059b3898() 
knlGS:
[53092.558067] CS:  0010 DS:  ES:  CR0: 8005003b
[53092.563817] CR2: 0051108c CR3: bedb9000 CR4: 06e0
[53092.570954] Process udevd (pid: 570, threadinfo 8100be828000, task 
8100befc0080)
[53092.579048] Stack:  0246 8100be829f38 80474ea0 

[53092.587179]  810092cf7e48 8013b9e6 8100be829f38 
8013b9e6
[53092.594679]  8100be829ee8 8014e93e 810092cf7e48 
81000593a8a8
[53092.601971] Call Trace:
[53092.604644]  [] proc_root_readdir+0x32/0x68
[53092.610397]  [] vfs_readdir+0x65/0x9a
[53092.615628]  [] sys_getdents64+0x7a/0xc1
[53092.621123]  [] system_call+0x7e/0x83
[53092.627195] DWARF2 unwinder stuck at system_call+0x7e/0x83
[53092.632681] 
[53092.634189] Leftover inexact backtrace:
[53092.634191] 
[53092.643162] 
[53092.644663] 
[53092.644665] Code: 44 8b 4b 10 0f b7 53 04 44 8b 03 49 8b 4e 38 48 8b 73 08 
48 
[53092.653935] RIP  [] proc_readdir+0x110/0x186
[53092.659798]  RSP 

The slab poison value in %rbx is suspicious, so I dug into the
relevant code:

0x801fafc5 :  mov0x10(%rbx),%r9d
0x801fafc9 :  movzwl 0x4(%rbx),%edx
0x801fafcd :  mov(%rbx),%r8d
0x801fafd0 :  mov0x38(%r14),%rcx
0x801fafd4 :  mov0x8(%rbx),%rsi
0x801fafd8 :  mov0xffc8(%rbp),%rdi
0x801fafdc :  shr$0xc,%r9d
0x801fafe0 :  callq  *%r15

This corresponds to this code in proc_readdir near
fs/proc/generic.c:480.  It looks like %rbx corresponds to the
"de" pointer:

spin_unlock(&proc_subdir_lock);
if (filldir(dirent, de->name, de->namelen, filp->f_pos,
de->low_ino, de->mode >> 12) < 0)
goto out;
spin_lock(&proc_subdir_lock);
filp->f_pos++;
de = de->next;

I believe what's happening here is that proc_readdir drops
proc_subdir_lock to call filldir() on the /proc/mpt directory
at the same time mptbase is being unloaded.  The unload causes
the removal of /proc/mpt, which means that de is overwritten
with the slab poison value as it is being freed.  We reacquire
the lock and try to grab the next value of de, but by then the
next pointer has been lost, and we crash.

I think an acceptable fix is to de_get() the proc_dir_entry
count before the unlock and de_put() it after the unlock.

Signed-off-by: Darrick J. Wong <[EMAIL PROTECTED]>
---

 fs/proc/generic.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index 4ba0300..7e77d7e 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -429,7 +429,7 @@ struct dentry *proc_lookup(struct inode 
 int proc_readdir(struct file * filp,
void * dirent, filldir_t filldir)
 {
-   struct proc_dir_entry * de;
+   struct proc_dir_entry * de, *next;
unsigned int ino;
int i;
struct inode *inode = filp->f_dentry->d_inode;
@@ -477,13 +477,16 @@ int proc_readdir(struct file * filp,
 
do {
/* filldir passes info to user space */
+   de_get(de);
spin_unlock(&proc_subdir_lock);
if (filldir(dirent, de->name, de->namelen, 
filp->f_pos,
de->low_ino, de->mode >> 12) < 0)
goto out;
spin_lock(&proc_subdir_lock);
filp->f_pos++;
-

Re: 2.6.19-git20 cciss: cmd f7b00000 timedout

2006-12-14 Thread dann frazier

On Thu, Dec 14, 2006 at 04:16:39PM -0600, Miller, Mike (OS Dev) wrote:
> H. Dann, did you see this on 32-bit Debian?

yep - all reports I've seen so far are on i386



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Abolishing the DMCA (was GPL only modules)

> The best ways to get rich corporations on our side in fighting the
> DMCA is to use the DMCA to hurt their profits. Companies that rely on
> binary drivers would have several options:
> 
> 1) Lobby politicians to repeal the DMCA, 

They already are. The tech industry is mostly anti DMCA and there are
plenty of deeply proprietary companies who fought against the DMCA, are
fighting the US broadcast flag idiocy and so on. So you'd be fighting the
wrong people.

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/4] Add

2006-12-14 Thread Randy Dunlap

On Thu, 14 Dec 2006 16:16:40 -0800 Zack Weinberg wrote:

> This patch introduces  with symbolic constants for the
> various sys_syslog() opcodes, and changes all in-kernel references to
> those opcodes to use the constants.  The header is added to the set of
> user/kernel interface headers.  (Unlike the previous revision of this
> patch series, no kernel-private additions to this file are contemplated.)

Hi Zack,

This patch looks good except for one nit:

> --- linux-2.6.orig/fs/proc/kmsg.c 2006-12-13 15:53:29.0 -0800
> +++ linux-2.6/fs/proc/kmsg.c  2006-12-13 16:04:46.0 -0800
> @@ -21,27 +22,28 @@
>  
>  static int kmsg_open(struct inode * inode, struct file * file)
>  {
> - return do_syslog(1,NULL,0);
> + return do_syslog(KLOG_OPEN,NULL,0);
>  }
>  
>  static int kmsg_release(struct inode * inode, struct file * file)
>  {
> - (void) do_syslog(0,NULL,0);
> + (void) do_syslog(KLOG_CLOSE,NULL,0);
>   return 0;
>  }

Please use a space after the commas (even though you just left it
as it already was).

---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kref refcnt and false positives

2006-12-14 Thread Eric W. Biederman

Andrew Morton <[EMAIL PROTECTED]> writes:

> Guys, we have about 100 reports of weirdo
> crashes, smashes, bashes and splats in the kref code.  The last thing we
> need is some obscure, tricksy little optimisation which leads legitimate
> uses of the API to mysteriously fail.  
>
> If we are allocating and freeing kref-counted objects at a sufficiently
> high frequency for this thing to make a difference then we should fix that
> instead of trying to suck faster.

Agreed. Correct code maintenance certainly trumps performance.

For the same reason someone reusing the data structure shouldn't
assume the kref code left it in any particular state.

So both sides should be liberal in what they accept.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix help text for CONFIG_ATA_PIIX

> Thanks for clarifying Bill, and sorry Alan. ata_piix does indeed work 
> correctly. The help text is a bit confusing:

The help text is out of date - thanks that is a real bug 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.19-git19] BUG due to bad argument to ieee80211softmac_assoc_work

2006-12-14 Thread Larry Finger

Michael Bommarito wrote:

Hello Uli,
Yes, apologies, I had been waiting for an abandoned bugzilla entry
to get attention, and when I realized it was assigned to a dead-end, I
had simply posted the patch without checking for prior messages.
I was further confused by the fact that it hadn't made its way into
any of the 19-gitX sets (and for that matter, the window for
2.6.20-rc1 has come and gone and this still remains unfixed), despite
how clear the error was and how trivial the fix seems.

I was not aware that a bugzilla entry existed for this problem. I learned about it when my system
would hang on bootup if the bcm43xx card was installed. By bisection, I learned which commit was
causing the problem. About that time, the complete fix was discussed on the netdev and bcm43xx
mailing lists. I was a little perturbed that only part of the fix was accepted into 2.6.19-gitX.

The full fix was pushed to John Linville on Dec. 10, who pushed it on to Jeff Garzik on Dec. 11. I
have not yet seen any message sending it on to Andrew Morton or Linus.

A bug fix will always be accepted, particularly one that only changes 2 lines - it is only a new
feature that will no longer be accepted once the -rc1 stage is reached. If this message doesn't do
the trick and it isn't included by -rc2, I'll ping Jeff to see what happened. Changes always take
longer than one likes, but one needs to be careful.

Larry

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

[patch 1/4] Add

This patch introduces  with symbolic constants for the
various sys_syslog() opcodes, and changes all in-kernel references to
those opcodes to use the constants.  The header is added to the set of
user/kernel interface headers.  (Unlike the previous revision of this
patch series, no kernel-private additions to this file are contemplated.)

zw

Index: linux-2.6/include/linux/Kbuild
===
--- linux-2.6.orig/include/linux/Kbuild 2006-12-13 15:58:13.0 -0800
+++ linux-2.6/include/linux/Kbuild  2006-12-13 16:06:57.0 -0800
@@ -100,6 +100,7 @@
 header-y += ixjuser.h
 header-y += jffs2.h
 header-y += keyctl.h
+header-y += klog.h
 header-y += limits.h
 header-y += lock_dlm_plock.h
 header-y += magic.h
Index: linux-2.6/include/linux/klog.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6/include/linux/klog.h  2006-12-13 16:06:22.0 -0800
@@ -0,0 +1,26 @@
+#ifndef _LINUX_KLOG_H
+#define _LINUX_KLOG_H
+
+/*
+ * Constants for the first argument to the syslog() system call
+ * (aka klogctl()).  These numbers are part of the user space ABI!
+ */
+enum {
+   KLOG_CLOSE   =  0, /* close log */
+   KLOG_OPEN=  1, /* open log */
+   KLOG_READ=  2, /* read from log (klogd) */
+
+   KLOG_READ_HIST   =  3, /* read history of log messages (dmesg) */
+   KLOG_READ_CLEAR_HIST =  4, /* read and clear history */
+   KLOG_CLEAR_HIST  =  5, /* just clear history */
+
+   KLOG_DISABLE_CONSOLE =  6, /* disable printk to console */
+   KLOG_ENABLE_CONSOLE  =  7, /* enable printk to console */
+   KLOG_SET_CONSOLE_LVL =  8, /* set minimum severity of messages to be
+   * printed to console */
+
+   KLOG_GET_UNREAD  =  9, /* return number of unread characters */
+   KLOG_GET_SIZE= 10  /* return size of log buffer */
+};
+
+#endif /* klog.h */
Index: linux-2.6/kernel/printk.c
===
--- linux-2.6.orig/kernel/printk.c  2006-12-13 15:58:16.0 -0800
+++ linux-2.6/kernel/printk.c   2006-12-13 16:06:22.0 -0800
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -163,21 +164,7 @@
 
 __setup("log_buf_len=", log_buf_len_setup);
 
-/*
- * Commands to do_syslog:
- *
- * 0 -- Close the log.  Currently a NOP.
- * 1 -- Open the log. Currently a NOP.
- * 2 -- Read from the log.
- * 3 -- Read all messages remaining in the ring buffer.
- * 4 -- Read and clear all messages remaining in the ring buffer
- * 5 -- Clear ring buffer.
- * 6 -- Disable printk's to console
- * 7 -- Enable printk's to console
- * 8 -- Set level of messages printed to console
- * 9 -- Return number of unread characters in the log buffer
- * 10 -- Return size of the log buffer
- */
+/* See linux/klog.h for the command numbers passed as the first argument.  */
 int do_syslog(int type, char __user *buf, int len)
 {
unsigned long i, j, limit, count;
@@ -190,11 +177,11 @@
return error;
 
switch (type) {
-   case 0: /* Close log */
+   case KLOG_CLOSE:
break;
-   case 1: /* Open log */
+   case KLOG_OPEN:
break;
-   case 2: /* Read from log */
+   case KLOG_READ:
error = -EINVAL;
if (!buf || len < 0)
goto out;
@@ -225,10 +212,10 @@
if (!error)
error = i;
break;
-   case 4: /* Read/clear last kernel messages */
+   case KLOG_READ_CLEAR_HIST:
do_clear = 1;
/* FALL THRU */
-   case 3: /* Read last kernel messages */
+   case KLOG_READ_HIST:
error = -EINVAL;
if (!buf || len < 0)
goto out;
@@ -281,16 +268,16 @@
}
}
break;
-   case 5: /* Clear ring buffer */
+   case KLOG_CLEAR_HIST:
logged_chars = 0;
break;
-   case 6: /* Disable logging to console */
+   case KLOG_DISABLE_CONSOLE:
console_loglevel = minimum_console_loglevel;
break;
-   case 7: /* Enable logging to console */
+   case KLOG_ENABLE_CONSOLE:
console_loglevel = default_console_loglevel;
break;
-   case 8: /* Set level of messages printed to console */
+   case KLOG_SET_CONSOLE_LVL:
error = -EINVAL;
if (len < 1 || len > 8)
goto out;
@@ -299,10 +286,10 @@
console_loglevel = len;
error = 0;
break;
-   case 9: /* Number of ch

[patch 0/4] /proc/kmsg permissions, take three

Here's a re-revised version of my patch set to allow klogd to drop
privileges and continue reading from /proc/kmsg (currently, even if klogd
has a legitimately opened fd on /proc/kmsg, it cannot read from it unless
it has CAP_SYS_ADMIN asserted).  SELinux's pickier and finer-grained
privilege rules for /proc/kmsg are unchanged.

The major change from the previous patchset
[q.v. http://comments.gmane.org/gmane.linux.kernel/466034 ] is that,
as Arjan van de Ven requested, the new header linux/klog.h contains only
userspace-visible definitions (the constants for sys_syslog()).  Thanks to
Alexey Dobriyan for telling me the proper place to put the KLOGSEC_*
constants (now renamed LSM_KLOG_* in keeping with other such constants).
They have also been rediffed versus yesterday's git.  They should be
applied in sequence; each step compiles, and the complete set has been
booted and tested to work as intended.

Any comments, as usual, appreciated.  I would very much like to see this
in 2.6.20.

zw

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 2/4] permission mapping for sys_syslog operations

As suggested by Stephen Smalley: map the various sys_syslog operations
to a smaller set of privilege codes before calling security modules.
This patch changes the security module interface!  There should be no
change in the actual security semantics enforced by dummy, capability,
nor SELinux (with one exception, clearly marked in sys_syslog).

Change from previous version of patch: the privilege codes are now
in linux/security.h instead of linux/klog.h, and use the LSM_* naming
convention used for other such constants in that file.

zw


Index: linux-2.6/kernel/printk.c
===
--- linux-2.6.orig/kernel/printk.c  2006-12-13 16:06:22.0 -0800
+++ linux-2.6/kernel/printk.c   2006-12-13 16:08:30.0 -0800
@@ -164,6 +164,12 @@
 
 __setup("log_buf_len=", log_buf_len_setup);
 
+#define security_syslog_or_fail(type) do { \
+   int error = security_syslog(type);  \
+   if (error)  \
+   return error;   \
+   } while (0)
+
 /* See linux/klog.h for the command numbers passed as the first argument.  */
 int do_syslog(int type, char __user *buf, int len)
 {
@@ -172,16 +178,15 @@
char c;
int error = 0;
 
-   error = security_syslog(type);
-   if (error)
-   return error;
-
switch (type) {
case KLOG_CLOSE:
+   security_syslog_or_fail(LSM_KLOG_READ);
break;
case KLOG_OPEN:
+   security_syslog_or_fail(LSM_KLOG_READ);
break;
case KLOG_READ:
+   security_syslog_or_fail(LSM_KLOG_READ);
error = -EINVAL;
if (!buf || len < 0)
goto out;
@@ -213,9 +218,11 @@
error = i;
break;
case KLOG_READ_CLEAR_HIST:
+   security_syslog_or_fail(LSM_KLOG_CLEARHIST);
do_clear = 1;
/* FALL THRU */
case KLOG_READ_HIST:
+   security_syslog_or_fail(LSM_KLOG_READHIST);
error = -EINVAL;
if (!buf || len < 0)
goto out;
@@ -269,15 +276,19 @@
}
break;
case KLOG_CLEAR_HIST:
+   security_syslog_or_fail(LSM_KLOG_CLEARHIST);
logged_chars = 0;
break;
case KLOG_DISABLE_CONSOLE:
+   security_syslog_or_fail(LSM_KLOG_CONSOLE);
console_loglevel = minimum_console_loglevel;
break;
case KLOG_ENABLE_CONSOLE:
+   security_syslog_or_fail(LSM_KLOG_CONSOLE);
console_loglevel = default_console_loglevel;
break;
case KLOG_SET_CONSOLE_LVL:
+   security_syslog_or_fail(LSM_KLOG_CONSOLE);
error = -EINVAL;
if (len < 1 || len > 8)
goto out;
@@ -287,9 +298,18 @@
error = 0;
break;
case KLOG_GET_UNREAD:
+   security_syslog_or_fail(LSM_KLOG_READ);
error = log_end - log_start;
break;
case KLOG_GET_SIZE:
+   /* This one is allowed if you have _either_
+  LSM_KLOG_READ or LSM_KLOG_READHIST.  */
+   error = security_syslog(LSM_KLOG_READ);
+   if (error)
+   error = security_syslog(LSM_KLOG_READHIST);
+   if (error)
+   break;
+
error = log_buf_len;
break;
default:
Index: linux-2.6/security/commoncap.c
===
--- linux-2.6.orig/security/commoncap.c 2006-12-13 16:06:22.0 -0800
+++ linux-2.6/security/commoncap.c  2006-12-13 16:11:13.0 -0800
@@ -311,7 +311,7 @@
 
 int cap_syslog (int type)
 {
-   if ((type != 3 && type != 10) && !capable(CAP_SYS_ADMIN))
+   if (type != LSM_KLOG_READHIST && !capable(CAP_SYS_ADMIN))
return -EPERM;
return 0;
 }
Index: linux-2.6/security/dummy.c
===
--- linux-2.6.orig/security/dummy.c 2006-12-13 16:06:22.0 -0800
+++ linux-2.6/security/dummy.c  2006-12-13 16:11:31.0 -0800
@@ -96,7 +96,7 @@
 
 static int dummy_syslog (int type)
 {
-   if ((type != 3 && type != 10) && current->euid)
+   if (type != LSM_KLOG_READHIST && current->euid)
return -EPERM;
return 0;
 }
Index: linux-2.6/security/selinux/hooks.c
===
--- linux-2.6.orig/security/selinux/hooks.c 2006-12-13 16:06:22.0 
-0800
+++ linux-2.6/security/selinux/hooks.c  2006-12-13 16:11:41.0 -0800
@@ -1509,25 +1509,17 @@
return rc;
 
switch (type) {
-

[patch 3/4] Refactor do_syslog interface

This patch breaks out the read operations in do_syslog() into their
own functions (klog_read, klog_readhist) and adds a klog_poll.
klog_read grows the ability to do a nonblocking read, which I expose
in the sys_syslog interface because there doesn't seem to be any
reason not to.  do_syslog itself is folded into sys_syslog.  The
security checks remain there, not in the subfunctions.

kmsg.c is then changed to use those functions instead of calling
do_syslog and/or poll_wait itself.. This entails that it must call
security_syslog as appropriate itself.  In this patch I preserve the
security checks exactly as they were with one exception: neither
kmsg_close() nor sys_syslog(KLOG_CLOSE, ...) calls security_syslog
at all anymore (close operations should never fail).

Finally, I fixed a couple of minor bugs.  __put_user error handling in
klog_read was slightly off: if __put_user returns an error, that
character should not be consumed from the kernel buffer; if it returns
an error after some characters have already been copied successfully,
the read operation should return the count of already-copied
characters, not the error code.  Seeking on /proc/kmsg has never been
meaningful, so kmsg_open() should call nonseekable_open() to enforce that.

Change from previous version of patch: proc/kmsg.c declares the
kernel/printk.c interfaces itself, instead of getting them from klog.h
which people want to be purely userspace-visible constants.  kmsg.c has
always had private declarations of printk.c functions (before, there were
declarations of do_syslog and a wait queue there); as it is unlikely that
more users of these functions will appear, I think this will do fine.
(It might be reasonable to put declarations in console.h.)

zw

Index: linux-2.6/fs/proc/kmsg.c
===
--- linux-2.6.orig/fs/proc/kmsg.c   2006-12-13 16:04:46.0 -0800
+++ linux-2.6/fs/proc/kmsg.c2006-12-13 16:36:56.0 -0800
@@ -12,40 +12,43 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 
-extern wait_queue_head_t log_wait;
-
-extern int do_syslog(int type, char __user *bug, int count);
+/* interfaces from kernel/printk.c */
+extern int klog_read(char __user *, int, int);
+extern unsigned int klog_poll(struct file *, poll_table *);
 
 static int kmsg_open(struct inode * inode, struct file * file)
 {
-   return do_syslog(KLOG_OPEN,NULL,0);
+   int error = security_syslog(LSM_KLOG_READ);
+   if (error)
+   return error;
+   return nonseekable_open(inode, file);
 }
 
 static int kmsg_release(struct inode * inode, struct file * file)
 {
-   (void) do_syslog(KLOG_CLOSE,NULL,0);
return 0;
 }
 
 static ssize_t kmsg_read(struct file *file, char __user *buf,
 size_t count, loff_t *ppos)
 {
-   if ((file->f_flags & O_NONBLOCK)
-   && !do_syslog(KLOG_GET_UNREAD, NULL, 0))
-   return -EAGAIN;
-   return do_syslog(KLOG_READ, buf, count);
+   int error = security_syslog(LSM_KLOG_READ);
+   if (error)
+   return error;
+   return klog_read(buf, count, !(file->f_flags & O_NONBLOCK));
 }
 
 static unsigned int kmsg_poll(struct file *file, poll_table *wait)
 {
-   poll_wait(file, &log_wait, wait);
-   if (do_syslog(KLOG_GET_UNREAD, NULL, 0))
-   return POLLIN | POLLRDNORM;
-   return 0;
+   int error = security_syslog(LSM_KLOG_READ);
+   if (error)
+   return error;
+   return klog_poll(file, wait);
 }
 
 
Index: linux-2.6/include/linux/klog.h
===
--- linux-2.6.orig/include/linux/klog.h 2006-12-13 16:12:43.0 -0800
+++ linux-2.6/include/linux/klog.h  2006-12-13 16:33:09.0 -0800
@@ -20,7 +20,9 @@
* printed to console */
 
KLOG_GET_UNREAD  =  9, /* return number of unread characters */
-   KLOG_GET_SIZE= 10  /* return size of log buffer */
+   KLOG_GET_SIZE= 10, /* return size of log buffer */
+   KLOG_READ_NONBLOCK   = 11, /* read from log, don't block if empty
+   * -- new in 2.6.20 */
 };
 
 #endif /* klog.h */
Index: linux-2.6/kernel/printk.c
===
--- linux-2.6.orig/kernel/printk.c  2006-12-13 16:08:30.0 -0800
+++ linux-2.6/kernel/printk.c   2006-12-13 16:39:24.0 -0800
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -45,7 +46,7 @@
 #define MINIMUM_CONSOLE_LOGLEVEL 1 /* Minimum loglevel we let people use */
 #define DEFAULT_CONSOLE_LOGLEVEL 7 /* anything MORE serious than KERN_DEBUG */
 
-DECLARE_WAIT_QUEUE_HEAD(log_wait);
+static DECLARE_WAIT_QUEUE_HEAD(log_wait);
 
 int console_printk[4] = {
DEFAULT_CONSOLE_LOGLEVEL,   /* console_loglevel */
@@ -164,116 +165,142 @@
 
 __setup("log_buf_len=", l

Re: [PATCH/RFC] CodingStyle updates

2006-12-14 Thread Robert P. J. Day

On Thu, 14 Dec 2006, Randy Dunlap wrote:

> On Thu, 14 Dec 2006 19:07:27 -0500 (EST) Robert P. J. Day wrote:
>
> > On Thu, 14 Dec 2006, Randy Dunlap wrote:
> >
> > > David Weinehall wrote:
> > > > On Thu, Dec 07, 2006 at 12:48:38AM -0800, Randy Dunlap wrote:
> > > >
> > > > [snip]
> > > >
> > > > > +but no space after unary operators:
> > > > > + sizeof  ++  --  &  *  +  -  ~  !  defined
> > > >
> > > > Uhm, that doesn't compute...  If you don't put a space after sizeof,
> > > > the program won't compile.
> > > >
> > > > int c;
> > > > printf("%d", sizeofc);
> > >
> > > Uh, we prefer not to see "sizeof c".  IOW, we prefer to have the
> > > parentheses use all the time.  Maybe I need to say that better?
> >
> > here's a *really* rough first pass, i'm sure the end result would need
> > some hand tweaking:
>
> You can certainly send such (generated) patches to Andrew or other
> subsystem maintainers if you'd like, but I'm more interested in not
> adding more crud to the tree in the future.
>
> IOW, sure, we prefer sizeof(foo) to sizeof foo, but the latter isn't
> killing us.  If someone is there making other changes, it would be
> OK to change that also.

the advantage to standardizing what's there is that it makes it easier
to make subsequent changes.  as a perfect example, because there are
several variations to the use of "sizeof", trying to catch every
combination that might be replaceable by the use of ARRAY_SIZE() is
just that much harder since you'd have to (as i had to) use regular
expressions to check for every variant -- parentheses or no
parentheses?  space after the word or not?  internal spaces within the
parentheses?

all that variation makes global changes for more useful stuff a real
pain.

rday

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 4/4] Distinguish /proc/kmsg access from sys_syslog

Finally, add a new security class for access to /proc/kmsg, distinct
from the class used for the "read current messages" operations on
sys_syslog.  The dummy and capability modules permit access to
/proc/kmsg to any user (who has somehow acquired an open fd on it);
SELinux is unchanged.  This accomplishes what I was trying to do in
the first place, i.e. enable running klogd unprivileged without a root
shim, in a non-SELinux installation.  Please remember that the
default DAC permissions for /proc/kmsg restrict it to root, so unless
you chmod it in your installation or modify klogd to open the file and
then drop privs, the actual restrictions are unchanged.

zw


Index: linux-2.6/fs/proc/kmsg.c
===
--- linux-2.6.orig/fs/proc/kmsg.c   2006-12-13 16:36:56.0 -0800
+++ linux-2.6/fs/proc/kmsg.c2006-12-13 16:41:33.0 -0800
@@ -23,7 +23,7 @@
 
 static int kmsg_open(struct inode * inode, struct file * file)
 {
-   int error = security_syslog(LSM_KLOG_READ);
+   int error = security_syslog(LSM_KLOG_READ_PROC);
if (error)
return error;
return nonseekable_open(inode, file);
@@ -37,7 +37,7 @@
 static ssize_t kmsg_read(struct file *file, char __user *buf,
 size_t count, loff_t *ppos)
 {
-   int error = security_syslog(LSM_KLOG_READ);
+   int error = security_syslog(LSM_KLOG_READ_PROC);
if (error)
return error;
return klog_read(buf, count, !(file->f_flags & O_NONBLOCK));
@@ -45,7 +45,7 @@
 
 static unsigned int kmsg_poll(struct file *file, poll_table *wait)
 {
-   int error = security_syslog(LSM_KLOG_READ);
+   int error = security_syslog(LSM_KLOG_READ_PROC);
if (error)
return error;
return klog_poll(file, wait);
Index: linux-2.6/security/commoncap.c
===
--- linux-2.6.orig/security/commoncap.c 2006-12-13 16:11:13.0 -0800
+++ linux-2.6/security/commoncap.c  2006-12-13 16:41:33.0 -0800
@@ -311,7 +311,14 @@
 
 int cap_syslog (int type)
 {
-   if (type != LSM_KLOG_READHIST && !capable(CAP_SYS_ADMIN))
+   /*
+* Reading history is allowed to any user, and so is reading
+* current messages via /proc/kmsg (by default that file is
+* only readable by root, but root is allowed to change that,
+* or open it and hand the fd to an unprivileged process).
+*/
+   if (type != LSM_KLOG_READHIST && type != LSM_KLOG_READ_PROC
+   && !capable(CAP_SYS_ADMIN))
return -EPERM;
return 0;
 }
Index: linux-2.6/security/selinux/hooks.c
===
--- linux-2.6.orig/security/selinux/hooks.c 2006-12-13 16:11:41.0 
-0800
+++ linux-2.6/security/selinux/hooks.c  2006-12-13 16:41:33.0 -0800
@@ -1515,7 +1515,14 @@
case LSM_KLOG_CONSOLE:
return task_has_system(current, SYSTEM__SYSLOG_CONSOLE);
 
+   /*
+* N.B. Unlike the default security model, with
+* SELinux active you have to have SYSTEM__SYSLOG_MOD
+* privilege to read current messages either with the
+* system call or from /proc/kmsg.
+*/
case LSM_KLOG_READ:
+   case LSM_KLOG_READ_PROC:
case LSM_KLOG_CLEARHIST:
default:
return task_has_system(current, SYSTEM__SYSLOG_MOD);
Index: linux-2.6/include/linux/security.h
===
--- linux-2.6.orig/include/linux/security.h 2006-12-13 16:41:45.0 
-0800
+++ linux-2.6/include/linux/security.h  2006-12-13 16:42:26.0 -0800
@@ -94,6 +94,8 @@
 #define LSM_KLOG_READHIST  1  /* read message history (dmesg) */
 #define LSM_KLOG_CLEARHIST 2  /* clear message history (dmesg -c) */
 #define LSM_KLOG_CONSOLE   3  /* set console log level */
+#define LSM_KLOG_READ_PROC 4  /* read current messages, but from /proc/kmsg
+   rather than the system call */
 
 /* forward declares to avoid warnings */
 struct nfsctl_arg;

--

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.20-rc1

2006-12-14 Thread Alistair John Strachan

On Thursday 14 December 2006 21:20, Jens Axboe wrote:
> On Thu, Dec 14 2006, Alistair John Strachan wrote:
> > Hi Jens,
> >
> > On Thursday 14 December 2006 20:48, Jens Axboe wrote:
> > > On Thu, Dec 14 2006, Jens Axboe wrote:
> > > > > I'll do that if nobody comes up with anything obvious.
> > > >
> > > > If you can just test 2.6.19-git1, then we'll know if it's the SG_IO
> > > > patch again.
> > >
> > > Actually, you should test 2.6.19-git1 with this patch applied as well.
> >
> > 2.6.19-git1 with FUJITA Tomonori's bio-leak fix doesn't break, and
> > hddtemp continues to work fine:
> >
> > [root] 21:10 [~] hddtemp /dev/sda /dev/sdb /dev/sdc /dev/sdd
> > /dev/sda: WDC WD2500KS-00MJB0: 29°C
> > /dev/sdb: WDC WD2500KS-00MJB0: 27°C
> > /dev/sdc: Maxtor 6B200M0: 28°C
> > /dev/sdd: Maxtor 6B200M0: 26°C
> >
> > I've added the strace results to the URL previously posted, with the
> > config.
>
> Then it is likely the sata updates, SG_IO is off the hook.

I bisected all the way down to 0e75f9063f5c55fb0b0b546a7c356f8ec186825e, which 
git reckons is the culprit. I wasn't able to revert this commit to test, 
because it has conflicts.

Any ideas?

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC: 2.6 patch] simplify drivers/md/md.c:update_size()

2006-12-14 Thread Adrian Bunk

On Thu, Dec 14, 2006 at 07:36:35PM -0500, Doug Ledford wrote:
> On Fri, 2006-12-15 at 01:19 +0100, Adrian Bunk wrote:
> > While looking at commit 8ddeeae51f2f197b4fafcba117ee8191b49d843e,
> > I got the impression that this commit couldn't fix anything, since the 
> > "size" variable can't be changed before "fit" gets used.
> > 
> > Is there any big thinko, or is the patch below that slightly simplifies 
> > update_size() semantically equivalent to the current code?
> 
> No, this patch is broken.  Where it fails is specifically the case where
> you want to autofit the largest possible size, you have different size
> devices, and the first device is not the smallest.  When you hit the
> first device, you will set size, then as you repeat the ITERATE_RDEV
> loop, when you hit the smaller device, size will be non-0 and you'll
> then trigger the later if and return -ENOSPC.  In the case of autofit,
> you have to preserve the fit variable instead of looking at size so you
> know whether or not to modify the size when you hit a smaller device
> later in the list.
>...

OK, sorry, I've got my thinko:

ITERATE_RDEV() is a loop.

That's what I missed.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kref refcnt and false positives

On Thu, 14 Dec 2006 17:19:55 -0700
[EMAIL PROTECTED] (Eric W. Biederman) wrote:

> "Pallipadi, Venkatesh" <[EMAIL PROTECTED]> writes:
> 
> >>But I believe Venkatesh problem comes from its release() 
> >>function : It is 
> >>supposed to free the object.
> >>If not, it should properly setup it so that further uses are OK.
> >>
> >>ie doing in release(kref)
> >>atomic_set(&kref->count, 0);
> >>
> >
> > Agreed that setting kref refcnt to 0 in release will solve the probloem.
> > But, once the optimization code is removed, we don't need to set it to
> > zero as release will only be called after the count reaches zero anyway.
> 
> The primary point of the optimization is to not write allocate a cache line
> unnecessarily.   I don't know it's value, but it can have one especially
> on big way SMP machines.

Guys, we have about 100 reports of weirdo
crashes, smashes, bashes and splats in the kref code.  The last thing we
need is some obscure, tricksy little optimisation which leads legitimate
uses of the API to mysteriously fail.  

If we are allocating and freeing kref-counted objects at a sufficiently
high frequency for this thing to make a difference then we should fix that
instead of trying to suck faster.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: libata-pata with ICH4, rootfs

On Thu, 14 Dec 2006 18:32:50 +
Alistair John Strachan <[EMAIL PROTECTED]> wrote:
> Correct me if I'm wrong, but SATA wasn't available on ICH4. Only 5 and 
> greater. The kernel help text agrees with me.
> 
> My IDE controller usually works with CONFIG_BLK_DEV_PIIX; I was interested in 
> using your pata_xxx drivers in replacement, assuming there was support.

The ata_piix driver does both SATA and PATA for the later chips. The
reason for this is that the SATA ICH devices have PATA ports as well
which are closely interlinked in how they operate. Since the ata_piix
driver has to drive those and the PATA only ones from PIIX3 onward are
similar it handles them all.

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC: 2.6 patch] simplify drivers/md/md.c:update_size()

2006-12-14 Thread Doug Ledford

On Fri, 2006-12-15 at 01:19 +0100, Adrian Bunk wrote:
> While looking at commit 8ddeeae51f2f197b4fafcba117ee8191b49d843e,
> I got the impression that this commit couldn't fix anything, since the 
> "size" variable can't be changed before "fit" gets used.
> 
> Is there any big thinko, or is the patch below that slightly simplifies 
> update_size() semantically equivalent to the current code?

No, this patch is broken.  Where it fails is specifically the case where
you want to autofit the largest possible size, you have different size
devices, and the first device is not the smallest.  When you hit the
first device, you will set size, then as you repeat the ITERATE_RDEV
loop, when you hit the smaller device, size will be non-0 and you'll
then trigger the later if and return -ENOSPC.  In the case of autofit,
you have to preserve the fit variable instead of looking at size so you
know whether or not to modify the size when you hit a smaller device
later in the list.

> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
> 
> ---
> 
>  drivers/md/md.c |3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> --- linux-2.6.19-mm1/drivers/md/md.c.old  2006-12-15 00:57:05.0 
> +0100
> +++ linux-2.6.19-mm1/drivers/md/md.c  2006-12-15 00:57:42.0 +0100
> @@ -4039,57 +4039,56 @@
>* Generate a 128 bit UUID
>*/
>   get_random_bytes(mddev->uuid, 16);
>  
>   mddev->new_level = mddev->level;
>   mddev->new_chunk = mddev->chunk_size;
>   mddev->new_layout = mddev->layout;
>   mddev->delta_disks = 0;
>  
>   mddev->dead = 0;
>   return 0;
>  }
>  
>  static int update_size(mddev_t *mddev, unsigned long size)
>  {
>   mdk_rdev_t * rdev;
>   int rv;
>   struct list_head *tmp;
> - int fit = (size == 0);
>  
>   if (mddev->pers->resize == NULL)
>   return -EINVAL;
>   /* The "size" is the amount of each device that is used.
>* This can only make sense for arrays with redundancy.
>* linear and raid0 always use whatever space is available
>* We can only consider changing the size if no resync
>* or reconstruction is happening, and if the new size
>* is acceptable. It must fit before the sb_offset or,
>* if that is * size of each device.
>* If size is zero, we find the largest size that fits.
>*/
>   if (mddev->sync_thread)
>   return -EBUSY;
>   ITERATE_RDEV(mddev,rdev,tmp) {
>   sector_t avail;
>   avail = rdev->size * 2;
>  
> - if (fit && (size == 0 || size > avail/2))
> + if (size == 0)
>   size = avail/2;
>   if (avail < ((sector_t)size << 1))
>   return -ENOSPC;
>   }
>   rv = mddev->pers->resize(mddev, (sector_t)size *2);
>   if (!rv) {
>   struct block_device *bdev;
>  
>   bdev = bdget_disk(mddev->gendisk, 0);
>   if (bdev) {
>   mutex_lock(&bdev->bd_inode->i_mutex);
>   i_size_write(bdev->bd_inode, (loff_t)mddev->array_size 
> << 10);
>   mutex_unlock(&bdev->bd_inode->i_mutex);
>   bdput(bdev);
>   }
>   }
>   return rv;
>  }
-- 
Doug Ledford <[EMAIL PROTECTED]>
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband


signature.asc
Description: This is a digitally signed message part

Register semundo task watcher

Make the semaphore undo code use a task watcher instead of hooking into
copy_process() and do_exit() directly.

Signed-off-by: Matt Helsley <[EMAIL PROTECTED]>
---
 include/linux/sem.h |   17 -
 ipc/sem.c   |   12 
 kernel/exit.c   |2 --
 kernel/fork.c   |6 +-
 4 files changed, 9 insertions(+), 28 deletions(-)

Index: linux-2.6.19/ipc/sem.c
===
--- linux-2.6.19.orig/ipc/sem.c
+++ linux-2.6.19/ipc/sem.c
@@ -81,10 +81,11 @@
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include "util.h"
 
 #define sem_ids(ns)(*((ns)->ids[IPC_SEM_IDS]))
@@ -1287,11 +1288,11 @@ asmlinkage long sys_semop (int semid, st
  * See the notes above unlock_semundo() regarding the spin_lock_init()
  * in this code.  Initialize the undo_list->lock here instead of 
get_undo_list()
  * because of the reasoning in the comment above unlock_semundo.
  */
 
-int copy_semundo(unsigned long clone_flags, struct task_struct *tsk)
+static int __task_init copy_semundo(unsigned long clone_flags, struct 
task_struct *tsk)
 {
struct sem_undo_list *undo_list;
int error;
 
if (clone_flags & CLONE_SYSVSEM) {
@@ -1303,10 +1304,11 @@ int copy_semundo(unsigned long clone_fla
} else 
tsk->sysvsem.undo_list = NULL;
 
return 0;
 }
+DEFINE_TASK_INITCALL(copy_semundo);
 
 /*
  * add semadj values to semaphores, free undo structures.
  * undo structures are not freed when semaphore arrays are destroyed
  * so some of them may be out of date.
@@ -1316,22 +1318,22 @@ int copy_semundo(unsigned long clone_fla
  * should we queue up and wait until we can do so legally?
  * The original implementation attempted to do this (queue and wait).
  * The current implementation does not do so. The POSIX standard
  * and SVID should be consulted to determine what behavior is mandated.
  */
-void exit_sem(struct task_struct *tsk)
+static int __task_free exit_sem(unsigned long ignored, struct task_struct *tsk)
 {
struct sem_undo_list *undo_list;
struct sem_undo *u, **up;
struct ipc_namespace *ns;
 
undo_list = tsk->sysvsem.undo_list;
if (!undo_list)
-   return;
+   return 0;
 
if (!atomic_dec_and_test(&undo_list->refcnt))
-   return;
+   return 0;
 
ns = tsk->nsproxy->ipc_ns;
/* There's no need to hold the semundo list lock, as current
  * is the last task exiting for this undo list.
 */
@@ -1394,11 +1396,13 @@ found:
update_queue(sma);
 next_entry:
sem_unlock(sma);
}
kfree(undo_list);
+   return 0;
 }
+DEFINE_TASK_FREECALL(exit_sem);
 
 #ifdef CONFIG_PROC_FS
 static int sysvipc_sem_proc_show(struct seq_file *s, void *it)
 {
struct sem_array *sma = it;
Index: linux-2.6.19/kernel/exit.c
===
--- linux-2.6.19.orig/kernel/exit.c
+++ linux-2.6.19/kernel/exit.c
@@ -45,11 +45,10 @@
 #include 
 #include 
 #include 
 #include 
 
-extern void sem_exit (void);
 extern struct task_struct *child_reaper;
 
 static void exit_mm(struct task_struct * tsk);
 
 static void __unhash_process(struct task_struct *p)
@@ -916,11 +915,10 @@ fastcall NORET_TYPE void do_exit(long co
exit_mm(tsk);
notify_task_watchers(WATCH_TASK_FREE, code, tsk);
 
if (group_dead)
acct_process();
-   exit_sem(tsk);
__exit_files(tsk);
__exit_fs(tsk);
exit_thread();
cpuset_exit(tsk);
exit_keys(tsk);
Index: linux-2.6.19/kernel/fork.c
===
--- linux-2.6.19.orig/kernel/fork.c
+++ linux-2.6.19/kernel/fork.c
@@ -1102,14 +1102,12 @@ static struct task_struct *copy_process(
 #endif
 
if ((retval = security_task_alloc(p)))
goto bad_fork_cleanup_policy;
/* copy all the process information */
-   if ((retval = copy_semundo(clone_flags, p)))
-   goto bad_fork_cleanup_security;
if ((retval = copy_files(clone_flags, p)))
-   goto bad_fork_cleanup_semundo;
+   goto bad_fork_cleanup_security;
if ((retval = copy_fs(clone_flags, p)))
goto bad_fork_cleanup_files;
if ((retval = copy_sighand(clone_flags, p)))
goto bad_fork_cleanup_fs;
if ((retval = copy_signal(clone_flags, p)))
@@ -1276,12 +1274,10 @@ bad_fork_cleanup_sighand:
__cleanup_sighand(p->sighand);
 bad_fork_cleanup_fs:
exit_fs(p); /* blocking */
 bad_fork_cleanup_files:
exit_files(p); /* blocking */
-bad_fork_cleanup_semundo:
-   exit_sem(p);
 bad_fork_cleanup_security:
security_task_free(p);
 bad_fork_cleanup_policy:
 #ifdef CONFIG_NUMA
mpol_free(p->mempolicy);
Index: linux-2.6.19/include/linux/sem.h
===

Register cpuset task watcher

Register a task watcher for cpusets instead of hooking into
copy_process() and do_exit() directly.

Signed-off-by: Matt Helsley <[EMAIL PROTECTED]>
Cc: Paul Jackson <[EMAIL PROTECTED]>
---
 include/linux/cpuset.h |4 
 kernel/cpuset.c|   11 +--
 kernel/exit.c  |2 --
 kernel/fork.c  |6 +-
 4 files changed, 10 insertions(+), 13 deletions(-)

Index: linux-2.6.19/kernel/fork.c
===
--- linux-2.6.19.orig/kernel/fork.c
+++ linux-2.6.19/kernel/fork.c
@@ -28,11 +28,10 @@
 #include 
 #include 
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
 #include 
 #include 
@@ -1060,17 +1059,16 @@ static struct task_struct *copy_process(
p->tgid = current->tgid;
 
retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p);
if (retval < 0)
goto bad_fork_cleanup_delays_binfmt;
-   cpuset_fork(p);
 #ifdef CONFIG_NUMA
p->mempolicy = mpol_copy(p->mempolicy);
if (IS_ERR(p->mempolicy)) {
retval = PTR_ERR(p->mempolicy);
p->mempolicy = NULL;
-   goto bad_fork_cleanup_cpuset;
+   goto bad_fork_cleanup_delays_binfmt;
}
mpol_fix_fork_child_flag(p);
 #endif
 #ifdef CONFIG_TRACE_IRQFLAGS
p->irq_events = 0;
@@ -1279,13 +1277,11 @@ bad_fork_cleanup_files:
 bad_fork_cleanup_security:
security_task_free(p);
 bad_fork_cleanup_policy:
 #ifdef CONFIG_NUMA
mpol_free(p->mempolicy);
-bad_fork_cleanup_cpuset:
 #endif
-   cpuset_exit(p);
 bad_fork_cleanup_delays_binfmt:
delayacct_tsk_free(p);
notify_task_watchers(WATCH_TASK_FREE, 0, p);
if (p->binfmt)
module_put(p->binfmt->module);
Index: linux-2.6.19/kernel/cpuset.c
===
--- linux-2.6.19.orig/kernel/cpuset.c
+++ linux-2.6.19/kernel/cpuset.c
@@ -47,10 +47,11 @@
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 #include 
 
@@ -2173,17 +2174,20 @@ void __init cpuset_init_smp(void)
  *
  * At the point that cpuset_fork() is called, 'current' is the parent
  * task, and the passed argument 'child' points to the child task.
  **/
 
-void cpuset_fork(struct task_struct *child)
+static int __task_init cpuset_fork(unsigned long clone_flags,
+   struct task_struct *child)
 {
task_lock(current);
child->cpuset = current->cpuset;
atomic_inc(&child->cpuset->count);
task_unlock(current);
+   return 0;
 }
+DEFINE_TASK_INITCALL(cpuset_fork);
 
 /**
  * cpuset_exit - detach cpuset from exiting task
  * @tsk: pointer to task_struct of exiting process
  *
@@ -2240,11 +2244,12 @@ void cpuset_fork(struct task_struct *chi
  *to NULL here, and check in cpuset_update_task_memory_state()
  *for a NULL pointer.  This hack avoids that NULL check, for no
  *cost (other than this way too long comment ;).
  **/
 
-void cpuset_exit(struct task_struct *tsk)
+static int __task_free cpuset_exit(unsigned long exit_code,
+   struct task_struct *tsk)
 {
struct cpuset *cs;
 
cs = tsk->cpuset;
tsk->cpuset = &top_cpuset;  /* the_top_cpuset_hack - see above */
@@ -2258,11 +2263,13 @@ void cpuset_exit(struct task_struct *tsk
mutex_unlock(&manage_mutex);
cpuset_release_agent(pathbuf);
} else {
atomic_dec(&cs->count);
}
+   return 0;
 }
+DEFINE_TASK_FREECALL(cpuset_exit);
 
 /**
  * cpuset_cpus_allowed - return cpus_allowed mask from a tasks cpuset.
  * @tsk: pointer to task_struct from which to obtain cpuset->cpus_allowed.
  *
Index: linux-2.6.19/kernel/exit.c
===
--- linux-2.6.19.orig/kernel/exit.c
+++ linux-2.6.19/kernel/exit.c
@@ -28,11 +28,10 @@
 #include 
 #include 
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
 #include 
 #include 
@@ -918,11 +917,10 @@ fastcall NORET_TYPE void do_exit(long co
if (group_dead)
acct_process();
__exit_files(tsk);
__exit_fs(tsk);
exit_thread();
-   cpuset_exit(tsk);
exit_keys(tsk);
 
if (group_dead && tsk->signal->leader)
disassociate_ctty(1);
 
Index: linux-2.6.19/include/linux/cpuset.h
===
--- linux-2.6.19.orig/include/linux/cpuset.h
+++ linux-2.6.19/include/linux/cpuset.h
@@ -17,12 +17,10 @@
 extern int number_of_cpusets;  /* How many cpusets are defined in system? */
 
 extern int cpuset_init_early(void);
 extern int cpuset_init(void);
 extern void cpuset_init_smp(void);
-extern void cpuset_fork(struct task_struct *p);
-extern void cpuset_exit(struct task_struct *p);
 extern cpumask_t cpuset_cpus_allowed(struct t

Re: [PATCH/RFC] CodingStyle updates

2006-12-14 Thread Randy Dunlap

On Thu, 14 Dec 2006 19:07:27 -0500 (EST) Robert P. J. Day wrote:

> On Thu, 14 Dec 2006, Randy Dunlap wrote:
> 
> > David Weinehall wrote:
> > > On Thu, Dec 07, 2006 at 12:48:38AM -0800, Randy Dunlap wrote:
> > >
> > > [snip]
> > >
> > > > +but no space after unary operators:
> > > > +   sizeof  ++  --  &  *  +  -  ~  !  defined
> > >
> > > Uhm, that doesn't compute...  If you don't put a space after sizeof,
> > > the program won't compile.
> > >
> > > int c;
> > > printf("%d", sizeofc);
> >
> > Uh, we prefer not to see "sizeof c".  IOW, we prefer to have the
> > parentheses use all the time.  Maybe I need to say that better?
> 
> here's a *really* rough first pass, i'm sure the end result would need
> some hand tweaking:

You can certainly send such (generated) patches to Andrew or other
subsystem maintainers if you'd like, but I'm more interested in
not adding more crud to the tree in the future.

IOW, sure, we prefer sizeof(foo) to sizeof foo, but the latter isn't
killing us.  If someone is there making other changes, it would be
OK to change that also.

Compare:  http://lkml.org/lkml/2006/12/7/191

---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Register NUMA mempolicy task watcher

Register a NUMA mempolicy task watcher instead of hooking into
copy_process() and do_exit() directly.

Signed-off-by: Matt Helsley <[EMAIL PROTECTED]>
---
 kernel/exit.c  |4 
 kernel/fork.c  |   15 +--
 mm/mempolicy.c |   25 +
 3 files changed, 26 insertions(+), 18 deletions(-)

Benchmark results:
System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 
kernel

Clone   Number of Children Cloned
500075001   12500   15000   
17500

---
Mean17836.3 18085.2 18220.4 18225   18319   
18339
Dev 302.801 314.617 303.079 293.46  287.267 
294.819
Err (%) 1.69767 1.73963 1.6634  1.6102  1.56814 
1.60761

ForkNumber of Children Forked
500075001   12500   15000   
17500

---
Mean17896.2 17990   18100.6 18242.3 18244   
18346.9
Dev 301.64  285.698 295.646 304.361 299.472 
287.153
Err (%) 1.6855  1.58809 1.63335 1.66844 1.64148 
1.56513

Kernbench:
Elapsed: 124.532s User: 439.732s System: 46.497s CPU: 389.9%
439.71user 46.48system 2:04.24elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.79user 46.42system 2:05.10elapsed 388%CPU (0avgtext+0avgdata 0maxresident)k
439.74user 46.44system 2:04.60elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.75user 46.64system 2:04.74elapsed 389%CPU (0avgtext+0avgdata 0maxresident)k
439.61user 46.45system 2:05.36elapsed 387%CPU (0avgtext+0avgdata 0maxresident)k
439.60user 46.43system 2:04.33elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.77user 46.47system 2:04.34elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.87user 46.45system 2:04.10elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
439.76user 46.71system 2:04.58elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k
439.72user 46.48system 2:03.93elapsed 392%CPU (0avgtext+0avgdata 0maxresident)k

Index: linux-2.6.19/mm/mempolicy.c
===
--- linux-2.6.19.orig/mm/mempolicy.c
+++ linux-2.6.19/mm/mempolicy.c
@@ -87,10 +87,11 @@
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 
 /* Internal flags */
@@ -1331,10 +1332,34 @@ struct mempolicy *__mpol_copy(struct mem
}
}
return new;
 }
 
+static int __task_init init_task_mempolicy(unsigned long clone_flags,
+  struct task_struct *tsk)
+{
+   tsk->mempolicy = mpol_copy(tsk->mempolicy);
+   if (IS_ERR(tsk->mempolicy)) {
+   int retval;
+
+   retval = PTR_ERR(tsk->mempolicy);
+   tsk->mempolicy = NULL;
+   return retval;
+   }
+   mpol_fix_fork_child_flag(tsk);
+   return 0;
+}
+DEFINE_TASK_INITCALL(init_task_mempolicy);
+
+static int __task_free free_task_mempolicy(unsigned int ignored,
+  struct task_struct *tsk)
+{
+   mpol_free(tsk->mempolicy);
+   tsk->mempolicy = NULL;
+}
+DEFINE_TASK_FREECALL(free_task_mempolicy);
+
 /* Slow path of a mempolicy comparison */
 int __mpol_equal(struct mempolicy *a, struct mempolicy *b)
 {
if (!a || !b)
return 0;
Index: linux-2.6.19/kernel/fork.c
===
--- linux-2.6.19.orig/kernel/fork.c
+++ linux-2.6.19/kernel/fork.c
@@ -1059,19 +1059,10 @@ static struct task_struct *copy_process(
p->tgid = current->tgid;
 
retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p);
if (retval < 0)
goto bad_fork_cleanup_delays_binfmt;
-#ifdef CONFIG_NUMA
-   p->mempolicy = mpol_copy(p->mempolicy);
-   if (IS_ERR(p->mempolicy)) {
-   retval = PTR_ERR(p->mempolicy);
-   p->mempolicy = NULL;
-   goto bad_fork_cleanup_delays_binfmt;
-   }
-   mpol_fix_fork_child_flag(p);
-#endif
 #ifdef CONFIG_TRACE_IRQFLAGS
p->irq_events = 0;
 #ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
p->hardirqs_enabled = 1;
 #else
@@ -1098,11 +1089,11 @@ static struct task_struct *copy_process(
 #ifdef CONFIG_DEBUG_MUTEXES
p->blocked_on = NULL; /* not blocked yet */
 #endif
 
if ((retval = security_task_alloc(p)))
-   goto bad_fork_cleanup_policy;
+   goto bad_fork_cleanup_delays_binfmt;
/* copy all the process information */
if ((retval = copy_files(clone_flags, p)))
goto bad_fork_cleanup_security;
if ((retval = copy_

Register lockdep task watcher

Register a task watcher for lockdep instead of hooking into copy_process().

Signed-off-by: Matt Helsley <[EMAIL PROTECTED]>
---
 kernel/fork.c|5 -
 kernel/lockdep.c |   11 +++
 2 files changed, 11 insertions(+), 5 deletions(-)

Index: linux-2.6.19/kernel/fork.c
===
--- linux-2.6.19.orig/kernel/fork.c
+++ linux-2.6.19/kernel/fork.c
@@ -1059,15 +1059,10 @@ static struct task_struct *copy_process(
p->tgid = current->tgid;
 
retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p);
if (retval < 0)
goto bad_fork_cleanup_delays_binfmt;
-#ifdef CONFIG_LOCKDEP
-   p->lockdep_depth = 0; /* no locks held yet */
-   p->curr_chain_key = 0;
-   p->lockdep_recursion = 0;
-#endif
 
 #ifdef CONFIG_DEBUG_MUTEXES
p->blocked_on = NULL; /* not blocked yet */
 #endif
 
Index: linux-2.6.19/kernel/lockdep.c
===
--- linux-2.6.19.orig/kernel/lockdep.c
+++ linux-2.6.19/kernel/lockdep.c
@@ -25,10 +25,11 @@
  * mapping lock dependencies runtime.
  */
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
@@ -2557,10 +2558,20 @@ void __init lockdep_init(void)
INIT_LIST_HEAD(chainhash_table + i);
 
lockdep_initialized = 1;
 }
 
+static int __task_init init_task_lockdep(unsigned long clone_flags,
+struct task_struct *p)
+{
+   p->lockdep_depth = 0; /* no locks held yet */
+   p->curr_chain_key = 0;
+   p->lockdep_recursion = 0;
+   return 0;
+}
+DEFINE_TASK_INITCALL(init_task_lockdep);
+
 void __init lockdep_info(void)
 {
printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., 
Ingo Molnar\n");
 
printk("... MAX_LOCKDEP_SUBCLASSES:%lu\n", MAX_LOCKDEP_SUBCLASSES);

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Register process events connector

Make the Process events connector use task watchers instead of hooking the
paths it's interested in.

Signed-off-by: Matt Helsley <[EMAIL PROTECTED]>
---
 drivers/connector/cn_proc.c |   51 +++-
 fs/exec.c   |1 
 include/linux/cn_proc.h |   21 --
 kernel/exit.c   |2 -
 kernel/fork.c   |2 -
 kernel/sys.c|8 --
 6 files changed, 36 insertions(+), 49 deletions(-)

Index: linux-2.6.19/drivers/connector/cn_proc.c
===
--- linux-2.6.19.orig/drivers/connector/cn_proc.c
+++ linux-2.6.19/drivers/connector/cn_proc.c
@@ -44,19 +44,20 @@ static inline void get_seq(__u32 *ts, in
*ts = get_cpu_var(proc_event_counts)++;
*cpu = smp_processor_id();
put_cpu_var(proc_event_counts);
 }
 
-void proc_fork_connector(struct task_struct *task)
+static int proc_fork_connector(unsigned long clone_flags,
+  struct task_struct *task)
 {
struct cn_msg *msg;
struct proc_event *ev;
__u8 buffer[CN_PROC_MSG_SIZE];
struct timespec ts;
 
if (atomic_read(&proc_event_num_listeners) < 1)
-   return;
+   return 0;
 
msg = (struct cn_msg*)buffer;
ev = (struct proc_event*)msg->data;
get_seq(&msg->seq, &ev->cpu);
ktime_get_ts(&ts); /* get high res monotonic timestamp */
@@ -70,21 +71,24 @@ void proc_fork_connector(struct task_str
memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
msg->len = sizeof(*ev);
/*  If cn_netlink_send() failed, the data is not sent */
cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+   return 0;
 }
+DEFINE_TASK_CLONECALL(proc_fork_connector);
 
-void proc_exec_connector(struct task_struct *task)
+static int proc_exec_connector(unsigned long ignore,
+  struct task_struct *task)
 {
struct cn_msg *msg;
struct proc_event *ev;
struct timespec ts;
__u8 buffer[CN_PROC_MSG_SIZE];
 
if (atomic_read(&proc_event_num_listeners) < 1)
-   return;
+   return 0;
 
msg = (struct cn_msg*)buffer;
ev = (struct proc_event*)msg->data;
get_seq(&msg->seq, &ev->cpu);
ktime_get_ts(&ts); /* get high res monotonic timestamp */
@@ -95,21 +99,23 @@ void proc_exec_connector(struct task_str
 
memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
msg->len = sizeof(*ev);
cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+   return 0;
 }
+DEFINE_TASK_EXECCALL(proc_exec_connector);
 
-void proc_id_connector(struct task_struct *task, int which_id)
+static int process_change_id(unsigned long which_id, struct task_struct *task)
 {
struct cn_msg *msg;
struct proc_event *ev;
__u8 buffer[CN_PROC_MSG_SIZE];
struct timespec ts;
 
if (atomic_read(&proc_event_num_listeners) < 1)
-   return;
+   return 0;
 
msg = (struct cn_msg*)buffer;
ev = (struct proc_event*)msg->data;
ev->what = which_id;
ev->event_data.id.process_pid = task->pid;
@@ -119,47 +125,64 @@ void proc_id_connector(struct task_struc
ev->event_data.id.e.euid = task->euid;
} else if (which_id == PROC_EVENT_GID) {
ev->event_data.id.r.rgid = task->gid;
ev->event_data.id.e.egid = task->egid;
} else
-   return;
+   return 0;
get_seq(&msg->seq, &ev->cpu);
ktime_get_ts(&ts); /* get high res monotonic timestamp */
ev->timestamp_ns = timespec_to_ns(&ts);
 
memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
msg->ack = 0; /* not used */
msg->len = sizeof(*ev);
cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL);
+   return 0;
+}
+
+static int proc_change_uid_connector(unsigned long ignore,
+struct task_struct *task)
+{
+   return process_change_id(PROC_EVENT_UID, task);
+}
+DEFINE_TASK_UIDCALL(proc_change_uid_connector);
+
+static int proc_change_gid_connector(unsigned long ignore,
+struct task_struct *task)
+{
+   return process_change_id(PROC_EVENT_GID, task);
 }
+DEFINE_TASK_GIDCALL(proc_change_gid_connector);
 
-void proc_exit_connector(struct task_struct *task)
+static int proc_exit_connector(unsigned long code, struct task_struct *task)
 {
struct cn_msg *msg;
struct proc_event *ev;
__u8 buffer[CN_PROC_MSG_SIZE];
struct timespec ts;
 
if (atomic_read(&proc_event_num_listeners) < 1)
-   return;
+   return 0;
 
msg = (struct cn_msg*)buffer;
ev = (struct proc_event*)msg->data;
get_seq(&msg->seq, &ev->cpu);
ktime

Register IRQ flag tracing task watcher

Register an irq-flag-tracing task watcher instead of hooking into
copy_process().

Signed-off-by: Matt Helsley <[EMAIL PROTECTED]>
---
 kernel/fork.c   |   19 ---
 kernel/irq/handle.c |   24 
 2 files changed, 24 insertions(+), 19 deletions(-)

Index: linux-2.6.19/kernel/fork.c
===
--- linux-2.6.19.orig/kernel/fork.c
+++ linux-2.6.19/kernel/fork.c
@@ -1059,29 +1059,10 @@ static struct task_struct *copy_process(
p->tgid = current->tgid;
 
retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p);
if (retval < 0)
goto bad_fork_cleanup_delays_binfmt;
-#ifdef CONFIG_TRACE_IRQFLAGS
-   p->irq_events = 0;
-#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
-   p->hardirqs_enabled = 1;
-#else
-   p->hardirqs_enabled = 0;
-#endif
-   p->hardirq_enable_ip = 0;
-   p->hardirq_enable_event = 0;
-   p->hardirq_disable_ip = _THIS_IP_;
-   p->hardirq_disable_event = 0;
-   p->softirqs_enabled = 1;
-   p->softirq_enable_ip = _THIS_IP_;
-   p->softirq_enable_event = 0;
-   p->softirq_disable_ip = 0;
-   p->softirq_disable_event = 0;
-   p->hardirq_context = 0;
-   p->softirq_context = 0;
-#endif
 #ifdef CONFIG_LOCKDEP
p->lockdep_depth = 0; /* no locks held yet */
p->curr_chain_key = 0;
p->lockdep_recursion = 0;
 #endif
Index: linux-2.6.19/kernel/irq/handle.c
===
--- linux-2.6.19.orig/kernel/irq/handle.c
+++ linux-2.6.19/kernel/irq/handle.c
@@ -13,10 +13,11 @@
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #include "internals.h"
 
 /**
  * handle_bad_irq - handle spurious and unhandled irqs
@@ -266,6 +267,29 @@ void early_init_irq_lock_class(void)
 
for (i = 0; i < NR_IRQS; i++)
lockdep_set_class(&irq_desc[i].lock, &irq_desc_lock_class);
 }
 
+static int __task_init init_task_trace_irqflags(unsigned long clone_flags,
+   struct task_struct *p)
+{
+   p->irq_events = 0;
+#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
+   p->hardirqs_enabled = 1;
+#else
+   p->hardirqs_enabled = 0;
+#endif
+   p->hardirq_enable_ip = 0;
+   p->hardirq_enable_event = 0;
+   p->hardirq_disable_ip = _THIS_IP_;
+   p->hardirq_disable_event = 0;
+   p->softirqs_enabled = 1;
+   p->softirq_enable_ip = _THIS_IP_;
+   p->softirq_enable_event = 0;
+   p->softirq_disable_ip = 0;
+   p->softirq_disable_event = 0;
+   p->hardirq_context = 0;
+   p->softirq_context = 0;
+   return 0;
+}
+DEFINE_TASK_INITCALL(init_task_trace_irqflags);
 #endif

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Register audit task watcher

Change audit to register a task watcher function rather than modify
the copy_process() and do_exit() paths directly.

Removes an unlikely() hint from kernel/exit.c:
if (unlikely(tsk->audit_context))
audit_free(tsk);
This use of unlikely() is an artifact of audit_free()'s former invocation from
__put_task_struct() (commit: fa84cb935d4ec601528f5e2f0d5d31e7876a5044).
Clearly in the __put_task_struct() path it would be called much more frequently
than do_exit() and hence the use of unlikely() there was justified. However, in
the new location the hint most likely offers no measurable performance impact.

Signed-off-by: Matt Helsley <[EMAIL PROTECTED]>
Cc: Al Viro <[EMAIL PROTECTED]>
Cc: Steve Grubb <[EMAIL PROTECTED]>
Cc: linux-audit@redhat.com
---
 include/linux/audit.h |4 
 kernel/auditsc.c  |9 ++---
 kernel/exit.c |3 ---
 kernel/fork.c |7 +--
 4 files changed, 7 insertions(+), 16 deletions(-)

Index: linux-2.6.19/kernel/auditsc.c
===
--- linux-2.6.19.orig/kernel/auditsc.c
+++ linux-2.6.19/kernel/auditsc.c
@@ -677,11 +677,11 @@ static inline struct audit_context *audi
  * Filter on the task information and allocate a per-task audit context
  * if necessary.  Doing so turns on system call auditing for the
  * specified task.  This is called from copy_process, so no lock is
  * needed.
  */
-int audit_alloc(struct task_struct *tsk)
+static int __task_init audit_alloc(unsigned long val, struct task_struct *tsk)
 {
struct audit_context *context;
enum audit_state state;
 
if (likely(!audit_enabled))
@@ -703,10 +703,11 @@ int audit_alloc(struct task_struct *tsk)
 
tsk->audit_context  = context;
set_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT);
return 0;
 }
+DEFINE_TASK_INITCALL(audit_alloc);
 
 static inline void audit_free_context(struct audit_context *context)
 {
struct audit_context *previous;
int  count = 0;
@@ -1033,28 +1034,30 @@ static void audit_log_exit(struct audit_
  * audit_free - free a per-task audit context
  * @tsk: task whose audit context block to free
  *
  * Called from copy_process and do_exit
  */
-void audit_free(struct task_struct *tsk)
+static int __task_free audit_free(unsigned long val, struct task_struct *tsk)
 {
struct audit_context *context;
 
context = audit_get_context(tsk, 0, 0);
if (likely(!context))
-   return;
+   return 0;
 
/* Check for system calls that do not go through the exit
 * function (e.g., exit_group), then free context block. 
 * We use GFP_ATOMIC here because we might be doing this 
 * in the context of the idle thread */
/* that can happen only if we are called from do_exit() */
if (context->in_syscall && context->auditable)
audit_log_exit(context, tsk);
 
audit_free_context(context);
+   return 0;
 }
+DEFINE_TASK_FREECALL(audit_free);
 
 /**
  * audit_syscall_entry - fill in an audit record at syscall entry
  * @tsk: task being audited
  * @arch: architecture type
Index: linux-2.6.19/include/linux/audit.h
===
--- linux-2.6.19.orig/include/linux/audit.h
+++ linux-2.6.19/include/linux/audit.h
@@ -332,12 +332,10 @@ struct mqstat;
 extern int __init audit_register_class(int class, unsigned *list);
 extern int audit_classify_syscall(int abi, unsigned syscall);
 #ifdef CONFIG_AUDITSYSCALL
 /* These are defined in auditsc.c */
/* Public API */
-extern int  audit_alloc(struct task_struct *task);
-extern void audit_free(struct task_struct *task);
 extern void audit_syscall_entry(int arch,
int major, unsigned long a0, unsigned long a1,
unsigned long a2, unsigned long a3);
 extern void audit_syscall_exit(int failed, long return_code);
 extern void __audit_getname(const char *name);
@@ -432,12 +430,10 @@ static inline int audit_mq_getsetattr(mq
return __audit_mq_getsetattr(mqdes, mqstat);
return 0;
 }
 extern int audit_n_rules;
 #else
-#define audit_alloc(t) ({ 0; })
-#define audit_free(t) do { ; } while (0)
 #define audit_syscall_entry(ta,a,b,c,d,e) do { ; } while (0)
 #define audit_syscall_exit(f,r) do { ; } while (0)
 #define audit_dummy_context() 1
 #define audit_getname(n) do { ; } while (0)
 #define audit_putname(n) do { ; } while (0)
Index: linux-2.6.19/kernel/fork.c
===
--- linux-2.6.19.orig/kernel/fork.c
+++ linux-2.6.19/kernel/fork.c
@@ -37,11 +37,10 @@
 #include 
 #include 
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
 #include 
 #include 
@@ -1102,15 +1101,13 @@ static struct task_struct *copy_process(
p->blocked_on = NULL; /* not blocked yet */
 #e

Prefetch hint

Prefetch the entire array of function pointers.

Signed-off-by: Matt Helsley <[EMAIL PROTECTED]>

---
 kernel/task_watchers.c |2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6.19/kernel/task_watchers.c
===
--- linux-2.6.19.orig/kernel/task_watchers.c
+++ linux-2.6.19/kernel/task_watchers.c
@@ -1,6 +1,7 @@
 #include 
+#include 
 
 /* Defined in include/asm-generic/vmlinux.lds.h */
 extern const task_watcher_fn __start_task_init[],
__start_task_clone[], __start_task_exec[],
__start_task_uid[], __start_task_gid[],
@@ -30,10 +31,11 @@ int notify_task_watchers(unsigned int ev
 
tw_call = twtable[ev];
tw_end = twtable[ev + 1];
 
/* Call all of the watchers, report the first error */
+   prefetch_range(tw_call, tw_end - tw_call);
for (; tw_call < tw_end; tw_call++) {
err = (*tw_call)(val, tsk);
if (unlikely((err < 0) && (ret_err == 0)))
ret_err = err;
}

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Register process keyrings task watcher

Make the keyring code use a task watcher to initialize and free per-task data.

NOTE:
We can't make copy_thread_group_keys() in copy_signal() a task watcher because 
it needs the task's signal field (struct signal_struct).

Signed-off-by: Matt Helsley <[EMAIL PROTECTED]>
Cc: David Howells <[EMAIL PROTECTED]>
---
 include/linux/key.h  |8 
 kernel/exit.c|2 --
 kernel/fork.c|6 +-
 kernel/sys.c |8 
 security/keys/process_keys.c |   21 ++---
 5 files changed, 15 insertions(+), 30 deletions(-)

Index: linux-2.6.19/include/linux/key.h
===
--- linux-2.6.19.orig/include/linux/key.h
+++ linux-2.6.19/include/linux/key.h
@@ -335,18 +335,14 @@ extern void keyring_replace_payload(stru
  */
 extern struct key root_user_keyring, root_session_keyring;
 extern int alloc_uid_keyring(struct user_struct *user,
 struct task_struct *ctx);
 extern void switch_uid_keyring(struct user_struct *new_user);
-extern int copy_keys(unsigned long clone_flags, struct task_struct *tsk);
 extern int copy_thread_group_keys(struct task_struct *tsk);
-extern void exit_keys(struct task_struct *tsk);
 extern void exit_thread_group_keys(struct signal_struct *tg);
 extern int suid_keys(struct task_struct *tsk);
 extern int exec_keys(struct task_struct *tsk);
-extern void key_fsuid_changed(struct task_struct *tsk);
-extern void key_fsgid_changed(struct task_struct *tsk);
 extern void key_init(void);
 
 #define __install_session_keyring(tsk, keyring)\
 ({ \
struct key *old_session = tsk->signal->session_keyring; \
@@ -365,18 +361,14 @@ extern void key_init(void);
 #define key_ref_to_ptr(k)  ({ NULL; })
 #define is_key_possessed(k)0
 #define alloc_uid_keyring(u,c) 0
 #define switch_uid_keyring(u)  do { } while(0)
 #define __install_session_keyring(t, k)({ NULL; })
-#define copy_keys(f,t) 0
 #define copy_thread_group_keys(t)  0
-#define exit_keys(t)   do { } while(0)
 #define exit_thread_group_keys(tg) do { } while(0)
 #define suid_keys(t)   do { } while(0)
 #define exec_keys(t)   do { } while(0)
-#define key_fsuid_changed(t)   do { } while(0)
-#define key_fsgid_changed(t)   do { } while(0)
 #define key_init() do { } while(0)
 
 /* Initial keyrings */
 extern struct key root_user_keyring;
 extern struct key root_session_keyring;
Index: linux-2.6.19/kernel/fork.c
===
--- linux-2.6.19.orig/kernel/fork.c
+++ linux-2.6.19/kernel/fork.c
@@ -1077,14 +1077,12 @@ static struct task_struct *copy_process(
goto bad_fork_cleanup_fs;
if ((retval = copy_signal(clone_flags, p)))
goto bad_fork_cleanup_sighand;
if ((retval = copy_mm(clone_flags, p)))
goto bad_fork_cleanup_signal;
-   if ((retval = copy_keys(clone_flags, p)))
-   goto bad_fork_cleanup_mm;
if ((retval = copy_namespaces(clone_flags, p)))
-   goto bad_fork_cleanup_keys;
+   goto bad_fork_cleanup_mm;
retval = copy_thread(0, clone_flags, stack_start, stack_size, p, regs);
if (retval)
goto bad_fork_cleanup_namespaces;
 
p->set_child_tid = (clone_flags & CLONE_CHILD_SETTID) ? child_tidptr : 
NULL;
@@ -1226,12 +1224,10 @@ static struct task_struct *copy_process(
proc_fork_connector(p);
return p;
 
 bad_fork_cleanup_namespaces:
exit_task_namespaces(p);
-bad_fork_cleanup_keys:
-   exit_keys(p);
 bad_fork_cleanup_mm:
if (p->mm)
mmput(p->mm);
 bad_fork_cleanup_signal:
cleanup_signal(p);
Index: linux-2.6.19/security/keys/process_keys.c
===
--- linux-2.6.19.orig/security/keys/process_keys.c
+++ linux-2.6.19/security/keys/process_keys.c
@@ -15,10 +15,11 @@
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "internal.h"
 
 /* session keyring create vs join semaphore */
 static DEFINE_MUTEX(key_session_mutex);
@@ -276,11 +277,12 @@ int copy_thread_group_keys(struct task_s
 
 /*/
 /*
  * copy the keys for fork
  */
-int copy_keys(unsigned long clone_flags, struct task_struct *tsk)
+static int __task_init copy_keys(unsigned long clone_flags,
+struct task_struct *tsk)
 {
key_check(tsk->thread_keyring);
key_check(tsk->request_key_auth);
 
/* no thread keyring yet */
@@ -290,10 +292,11 @@ int copy_keys(unsigned long clone_flags,
key_get(tsk->request_key_auth);
 
return 0;

[PATCH 00/10] Introduction

This is version 2 of my Task Watchers patches with performance enhancements.

Task watchers calls functions whenever a task forks, execs, changes its
[re][ug]id, or exits.

Task watchers is primarily useful to existing kernel code as a means of making
the code in fork and exit more readable. Kernel code uses these paths by
marking a function as a task watcher much like modules mark their init
functions with module_init(). This improves the readability of copy_process().

The first patch adds the basic infrastructure of task watchers: notification
function calls in the various paths and a table of function pointers to be
called. It uses an ELF section because parts of the table must be gathered
from all over the kernel code and using the linker is easier than resolving
and maintaining complex header interdependencies. Furthermore, using a list
proved to have much higher impact on the size of the patches and was deemed
unacceptable overhead. An ELF table is also ideal because its "readonly" nature 
means that no locking nor list traversal are required.

Subsequent patches adapt existing parts of the kernel to use a task watcher
 -- typically in the fork, clone, and exit paths:

FEATURE (notes)   RELEVANT CONFIG VARIABLE
---
audit [ CONFIG_AUDIT ...  ]
semundo   [ CONFIG_SYSVIPC]
cpusets   [ CONFIG_CPUSETS]
mempolicy [ CONFIG_NUMA   ]
trace irqflags[ CONFIG_TRACE_IRQFLAGS ]
lockdep   [ CONFIG_LOCKDEP]
keys (for processes -- not for thread groups) [ CONFIG_KEYS   ]
process events connector  [ CONFIG_PROC_EVENTS]

TODO:
Mark the task watcher table ELF section read-only. I've tried to "fix"
the .lds files to do this with no success. I'd really appreciate help
from folks familiar with writing linker scripts.

I'm working on three more patches that add support for creating a task
watcher from within a module using an ELF section. They haven't recieved
as much attention since I've been focusing on measuring the performance
impact of these patches.

Changes:
since v2 ():
Added ELF section annotations to the functions handling the events
Added section annotation to the lookup table in kernel/task_watchers.c
Added prefetch hints to the function pointer array walk
Renamed the macros (better?)
Retested the patches
Reduced noise in test results (0.6 - 1%, 2+% previously)

With the last prefetch patch I was able to measure a performance increase in
the range of 0.4 to 2.8%. I sampled 100 times and took the mean for each patch.
Since the numbers seemed to be a source of confusion last time I've tried to
simplify them here:

PatchMean (forks/second)
06925.16 (baseline)
17170.81  task watchers
27100.34  audit
37114.47  semundo
47185.7   cpusets
57121.41  numa-mempolicy
67070.82  irqflags
77012.61  lockdep
87116.54  keys
97116.35  procevents
12   7109.52  prefetch

7109.52 - 6925.16 = +184 forks/second (+2.6%)

So the patch series now actually improves performance a little.

All the numbers from the tests are available if anyone wishes to analyze them
independently.

Please consider for inclusion in -mm.

Cheers,
-Matt Helsley
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Task watchers v2