date:20070409

Re: I give up

2007-04-09 Thread CaT

On Mon, Apr 09, 2007 at 11:34:44PM -0400, Gene Heskett wrote:
> I haven't seen any 200GB for $55 yet, more like $129 & maybe a rebate at 
> Circuit City.  We don't have a Fry's around here.

Wow. 200GB HDs can be had for AUD91 here. I think you need to shop
around. The internet can be your friend. :)

-- 
"To the extent that we overreact, we proffer the terrorists the
greatest tribute."
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: init's children list is long and slows reaping children.

2007-04-09 Thread Jeff Garzik


Eric W. Biederman wrote:

At 10 kernel threads per cpu there may be a little bloat but it isn't
out of control.  It is mostly that we are observing the kernel as
NR_CPUS approaches infinity.  4096 isn't infinity yet but it's easily
a 1000 fold bigger then most people are used to :)



I disagree there is only a little bloat:  the current mechanism in place 
does not scale as NR_CPUS increases, as this thread demonstrates.


Beyond a certain point, on an 8-CPU box, it gets silly.  You certainly 
don't need eight kblockd threads or eight ata threads.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: I give up

2007-04-09 Thread Jeff Garzik


Gene Heskett wrote:
I haven't seen any 200GB for $55 yet, more like $129 & maybe a rebate at 
Circuit City.  We don't have a Fry's around here.



pricewatch.com is your friend :)

Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] sched: align rq to cacheline boundary

2007-04-09 Thread Ingo Molnar

* Andrew Morton <[EMAIL PROTECTED]> wrote:

> > -static DEFINE_PER_CPU(struct rq, runqueues);
> > +static DEFINE_PER_CPU(struct rq, runqueues) cacheline_aligned_in_smp;
> 
> Remember that this can consume up to (linesize-4 * NR_CPUS) bytes, 
> which is rather a lot.

yes - but one (special) issue here is that there are other 'hot' but 
truly per-CPU structures nearby:

 8067e800 D per_cpu__current_kprobe
 8067e820 D per_cpu__kprobe_ctlblk
 8067e960 D per_cpu__mmu_gathers
 8067f960 d per_cpu__runqueues
 80680c60 d per_cpu__cpu_domains
 80680df0 d per_cpu__sched_group_cpus

cpu_domains is being dirtied too (sd->nr_balance_failed, 
sd->last_balanc, etc.) and mmu_gathers too. So while both mmu_gathers 
and cpu_domains are mostly purely per-CPU, runqueue fields can bounce 
around alot and drag those nearby fields with them (and then get dragged 
back due to those nearby fields being used per-CPU again.)

the runqueue is really supposed to be cacheline-isolated at _both_ ends 
- at its beginning and at its end as well.

> And that putting a gap in the per-cpu memory like this will reduce its 
> overall cache-friendliness.

yes - although the per-cpu runqueue overhead is nearly 5K anyway.

> Remember also that the linesize on VSMP is 4k.

that sucks ...

maybe, to mitigate some of the costs, do a special PER_CPU_CACHE_ALIGNED 
area that collects per-cpu fields that also have significant cross-CPU 
use and need cacheline isolation? Such cacheline-aligned variables, if 
collected separately, would pack up more tightly and would cause only 
half of the wasted space.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] sched: align rq to cacheline boundary

2007-04-09 Thread Ingo Molnar


* Siddha, Suresh B <[EMAIL PROTECTED]> wrote:

> Align the per cpu runqueue to the cacheline boundary. This will 
> minimize the number of cachelines touched during remote wakeup.

> -static DEFINE_PER_CPU(struct rq, runqueues);
> +static DEFINE_PER_CPU(struct rq, runqueues) cacheline_aligned_in_smp;

ouch!! Now how did _that_ slip through. The runqueues had been 
cacheline-aligned for ages. Or at least, they were supposed to be.

could you see any improvement in profiles or workloads with this patch 
applied? (just curious - it's an obviously right fix)

> Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]>

Acked-by: Ingo Molnar <[EMAIL PROTECTED]>

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: m68knommu and vmlinux.lds.h RODATA

2007-04-09 Thread Greg Ungerer


Hi Mathieu,

Mathieu Desnoyers wrote:

Is there any particular reason why m68knommu does not use the RODATA
linker script macro defined in asm-generic/vmlinux.lds.h ? It makes it
rather inconvenient to add new RO sections to the kernel.


This is going back some way, but this was the original commit message:

# 
# 03/02/18  [EMAIL PROTECTED]   1.984
# [PATCH] use local RODATA setup for m68knommu linker script
#
# This patch removes the use of the common RODATA define in the m68knommu
# architecture. It cannot be used the same way for the m68knommu target.
# For starters just inserting it here is syntactically wrong. All the read
# only parts are grouped into a single "text" segment, and this is the root
# cause of the problem. So for the m68knommu arch it makes sense to not
# use the generic RODATA setup, but to list them locally.

The problem looks to be the same today.

The current vmlinux.lds.S for m68knommu directs sections out to ROM
or RAM depending on whether we are building to be run from some type
of read only memory (ROM, FLASH, etc) or RAM. The RODATA macro as it
stands doesn't allow that.

Regards
Greg




Greg Ungerer  --  Chief Software Dude   EMAIL: [EMAIL PROTECTED]
Secure Computing CorporationPHONE:   +61 7 3435 2888
825 Stanley St, FAX: +61 7 3891 3630
Woolloongabba, QLD, 4102, Australia WEB: http://www.SnapGear.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] fs: use memclear_highpage_flush to zero page data

2007-04-09 Thread Andrew Morton

On Mon, 09 Apr 2007 21:31:37 -0700 Nate Diller <[EMAIL PROTECTED]> wrote:

> It's very common for file systems to need to zero part or all of a page, the
> simplist way is just to use kmap_atomic() and memset().  There's actually a
> library function in include/linux/highmem.h that does exactly that, but it's
> confusingly named memclear_highpage_flush(), which is descriptive of *how*
> it does the work rather than what the *purpose* is.  So this patch renames
> the function to zero_page_data(), and calls it from the various places that
> currently open code it.
> 
> Compile tested in x86_64.
> 
> signed-off-by: Nate Diller <[EMAIL PROTECTED]>
> 
> ---
> 
>  drivers/block/loop.c |6 ---
>  fs/affs/file.c   |6 ---
>  fs/buffer.c  |   53 
> +--
>  fs/direct-io.c   |8 +---
>  fs/ecryptfs/mmap.c   |   14 +---
>  fs/ext3/inode.c  |   12 +--
>  fs/ext4/inode.c  |   12 +--
>  fs/ext4/writeback.c  |   12 +--
>  fs/gfs2/bmap.c   |6 ---
>  fs/mpage.c   |   11 +-
>  fs/nfs/read.c|   10 ++---
>  fs/nfs/write.c   |2 -
>  fs/ntfs/aops.c   |   32 +++---
>  fs/ntfs/file.c   |   47 +--
>  fs/ocfs2/aops.c  |5 --
>  fs/reiser4/plugin/file/cryptcompress.c   |   19 +--
>  fs/reiser4/plugin/file/file.c|6 ---
>  fs/reiser4/plugin/item/ctail.c   |6 ---
>  fs/reiser4/plugin/item/extent_file_ops.c |   19 +++
>  fs/reiser4/plugin/item/tail.c|8 +---
>  fs/reiserfs/file.c   |   39 ++
>  fs/reiserfs/inode.c  |   13 +--
>  fs/xfs/linux-2.6/xfs_lrw.c   |2 -
>  include/linux/highmem.h  |2 -
>  mm/filemap_xip.c |7 
>  mm/truncate.c|2 -
>  26 files changed, 78 insertions(+), 281 deletions(-)
> 

Not sure that I agree with the name zero_page_data().  People might use it
to, err, zero a page's data.  Whereas it is really only for use against
*user* pages.   zero_user_page(), perhaps.

Plus..

This patch as presented causes me surprising amounts of trouble.  I need to
split it up into

  - core plus filesystems which don't have maintainers (for me to merge)

  - filesystems which do have maintainers (one patch per), for
maintainers to merge.

  - another patch for reiser4, to remain in -mm.

And this is actually not possible to do, because my merge and the subsystem
maintainers' merges will happen at different times.  In the intervening
window, the kernel won't compile.

So instead I need to

  - split off the reiser4 bit

  - get acks from fs maintainers on the rest

  - merge the whole thing in one hit (minus reiser4)

And I can do that, but it is the less preferable option.


The better way to do this merge is:

patch #1:

static inline void memclear_highpage_flush(...) __deprecated
{
zero_user_page(...);
}

patch #2..n:  convert filesystems.


then, when all filesystems are converted, we're ready to remove
memclear_highpage_flush().  But we do that six months later - let's not
screw out-of-tree fs maintainers (and their users) unnecessarily.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: init's children list is long and slows reaping children.

2007-04-09 Thread Torsten Kaiser


On 4/10/07, Andrew Morton <[EMAIL PROTECTED]> wrote:

: root   299  0.0  0.0  0 0 ?S18:51   0:00 [scsi_eh_0]
: root   300  0.0  0.0  0 0 ?S18:51   0:00 [scsi_eh_1]
: root   305  0.0  0.0  0 0 ?S18:51   0:00 [scsi_eh_2]
: root   306  0.0  0.0  0 0 ?S18:51   0:00 [scsi_eh_3]

This machine has one CPU, one sata disk and one DVD drive.  The above is
hard to explain.


One thread per port, not per device.

 796 ?S  0:00  \_ [scsi_eh_0]
 797 ?S  0:00  \_ [scsi_eh_1]
 798 ?S  0:00  \_ [scsi_eh_2]
 819 ?S  0:00  \_ [scsi_eh_3]
 820 ?S  0:00  \_ [scsi_eh_4]
 824 ?S  0:00  \_ [scsi_eh_5]
 825 ?S  0:14  \_ [scsi_eh_6]

bardioc ~ # lsscsi -d
[0:0:0:0]diskATA  ST3160827AS  3.42  /dev/sda[8:0]
[1:0:0:0]diskATA  ST3160827AS  3.42  /dev/sdb[8:16]
[5:0:0:0]diskATA  IBM-DHEA-36480   HE8O  /dev/sdc[8:32]
[5:0:1:0]diskATA  Maxtor 6L160P0   BAH4  /dev/sdd[8:48]
[6:0:0:0]cd/dvd  HL-DT-ST DVDRAM GSA-4081B A100  /dev/sr0[11:0]
bardioc ~ # lsscsi -H
[0]sata_promise
[1]sata_promise
[2]sata_promise
[3]sata_via
[4]sata_via
[5]pata_via
[6]pata_via

The bad is, that there is always a thread, even if the hardware is not
even hotplug capable.
Don't know if the thread is even needed for hotplug...


I don't think it's completely silly to object to all this.  Sure, a kernel
thread is worth 4k in the best case, but I bet they have associated unused
resources and as we've seen, they can cause overhead.


For me its not the 4k that annoy me, but the clutter in ps or top.

Torsten
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/10] I386 sysenter arch pages fix.patch

2007-04-09 Thread Jeremy Fitzhardinge

Zachary Amsden wrote:
> In compat mode, the return value here was uninitialized.
>
> Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>
>
> diff -r 1fda49a076ed arch/i386/kernel/sysenter.c
> --- a/arch/i386/kernel/sysenter.c Fri Apr 06 14:25:09 2007 -0700
> +++ b/arch/i386/kernel/sysenter.c Fri Apr 06 14:27:31 2007 -0700
> @@ -254,7 +254,7 @@ int arch_setup_additional_pages(struct l
>  {
>   struct mm_struct *mm = current->mm;
>   unsigned long addr;
> - int ret;
> + int ret = 0;
>   bool compat;
>  
>   down_write(&mm->mmap_sem);
> -

Hm, OK, but what about just zeroing it in the compat leg of the if()?

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: init's children list is long and slows reaping children.

2007-04-09 Thread Dave Jones

On Tue, Apr 10, 2007 at 09:07:54AM +0400, Alexey Dobriyan wrote:
 > On Mon, Apr 09, 2007 at 07:30:56PM -0700, Andrew Morton wrote:
 > > On Mon, 9 Apr 2007 21:59:12 -0400 Dave Jones <[EMAIL PROTECTED]> wrote:
 > 
 > [possible topic for KS2007]
 > 
 > > >   164 ?S< 0:00 [cqueue/0]
 > > >   165 ?S< 0:00 [cqueue/1]
 > > >
 > > > I'm not even sure wth these are.
 > >
 > > Me either.
 > 
 > drivers/connector/connector.c:
 > 455  dev->cbdev = cn_queue_alloc_dev("cqueue", dev->nls);

Maybe I have apps relying on the connector stuff that I don't
even realise, but ttbomk, nothing actually *needs* this
for 99% of users if I'm not mistaken.

* wonders why he never built this modular..

config PROC_EVENTS
boolean "Report process events to userspace"
depends on CONNECTOR=y


Yay.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: init's children list is long and slows reaping children.

2007-04-09 Thread Alexey Dobriyan

On Mon, Apr 09, 2007 at 07:30:56PM -0700, Andrew Morton wrote:
> On Mon, 9 Apr 2007 21:59:12 -0400 Dave Jones <[EMAIL PROTECTED]> wrote:

[possible topic for KS2007]

> >   164 ?S< 0:00 [cqueue/0]
> >   165 ?S< 0:00 [cqueue/1]
> >
> > I'm not even sure wth these are.
>
> Me either.

drivers/connector/connector.c:
455 dev->cbdev = cn_queue_alloc_dev("cqueue", dev->nls);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Make ati_remote button repeat sensitivity soft-configurable (V2)

2007-04-09 Thread Dmitry Torokhov

On Thursday 05 April 2007 10:23, Karl Pickett wrote:
> Dmitry, please use this instead of my previous patch.  Thanks to
> Vincent for the code review , fixes, and testing.
> 

Applied to the input tree; thank you Karl and Vincent.

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] sched: align rq to cacheline boundary

2007-04-09 Thread Ravikiran G Thirumalai

On Mon, Apr 09, 2007 at 03:17:05PM -0700, Siddha, Suresh B wrote:
> On Mon, Apr 09, 2007 at 02:53:09PM -0700, Ravikiran G Thirumalai wrote:
> > On Mon, Apr 09, 2007 at 01:40:57PM -0700, Andrew Morton wrote:
> > > On Mon, 9 Apr 2007 11:08:53 -0700
> > > "Siddha, Suresh B" <[EMAIL PROTECTED]> wrote:
> 
> Kiran, can you educate me when I am supposed to use
> cacheline_aligned_in_smp
> Vs
> __cacheline_aligned_in_smp ?

As far as my understanding goes, the four underscore version is for 
aligning members/elements within a data structure, and the two underscore 
version is for aligning statically defined variables.
The dual underscore version places the variable in a separate section meant
for cacheline aligned variables, so that there is no false sharing on the
cacheline with a consecutive datum.  For regular statically defined data
structures, the latter has to be used, but since your patch uses per-cpu data, 
which is already in a separate section, you had to use the former I guess.

> 
> > As for the (linesize-4 * NR_CPUS) wastage, maybe we can place the cacheline 
> > aligned per-cpu data in another section, just like we do with 
> > .data.cacheline_aligned section, but keep this new section between
> > __percpu_start and __percpu_end?
> 
> Yes. But that will still waste some memory in the new section, if the data
> elements are not multiples of 4k.

Yes.  But the wastage depends on the data structure now being aligned rather
than the structure that happened to be there before.  You cannot not lose
memory while padding I guess :).  But padding for per-cpu data seems a bit 
odd and I am not sure if it is worth it for 0.5% gain.

Thanks,
Kiran
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc5-mm3 - no boot, "address not 2M aligned"

2007-04-09 Thread Helge Hafting

Sorry, that was a wrong .config file.  Here is the right one, form
the amd64 box:
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.21-rc5-mm3
# Sat Mar 31 09:01:57 2007
#
CONFIG_X86_64=y
CONFIG_64BIT=y
CONFIG_X86=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_ZONE_DMA32=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_CMPXCHG=y
CONFIG_EARLY_PRINTK=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_DMI=y
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_BUG=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SWAP=y
CONFIG_SWAP_PREFETCH=y
CONFIG_SYSVIPC=y
# CONFIG_IPC_NS is not set
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
# CONFIG_TASKSTATS is not set
# CONFIG_UTS_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not set
# CONFIG_BLK_DEV_INITRD is not set
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_EMBEDDED=y
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_SLAB=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
# CONFIG_SLOB is not set
CONFIG_PAGE_GROUP_BY_MOBILITY=y

#
# Loadable module support
#
# CONFIG_MODULES is not set

#
# Process debugging support
#
CONFIG_UTRACE=y
CONFIG_PTRACE=y

#
# Block layer
#
CONFIG_BLOCK=y
# CONFIG_BLK_DEV_IO_TRACE is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
# CONFIG_IOSCHED_AS is not set
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"

#
# Processor type and features
#
CONFIG_X86_PC=y
# CONFIG_X86_VSMP is not set
CONFIG_MK8=y
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
# CONFIG_GENERIC_CPU is not set
CONFIG_X86_L1_CACHE_BYTES=64
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_INTERNODE_CACHE_BYTES=64
CONFIG_X86_TSC=y
CONFIG_X86_GOOD_APIC=y
# CONFIG_MICROCODE is not set
CONFIG_X86_MSR=y
# CONFIG_X86_CPUID is not set
CONFIG_X86_IO_APIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_MTRR=y
# CONFIG_SMP is not set
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
# CONFIG_DISCONTIGMEM_MANUAL is not set
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
# CONFIG_SPARSEMEM_STATIC is not set
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_RESOURCES_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_ADAPTIVE_READAHEAD=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_IOMMU=y
# CONFIG_CALGARY_IOMMU is not set
CONFIG_SWIOTLB=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_INTEL=y
CONFIG_X86_MCE_AMD=y
CONFIG_KEXEC=y
# CONFIG_CRASH_DUMP is not set
# CONFIG_RELOCATABLE is not set
CONFIG_PHYSICAL_START=0x10
CONFIG_SECCOMP=y
# CONFIG_CC_STACKPROTECTOR is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
CONFIG_HZ_300=y
# CONFIG_HZ_1000 is not set
CONFIG_HZ=300
CONFIG_REORDER=y
CONFIG_K8_NB=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_ISA_DMA_API=y

#
# Power management options
#
CONFIG_PM=y
# CONFIG_PM_LEGACY is not set
# CONFIG_PM_DEBUG is not set
# CONFIG_PM_SYSFS_DEPRECATED is not set
# CONFIG_SOFTWARE_SUSPEND is not set

#
# ACPI (Advanced Configuration and Power Interface) Support
#
CONFIG_ACPI=y
# CONFIG_ACPI_SLEEP is not set
# CONFIG_ACPI_PROCFS is not set
# CONFIG_ACPI_AC is not set
# CONFIG_ACPI_BATTERY is not set
CONFIG_ACPI_BUTTON=y
# CONFIG_ACPI_VIDEO is not set
CONFIG_ACPI_FAN=y
# CONFIG_ACPI_DOCK is not set
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_THERMAL=y
# CONFIG_ACPI_ASUS is not set
# CONFIG_ACPI_TOSHIBA is not set
# CONFIG_ACPI_CUSTOM_DSDT is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
# CONFIG_ACPI_DEBUG is not set
CONFIG_ACPI_EC=y
CONFIG_ACPI_POWER=y
CONFIG_ACPI_SYSTEM=y
CONFIG_X86_PM_TIMER=y
# CONFIG_ACPI_CONTAINER is not set
# CONFIG_ACPI_SBS is not set

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_TABLE=y
# CONFIG_CPU_FREQ_DEBUG is not set
CONFIG_CPU_FREQ_STAT=y
CONFIG_CPU_FREQ_STAT_DETAILS=y
CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_US

Re: [QUICKLIST 1/4] Quicklists for page table pages V5

2007-04-09 Thread Benjamin Herrenschmidt

On Mon, 2007-04-09 at 11:25 -0700, Christoph Lameter wrote:

> Quicklists for page table pages V5

Looks interesting, but unfortunately not very useful at this point for
powerpc unless you remove the assumption that quicklists contain
pages...

On powerpc, we currently use kmem cache slabs (though that isn't
terribly node friendly) whose sizes depend on the page size.

For a 4K page size kernel, we have 4 level page tables and use 2 caches,
PTE and PGD pages are 4K (thus are PAGE_SIZE'd), and PMD & PUD are 1K.

For a 64K page size kernel, we have 3 level page tables and we use 3
caches: a PGD pages are 128 bytes (yeah, not big heh...), our pmd
pages are 32K (half a page) and PTE pages are PAGE_SIZE (64K).

Cheers,
Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] fs: use simple_prepare_write to zero page data

2007-04-09 Thread Nate Diller

It's common for file systems to need to zero data on either side of a write,
if a page is not Uptodate during prepare_write.  It just so happens that
simple_prepare_write() in libfs.c does exactly that, so we can avoid
duplication and just call that function to zero page data.

Compile tested on x86_64.

signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

 cifs/file.c   |9 +
 ext4/writeback.c  |   17 +
 reiser4/plugin/item/extent_file_ops.c |   13 +++--
 3 files changed, 5 insertions(+), 34 deletions(-)

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/cifs/file.c 
linux-2.6.21-rc6-mm1-test/fs/cifs/file.c
--- linux-2.6.21-rc6-mm1/fs/cifs/file.c 2007-04-09 18:25:37.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/cifs/file.c2007-04-09 18:23:16.0 
-0700
@@ -1955,14 +1955,7 @@ static int cifs_prepare_write(struct fil
 * We don't need to read data beyond the end of the file.
 * zero it, and set the page uptodate
 */
-   void *kaddr = kmap_atomic(page, KM_USER0);
-
-   if (from)
-   memset(kaddr, 0, from);
-   if (to < PAGE_CACHE_SIZE)
-   memset(kaddr + to, 0, PAGE_CACHE_SIZE - to);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   simple_prepare_write(file, page, from, to);
SetPageUptodate(page);
} else if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
/* might as well read a page, it is fast enough */
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/ext4/writeback.c 
linux-2.6.21-rc6-mm1-test/fs/ext4/writeback.c
--- linux-2.6.21-rc6-mm1/fs/ext4/writeback.c2007-04-09 18:32:52.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/fs/ext4/writeback.c   2007-04-09 
18:23:16.0 -0700
@@ -819,21 +819,6 @@ int ext4_wb_writepages(struct address_sp
return 0;
 }
 
-static void ext4_wb_clear_page(struct page *page, int from, int to)
-{
-   void *kaddr;
-
-   if (to < PAGE_CACHE_SIZE || from > 0) {
-   kaddr = kmap_atomic(page, KM_USER0);
-   if (PAGE_CACHE_SIZE > to)
-   memset(kaddr + to, 0, PAGE_CACHE_SIZE - to);
-   if (0 < from)
-   memset(kaddr, 0, from);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
-   }
-}
-
 int ext4_wb_prepare_write(struct file *file, struct page *page,
  unsigned from, unsigned to)
 {
@@ -863,7 +848,7 @@ int ext4_wb_prepare_write(struct file *f
/* this block isn't allocated yet, reserve space */
wb_debug("reserve space for new block\n");
page->private = 1;
-   ext4_wb_clear_page(page, from, to);
+   simple_prepare_write(file, page, from, to);
ClearPageMappedToDisk(page);
} else { 
/* block is already mapped, so no need to reserve */
diff -urpN -X dontdiff 
linux-2.6.21-rc6-mm1/fs/reiser4/plugin/item/extent_file_ops.c 
linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/item/extent_file_ops.c
--- linux-2.6.21-rc6-mm1/fs/reiser4/plugin/item/extent_file_ops.c   
2007-04-09 18:32:52.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/reiser4/plugin/item/extent_file_ops.c  
2007-04-09 18:31:34.0 -0700
@@ -1040,16 +1040,9 @@ ssize_t reiser4_write_extent(struct file
BUG_ON(get_current_context()->trans->atom != NULL);
 
lock_page(page);
-   if (!PageUptodate(page) && to_page != PAGE_CACHE_SIZE) {
-   void *kaddr;
-
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr, 0, page_off);
-   memset(kaddr + page_off + to_page, 0,
-  PAGE_CACHE_SIZE - (page_off + to_page));
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
-   }
+   if (!PageUptodate(page) && to_page != PAGE_CACHE_SIZE)
+   simple_prepare_write(file, page, page_off,
+page_off + to_page);
 
written = filemap_copy_from_user(page, page_off, buf, to_page);
flush_dcache_page(page);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] fs: use memclear_highpage_flush to zero page data

2007-04-09 Thread Nate Diller

It's very common for file systems to need to zero part or all of a page, the
simplist way is just to use kmap_atomic() and memset().  There's actually a
library function in include/linux/highmem.h that does exactly that, but it's
confusingly named memclear_highpage_flush(), which is descriptive of *how*
it does the work rather than what the *purpose* is.  So this patch renames
the function to zero_page_data(), and calls it from the various places that
currently open code it.

Compile tested in x86_64.

signed-off-by: Nate Diller <[EMAIL PROTECTED]>

---

 drivers/block/loop.c |6 ---
 fs/affs/file.c   |6 ---
 fs/buffer.c  |   53 +--
 fs/direct-io.c   |8 +---
 fs/ecryptfs/mmap.c   |   14 +---
 fs/ext3/inode.c  |   12 +--
 fs/ext4/inode.c  |   12 +--
 fs/ext4/writeback.c  |   12 +--
 fs/gfs2/bmap.c   |6 ---
 fs/mpage.c   |   11 +-
 fs/nfs/read.c|   10 ++---
 fs/nfs/write.c   |2 -
 fs/ntfs/aops.c   |   32 +++---
 fs/ntfs/file.c   |   47 +--
 fs/ocfs2/aops.c  |5 --
 fs/reiser4/plugin/file/cryptcompress.c   |   19 +--
 fs/reiser4/plugin/file/file.c|6 ---
 fs/reiser4/plugin/item/ctail.c   |6 ---
 fs/reiser4/plugin/item/extent_file_ops.c |   19 +++
 fs/reiser4/plugin/item/tail.c|8 +---
 fs/reiserfs/file.c   |   39 ++
 fs/reiserfs/inode.c  |   13 +--
 fs/xfs/linux-2.6/xfs_lrw.c   |2 -
 include/linux/highmem.h  |2 -
 mm/filemap_xip.c |7 
 mm/truncate.c|2 -
 26 files changed, 78 insertions(+), 281 deletions(-)

---

diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/drivers/block/loop.c 
linux-2.6.21-rc6-mm1-test/drivers/block/loop.c
--- linux-2.6.21-rc6-mm1/drivers/block/loop.c   2007-04-09 17:24:00.0 
-0700
+++ linux-2.6.21-rc6-mm1-test/drivers/block/loop.c  2007-04-09 
18:18:23.0 -0700
@@ -244,17 +244,13 @@ static int do_lo_send_aops(struct loop_d
transfer_result = lo_do_transfer(lo, WRITE, page, offset,
bvec->bv_page, bv_offs, size, IV);
if (unlikely(transfer_result)) {
-   char *kaddr;
-
/*
 * The transfer failed, but we still write the data to
 * keep prepare/commit calls balanced.
 */
printk(KERN_ERR "loop: transfer error block %llu\n",
   (unsigned long long)index);
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr + offset, 0, size);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_page_data(page, offset, size);
}
flush_dcache_page(page);
ret = aops->commit_write(file, page, offset,
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/affs/file.c 
linux-2.6.21-rc6-mm1-test/fs/affs/file.c
--- linux-2.6.21-rc6-mm1/fs/affs/file.c 2007-04-09 17:23:48.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/affs/file.c2007-04-09 18:18:23.0 
-0700
@@ -628,11 +628,7 @@ static int affs_prepare_write_ofs(struct
return err;
}
if (to < PAGE_CACHE_SIZE) {
-   char *kaddr = kmap_atomic(page, KM_USER0);
-
-   memset(kaddr + to, 0, PAGE_CACHE_SIZE - to);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+   zero_page_data(page, to, PAGE_CACHE_SIZE - to);
if (size > offset + to) {
if (size < offset + PAGE_CACHE_SIZE)
tmp = size & ~PAGE_CACHE_MASK;
diff -urpN -X dontdiff linux-2.6.21-rc6-mm1/fs/buffer.c 
linux-2.6.21-rc6-mm1-test/fs/buffer.c
--- linux-2.6.21-rc6-mm1/fs/buffer.c2007-04-09 17:24:03.0 -0700
+++ linux-2.6.21-rc6-mm1-test/fs/buffer.c   2007-04-09 18:18:23.0 
-0700
@@ -1862,13 +1862,8 @@ static int __block_prepare_write(struct 
if (block_start >= to)
break;
if (buffer_new(bh)) {
-   void *kaddr;
-
clear_buffer_new(bh);
-   kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr+block_start, 0, bh->b_size);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
+

Re: Preemption Broken: centrino_target busted under SMP on 2.6.20.4

2007-04-09 Thread Dave Jones

On Mon, Apr 09, 2007 at 08:51:36PM -0700, Andrew Morton wrote:
 > On Mon, 9 Apr 2007 23:08:23 -0400 Dave Jones <[EMAIL PROTECTED]> wrote:
 > 
 > >  > This whole file is going away in .22, and we have a viable alternative 
 > > in
 > >  > .21 (acpi-cpufreq), so I'm not overly worried about fixing this up
 > >  > given it only shows up in debug kernels, especially at this stage in 
 > > -rc.
 > >  > 
 > >  > (Yeah, it's a cop-out, but unless someone with more interest in this 
 > > problem
 > >  >  steps up, I've bigger fishes to fry).
 > > 
 > > One last try...
 > > (I didn't think too long about this, so this might be equally busted,
 > >  but if so, see comment above).
 > 
 > Yes, I expect that should squish the warnings.  It looks all racy wrt cpu 
 > hotplug
 > and against async set_cpus_allowed(), but if those are our worst problems, 
 > we're
 > good.

It probably needs a couple more preempt_enable()'s sprinkled throughout the 
function
to take care of the break's. I also missed a goto case.
Meh, this cure is as bad as the disease.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RFC: "mlx4" drivers for Mellanox ConnectX HCAs

2007-04-09 Thread Roland Dreier

I'd like to announce preliminary versions of a set of "mlx4" drivers
for Mellanox's new ConnectX InfiniBand/10 gigabit ethernet adapters.
(These are Mellanox's 4th generation of adapters, hence the mlx4 name)
Because these adapters can operate as both an ethernet NIC and an
InfiniBand HCA (at the same time!), the driver is split up into three
pieces:

mlx4_core: Basic support for hardware, managing resources, sending
  commands to firmware, etc.  Lives in drivers/net/mlx4 (so that
  it gets built in a natural way for ethernet support, even if
  CONFIG_INFINIBAND=n) and exports its API in include/linux/mlx4.

mlx4_ib: InfiniBand HCA driver, sits between the IB midlayer and
  mlx4_core.  Lives in drivers/infiniband/hw/mlx4.

mlx4_eth: Ethernet NIC driver, sits between networking stack and
  mlx4_core.  Also lives in drivers/net/mlx4.  This is just a stub
  right now, because firmware support for ethernet mode is still
  too immature.

In fact, the ConnectX hardware has support for fibre channel stuff
too, so in the future there may also be an FC HBA driver layered on
top of mlx4_core as well.

I will post a full set of patches for review via email once I've had a
chance to clean things up and split things into reasonable sized
chunks (the full patch is > 300 KB right now), but for those who are
interested, you can grab the connectx branch of my infiniband.git tree:

git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git connectx

I've also put this code in my for-mm branch, so it should appear in
the next -mm kernel whenever Andrew has time to get one out.

Any and all comments are appreciated; my current plan is to merge the
mlx4_core and mlx4_ib drivers for 2.6.22, and I hope that mlx4_eth
will be ready for 2.6.23.

I've tried to flag areas that are not fully implemented or still need
work, and there are quite a few places that need cleanup, but the
driver is at least able to run IP-over-InfiniBand and some basic
userspace direct access tests.

Speaking of direct access, a preliminary version of a userspace driver
that works with libibverbs is available from:

git://git.kernel.org/pub/scm/libs/infiniband/libmlx4.git


Thanks to the crew at Mellanox for lots of help with sample code and
debugging, as well as early access to the hardware!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Lower HD transfer rate with NCQ enabled?

2007-04-09 Thread Mark Lord


Phillip Susi wrote:

Mark Lord wrote:

Phillip Susi wrote:

Sounds like this is a serious bug in the WD firmware.


For personal systems, yes.  For servers, probably not a bug.

Disabling readahead means faster execution queued commands,
since it doesn't have to "linger" and do unwanted read-ahead.
So this bug is a "feature" for random access servers.
And a big nuisance for everything else.


I think you misunderstand the bug.  The bug is not that the drive 
disables internal readahead; the bug is that host supplied readahead 
requests work so horribly.  It is a good thing that the drive allows the 
host to control the readahead, but something is wrong if the drive's 
readahead is WAY better than any the host can perform.


Well, in this case, it has already been determined that switching
to a different Linux I/O scheduler gives back most of the performance.

But the drive can do readahead better than the OS:  With the OS,
everything is broken up into discrete requests, whereas with the 
drive firmware, it can continuously update it's readahead projections,

even in the midst of a command.  So it does have an advantage.

But again, only the WD Raptor seems to have serious problems here.
Other drives cope well with readahead + NCQ just fine.

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Preemption Broken: centrino_target busted under SMP on 2.6.20.4

2007-04-09 Thread Andrew Morton

On Mon, 9 Apr 2007 23:08:23 -0400 Dave Jones <[EMAIL PROTECTED]> wrote:

>  > This whole file is going away in .22, and we have a viable alternative in
>  > .21 (acpi-cpufreq), so I'm not overly worried about fixing this up
>  > given it only shows up in debug kernels, especially at this stage in -rc.
>  > 
>  > (Yeah, it's a cop-out, but unless someone with more interest in this 
> problem
>  >  steps up, I've bigger fishes to fry).
> 
> One last try...
> (I didn't think too long about this, so this might be equally busted,
>  but if so, see comment above).
> 
>   Dave
> 
> diff --git a/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c 
> b/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
> index f43b987..38e31ce 100644
> --- a/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
> +++ b/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
> @@ -720,11 +720,13 @@ static int centrino_target (struct cpufreq_policy 
> *policy,
>   cpu_set(j, set_mask);
>  
>   set_cpus_allowed(current, set_mask);
> + preempt_disable();
>   if (unlikely(!cpu_isset(smp_processor_id(), set_mask))) {
>   dprintk("couldn't limit to CPUs in this domain\n");
>   retval = -EAGAIN;
>   if (first_cpu) {
>   /* We haven't started the transition yet. */
> + preempt_enable();
>   goto migrate_end;
>   }
>   break;
> @@ -765,6 +767,7 @@ static int centrino_target (struct cpufreq_policy *policy,
>   break;
>  
>   cpu_set(j, covered_cpus);
> + preempt_enable();
>   }
>  

Yes, I expect that should squish the warnings.  It looks all racy wrt cpu 
hotplug
and against async set_cpus_allowed(), but if those are our worst problems, we're
good.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: I give up

2007-04-09 Thread Gene Heskett

On Monday 09 April 2007, Dave Jones wrote:
>On Mon, Apr 09, 2007 at 06:22:10PM -0400, Dave Dillow wrote:
> > On Mon, 2007-04-09 at 23:35 +0200, Jan Engelhardt wrote:
> > > On Apr 9 2007 15:38, Gene Heskett wrote:
> > > >On Monday 09 April 2007, H. Peter Anvin wrote:
> > > >>Jan Engelhardt wrote:
> > > >>> dm is on 254 for me.. in opensuse with a 2.6.20 that is. I
> > > >>> wonder why it even moves around. However, even then, those who
> > > >>> use udev and device names rather than (major,minor) tuples
> > > >>> should not have any problem.
> > > >>
> > > >>It moves around because someone at some point thought it was a
> > > >> great idea to assign dynamic majors to core functionality.
> > > >
> > > >What were they smoking, I want some of that!
> > >
> > > Do you actually use udev?
> >
> > udev doesn't help the problem he is having (and he is using it, since
> > he is using Fedora).
>
>However, it also doesn't explain what the point is of backing up /dev
>when it's dynamically created.
>
>   Dave

I'm glad you mentioned that, I was, leftovers from RH7.3 probably. :(

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Windows Tip of the Day:
Add DEVICE=FNGRCROS.SYS to your CONFIG.SYS file.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] VM throttling: Start writeback at dirty_writeback_start_ratio

2007-04-09 Thread Andrew Morton

On Tue, 10 Apr 2007 12:04:54 +0900 Tomoki Sekiyama <[EMAIL PROTECTED]> wrote:

> Hello Andrew,
> Thank you for your comments.
> 
> Andrew Morton wrote:
> > On Tue, 03 Apr 2007 19:46:04 +0900
> > Tomoki Sekiyama <[EMAIL PROTECTED]> wrote:
> >> If % of Dirty+Writeback > `dirty_writeback_start_ratio', generators of
> >> dirty pages start writeback of dirty pages by themselves. At that time,
> >> these processes are not blocked in balance_dirty_pages(), but they may
> >> be blocked if the write-requests-queue of the written disk is full
> >> (that is, the length of the queue > `nr_requests'). By this behavior,
> >> we can throttle only processes which write to the disks with heavy load,
> >> and can allow processes to write to the other disks without blocking.
> >>
> >> If % of Dirty+Writeback > `dirty_ratio', generators of dirty pages
> >> are throttled as current Linux does, not to fill up memory with dirty
> >> pages.
> > 
> > Does this actually solve the problem?  If the request queue is sufficiently
> > large (relative to the various dirty-memory thresholds) then I'd expect
> > that a heavy-writer will be able to very quickly take the total
> > dirty+writeback memory up to the dirty_ratio (should be renamed
> > throttle_threshold, but it's too late for that).
> > 
> > I suspect the reason why this patch was successful in your testing was
> > because dirty_start_writeback_ratio happens to exceed the size of the disk
> > request queues, so the heavy writer is getting stuck on disk request queue
> > exhaustion.
> > 
> > But that won't work if we have a lot of processes writing to a lot of
> > disks, and it won't work if the request queue size is large, or if the
> > dirty-memory thresholds are small (relative to the request queue size).
> > 
> > Do the patches still work after
> > `echo 1 > /sys/block/sda/queue/nr_requests'?
> 
> As you pointed out, this patch has no effect if nr_requests is too large,
> because it distinguishes heavy disks depending on the length of the write-
> requests queue of each disk.
> 
> This patch is for providing the system administrators with room to avoid
> the problem by adjusting parameters appropriately, rather than an automatic
> solution for any possible situations.
> 
> Could you please tell me some situations in which we should set nr_request
> that large?

It's probably not a sensible thing to do.  But it's _possible_ to do, and
the fact that the kernel will again misbehave indicates an overall weakness
in our design.

And there are other ways in which this situation could occur:

- The request queue has a fixed size (it is not scaled according to the
  amount of memory in the machine).  So if the machine is small enough
  (say, 64MB) then the problem can happen.

- The machine could have a large number of disks

- The queue size of 128 is in units of "number of requests".  But it is
  independent upon the _size_ of those requests.  If someone comes up with
  a driver which wants to use 16MB-sized requests, the problem will again
  reoccur.

For all these sorts of reasons, we have learned that we should avoid any
dependence upon request queue exhaustion within the VM/VFS/etc.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/8] [Intel IOMMU] Some generic search functions required to lookup device relationships.

2007-04-09 Thread Greg KH

On Mon, Apr 09, 2007 at 02:55:54PM -0700, Ashok Raj wrote:
> +/*
> + * find the upstream PCIE-to-PCI bridge of a PCI device
> + * if the device is PCIE, return NULL
> + * if the device isn't connected to a PCIE bridge (that is its parent is a
> + * legacy PCI bridge and the bridge is directly connected to bus 0), return 
> its
> + * parent
> + */
> +struct pci_dev *
> +pci_find_upstream_pcie_bridge(struct pci_dev *pdev)
> +{
> + struct pci_dev *tmp = NULL;
> +
> + if (pdev->is_pcie)
> + return NULL;
> + while (1) {
> + if (!pdev->bus->self)
> + break;
> + pdev = pdev->bus->self;
> + /* a p2p bridge */
> + if (!pdev->is_pcie) {
> + tmp = pdev;
> + continue;
> + }
> + /* PCI device should connect to a PCIE bridge */
> + BUG_ON(pdev->pcie_type != PCI_EXP_TYPE_PCI_BRIDGE);
> + return pdev;
> + }
> +
> + return tmp;
> +}

No locking while you walk up the bus list?

> --- linux-2.6.21-rc5.orig/include/linux/pci.h 2007-04-03 04:30:51.0 
> -0700
> +++ linux-2.6.21-rc5/include/linux/pci.h  2007-04-03 06:58:58.0 
> -0700
> @@ -126,6 +126,7 @@
>   unsigned short  subsystem_device;
>   unsigned intclass;  /* 3 bytes: (base,sub,prog-if) */
>   u8  hdr_type;   /* PCI header type (`multi' flag masked 
> out) */
> + u8  pcie_type;  /* PCI-E device/port type */
>   u8  rom_base_reg;   /* which config register controls the 
> ROM */
>   u8  pin;/* which interrupt pin this device uses 
> */
>  
> @@ -168,6 +169,7 @@
>   unsigned intmsi_enabled:1;
>   unsigned intmsix_enabled:1;
>   unsigned intis_managed:1;
> + unsigned intis_pcie:1;

Do you really need both fields?  Wouldn't just the pcie_type one work
(with some NOT_PCIE type being set for it if it isn't I suppose.)

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/8] [Intel IOMMU] ACPI support for Intel Virtualization Technology for Directed I/O

2007-04-09 Thread Len Brown

On Monday 09 April 2007 17:55, Ashok Raj wrote:
> This patch contains basic ACPI parsing and enumeration support.

AFAICS, ACPI supplies the envelope which delivers the table,
and ACPI has some convenience structure definitions for that
table in include/acpi/actbl1.h (primarily for the acpixtract table 
dis-assembler),
but ACPI is otherwise not involved in IOMMU support.

Indeed, one might argue that all new functions in this patch series with
"acpi..." would more appropriately be called "pci...", since a cursory
scan of the IOMMU spec seems to suggest it is specific to PCI.

So on first blush, it looks like the only call to a function that begins with
"acpi" in this patch series should be acpi_get_table() from some IOMMU
specific file outside of drivers/acpi,
and the only modification to any code with an "acpi" in the file path or 
filename should
be any updates to the convenience structure definitions in acpitbl1.h

thanks,
-Len
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: I give up

2007-04-09 Thread Gene Heskett

On Monday 09 April 2007, Jan Engelhardt wrote:
>On Apr 9 2007 15:38, Gene Heskett wrote:
>>On Monday 09 April 2007, H. Peter Anvin wrote:
>>>Jan Engelhardt wrote:
 dm is on 254 for me.. in opensuse with a 2.6.20 that is. I wonder
 why it even moves around. However, even then, those who use udev and
 device names rather than (major,minor) tuples should not have any
 problem.
>>>
>>>It moves around because someone at some point thought it was a great
>>>idea to assign dynamic majors to core functionality.
>>
>>What were they smoking, I want some of that!
>
>Do you actually use udev?
>
Yes, and it works just fine, for non-LVM filesystems.
>
>
>Jan



-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
"The greatest warriors are the ones who fight for peace."
-- Holly Near
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: I give up

2007-04-09 Thread John Stoffel


Gene> I haven't seen any 200GB for $55 yet, more like $129 & maybe a
Gene> rebate at Circuit City.  We don't have a Fry's around here.

Newegg.com, 320Gb for $85 ea, plus shipping, plus a SATA controller
board, just under $200.  I'm happy.  And thanks for the SATA
controller work Jeff!

John
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: I give up

2007-04-09 Thread Gene Heskett

On Monday 09 April 2007, Jeff Garzik wrote:
>Gene Heskett wrote:
>> For those of you with big tapes that can hold a complete dump of every
>> partition (and partitions is the only way dump works in case some have
>> forgotten), go ahead and use dump/restore.  Tar quite simply, allows
>> one to break his backup files down into small enough pieces that a
>> tape drive that's only 20% of the system drives size is totally
>> usable.  I ran dds2 tapes for a long time, and it wasn't at all
>> unusual to have amanda fill those to the 95% or better mark every
>> night for a week running, without ever hitting EOT.
>
>Wow, people still use tapes for backup?
>
>With current hard drive prices (200GB @ US$55, 500GB @ US$120) you can
>just keep buying hard drives :)
>
>Surely tape price/GB is higher than hard drive price/GB...
>
>   Jeff
I haven't seen any 200GB for $55 yet, more like $129 & maybe a rebate at 
Circuit City.  We don't have a Fry's around here.


-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
You will be called upon to help a friend in trouble.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.21-rc6

2007-04-09 Thread Dmitry Torokhov

On Sunday 08 April 2007 19:09, Andrew Morton wrote:
> driver core:
> 
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/update-documentation-driver-model-platformtxt.patch
> 

We should not encourage using platform_device_register_simple as we want
to obsolete this function.

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2.6.21 3/3] cxgb3 - missing CPL hanler and register setting.

2007-04-09 Thread divy

From: Divy Le Ray <[EMAIL PROTECTED]>

Remove specific CPL handler. 
Add missing CPL handler.
Add missing register setting when the interface is brought up.

Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]>
---

 drivers/net/cxgb3/cxgb3_main.c|2 ++
 drivers/net/cxgb3/cxgb3_offload.c |   14 ++
 drivers/net/cxgb3/regs.h  |6 ++
 3 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c
index d6eb982..67b4b21 100644
--- a/drivers/net/cxgb3/cxgb3_main.c
+++ b/drivers/net/cxgb3/cxgb3_main.c
@@ -770,6 +770,8 @@ static int cxgb_up(struct adapter *adap)
if (err)
goto out;
 
+   t3_write_reg(adap, A_ULPRX_TDDP_PSZ, V_HPZ0(PAGE_SHIFT - 12));
+   
err = setup_sge_qsets(adap);
if (err)
goto out;
diff --git a/drivers/net/cxgb3/cxgb3_offload.c 
b/drivers/net/cxgb3/cxgb3_offload.c
index eed7a48..4864924 100644
--- a/drivers/net/cxgb3/cxgb3_offload.c
+++ b/drivers/net/cxgb3/cxgb3_offload.c
@@ -743,17 +743,6 @@ static int do_act_establish(struct t3cde
}
 }
 
-static int do_set_tcb_rpl(struct t3cdev *dev, struct sk_buff *skb)
-{
-   struct cpl_set_tcb_rpl *rpl = cplhdr(skb);
-
-   if (rpl->status != CPL_ERR_NONE)
-   printk(KERN_ERR
-  "Unexpected SET_TCB_RPL status %u for tid %u\n",
-  rpl->status, GET_TID(rpl));
-   return CPL_RET_BUF_DONE;
-}
-
 static int do_trace(struct t3cdev *dev, struct sk_buff *skb)
 {
struct cpl_trace_pkt *p = cplhdr(skb);
@@ -1215,7 +1204,8 @@ void __init cxgb3_offload_init(void)
t3_register_cpl_handler(CPL_CLOSE_CON_RPL, do_hwtid_rpl);
t3_register_cpl_handler(CPL_ABORT_REQ_RSS, do_abort_req_rss);
t3_register_cpl_handler(CPL_ACT_ESTABLISH, do_act_establish);
-   t3_register_cpl_handler(CPL_SET_TCB_RPL, do_set_tcb_rpl);
+   t3_register_cpl_handler(CPL_SET_TCB_RPL, do_hwtid_rpl);
+   t3_register_cpl_handler(CPL_GET_TCB_RPL, do_hwtid_rpl);
t3_register_cpl_handler(CPL_RDMA_TERMINATE, do_term);
t3_register_cpl_handler(CPL_RDMA_EC_STATUS, do_hwtid_rpl);
t3_register_cpl_handler(CPL_TRACE_PKT, do_trace);
diff --git a/drivers/net/cxgb3/regs.h b/drivers/net/cxgb3/regs.h
index f8be41c..e5a5534 100644
--- a/drivers/net/cxgb3/regs.h
+++ b/drivers/net/cxgb3/regs.h
@@ -1234,9 +1234,15 @@
 
 #define A_ULPRX_ISCSI_TAGMASK 0x514
 
+#define S_HPZ00
+#define M_HPZ00xf
+#define V_HPZ0(x) ((x) << S_HPZ0)
+#define G_HPZ0(x) (((x) >> S_HPZ0) & M_HPZ0)
+
 #define A_ULPRX_TDDP_LLIMIT 0x51c
 
 #define A_ULPRX_TDDP_ULIMIT 0x520
+#define A_ULPRX_TDDP_PSZ 0x528
 
 #define A_ULPRX_STAG_LLIMIT 0x52c
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2.6.21 1/3] cxgb3 - avoid deadlock with mac watchdog

2007-04-09 Thread divy

From: Divy Le Ray <[EMAIL PROTECTED]>

Fix a deadlock when the interface s configured down and 
the watchdog tack is sleeping on rtnl_lock.

Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]>
---

 drivers/net/cxgb3/cxgb3_main.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c
index 26240fd..c6ebe25 100644
--- a/drivers/net/cxgb3/cxgb3_main.c
+++ b/drivers/net/cxgb3/cxgb3_main.c
@@ -2119,7 +2119,9 @@ static void check_t3b2_mac(struct adapte
 {
int i;
 
-   rtnl_lock();  /* synchronize with ifdown */
+   if (!rtnl_trylock())/* synchronize with ifdown */
+   return;
+
for_each_port(adapter, i) {
struct net_device *dev = adapter->port[i];
struct port_info *p = netdev_priv(dev);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2.6.21 2/3] cxgb3 - MAC watchdog update

2007-04-09 Thread divy

From: Divy Le Ray <[EMAIL PROTECTED]>

The MAC watchdog was failing if the peer interface was brought down.

Signed-off-by: Divy Le Ray <[EMAIL PROTECTED]>
---

 drivers/net/cxgb3/common.h |7 ++-
 drivers/net/cxgb3/cxgb3_main.c |   10 +---
 drivers/net/cxgb3/xgmac.c  |  107 ++--
 3 files changed, 89 insertions(+), 35 deletions(-)

diff --git a/drivers/net/cxgb3/common.h b/drivers/net/cxgb3/common.h
index 97128d8..8d13796 100644
--- a/drivers/net/cxgb3/common.h
+++ b/drivers/net/cxgb3/common.h
@@ -478,8 +478,11 @@ struct cmac {
struct adapter *adapter;
unsigned int offset;
unsigned int nucast;/* # of address filters for unicast MACs */
-   unsigned int tcnt;
-   unsigned int xcnt;
+   unsigned int tx_tcnt;
+   unsigned int tx_xcnt;
+   u64 tx_mcnt;
+   unsigned int rx_xcnt;
+   u64 rx_mcnt;
unsigned int toggle_cnt;
unsigned int txen;
struct mac_stats stats;
diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c
index c6ebe25..d6eb982 100644
--- a/drivers/net/cxgb3/cxgb3_main.c
+++ b/drivers/net/cxgb3/cxgb3_main.c
@@ -194,15 +194,13 @@ void t3_os_link_changed(struct adapter *
 
if (link_stat != netif_carrier_ok(dev)) {
if (link_stat) {
-   t3_set_reg_field(adapter,
-A_XGM_TXFIFO_CFG + mac->offset,
-F_ENDROPPKT, 0);
+   t3_mac_enable(mac, MAC_DIRECTION_RX);
netif_carrier_on(dev);
} else {
netif_carrier_off(dev);
-   t3_set_reg_field(adapter,
-A_XGM_TXFIFO_CFG + mac->offset,
-F_ENDROPPKT, F_ENDROPPKT);
+   pi->phy.ops->power_down(&pi->phy, 1);
+   t3_mac_disable(mac, MAC_DIRECTION_RX);
+   t3_link_start(&pi->phy, mac, &pi->link_config);
}
 
link_report(dev);
diff --git a/drivers/net/cxgb3/xgmac.c b/drivers/net/cxgb3/xgmac.c
index 94aaff0..a506792 100644
--- a/drivers/net/cxgb3/xgmac.c
+++ b/drivers/net/cxgb3/xgmac.c
@@ -367,7 +367,8 @@ int t3_mac_enable(struct cmac *mac, int
int idx = macidx(mac);
struct adapter *adap = mac->adapter;
unsigned int oft = mac->offset;
-
+   struct mac_stats *s = &mac->stats;
+   
if (which & MAC_DIRECTION_TX) {
t3_write_reg(adap, A_XGM_TX_CTRL + oft, F_TXEN);
t3_write_reg(adap, A_TP_PIO_ADDR, A_TP_TX_DROP_CFG_CH0 + idx);
@@ -376,10 +377,16 @@ int t3_mac_enable(struct cmac *mac, int
t3_set_reg_field(adap, A_TP_PIO_DATA, 1 << idx, 1 << idx);
 
t3_write_reg(adap, A_TP_PIO_ADDR, A_TP_TX_DROP_CNT_CH0 + idx);
-   mac->tcnt = (G_TXDROPCNTCH0RCVD(t3_read_reg(adap,
-   A_TP_PIO_DATA)));
-   mac->xcnt = (G_TXSPI4SOPCNT(t3_read_reg(adap,
-   A_XGM_TX_SPI4_SOP_EOP_CNT)));
+   mac->tx_mcnt = s->tx_frames;
+   mac->tx_tcnt = (G_TXDROPCNTCH0RCVD(t3_read_reg(adap,
+   A_TP_PIO_DATA)));
+   mac->tx_xcnt = (G_TXSPI4SOPCNT(t3_read_reg(adap,
+   A_XGM_TX_SPI4_SOP_EOP_CNT +
+   oft)));
+   mac->rx_mcnt = s->rx_frames;
+   mac->rx_xcnt = (G_TXSPI4SOPCNT(t3_read_reg(adap,
+   A_XGM_RX_SPI4_SOP_EOP_CNT +
+   oft)));
mac->txen = F_TXEN;
mac->toggle_cnt = 0;
}
@@ -392,6 +399,7 @@ int t3_mac_disable(struct cmac *mac, int
 {
int idx = macidx(mac);
struct adapter *adap = mac->adapter;
+   int val;
 
if (which & MAC_DIRECTION_TX) {
t3_write_reg(adap, A_XGM_TX_CTRL + mac->offset, 0);
@@ -401,44 +409,89 @@ int t3_mac_disable(struct cmac *mac, int
t3_set_reg_field(adap, A_TP_PIO_DATA, 1 << idx, 1 << idx);
mac->txen = 0;
}
-   if (which & MAC_DIRECTION_RX)
+   if (which & MAC_DIRECTION_RX) {
+   t3_set_reg_field(mac->adapter, A_XGM_RESET_CTRL + mac->offset,
+F_PCS_RESET_, 0);
+   msleep(100);
t3_write_reg(adap, A_XGM_RX_CTRL + mac->offset, 0);
+   val = F_MAC_RESET_;
+   if (is_10G(adap))
+   val |= F_PCS_RESET_;
+   else if (uses_xaui(adap))
+   val |= F_PCS_RESET_ | F_XG2G_RESET_;
+   else
+   val |= F_RGMII_RESET_ | F_XG2G_RESET_;
+   t3_write_reg

Re: [PATCH 1/2] VM throttling: Start writeback at dirty_writeback_start_ratio

2007-04-09 Thread Tomoki Sekiyama

Hello Andrew,
Thank you for your comments.

Andrew Morton wrote:
> On Tue, 03 Apr 2007 19:46:04 +0900
> Tomoki Sekiyama <[EMAIL PROTECTED]> wrote:
>> If % of Dirty+Writeback > `dirty_writeback_start_ratio', generators of
>> dirty pages start writeback of dirty pages by themselves. At that time,
>> these processes are not blocked in balance_dirty_pages(), but they may
>> be blocked if the write-requests-queue of the written disk is full
>> (that is, the length of the queue > `nr_requests'). By this behavior,
>> we can throttle only processes which write to the disks with heavy load,
>> and can allow processes to write to the other disks without blocking.
>>
>> If % of Dirty+Writeback > `dirty_ratio', generators of dirty pages
>> are throttled as current Linux does, not to fill up memory with dirty
>> pages.
> 
> Does this actually solve the problem?  If the request queue is sufficiently
> large (relative to the various dirty-memory thresholds) then I'd expect
> that a heavy-writer will be able to very quickly take the total
> dirty+writeback memory up to the dirty_ratio (should be renamed
> throttle_threshold, but it's too late for that).
> 
> I suspect the reason why this patch was successful in your testing was
> because dirty_start_writeback_ratio happens to exceed the size of the disk
> request queues, so the heavy writer is getting stuck on disk request queue
> exhaustion.
> 
> But that won't work if we have a lot of processes writing to a lot of
> disks, and it won't work if the request queue size is large, or if the
> dirty-memory thresholds are small (relative to the request queue size).
> 
> Do the patches still work after
> `echo 1 > /sys/block/sda/queue/nr_requests'?

As you pointed out, this patch has no effect if nr_requests is too large,
because it distinguishes heavy disks depending on the length of the write-
requests queue of each disk.

This patch is for providing the system administrators with room to avoid
the problem by adjusting parameters appropriately, rather than an automatic
solution for any possible situations.

Could you please tell me some situations in which we should set nr_request
that large?

Thanks,
-- 
Tomoki Sekiyama
Hitachi, Ltd., Systems Development Laboratory
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2.6.21 0/3] cxgb3 - bug fixes

2007-04-09 Thread Divy Le Ray


Hi Jeff,

I'm submitting a set of bug fixes for inclusion in 2.6.21.
The patches are built against Linus'git tree.

Here is a brief description:
- Avoid deadlock when the interface is brought down
- Rework the MAC hang workaround since it was failing
 if the peer interface was brought down
- add missing RNIC register setting and CPL handler

Cheers,
Divy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Preemption Broken: centrino_target busted under SMP on 2.6.20.4

2007-04-09 Thread Dave Jones

On Mon, Apr 09, 2007 at 11:05:00PM -0400, Dave Jones wrote:

 > This whole file is going away in .22, and we have a viable alternative in
 > .21 (acpi-cpufreq), so I'm not overly worried about fixing this up
 > given it only shows up in debug kernels, especially at this stage in -rc.
 > 
 > (Yeah, it's a cop-out, but unless someone with more interest in this problem
 >  steps up, I've bigger fishes to fry).

One last try...
(I didn't think too long about this, so this might be equally busted,
 but if so, see comment above).

Dave

diff --git a/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c 
b/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
index f43b987..38e31ce 100644
--- a/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
+++ b/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
@@ -720,11 +720,13 @@ static int centrino_target (struct cpufreq_policy *policy,
cpu_set(j, set_mask);
 
set_cpus_allowed(current, set_mask);
+   preempt_disable();
if (unlikely(!cpu_isset(smp_processor_id(), set_mask))) {
dprintk("couldn't limit to CPUs in this domain\n");
retval = -EAGAIN;
if (first_cpu) {
/* We haven't started the transition yet. */
+   preempt_enable();
goto migrate_end;
}
break;
@@ -765,6 +767,7 @@ static int centrino_target (struct cpufreq_policy *policy,
break;
 
cpu_set(j, covered_cpus);
+   preempt_enable();
}
 
for_each_cpu_mask(k, online_policy_cpus) {

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Preemption Broken: centrino_target busted under SMP on 2.6.20.4

2007-04-09 Thread Dave Jones

On Mon, Apr 09, 2007 at 07:41:42PM -0700, Andrew Morton wrote:

 > >  > This means we'll call set_cpus_allowed() while in atomic state, but
 > >  > set_cpus_allowed() does sleepy stuff.
 > > 
 > > Puzzled. This diff shouldn't change anything about the context we're in
 > > when we call set_cpus_allowed, and as we're not seeing warnings now,
 > > I'm not sure what I'm missing?
 > 
 > set_cpus_allowed() will only sleep in special circumstances: when we're
 > telling the target task that it is not allwed to run on a CPU upon which it
 > is presently executing.  So it needs to be synchronously migrated off that
 > CPU, which requires that the set_cpus_allowed() caller block.
 > 
 > You're probably just not hitting that case.

Oh, now I see it. The set_cpus_allowed that was inside the preempt stuff
I was adding. (that the diff elided).  Yeah, that's a problem. Bugger.

 > Probably we should have a might_sleep() in set_cpus_allowed(), although
 > there might be callers who are guaranteeed to never hit that case and who
 > might legitimately want special treatment to avoid the warning.

This whole file is going away in .22, and we have a viable alternative in
.21 (acpi-cpufreq), so I'm not overly worried about fixing this up
given it only shows up in debug kernels, especially at this stage in -rc.

(Yeah, it's a cop-out, but unless someone with more interest in this problem
 steps up, I've bigger fishes to fry).

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: init's children list is long and slows reaping children.

2007-04-09 Thread Linus Torvalds

On Mon, 9 Apr 2007, Andrew Morton wrote:
> 
> >10 ?S< 0:00 [khelper]
> 
> That one's needed to parent the call_usermodehelper() apps.  I don't think
> it does anything else.  We used to use keventd for this but that had some
> problem whcih I forget.

I think it was one of a long series of deadlocks. 

Using a "keventd" for many different things sounds clever and nice, but 
then sucks horribly when one event triggers another event, and they depend 
on each other. Solution: use independent threads for the events.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Preemption Broken: centrino_target busted under SMP on 2.6.20.4

2007-04-09 Thread Andrew Morton

On Mon, 9 Apr 2007 22:31:08 -0400 Dave Jones <[EMAIL PROTECTED]> wrote:

> On Mon, Apr 09, 2007 at 05:26:51PM -0700, Andrew Morton wrote:
>  > On Thu, 5 Apr 2007 16:50:34 -0400
>  > Dave Jones <[EMAIL PROTECTED]> wrote:
>  > 
>  > > diff --git a/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c 
> b/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
>  > > index f43b987..824d0a2 100644
>  > > --- a/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
>  > > +++ b/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
>  > > @@ -708,6 +708,7 @@ static int centrino_target (struct cpufreq_policy 
> *policy,
>  > >  saved_mask = current->cpus_allowed;
>  > >  first_cpu = 1;
>  > >  cpus_clear(covered_cpus);
>  > > +preempt_disable();
>  > >  for_each_cpu_mask(j, online_policy_cpus) {
>  > >  /*
>  > >   * Support for SMP systems.
>  > > @@ -798,6 +799,7 @@ static int centrino_target (struct cpufreq_policy 
> *policy,
>  > >  }
>  > >  
>  > >  migrate_end:
>  > > +preempt_enable();
>  > >  set_cpus_allowed(current, saved_mask);
>  > >  return 0;
>  > >  }
>  > 
>  > This means we'll call set_cpus_allowed() while in atomic state, but
>  > set_cpus_allowed() does sleepy stuff.
> 
> Puzzled. This diff shouldn't change anything about the context we're in
> when we call set_cpus_allowed, and as we're not seeing warnings now,
> I'm not sure what I'm missing?

set_cpus_allowed() will only sleep in special circumstances: when we're
telling the target task that it is not allwed to run on a CPU upon which it
is presently executing.  So it needs to be synchronously migrated off that
CPU, which requires that the set_cpus_allowed() caller block.

You're probably just not hitting that case.

Probably we should have a might_sleep() in set_cpus_allowed(), although
there might be callers who are guaranteeed to never hit that case and who
might legitimately want special treatment to avoid the warning.

> [which may be 'the obvious', you wouldn't believe the evening I've had
>  involving gas leaks and noxious fumes. Wheee, floaty head.]

Yeah, I get a lot of patches like that.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ANNOUNCE] sdparm 1.01

2007-04-09 Thread Douglas Gilbert

sdparm is a command line utility designed to get and set
SCSI device parameters (cf hdparm for ATA disks). The
parameters are held in mode pages. Apart from SCSI devices
(e.g. disks, tapes and enclosures) sdparm can be used on
any device that uses a SCSI command set. Virtually all CD/DVD
drives use the SCSI MMC set irrespective of the transport.
sdparm also can decode VPD pages including the device
identification page. Commands to start and stop the media;
load and unload removable media and some other housekeeping
functions are supported. sdparm supports both the linux
kernel 2.4 and 2.6 series with ports to FreeBSD and Windows.

ChangeLog for sdparm-1.01 [20070405]
  - add element address assignment mode page (smc)
  - improve error handling in lk 2.4 series mapping to
sg devices
  - add configure.ac rule for mingw (Windows)
- include  to use PRIx64 instead of %llx
  - add LUICLR bit to extended inquiry VPD page
  - correct some headers for C++ inclusion
- fix some C code to compile under C++
  - fix bug when unusual transport or vendor given
  - add a Fujitsu vendor mode page
  - add "initial priority" to control extension mpage
  - add "disconnect-reconnect" mpage to generic list;
there are still transport specific versions
  - extend block limits VPD page (sbc3r09)
  - sync with sg3_utils-1.24 pass-through code

For more information and downloads see:
http://www.torque.net/sg/sdparm.html

A release announcement has been sent to freshmeat.net .

Doug Gilbert
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 9/10] Vmi timer update.patch

2007-04-09 Thread Chris Wright

* Zachary Amsden ([EMAIL PROTECTED]) wrote:
> diff -r c02ab981c99c arch/i386/kernel/vmiclock.c
> --- /dev/null Thu Jan 01 00:00:00 1970 +
> +++ b/arch/i386/kernel/vmiclock.c Mon Apr 09 15:47:17 2007 -0700
> @@ -0,0 +1,318 @@
> +/*
> + * VMI paravirtual timer support routines.
> + *
> + * Copyright (C) 2007, VMware, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + * NON INFRINGEMENT.  See the GNU General Public License for more
> + * details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include "io_ports.h"
> +
> +#define VMI_ONESHOT  (VMI_ALARM_IS_ONESHOT  | VMI_CYCLES_REAL | 
> vmi_get_alarm_wiring())
> +#define VMI_PERIODIC (VMI_ALARM_IS_PERIODIC | VMI_CYCLES_REAL | 
> vmi_get_alarm_wiring())
> +
> +static DEFINE_PER_CPU(struct clock_event_device, local_events);
> +
> +static inline u32 vmi_counter(u32 flags)
> +{
> + /* Given VMI_ONESHOT or VMI_PERIODIC, return the corresponding
> +  * cycle counter. */
> + return flags & VMI_ALARM_COUNTER_MASK;
> +}
> +
> +/* paravirt_ops.get_wallclock = vmi_get_wallclock */

Style nit, these pv_ops.foo = vmi_foo style comments aren't really useful.

> +unsigned long vmi_get_wallclock(void)
> +{
> + unsigned long long wallclock;
> + wallclock = vmi_timer_ops.get_wallclock(); // nsec
> + (void)do_div(wallclock, 10);   // sec
> +
> + return wallclock;
> +}
> +
> +/* paravirt_ops.set_wallclock = vmi_set_wallclock */
> +int vmi_set_wallclock(unsigned long now)
> +{
> + return 0;
> +}
> +
> +/* paravirt_ops.get_scheduled_cycles = vmi_get_sched_cycles */
> +unsigned long long vmi_get_sched_cycles(void)
> +{
> + return vmi_timer_ops.get_cycle_counter(VMI_CYCLES_AVAILABLE);
> +}
> +
> +/* paravirt_ops.get_cpu_khz = vmi_cpu_khz */
> +unsigned long vmi_cpu_khz(void)
> +{
> + unsigned long long khz;
> + khz = vmi_timer_ops.get_cycle_frequency();
> + (void)do_div(khz, 1000);
> + return khz;
> +}
> +
> +static inline unsigned int vmi_get_timer_vector(void)
> +{
> +#ifdef CONFIG_X86_IO_APIC
> + return FIRST_DEVICE_VECTOR;
> +#else
> + return FIRST_EXTERNAL_VECTOR;
> +#endif
> +}
> +
> +/** vmi clockchip */
> +#ifdef CONFIG_X86_LOCAL_APIC
> +static unsigned int startup_timer_irq(unsigned int irq)
> +{
> + unsigned long val = apic_read(APIC_LVTT);
> + apic_write(APIC_LVTT, vmi_get_timer_vector());
> +
> + return (val & APIC_SEND_PENDING);
> +}
> +
> +static void mask_timer_irq(unsigned int irq)
> +{
> + unsigned long val = apic_read(APIC_LVTT);
> + apic_write(APIC_LVTT, val | APIC_LVT_MASKED);
> +}
> +
> +static void unmask_timer_irq(unsigned int irq)
> +{
> + unsigned long val = apic_read(APIC_LVTT);
> + apic_write(APIC_LVTT, val & ~APIC_LVT_MASKED);
> +}
> +
> +static void ack_timer_irq(unsigned int irq)
> +{
> + ack_APIC_irq();
> +}
> +
> +static struct irq_chip vmi_chip __read_mostly = {
> + .name   = "VMI-LOCAL",
> + .startup= startup_timer_irq,
> + .mask   = mask_timer_irq,
> + .unmask = unmask_timer_irq,
> + .ack= ack_timer_irq
> +};
> +#endif
> +
> +/** vmi clockevent */
> +#define VMI_ALARM_WIRED_IRQ00x
> +#define VMI_ALARM_WIRED_LVTT0x0001
> +static int vmi_wiring = VMI_ALARM_WIRED_IRQ0;
> +
> +static inline int vmi_get_alarm_wiring(void)
> +{
> + return vmi_wiring;  
> +}
> +
> +static void vmi_timer_set_mode(enum clock_event_mode mode,
> +struct clock_event_device *evt)
> +{
> + cycle_t now, cycles_per_hz;
> + BUG_ON(!irqs_disabled());
> +
> + switch (mode) {
> + case CLOCK_EVT_MODE_ONESHOT:
> + break;
> + case CLOCK_EVT_MODE_PERIODIC:
> + cycles_per_hz = vmi_timer_ops.get_cycle_frequency();
> + (void)do_div(cycles_per_hz, HZ);
> + now = 
> vmi_timer_ops.get_cycle_counter(vmi_counter(VMI_PERIODIC));
> + vmi_timer_ops.set_alarm(VMI_PERIODIC, now, cycles_per_hz);
> + break;
> + case CLOCK_EVT_MODE_UNUSED:
> + case CLOCK_EVT_MODE_SHUTDOWN:
> + switch (evt->mode) {
> + case CLOCK_EVT_MODE_ONESHOT:
> + vmi_timer_ops.cancel_alarm

Re: Ten percent test

2007-04-09 Thread Mike Galbraith

On Mon, 2007-04-09 at 07:38 +0200, Mike Galbraith wrote:

> I don't think you can have very much effect on latency using nice with
> SD once the CPU is fully utilized.  See below.
> 
> /*
>  * This contains a bitmap for each dynamic priority level with empty slots
>  * for the valid priorities each different nice level can have. It allows
>  * us to stagger the slots where differing priorities run in a way that
>  * keeps latency differences between different nice levels at a minimum.
>  * ie, where 0 means a slot for that priority, priority running from left to
>  * right:
>  * nice -20 
>  * nice -10 1001000100100010001001000100010010001000
>  * nice   0 0101010101010101010101010101010101010101
>  * nice   5 1101011010110101101011010110101101011011
>  * nice  10 0110111011011101110110111011101101110111
>  * nice  15 0101101101011011
>  * nice  19 1110
>  */
> 
> Nice allocates bandwidth, but as long as the CPU is busy, tasks always
> proceed downward in priority until they hit the expired array.  That's
> the design.

There's another aspect of this that may require some thought - kernel
threads.  As load increases, so does rotation length.  Would you really
want CPU hogs routinely preempting house-keepers under load?

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: init's children list is long and slows reaping children.

2007-04-09 Thread Andrew Morton

On Mon, 9 Apr 2007 21:59:12 -0400 Dave Jones <[EMAIL PROTECTED]> wrote:

> On Mon, Apr 09, 2007 at 05:23:39PM -0700, Andrew Morton wrote:
> 
>  > I suspect there are quite a few kernel threads which don't really need to
>  > be threads at all: the code would quite happily work if it was changed to
>  > use keventd, via schedule_work() and friends.  But kernel threads are
>  > somewhat easier to code for.
> 
> Perhaps too easy.  We have a bunch of kthreads sitting around that afaict,
> are there 'just in case', not because they're actually in use.

This is fun.

>10 ?S< 0:00 [khelper]

That one's needed to parent the call_usermodehelper() apps.  I don't think
it does anything else.  We used to use keventd for this but that had some
problem whcih I forget.  (Who went and misnamed keventd to "events", too? 
Nobody calls it "events", and with good reason)

> Why doesn't this get spawned when it needs to?
> 
>   164 ?S< 0:00 [cqueue/0]
>   165 ?S< 0:00 [cqueue/1]
> 
> I'm not even sure wth these are.

Me either.

> ...
>

: root 3  0.0  0.0  0 0 ?S18:51   0:00 [watchdog/0]

That's the softlockup detector.  Confusingly named to look like a, err,
watchdog.  Could probably use keventd.

: root 5  0.0  0.0  0 0 ?S18:51   0:00 [khelper]

That's there to parent the kthread_create()d threads.  Could presumably use
khelper.

: root   152  0.0  0.0  0 0 ?S18:51   0:00 [ata/0]

Does it need to be per-cpu?

: root   153  0.0  0.0  0 0 ?S18:51   0:00 [ata_aux]

That's a single-threaded workqueue handler.  Perhaps could use keventd.

: root   299  0.0  0.0  0 0 ?S18:51   0:00 [scsi_eh_0]
: root   300  0.0  0.0  0 0 ?S18:51   0:00 [scsi_eh_1]
: root   305  0.0  0.0  0 0 ?S18:51   0:00 [scsi_eh_2]
: root   306  0.0  0.0  0 0 ?S18:51   0:00 [scsi_eh_3]

This machine has one CPU, one sata disk and one DVD drive.  The above is
hard to explain.

: root   319  0.0  0.0  0 0 ?S18:51   0:00 [pccardd]

hm.

: root   331  0.0  0.0  0 0 ?S18:51   0:00 [kpsmoused]

hm.

: root   337  0.0  0.0  0 0 ?S18:51   0:00 [kedac]

hm.  I didn't know that the Vaio had EDAC.

: root  1173  0.0  0.0  0 0 ?S18:51   0:00 [khpsbpkt]

I can't even pronounce that.

: root  1354  0.0  0.0  0 0 ?S18:51   0:00 [knodemgrd_0]

OK, I do have 1394 hardware, but it hasn't been used.

: root  1636  0.0  0.0  0 0 ?S18:52   0:00 [kondemand/0]

I blame davej.

>  > otoh, a lot of these inefficeincies are probably down in scruffy drivers
>  > rather than in core or top-level code.
> 
> You say scruffy, but most of the proliferation of kthreads comes
> from code written in the last few years.  Compare the explosion of kthreads
> we see coming from 2.4 to 2.6. It's disturbing, and I don't see it
> slowing down at all.
> 
> On the 2-way box I grabbed the above ps output from, I end up with 69 
> kthreads.
> It doesn't surprise me at all that bigger iron is starting to see issues.
> 

Sure.

I don't think it's completely silly to object to all this.  Sure, a kernel
thread is worth 4k in the best case, but I bet they have associated unused
resources and as we've seen, they can cause overhead.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Preemption Broken: centrino_target busted under SMP on 2.6.20.4

2007-04-09 Thread Dave Jones

On Mon, Apr 09, 2007 at 05:26:51PM -0700, Andrew Morton wrote:
 > On Thu, 5 Apr 2007 16:50:34 -0400
 > Dave Jones <[EMAIL PROTECTED]> wrote:
 > 
 > > diff --git a/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c 
 > > b/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
 > > index f43b987..824d0a2 100644
 > > --- a/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
 > > +++ b/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
 > > @@ -708,6 +708,7 @@ static int centrino_target (struct cpufreq_policy 
 > > *policy,
 > >saved_mask = current->cpus_allowed;
 > >first_cpu = 1;
 > >cpus_clear(covered_cpus);
 > > +  preempt_disable();
 > >for_each_cpu_mask(j, online_policy_cpus) {
 > >/*
 > > * Support for SMP systems.
 > > @@ -798,6 +799,7 @@ static int centrino_target (struct cpufreq_policy 
 > > *policy,
 > >}
 > >  
 > >  migrate_end:
 > > +  preempt_enable();
 > >set_cpus_allowed(current, saved_mask);
 > >return 0;
 > >  }
 > 
 > This means we'll call set_cpus_allowed() while in atomic state, but
 > set_cpus_allowed() does sleepy stuff.

Puzzled. This diff shouldn't change anything about the context we're in
when we call set_cpus_allowed, and as we're not seeing warnings now,
I'm not sure what I'm missing?

[which may be 'the obvious', you wouldn't believe the evening I've had
 involving gas leaks and noxious fumes. Wheee, floaty head.]

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?

2007-04-09 Thread Dmitry Torokhov

On Monday 09 April 2007 18:36, Helge Hafting wrote:
> On Fri, Apr 06, 2007 at 10:37:12PM -0400, Dmitry Torokhov wrote:
> > On Friday 06 April 2007 20:54, Helge Hafting wrote:
> > > I have an usb  touchscreen (egalax variety) that works with
> > > the 2.6.18 kernel supplied by debian.
> > > 
> > > It fails when I compile 2.6.21-rc5-mm4, tuned to the machine
> > > in question.  Unlike the debian kernel, this kernel don't use
> > > modules in order to save boot time.
> > > 
> > > The strange thing is, 2.6.21-rc5-mm4 recognizes the device.
> > > dmesg says things like 
> > > usb 3-2: Manufacturer: eGalac Inc.
> > > usb 3-2: Product: USB TouchController
> > > 
> > > and a lot more. Unlike 2.6.18, it never gets around to say
> > > "usbcore: registered new driver usbtouchscreen"
> > > which seems to indicate a problem.
> > > usbcore registers several other drivers, such as usbserial and pl2303
> > > that makes the gps work. It also registers other drivers like
> > > usb-storage,usbfs,hub,libusual,hiddev,usbhid.  But not usbtouchscreen.
> > > I believe I have turned on every config option for usb touchscreen,
> > > this should not be missing.
> > > 
> > > Is there something wrong, or could there be a seemingly unrelated option
> > > that I need to turn on?
> > 
> > Please make sure that you have CONFIG_USB_TOUCHSCREEN turned on.
> > 
> Unfortunately, I have:
> CONFIG_USB_TOUCHSCREEN=y
> CONFIG_USB_TOUCHSCREEN_EGALAX=y
> 
> Anything else I may have missed?
>

Hmm, I am concerned because not only you don't have an input device created,
you don't even see the driver being registered with usbcore. Could you please
try booting with debug_initcall to see with what error code usbtouchscreen
initialization fails?


-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: ACPI Thermal Zone for CPU0 does not update after hibernation

2007-04-09 Thread Walter Francis

> On Mon, 9 Apr 2007 21:47:38 -0400 (EDT) "Walter Francis"
> <[EMAIL PROTECTED]> wrote:
>
>> > On Fri, 6 Apr 2007 16:13:55 -0400 (EDT)
>> > "Walter Francis" <[EMAIL PROTECTED]> wrote:
>>
>> >> After hibernating, the CPU0 thermal zone never updates.  It will stay at
>> 59C
>> >> forever for example.

> Yes, the above looks like the correct change.

> Strange.  Maybe a timing thing.

> Yes it does.  What happens of you stop and restart the daemon a second
> time?

Same thing, although I reverted the change and it still happens, so I must
have overlooked it before.  Which is odd, because it's right at the end of my
suspend script.  Looked to see if the cpuspeed daemon has updated recently and
it hasn't.  Strange. :)

> fwiw, /proc/acpi/thermal_zone/ATF0/temperature seems to do the right thing
> here.

SMP?  Not sure if it's directly tied to SMP, my other laptop I never could get
to behave when suspending.

If there's any more info I can provide or patches to try, let me know.

-- 
Walter Francis

khayts.us
theblackmoor.net
unlimitedphoto.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] intel_agp: fix G965 GTT size detect

2007-04-09 Thread Dave Jones

On Tue, Apr 10, 2007 at 09:42:48AM +0800, Wang Zhenyu wrote:
 > 
 > Dave, 
 > 
 > On G965, I810_PGETBL_CTL is a mmio offset, but we wrongly take it
 > as pci config space offset in detecting GTT size. This one line patch
 > fixs this. 

Thanks, applied, and pushed on to Linus for .21

Be careful with trailing whitespace btw, git-applymbox complains loudly
when merging such patches. (I fixed it up by hand this time)

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: I give up

2007-04-09 Thread Dave Jones

On Mon, Apr 09, 2007 at 09:40:46PM -0400, Dave Dillow wrote:
 
 > > However, it also doesn't explain what the point is of backing up /dev
 > > when it's dynamically created.
 > 
 > It's not /dev he's backing up -- its /home, /usr, and others. GNU tar
 > saves the device and inode numbers from the {,l}stat() call on each file
 > and decides it is a new file if either number changes from run to run.

Ah apologies, I jumped into the thread halfway, and misunderstood the problem.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] CONFIG_PACKET_MMAP should depend on MMU

2007-04-09 Thread Wu, Bryan

On Mon, 2007-04-09 at 16:08 -0400, Robin Getz wrote:
> On Mon 9 Apr 2007 14:43, David Miller pondered:
> > From: David Miller <[EMAIL PROTECTED]>
> > Date: Mon, 09 Apr 2007 09:55:23 -0700 (PDT)
> > >
> > > I will apply this patch.
> >
> > Actually I won't, the other comments in this thread make a lot
> > of sense, we should try to make it build and work just as we
> > do for other similar things on no-MMU.
> 
> Great. Any pointers in the right direction to remove the requirement of 
> vm_insert_page() in the noMMU case?
> 
> Thanks
> -Robin

OK, we will try to find other way to fix this bug according Robin's
comments.

Thanks
-Bryan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: init's children list is long and slows reaping children.

2007-04-09 Thread Dave Jones

On Mon, Apr 09, 2007 at 05:23:39PM -0700, Andrew Morton wrote:

 > I suspect there are quite a few kernel threads which don't really need to
 > be threads at all: the code would quite happily work if it was changed to
 > use keventd, via schedule_work() and friends.  But kernel threads are
 > somewhat easier to code for.

Perhaps too easy.  We have a bunch of kthreads sitting around that afaict,
are there 'just in case', not because they're actually in use.

   10 ?S< 0:00 [khelper]

Why doesn't this get spawned when it needs to?

  164 ?S< 0:00 [cqueue/0]
  165 ?S< 0:00 [cqueue/1]

I'm not even sure wth these are.

  166 ?S< 0:00 [ksuspend_usbd]

Just in case I decide to suspend ? Sounds.. handy.
But why not spawn this just after we start a suspend?

  209 ?S< 0:00 [aio/0]
  210 ?S< 0:00 [aio/1]

I'm sure I'd appreciate these a lot more if I had any AIO using apps.

  364 ?S< 0:00 [kpsmoused]

I never did figure out why this was a thread.

  417 ?S< 0:00 [scsi_eh_1]
  418 ?S< 0:00 [scsi_eh_2]
  419 ?S< 5:28 [scsi_eh_3]
  426 ?S< 0:00 [scsi_eh_4]
  427 ?S< 0:00 [scsi_eh_5]
  428 ?S< 0:00 [scsi_eh_6]
  429 ?S< 0:00 [scsi_eh_7]

Just in case my 7-1 in card reader gets an error.
(Which is unlikely on at least 6 of the slots as evidenced by the runtime 
column.
 -- Though now I'm curious as to why the error handler was running so much given
I've not experienced any errors.).
This must be a fun one of on huge diskfarms.

  884 ?S< 0:00 [kgameportd]

Just in case I ever decide to plug something into my soundcard.

 2179 ?S< 0:00 [kmpathd/0]
 2180 ?S< 0:00 [kmpathd/1]
 2189 ?S< 0:00 [kmirrord]

Just loading the modules starts up the threads, regardless
of whether you use them. (Not sure why they're getting loaded,
something for me to look into)

 3246 ?S< 0:00 [krfcommd]

I don't even *have* bluetooth hardware.
(Yes, the module shouldn't have loaded, but that's another battle..)

 > otoh, a lot of these inefficeincies are probably down in scruffy drivers
 > rather than in core or top-level code.

You say scruffy, but most of the proliferation of kthreads comes
from code written in the last few years.  Compare the explosion of kthreads
we see coming from 2.4 to 2.6. It's disturbing, and I don't see it
slowing down at all.

On the 2-way box I grabbed the above ps output from, I end up with 69 kthreads.
It doesn't surprise me at all that bigger iron is starting to see issues.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: ACPI Thermal Zone for CPU0 does not update after hibernation

2007-04-09 Thread Andrew Morton

On Mon, 9 Apr 2007 21:47:38 -0400 (EDT) "Walter Francis" <[EMAIL PROTECTED]> 
wrote:

> > On Fri, 6 Apr 2007 16:13:55 -0400 (EDT)
> > "Walter Francis" <[EMAIL PROTECTED]> wrote:
> 
> >> After hibernating, the CPU0 thermal zone never updates.  It will stay at 
> >> 59C
> >> forever for example.
> 
> > Yeah, John spotted a bug in there the other day.
> >
> > Does this fix it?
> >
> > --- a/drivers/acpi/thermal.c~acpi-thermal-fix-mod_timer-interval
> > +++ a/drivers/acpi/thermal.c
> > @@ -758,7 +758,8 @@ static void acpi_thermal_check(void *dat
> > del_timer(&(tz->timer));
> > } else {
> > if (timer_pending(&(tz->timer)))
> > -   mod_timer(&(tz->timer), (HZ * sleep_time) / 1000);
> > +   mod_timer(&(tz->timer),
> > +   jiffies + (HZ * sleep_time) / 1000);
> > else {
> > tz->timer.data = (unsigned long)tz;
> > tz->timer.function = acpi_thermal_run;
> 
> No joy, didn't help.
>
> Didn't apply clean to 21-pre6 (still latest I see on kernel.org), but here's
> the section that seemed right.
> 
> /*
>  * Schedule Next Poll
>  * --
>  */
> if (!sleep_time) {
> if (timer_pending(&(tz->timer)))
> del_timer(&(tz->timer));
> } else {
> if (timer_pending(&(tz->timer)))
> mod_timer(&(tz->timer),jiffies + (HZ * sleep_time) /
> 1000);
> else {
> tz->timer.data = (unsigned long)tz;
> tz->timer.function = acpi_thermal_run;
> tz->timer.expires = jiffies + (HZ * sleep_time) / 
> 1000;
> add_timer(&(tz->timer));

Yes, the above looks like the correct change.

> Also, I think this is new with the line replaced..  Not sure exactly what's
> causing it, but when I restart the cpuspeed daemon:
> 
> Starting cpuspeed: Error: Not an integer:
> /proc/acpi/thermal_zone/TZS1/temperature

Strange.  Maybe a timing thing.

> # cat /proc/acpi/thermal_zone/TZS1/temperature
> temperature: 41 C
> 
> Looks normal to me?

Yes it does.  What happens of you stop and restart the daemon a second
time?

fwiw, /proc/acpi/thermal_zone/ATF0/temperature seems to do the right thing
here.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: ACPI Thermal Zone for CPU0 does not update after hibernation

2007-04-09 Thread Walter Francis

> On Fri, 6 Apr 2007 16:13:55 -0400 (EDT)
> "Walter Francis" <[EMAIL PROTECTED]> wrote:

>> After hibernating, the CPU0 thermal zone never updates.  It will stay at 59C
>> forever for example.

> Yeah, John spotted a bug in there the other day.
>
> Does this fix it?
>
> --- a/drivers/acpi/thermal.c~acpi-thermal-fix-mod_timer-interval
> +++ a/drivers/acpi/thermal.c
> @@ -758,7 +758,8 @@ static void acpi_thermal_check(void *dat
>   del_timer(&(tz->timer));
>   } else {
>   if (timer_pending(&(tz->timer)))
> - mod_timer(&(tz->timer), (HZ * sleep_time) / 1000);
> + mod_timer(&(tz->timer),
> + jiffies + (HZ * sleep_time) / 1000);
>   else {
>   tz->timer.data = (unsigned long)tz;
>   tz->timer.function = acpi_thermal_run;

No joy, didn't help.

Didn't apply clean to 21-pre6 (still latest I see on kernel.org), but here's
the section that seemed right.

/*
 * Schedule Next Poll
 * --
 */
if (!sleep_time) {
if (timer_pending(&(tz->timer)))
del_timer(&(tz->timer));
} else {
if (timer_pending(&(tz->timer)))
mod_timer(&(tz->timer),jiffies + (HZ * sleep_time) /
1000);
else {
tz->timer.data = (unsigned long)tz;
tz->timer.function = acpi_thermal_run;
tz->timer.expires = jiffies + (HZ * sleep_time) / 1000;
add_timer(&(tz->timer));

Also, I think this is new with the line replaced..  Not sure exactly what's
causing it, but when I restart the cpuspeed daemon:

Starting cpuspeed: Error: Not an integer:
/proc/acpi/thermal_zone/TZS1/temperature

# cat /proc/acpi/thermal_zone/TZS1/temperature
temperature: 41 C

Looks normal to me?

Thanks!

-- 
Walter Francis

khayts.us
theblackmoor.net
unlimitedphoto.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] intel_agp: fix G965 GTT size detect

2007-04-09 Thread Wang Zhenyu


Dave, 

On G965, I810_PGETBL_CTL is a mmio offset, but we wrongly take it
as pci config space offset in detecting GTT size. This one line patch
fixs this. 

Signed-off-by: Wang Zhenyu <[EMAIL PROTECTED]>

---
diff --git a/drivers/char/agp/intel-agp.c b/drivers/char/agp/intel-agp.c
index a9fdbf9..4c05c71 100644
--- a/drivers/char/agp/intel-agp.c
+++ b/drivers/char/agp/intel-agp.c
@@ -431,9 +431,8 @@ static void intel_i830_init_gtt_entries(
 
if (IS_I965) {
u32 pgetbl_ctl;
-
-   pci_read_config_dword(agp_bridge->dev, I810_PGETBL_CTL,
- &pgetbl_ctl);
+   pgetbl_ctl = 
readl(intel_i830_private.registers+I810_PGETBL_CTL);
+   
/* The 965 has a field telling us the size of the GTT,
 * which may be larger than what is necessary to map the
 * aperture.
---
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] NET: [UPDATED] Multiqueue network device support implementation.

2007-04-09 Thread Waskiewicz Jr, Peter P

> This indeed looks a lot better than the first patch. I'm too 
> tired to fully review this now, but could you please post the 
> corresponding e1000 patch? From a quick look I'm guessing 
> that this patch changes the behaviour of the prio qdisc from 
> strict priority to whatever scheduling mechanism e1000 uses 
> for its queues when the multiqueue config option is enabled, 
> which might surprise people.
> 

Thanks Pat for the initial feedback.  I can post a set of patches to
e1000 using the new API; I'll try to get them out asap (need to apply to
this kernel tree).  However, the PRIO qdisc still uses the priority in
the bands for dequeueing priority, and will feed the queues on the NIC.
The e1000, and any other multiqueue NIC, will schedule Tx based on how
the PRIO qdisc feeds the queues.  So the only priority here is the
dequeuing priority from the kernel.  The e1000 will use the new API for
starting/stopping the individual queues based on the descriptors
available, much like it does today for the global queue.

-PJ Waskiewicz
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: I give up

2007-04-09 Thread Dave Dillow

On Mon, 2007-04-09 at 21:27 -0400, Dave Jones wrote:
> On Mon, Apr 09, 2007 at 06:22:10PM -0400, Dave Dillow wrote:
>  > On Mon, 2007-04-09 at 23:35 +0200, Jan Engelhardt wrote:
>  > > On Apr 9 2007 15:38, Gene Heskett wrote:
>  > > >On Monday 09 April 2007, H. Peter Anvin wrote:
>  > > >>Jan Engelhardt wrote:
>  > > >>> dm is on 254 for me.. in opensuse with a 2.6.20 that is. I wonder why
>  > > >>> it even moves around. However, even then, those who use udev and
>  > > >>> device names rather than (major,minor) tuples should not have any
>  > > >>> problem.
>  > > >>
>  > > >>It moves around because someone at some point thought it was a great
>  > > >>idea to assign dynamic majors to core functionality.
>  > > >>
>  > > >What were they smoking, I want some of that!
>  > > 
>  > > Do you actually use udev?
>  > 
>  > udev doesn't help the problem he is having (and he is using it, since he
>  > is using Fedora).
> 
> However, it also doesn't explain what the point is of backing up /dev
> when it's dynamically created.

It's not /dev he's backing up -- its /home, /usr, and others. GNU tar
saves the device and inode numbers from the {,l}stat() call on each file
and decides it is a new file if either number changes from run to run.

You are correct about /dev -- who'd back it up if it is dynamically
created? But that is also not the problem Gene is having.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] scheduler: first timeslice of the exiting thread

2007-04-09 Thread Satoru Takeuchi

> >  b) Doesn't add extra field and have thread's parent the creater, which is
> > same as process creation. However it has many side effects, for example,
> > we also need to change sys_getppid() implementation.
> 
> can't understand this, sorry.

Sorry for my obscure English, perhaps I need more training about writing.

What I meant is ...

If thread A creates thread B and A's parent is C, B's parent is also C
currently. In my idea (b), copy_process() doesn't varies B's parent by
CLONE_THREAD flag and B's parent is A. In this case, there is no need to
change sched_exit() and doesn't need to add extra field. However, we need
to change other code as sysy_getppid(), forget_original_parent(), and so on.

It's same as past linux kernel behavior. I don't know the details when and
why these code was changed (and this problem was overlooked then). Once I
thought I came up with good idea, but now I'm rocking back on my heels
because it seems to need many code change.

Satoru
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: I give up

2007-04-09 Thread Dave Jones

On Mon, Apr 09, 2007 at 06:22:10PM -0400, Dave Dillow wrote:
 > On Mon, 2007-04-09 at 23:35 +0200, Jan Engelhardt wrote:
 > > On Apr 9 2007 15:38, Gene Heskett wrote:
 > > >On Monday 09 April 2007, H. Peter Anvin wrote:
 > > >>Jan Engelhardt wrote:
 > > >>> dm is on 254 for me.. in opensuse with a 2.6.20 that is. I wonder why
 > > >>> it even moves around. However, even then, those who use udev and
 > > >>> device names rather than (major,minor) tuples should not have any
 > > >>> problem.
 > > >>
 > > >>It moves around because someone at some point thought it was a great
 > > >>idea to assign dynamic majors to core functionality.
 > > >>
 > > >What were they smoking, I want some of that!
 > > 
 > > Do you actually use udev?
 > 
 > udev doesn't help the problem he is having (and he is using it, since he
 > is using Fedora).

However, it also doesn't explain what the point is of backing up /dev
when it's dynamically created.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: per-thread rusage

2007-04-09 Thread Andrew Morton

On Mon, 9 Apr 2007 18:12:57 -0700
William Lee Irwin III <[EMAIL PROTECTED]> wrote:

> On Mon, 9 Apr 2007 17:42:01 -0700 William Lee Irwin III <[EMAIL PROTECTED]> 
> wrote:
> >>  My use for it is report generation in VM (and possibly other)
> >> testcases.
> 
> On Mon, Apr 09, 2007 at 05:53:52PM -0700, Andrew Morton wrote:
> > OK.  The cool kids are using taskstats for this sort of thing now, but I
> > note that taskstats is inexplicably missing the context-switch accounting,
> > and perhaps other things?
> 
> Sounds interesting. I'll poke around there for testcase affairs if I
> get moving on them first. I've no sentimental attachment to the rusage
> patch, so if taskstats do happen to displace this, I'm not concerned.
> (That said, it may still make sense to do this for the purposes of API
> compatibility. I'll keep it moving along until it's all decided.)
> 

rusage() is a bit easier to use, as it delivers synchronously at task exit.
 taskstats delivers over netlink into a separate process (and can be polled
at any time during task execution) but does require new skills, more
(Linux-specific) code and perhaps more complex synchronisation.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: init's children list is long and slows reaping children.

2007-04-09 Thread Andrew Morton

On Mon, 09 Apr 2007 18:48:54 -0600
[EMAIL PROTECTED] (Eric W. Biederman) wrote:

> Andrew Morton <[EMAIL PROTECTED]> writes:
> 
> > I suspect there are quite a few kernel threads which don't really need to
> > be threads at all: the code would quite happily work if it was changed to
> > use keventd, via schedule_work() and friends.  But kernel threads are
> > somewhat easier to code for.
> >
> > I also suspect that there are a number of workqueue threads which
> > could/should have used create_singlethread_workqueue().  Often this is
> > because the developer just didn't think to do it.
> >
> > otoh, a lot of these inefficeincies are probably down in scruffy drivers
> > rather than in core or top-level code.
> >
> >  > presumably-not-using-kthread kernel threads are coming from>
> 
> 
> >From another piece of this thread.

Yeah, sorry.  Without mentioning any names, [EMAIL PROTECTED] broke the
threading so I came in halfway.

> > > Robin how many kernel thread per cpu are you seeing?
> > 
> > 10.
> > 
> > FYI, pid 1539 is kthread.
> > 
> > a01:~ # ps -ef | egrep "\[.*\/255\]" 
> > root   512 1  0 Apr08 ?00:00:00 [migration/255]
> > root   513 1  0 Apr08 ?00:00:00 [ksoftirqd/255]
> > root  1281 1  0 Apr08 ?00:00:02 [events/255]
> > root  2435  1539  0 Apr08 ?00:00:00 [kblockd/255]
> > root  3159  1539  0 Apr08 ?00:00:00 [aio/255]
> > root  4007  1539  0 Apr08 ?00:00:00 [cqueue/255]
> > root  8653  1539  0 Apr08 ?00:00:00 [ata/255]
> > root 17438  1539  0 Apr08 ?00:00:00 [xfslogd/255]
> > root 17950  1539  0 Apr08 ?00:00:00 [xfsdatad/255]
> > root 18426  1539  0 Apr08 ?00:00:00 [rpciod/255]
> 
> 
> So it looks like there were about 1500 kernel threads that started up before
> kthread started.
> 
> So the kernel threads appear to have init as their parent is because
> they started before kthread for the most part.
> 
> At 10 kernel threads per cpu there may be a little bloat but it isn't
> out of control.  It is mostly that we are observing the kernel as
> NR_CPUS approaches infinity.  4096 isn't infinity yet but it's easily
> a 1000 fold bigger then most people are used to :)

Yes, I expect we could run init_workqueues() much earlier, from process 0
rather than from process 1.  Something _might_ blow up as it often does when
we change startup ordering, but in this case it would be somewhat surprising.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: per-thread rusage

2007-04-09 Thread William Lee Irwin III

On Mon, 9 Apr 2007 17:42:01 -0700 William Lee Irwin III <[EMAIL PROTECTED]> 
wrote:
>>  My use for it is report generation in VM (and possibly other)
>> testcases.

On Mon, Apr 09, 2007 at 05:53:52PM -0700, Andrew Morton wrote:
> OK.  The cool kids are using taskstats for this sort of thing now, but I
> note that taskstats is inexplicably missing the context-switch accounting,
> and perhaps other things?

Sounds interesting. I'll poke around there for testcase affairs if I
get moving on them first. I've no sentimental attachment to the rusage
patch, so if taskstats do happen to displace this, I'm not concerned.
(That said, it may still make sense to do this for the purposes of API
compatibility. I'll keep it moving along until it's all decided.)

-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc6-mm1

2007-04-09 Thread William Lee Irwin III

On Mon, Apr 09, 2007 at 05:50:54PM -0700, Nishanth Aravamudan wrote:
>  static int hugetlbfs_set_page_dirty(struct page *page)
>  {
> - struct page *head = (struct page *)page_private(page);
> + struct page *head = compound_head(page);

Thanks for cleaning this up.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?

2007-04-09 Thread Andrew Morton

On Tue, 10 Apr 2007 00:36:43 +0200
Helge Hafting <[EMAIL PROTECTED]> wrote:

> On Fri, Apr 06, 2007 at 10:37:12PM -0400, Dmitry Torokhov wrote:
> > On Friday 06 April 2007 20:54, Helge Hafting wrote:
> > > I have an usb  touchscreen (egalax variety) that works with
> > > the 2.6.18 kernel supplied by debian.
> > > 
> > > It fails when I compile 2.6.21-rc5-mm4, tuned to the machine
> > > in question.  Unlike the debian kernel, this kernel don't use
> > > modules in order to save boot time.
> > > 
> > > The strange thing is, 2.6.21-rc5-mm4 recognizes the device.
> > > dmesg says things like 
> > > usb 3-2: Manufacturer: eGalac Inc.
> > > usb 3-2: Product: USB TouchController
> > > 
> > > and a lot more. Unlike 2.6.18, it never gets around to say
> > > "usbcore: registered new driver usbtouchscreen"
> > > which seems to indicate a problem.
> > > usbcore registers several other drivers, such as usbserial and pl2303
> > > that makes the gps work. It also registers other drivers like
> > > usb-storage,usbfs,hub,libusual,hiddev,usbhid.  But not usbtouchscreen.
> > > I believe I have turned on every config option for usb touchscreen,
> > > this should not be missing.
> > > 
> > > Is there something wrong, or could there be a seemingly unrelated option
> > > that I need to turn on?
> > 
> > Please make sure that you have CONFIG_USB_TOUCHSCREEN turned on.
> > 
> Unfortunately, I have:
> CONFIG_USB_TOUCHSCREEN=y
> CONFIG_USB_TOUCHSCREEN_EGALAX=y
> 
> Anything else I may have missed?
> 

Is 2.6.21-rc6 OK?

If so, please keep a close eye on 2.6.22-rcX, let us know if/when we've
moved this breakage into mainline :(
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc6-mm1

2007-04-09 Thread Nishanth Aravamudan

On 08.04.2007 [14:35:59 -0700], Andrew Morton wrote:
> 
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/

Get this Oops:

Unable to handle kernel NULL pointer dereference at  RIP: 
 [] hugetlbfs_set_page_dirty+0x4/0xc
PGD 414e067 PUD 4198067 PMD 0 
Oops: 0002 [1] SMP 
last sysfs file: devices/system/node/node0/cpumap
CPU 1 
Modules linked in: ipv6 hidp rfcomm l2cap bluetooth sunrpc video button battery 
asus_acpi ac lp parport_pc parport nvram amd_rng rng_core i2c_amd756 i2c_core
Pid: 6053, comm: readback Not tainted 2.6.21-rc6-mm1-autokern1 #1
RIP: 0010:[]  [] 
hugetlbfs_set_page_dirty+0x4/0xc
RSP: 0018:810004145d90  EFLAGS: 00010282
RAX:  RBX: 81003f1ad000 RCX: 003f
RDX: 810004771dc0 RSI: 810004145db0 RDI: 81003f1ad000
RBP: 87800040 R08: 01258020 R09: 81000160ad84
R10: 0282 R11: 802f931c R12: 8100035db7c0
R13: 810003675c38 R14: 2ae0 R15: 810001022820
FS:  2ac8d0bd6590() GS:81000160acc0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2:  CR3: 047b7000 CR4: 06e0
Process readback (pid: 6053, threadinfo 810004144000, task 81000177b140)
Stack:  80283f95 810004145d98 810004145d98 8100
 2ac0 810003675c38 2ae0 2ac0
 8100047b68b8 0036d5f18000 80284060 81003fc066c0
Call Trace:
 [] __unmap_hugepage_range+0xcf/0x163
 [] unmap_hugepage_range+0x37/0x57
 [] unmap_vmas+0xf6/0x744
 [] exit_mmap+0x78/0xed
 [] mmput+0x45/0xb7
 [] do_exit+0x23d/0x811
 [] sys_exit_group+0x0/0xe
 [] system_call+0x7e/0x83


Code: f0 0f ba 28 04 31 c0 c3 48 89 c8 48 c7 c1 5f 9b 2f 80 48 89 
RIP  [] hugetlbfs_set_page_dirty+0x4/0xc
 RSP 
CR2: 
Fixing recursive fault but reboot is needed!



Steve Fox narrowed it down to between
mm-clean-up-and-kernelify-shrinker-registration.patch (good) and
file-capabilities-accomodate-future-64-bit-caps.patch (bad). Without
testing yet, I'm betting it is:

> +make-page-pprivate-usable-in-compound-pages-v1.patch

I am not sure if there are other users of page_private() that were
missed that are also compound pages, but probably the attached will fix
this case?

Thanks,
Nish

Christoph Lameter's rework of the use of private member of struct page
missed the hugetlbfs dirtying function.

Signed-off-by: Nishanth Aravamudan <[EMAIL PROTECTED]>

---
Only compile-tested so far (on x86_64).

diff -urpN 2.6.21-rc6-mm1/fs/hugetlbfs/inode.c 
2.6.21-rc6-mm1-dev/fs/hugetlbfs/inode.c
--- 2.6.21-rc6-mm1/fs/hugetlbfs/inode.c 2007-04-09 17:17:16.0 -0700
+++ 2.6.21-rc6-mm1-dev/fs/hugetlbfs/inode.c 2007-04-09 17:42:41.0 
-0700
@@ -450,7 +450,7 @@ static int hugetlbfs_symlink(struct inod
  */
 static int hugetlbfs_set_page_dirty(struct page *page)
 {
-   struct page *head = (struct page *)page_private(page);
+   struct page *head = compound_head(page);
 
SetPageDirty(head);
return 0;

-- 
Nishanth Aravamudan <[EMAIL PROTECTED]>
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc6-mm1

2007-04-09 Thread Christoph Lameter

On Mon, 9 Apr 2007, Nishanth Aravamudan wrote:

> I am not sure if there are other users of page_private() that were
> missed that are also compound pages, but probably the attached will fix
> this case?

Correct. 

Acked-by: Christoph Lameter <[EMAIL PROTECTED]>

Who is off to look for more of these.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [QUICKLIST 1/4] Quicklists for page table pages V5

2007-04-09 Thread Christoph Lameter

On Mon, 9 Apr 2007, William Lee Irwin III wrote:

> Basically, I'll help all this along.

Thank you.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: per-thread rusage

2007-04-09 Thread Andrew Morton

On Mon, 9 Apr 2007 17:42:01 -0700
William Lee Irwin III <[EMAIL PROTECTED]> wrote:

>  My use for it is report generation in VM (and possibly other)
> testcases.

OK.  The cool kids are using taskstats for this sort of thing now, but I
note that taskstats is inexplicably missing the context-switch accounting,
and perhaps other things?

> The ack-in-concept is good enough for me to go about
> sweeping up the OS/standards compatibility, testing, and documentation
> issues in the near future prior to resubmission.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: init's children list is long and slows reaping children.

2007-04-09 Thread Eric W. Biederman

Andrew Morton <[EMAIL PROTECTED]> writes:

> I suspect there are quite a few kernel threads which don't really need to
> be threads at all: the code would quite happily work if it was changed to
> use keventd, via schedule_work() and friends.  But kernel threads are
> somewhat easier to code for.
>
> I also suspect that there are a number of workqueue threads which
> could/should have used create_singlethread_workqueue().  Often this is
> because the developer just didn't think to do it.
>
> otoh, a lot of these inefficeincies are probably down in scruffy drivers
> rather than in core or top-level code.
>
>  presumably-not-using-kthread kernel threads are coming from>


>From another piece of this thread.

> > Robin how many kernel thread per cpu are you seeing?
> 
> 10.
> 
> FYI, pid 1539 is kthread.
> 
> a01:~ # ps -ef | egrep "\[.*\/255\]" 
> root   512 1  0 Apr08 ?00:00:00 [migration/255]
> root   513 1  0 Apr08 ?00:00:00 [ksoftirqd/255]
> root  1281 1  0 Apr08 ?00:00:02 [events/255]
> root  2435  1539  0 Apr08 ?00:00:00 [kblockd/255]
> root  3159  1539  0 Apr08 ?00:00:00 [aio/255]
> root  4007  1539  0 Apr08 ?00:00:00 [cqueue/255]
> root  8653  1539  0 Apr08 ?00:00:00 [ata/255]
> root 17438  1539  0 Apr08 ?00:00:00 [xfslogd/255]
> root 17950  1539  0 Apr08 ?00:00:00 [xfsdatad/255]
> root 18426  1539  0 Apr08 ?00:00:00 [rpciod/255]


So it looks like there were about 1500 kernel threads that started up before
kthread started.

So the kernel threads appear to have init as their parent is because
they started before kthread for the most part.

At 10 kernel threads per cpu there may be a little bloat but it isn't
out of control.  It is mostly that we are observing the kernel as
NR_CPUS approaches infinity.  4096 isn't infinity yet but it's easily
a 1000 fold bigger then most people are used to :)

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: ACPI Thermal Zone for CPU0 does not update after hibernation

2007-04-09 Thread Andrew Morton

On Fri, 6 Apr 2007 16:13:55 -0400 (EDT)
"Walter Francis" <[EMAIL PROTECTED]> wrote:

> After hibernating, the CPU0 thermal zone never updates.  It will stay at 59C
> forever for example.
> 
> I've tried making the thermal driver a module and unloading it before
> hibernating and it didn't help, also went back as far as 2.6.19 and saw the
> same behavior there.  Currently using 2.6.21-pre6.  If I reboot or suspend to
> *RAM*, the problem fixes itself.  And CPU1's thermal zone is fine.  But CPU0
> if it's (example) 59C, it stays 59C forever.  I'm seeing it in gkrellm, but
> it's coming from /proc/acpi/thermal_zone/*/temperature and verified to match
> there.

Yeah, John spotted a bug in there the other day.

Does this fix it?

--- a/drivers/acpi/thermal.c~acpi-thermal-fix-mod_timer-interval
+++ a/drivers/acpi/thermal.c
@@ -758,7 +758,8 @@ static void acpi_thermal_check(void *dat
del_timer(&(tz->timer));
} else {
if (timer_pending(&(tz->timer)))
-   mod_timer(&(tz->timer), (HZ * sleep_time) / 1000);
+   mod_timer(&(tz->timer),
+   jiffies + (HZ * sleep_time) / 1000);
else {
tz->timer.data = (unsigned long)tz;
tz->timer.function = acpi_thermal_run;
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: per-thread rusage

2007-04-09 Thread William Lee Irwin III

On Wed, 4 Apr 2007 11:10:50 -0700 William Lee Irwin III <[EMAIL PROTECTED]> 
wrote:
>> Respun vs. 2.6.21-rc5-mm4, still untested. Also...
>> Signed-off-by: William Irwin <[EMAIL PROTECTED]>
[...]

On Mon, Apr 09, 2007 at 04:53:15PM -0700, Andrew Morton wrote:
> Seems sane.  Could we please get it tested and get a full description in
> place?  Something which provides enough detail for the manpage maintainers.
> Also, a quick comparison between Linux's RUSAGE_THREAD and $other-os's
> implementations would reduce the possibility of silly, cast-in-stone
> incompatabilities.

The latter is the more serious of the two. I'll go about investigating
that as the primary task here. Testing and a more verbose patch
description are clearly very little work.

General maintenance-relevant commentary: This patch arose from an
observation of a lacuna in the API. There are no bugs or apps broken
awaiting this as a fix, so it's not needed by 2.6.22 or otherwise
urgently. My use for it is report generation in VM (and possibly other)
testcases. The ack-in-concept is good enough for me to go about
sweeping up the OS/standards compatibility, testing, and documentation
issues in the near future prior to resubmission.

-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC, PATCH 1/3] introduce SYS_CLONE_MASK

2007-04-09 Thread Eric W. Biederman

Robin Holt <[EMAIL PROTECTED]> writes:

> OK.  I just got the OK from management.  The system we were booting was
> for research only.  We had NR_CPUS=num_online_cpus()=4096 which were
> non-hyperthreaded.  With no attached I/O and the tweak I originally
> posted plus one change Jack has already gotten accepted, the machine
> booted in approx 12 minutes.

How much of that time was between the time the kernel was loaded
and before user space was started?

Twelve minutes sounds like a long time for a boot, if you aren't fsck'ing
filesystems.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] NET: [UPDATED] Multiqueue network device support implementation.

2007-04-09 Thread Patrick McHardy

Peter P Waskiewicz Jr wrote:
> From: Peter P Waskiewicz Jr <[EMAIL PROTECTED]>
> 
> Update: Fixed a typecast in free_netdev() for the egress_subqueue list.
> 
> Added an API and associated supporting routines for multiqueue network 
> devices.
> This allows network devices supporting multiple TX queues to configure each
> queue within the netdevice and manage each queue independantly.  Changes to 
> the
> PRIO Qdisc also allow a user to map multiple flows to individual TX queues,
> taking advantage of each queue on the device.

This indeed looks a lot better than the first patch. I'm too tired to
fully review this now, but could you please post the corresponding e1000
patch? From a quick look I'm guessing that this patch changes the
behaviour of the prio qdisc from strict priority to whatever scheduling
mechanism e1000 uses for its queues when the multiqueue config option
is enabled, which might surprise people.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/10] I386 pgd clone under lock fix.patch

2007-04-09 Thread William Lee Irwin III

On Mon, Apr 09, 2007 at 05:06:11PM -0700, Zachary Amsden wrote:
> Copying of the pgd range must happen under the pgd_lock.  This got broken by
> the paravirt changes in the -mm tree.  Badness can result if you copy the pgd
> before being added to the list when splitting or rejoining large pages.
> Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

Sorry my review missed this.

Acked-by: William Irwin <[EMAIL PROTECTED]>


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [QUICKLIST 1/4] Quicklists for page table pages V5

2007-04-09 Thread William Lee Irwin III

On Mon, 9 Apr 2007, Andrew Morton wrote:
>> So... we skipped i386 this time?
>> I'd have gone squeamish if it was included, due to the mystery crash when
>> we (effectively) set the list size to zero.  Someone(tm) should look into 
>> that - who knows, it might indicate a problem in generic code.

On Mon, Apr 09, 2007 at 03:03:19PM -0700, Christoph Lameter wrote:
> Yeah too many scary monsters in the i386 arch code. Maybe Bill Irwin can 
> take a look at how to make this work? He liked the benchmarking code that 
> I posted so he may have the tools to insure that it works right. Maybe he 
> can figure out some additional tricks on how to make quicklists work 
> better?

There shouldn't be anything all that interesting in the i386 code apart
from accommodations made for slab.c and pageattr.c. But yes, I can do
the grunt work there since I'm familiar enough with its history.

I used the i386 pagetable caching backout code to help verify that
nothing unusual was going on with generic code in this area. I can
debug the altered quicklist code in like fashion to what that was.

Basically, I'll help all this along.

-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Preemption Broken: centrino_target busted under SMP on 2.6.20.4

2007-04-09 Thread Andrew Morton

On Thu, 5 Apr 2007 16:50:34 -0400
Dave Jones <[EMAIL PROTECTED]> wrote:

> On Thu, Apr 05, 2007 at 02:33:03PM -0600, Jeff V. Merkey wrote:
>  > 
>  > BUG: using smp_processor_id() in preemptible [0001] code: 
>  > kondemand/0/2473
>  > caller is centrino_target+0xfb/0x600
>  > [<401e3646>] debug_smp_processor_id+0x9e/0xb0
>  > [<40112afb>] centrino_target+0xfb/0x600
>  > [<40112a00>] centrino_target+0x0/0x600
>  > [<40305bd9>] __cpufreq_driver_target+0x5c/0x6b
>  > [] do_dbs_timer+0x1bc/0x208 [cpufreq_ondemand]
>  > [<40134a46>] run_workqueue+0x85/0x125
>  > [<40374f7f>] _spin_lock_irqsave+0x18/0x66
>  > [] do_dbs_timer+0x0/0x208 [cpufreq_ondemand]
>  > [<401353fb>] worker_thread+0xf9/0x124
>  > [<401213b9>] default_wake_function+0x0/0xc
>  > [<40135302>] worker_thread+0x0/0x124
>  > [<40137b37>] kthread+0xb0/0xd9
>  > [<40137a87>] kthread+0x0/0xd9
>  > [<40104b2f>] kernel_thread_helper+0x7/0x10
> 
> Given speedstep-centrino is obsoleted in favour of acpi-cpufreq,
> I'm reluctant to spend too much time turd-polishing.
> This big-hammer diff should fix this..
> 
>   Dave
> 
> diff --git a/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c 
> b/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
> index f43b987..824d0a2 100644
> --- a/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
> +++ b/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
> @@ -708,6 +708,7 @@ static int centrino_target (struct cpufreq_policy *policy,
>   saved_mask = current->cpus_allowed;
>   first_cpu = 1;
>   cpus_clear(covered_cpus);
> + preempt_disable();
>   for_each_cpu_mask(j, online_policy_cpus) {
>   /*
>* Support for SMP systems.
> @@ -798,6 +799,7 @@ static int centrino_target (struct cpufreq_policy *policy,
>   }
>  
>  migrate_end:
> + preempt_enable();
>   set_cpus_allowed(current, saved_mask);
>   return 0;
>  }

This means we'll call set_cpus_allowed() while in atomic state, but
set_cpus_allowed() does sleepy stuff.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: init's children list is long and slows reaping children.

2007-04-09 Thread Andrew Morton

On Fri, 06 Apr 2007 18:38:40 -0400
Jeff Garzik <[EMAIL PROTECTED]> wrote:

> Robin Holt wrote:
> > We have been testing a new larger configuration and we are seeing a very
> > large scan time of init's tsk->children list.  In the cases we are seeing,
> > there are numerous kernel processes created for each cpu (ie: events/0
> > ... events/, xfslogd/0 ... xfslogd/).  These are
> > all on the list ahead of the processes we are currently trying to reap.
> 
> What about attacking the explosion of kernel threads?
> 
> As CPU counts increase, the number of per-CPU kernel threads gets really 
> ridiculous.
> 
> I would rather change the implementation under the hood to start per-CPU 
> threads on demand, similar to a thread-pool implementation.
> 
> Boxes with $BigNum CPUs probably won't ever use half of those threads.

I suspect there are quite a few kernel threads which don't really need to
be threads at all: the code would quite happily work if it was changed to
use keventd, via schedule_work() and friends.  But kernel threads are
somewhat easier to code for.

I also suspect that there are a number of workqueue threads which
could/should have used create_singlethread_workqueue().  Often this is
because the developer just didn't think to do it.

otoh, a lot of these inefficeincies are probably down in scruffy drivers
rather than in core or top-level code.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] NET: [UPDATED] Multiqueue network device support implementation.

2007-04-09 Thread Peter P Waskiewicz Jr

From: Peter P Waskiewicz Jr <[EMAIL PROTECTED]>

Update: Fixed a typecast in free_netdev() for the egress_subqueue list.

Added an API and associated supporting routines for multiqueue network devices.
This allows network devices supporting multiple TX queues to configure each
queue within the netdevice and manage each queue independantly.  Changes to the
PRIO Qdisc also allow a user to map multiple flows to individual TX queues,
taking advantage of each queue on the device.

Signed-off-by: Peter P. Waskiewicz Jr <[EMAIL PROTECTED]>
Signed-off-by: Auke Kok <[EMAIL PROTECTED]>
---

 include/linux/etherdevice.h |3 +-
 include/linux/netdevice.h   |   66 ++-
 include/linux/skbuff.h  |2 +
 net/core/dev.c  |   25 +---
 net/core/skbuff.c   |3 ++
 net/ethernet/eth.c  |9 +++---
 net/sched/sch_generic.c |4 +--
 net/sched/sch_prio.c|   55 +++-
 8 files changed, 146 insertions(+), 21 deletions(-)

diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
index 745c988..446de39 100644
--- a/include/linux/etherdevice.h
+++ b/include/linux/etherdevice.h
@@ -39,7 +39,8 @@ extern void   eth_header_cache_update(struct hh_cache 
*hh, struct net_device *dev
 extern int eth_header_cache(struct neighbour *neigh,
 struct hh_cache *hh);
 
-extern struct net_device *alloc_etherdev(int sizeof_priv);
+extern struct net_device *alloc_etherdev_mq(int sizeof_priv, int queue_count);
+#define alloc_etherdev(sizeof_priv) alloc_etherdev_mq(sizeof_priv, 1)
 static inline void eth_copy_and_sum (struct sk_buff *dest, 
 const unsigned char *src, 
 int len, int base)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 71fc8ff..db06169 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -106,6 +106,14 @@ struct netpoll_info;
 #define MAX_HEADER (LL_MAX_HEADER + 48)
 #endif
 
+struct net_device_subqueue
+{
+   /* Give a control state for each queue.  This struct may contain
+* per-queue locks in the future.
+*/
+   unsigned long   state;
+};
+
 /*
  * Network device statistics. Akin to the 2.0 ether stats but
  * with byte counters.
@@ -324,6 +332,7 @@ struct net_device
 #define NETIF_F_GSO2048/* Enable software GSO. */
 #define NETIF_F_LLTX   4096/* LockLess TX */
 #define NETIF_F_INTERNAL_STATS 8192/* Use stats structure in net_device */
+#define NETIF_F_MULTI_QUEUE16384   /* Has multiple TX/RX queues */
 
/* Segmentation offload features */
 #define NETIF_F_GSO_SHIFT  16
@@ -534,6 +543,14 @@ struct net_device
struct device   dev;
/* space for optional statistics and wireless sysfs groups */
struct attribute_group  *sysfs_groups[3];
+
+   /* To retrieve statistics per subqueue - FOR FUTURE USE */
+   struct net_device_stats* (*get_subqueue_stats)(struct net_device *dev,
+   int queue_index);
+
+   /* The TX queue control structures */
+   struct net_device_subqueue  *egress_subqueue;
+   int egress_subqueue_count;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -675,6 +692,48 @@ static inline int netif_running(const struct net_device 
*dev)
return test_bit(__LINK_STATE_START, &dev->state);
 }
 
+/*
+ * Routines to manage the subqueues on a device.  We only need start
+ * stop, and a check if it's stopped.  All other device management is
+ * done at the overall netdevice level.
+ * Also test the device if we're multiqueue.
+ */
+static inline void netif_start_subqueue(struct net_device *dev, u16 
queue_index)
+{
+   clear_bit(__LINK_STATE_XOFF, &dev->egress_subqueue[queue_index].state);
+}
+
+static inline void netif_stop_subqueue(struct net_device *dev, u16 queue_index)
+{
+#ifdef CONFIG_NETPOLL_TRAP
+   if (netpoll_trap())
+   return;
+#endif
+   set_bit(__LINK_STATE_XOFF, &dev->egress_subqueue[queue_index].state);
+}
+
+static inline int netif_subqueue_stopped(const struct net_device *dev,
+ u16 queue_index)
+{
+   return test_bit(__LINK_STATE_XOFF,
+   &dev->egress_subqueue[queue_index].state);
+}
+
+static inline void netif_wake_subqueue(struct net_device *dev, u16 queue_index)
+{
+#ifdef CONFIG_NETPOLL_TRAP
+   if (netpoll_trap())
+   return;
+#endif
+   if (test_and_clear_bit(__LINK_STATE_XOFF,
+  &dev->egress_subqueue[queue_index].state))
+   __netif_schedule(dev);
+}
+
+static inline int netif_is_multiqueue(const struct net_device *dev)
+{
+   return (!!(NETIF_F_MULTI_QUEUE & dev->features));
+}
 
 /* Use this vari

[PATCH 10/10] Fix BusLogic to stop using check_region

2007-04-09 Thread Zachary Amsden

I got so sick of seing the check_region warnings from BusLogic.c I actually
fixed it properly.  Never use check region, reserve it before the probe
with request region instead and check the error result; free region if
setup fails.  Should be functionally identical to the original except for
fixing the potential race.

Subject: Fix BusLogic to stop using check_region
Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

diff -r 15645892b508 drivers/scsi/BusLogic.c
--- a/drivers/scsi/BusLogic.c   Mon Apr 09 13:43:06 2007 -0700
+++ b/drivers/scsi/BusLogic.c   Mon Apr 09 14:52:00 2007 -0700
@@ -579,17 +579,17 @@ static void __init BusLogic_InitializePr
/*
   Append the list of standard BusLogic MultiMaster ISA I/O Addresses.
 */
-   if (BusLogic_ProbeOptions.LimitedProbeISA ? 
BusLogic_ProbeOptions.Probe330 : check_region(0x330, 
BusLogic_MultiMasterAddressCount) == 0)
+   if (!BusLogic_ProbeOptions.LimitedProbeISA || 
BusLogic_ProbeOptions.Probe330)
BusLogic_AppendProbeAddressISA(0x330);
-   if (BusLogic_ProbeOptions.LimitedProbeISA ? 
BusLogic_ProbeOptions.Probe334 : check_region(0x334, 
BusLogic_MultiMasterAddressCount) == 0)
+   if (!BusLogic_ProbeOptions.LimitedProbeISA || 
BusLogic_ProbeOptions.Probe334)
BusLogic_AppendProbeAddressISA(0x334);
-   if (BusLogic_ProbeOptions.LimitedProbeISA ? 
BusLogic_ProbeOptions.Probe230 : check_region(0x230, 
BusLogic_MultiMasterAddressCount) == 0)
+   if (!BusLogic_ProbeOptions.LimitedProbeISA || 
BusLogic_ProbeOptions.Probe230)
BusLogic_AppendProbeAddressISA(0x230);
-   if (BusLogic_ProbeOptions.LimitedProbeISA ? 
BusLogic_ProbeOptions.Probe234 : check_region(0x234, 
BusLogic_MultiMasterAddressCount) == 0)
+   if (!BusLogic_ProbeOptions.LimitedProbeISA || 
BusLogic_ProbeOptions.Probe234)
BusLogic_AppendProbeAddressISA(0x234);
-   if (BusLogic_ProbeOptions.LimitedProbeISA ? 
BusLogic_ProbeOptions.Probe130 : check_region(0x130, 
BusLogic_MultiMasterAddressCount) == 0)
+   if (!BusLogic_ProbeOptions.LimitedProbeISA || 
BusLogic_ProbeOptions.Probe130)
BusLogic_AppendProbeAddressISA(0x130);
-   if (BusLogic_ProbeOptions.LimitedProbeISA ? 
BusLogic_ProbeOptions.Probe134 : check_region(0x134, 
BusLogic_MultiMasterAddressCount) == 0)
+   if (!BusLogic_ProbeOptions.LimitedProbeISA || 
BusLogic_ProbeOptions.Probe134)
BusLogic_AppendProbeAddressISA(0x134);
 }
 
@@ -795,7 +795,9 @@ static int __init BusLogic_InitializeMul
   host adapters are probed.
 */
if (!BusLogic_ProbeOptions.NoProbeISA)
-   if (PrimaryProbeInfo->IO_Address == 0 && 
(BusLogic_ProbeOptions.LimitedProbeISA ? BusLogic_ProbeOptions.Probe330 : 
check_region(0x330, BusLogic_MultiMasterAddressCount) == 0)) {
+   if (PrimaryProbeInfo->IO_Address == 0 &&
+   (!BusLogic_ProbeOptions.LimitedProbeISA ||
+BusLogic_ProbeOptions.Probe330)) {
PrimaryProbeInfo->HostAdapterType = 
BusLogic_MultiMaster;
PrimaryProbeInfo->HostAdapterBusType = BusLogic_ISA_Bus;
PrimaryProbeInfo->IO_Address = 0x330;
@@ -805,15 +807,25 @@ static int __init BusLogic_InitializeMul
   omitting the Primary I/O Address which has already been handled.
 */
if (!BusLogic_ProbeOptions.NoProbeISA) {
-   if (!StandardAddressSeen[1] && 
(BusLogic_ProbeOptions.LimitedProbeISA ? BusLogic_ProbeOptions.Probe334 : 
check_region(0x334, BusLogic_MultiMasterAddressCount) == 0))
+   if (!StandardAddressSeen[1] &&
+   (!BusLogic_ProbeOptions.LimitedProbeISA ||
+BusLogic_ProbeOptions.Probe334))
BusLogic_AppendProbeAddressISA(0x334);
-   if (!StandardAddressSeen[2] && 
(BusLogic_ProbeOptions.LimitedProbeISA ? BusLogic_ProbeOptions.Probe230 : 
check_region(0x230, BusLogic_MultiMasterAddressCount) == 0))
+   if (!StandardAddressSeen[2] &&
+   (!BusLogic_ProbeOptions.LimitedProbeISA ||
+BusLogic_ProbeOptions.Probe230))
BusLogic_AppendProbeAddressISA(0x230);
-   if (!StandardAddressSeen[3] && 
(BusLogic_ProbeOptions.LimitedProbeISA ? BusLogic_ProbeOptions.Probe234 : 
check_region(0x234, BusLogic_MultiMasterAddressCount) == 0))
+   if (!StandardAddressSeen[3] &&
+   (!BusLogic_ProbeOptions.LimitedProbeISA ||
+BusLogic_ProbeOptions.Probe234))
BusLogic_AppendProbeAddressISA(0x234);
-   if (!StandardAddressSeen[4] && 
(BusLogic_ProbeOptions.LimitedProbeISA ? BusLogic_ProbeOptions.Probe130 : 
check_region(0x130, BusLogic_MultiMasterAddressCount) == 0))
+   if (!StandardAddres

[PATCH 3/10] I386 mcheck p4 grotesque and needless warning fix.patch

2007-04-09 Thread Zachary Amsden

No, just no.  You do not use goto to skip a code block.  You do not
return an obvious variable from a singly-inlined function and give
the function a return value.  You don't put unexplained comments
about kmalloc in code which doesn't do dynamic allocation.  And
you don't leave stray warnings around for no good reason.

Also, when possible, it is better to use block scoped variables
because gcc can sometime generate better code.

Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

diff -r ed741f57dae8 arch/i386/kernel/cpu/mcheck/p4.c
--- a/arch/i386/kernel/cpu/mcheck/p4.c  Fri Apr 06 14:29:52 2007 -0700
+++ b/arch/i386/kernel/cpu/mcheck/p4.c  Fri Apr 06 14:43:24 2007 -0700
@@ -124,12 +124,9 @@ static void intel_init_thermal(struct cp
 
 
 /* P4/Xeon Extended MCE MSR retrieval, return 0 if unsupported */
-static inline int intel_get_extended_msrs(struct intel_mce_extended_msrs *r)
+static inline void intel_get_extended_msrs(struct intel_mce_extended_msrs *r)
 {
u32 h;
-
-   if (mce_num_extended_msrs == 0)
-   goto done;
 
rdmsr (MSR_IA32_MCG_EAX, r->eax, h);
rdmsr (MSR_IA32_MCG_EBX, r->ebx, h);
@@ -141,12 +138,6 @@ static inline int intel_get_extended_msr
rdmsr (MSR_IA32_MCG_ESP, r->esp, h);
rdmsr (MSR_IA32_MCG_EFLAGS, r->eflags, h);
rdmsr (MSR_IA32_MCG_EIP, r->eip, h);
-
-   /* can we rely on kmalloc to do a dynamic
-* allocation for the reserved registers?
-*/
-done:
-   return mce_num_extended_msrs;
 }
 
 static fastcall void intel_machine_check(struct pt_regs * regs, long 
error_code)
@@ -155,7 +146,6 @@ static fastcall void intel_machine_check
u32 alow, ahigh, high, low;
u32 mcgstl, mcgsth;
int i;
-   struct intel_mce_extended_msrs dbg;
 
rdmsr (MSR_IA32_MCG_STATUS, mcgstl, mcgsth);
if (mcgstl & (1<<0))/* Recoverable ? */
@@ -164,7 +154,9 @@ static fastcall void intel_machine_check
printk (KERN_EMERG "CPU %d: Machine Check Exception: %08x%08x\n",
smp_processor_id(), mcgsth, mcgstl);
 
-   if (intel_get_extended_msrs(&dbg)) {
+   if (mce_num_extended_msrs > 0) {
+   struct intel_mce_extended_msrs dbg;
+   intel_get_extended_msrs(&dbg);
printk (KERN_DEBUG "CPU %d: EIP: %08x EFLAGS: %08x\n",
smp_processor_id(), dbg.eip, dbg.eflags);
printk (KERN_DEBUG "\teax: %08x ebx: %08x ecx: %08x edx: 
%08x\n",
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/10] I386 pgd clone under lock fix.patch

2007-04-09 Thread Zachary Amsden

Copying of the pgd range must happen under the pgd_lock.  This got broken by
the paravirt changes in the -mm tree.  Badness can result if you copy the pgd
before being added to the list when splitting or rejoining large pages.

Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

diff -r 2247ff2c3fdb arch/i386/mm/pgtable.c
--- a/arch/i386/mm/pgtable.cThu Apr 05 17:29:15 2007 -0700
+++ b/arch/i386/mm/pgtable.cThu Apr 05 17:40:02 2007 -0700
@@ -241,18 +241,16 @@ void pgd_ctor(void *pgd, struct kmem_cac
/* !PAE, no pagetable sharing */
memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t));
 
+   spin_lock_irqsave(&pgd_lock, flags);
+
+   /* must happen under lock */
clone_pgd_range((pgd_t *)pgd + USER_PTRS_PER_PGD,
swapper_pg_dir + USER_PTRS_PER_PGD,
KERNEL_PGD_PTRS);
-
-   spin_lock_irqsave(&pgd_lock, flags);
-
-   /* must happen under lock */
paravirt_alloc_pd_clone(__pa(pgd) >> PAGE_SHIFT,
__pa(swapper_pg_dir) >> PAGE_SHIFT,
USER_PTRS_PER_PGD,
KERNEL_PGD_PTRS);
-
pgd_list_add(pgd);
spin_unlock_irqrestore(&pgd_lock, flags);
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 9/10] Vmi timer update.patch

2007-04-09 Thread Zachary Amsden

Convert VMI timer to use clock events, making it properly able to use the NO_HZ
infrastructure.  On UP systems, with no local APIC, we just continue to route
these events through the PIT.  On systems with a local APIC, or SMP, we provide
a single source interrupt chip which creates the local timer IRQ.  It actually
gets delivered by the APIC hardware, but we don't want to use the same local
APIC clocksource processing, so we create our own handler here.

Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

diff -r c02ab981c99c arch/i386/kernel/Makefile
--- a/arch/i386/kernel/Makefile Mon Apr 09 15:45:27 2007 -0700
+++ b/arch/i386/kernel/Makefile Mon Apr 09 15:45:27 2007 -0700
@@ -41,7 +41,7 @@ obj-$(CONFIG_K8_NB)   += k8.o
 obj-$(CONFIG_K8_NB)+= k8.o
 obj-$(CONFIG_STACK_UNWIND) += unwind.o
 
-obj-$(CONFIG_VMI)  += vmi.o vmitime.o
+obj-$(CONFIG_VMI)  += vmi.o vmiclock.o
 obj-$(CONFIG_PARAVIRT) += paravirt.o
 obj-y  += pcspeaker.o
 
diff -r c02ab981c99c arch/i386/kernel/vmi.c
--- a/arch/i386/kernel/vmi.cMon Apr 09 15:45:27 2007 -0700
+++ b/arch/i386/kernel/vmi.cMon Apr 09 15:49:37 2007 -0700
@@ -73,6 +73,9 @@ static struct {
void (*set_lazy_mode)(int mode);
 } vmi_ops;
 
+/* Cached VMI operations */
+struct vmi_timer_ops vmi_timer_ops;
+
 /*
  * VMI patching routines.
  */
@@ -231,18 +234,6 @@ static void vmi_nop(void)
 {
 }
 
-/* For NO_IDLE_HZ, we stop the clock when halting the kernel */
-static fastcall void vmi_safe_halt(void)
-{
-   int idle = vmi_stop_hz_timer();
-   vmi_ops.halt();
-   if (idle) {
-   local_irq_disable();
-   vmi_account_time_restart_hz_timer();
-   local_irq_enable();
-   }
-}
-
 #ifdef CONFIG_DEBUG_PAGE_TYPE
 
 #ifdef CONFIG_X86_PAE
@@ -714,7 +705,6 @@ do {
\
vmi_ops.cache = (void *)rel->eip;   \
}   \
 } while (0)
-
 
 /*
  * Activate the VMI interface and switch into paravirtualized mode
@@ -894,8 +884,8 @@ static inline int __init activate_vmi(vo
paravirt_ops.get_wallclock = vmi_get_wallclock;
paravirt_ops.set_wallclock = vmi_set_wallclock;
 #ifdef CONFIG_X86_LOCAL_APIC
-   paravirt_ops.setup_boot_clock = vmi_timer_setup_boot_alarm;
-   paravirt_ops.setup_secondary_clock = 
vmi_timer_setup_secondary_alarm;
+   paravirt_ops.setup_boot_clock = vmi_time_bsp_init; 
+   paravirt_ops.setup_secondary_clock = vmi_time_ap_init;
 #endif
paravirt_ops.get_scheduled_cycles = vmi_get_sched_cycles;
paravirt_ops.get_cpu_khz = vmi_cpu_khz;
@@ -907,11 +897,7 @@ static inline int __init activate_vmi(vo
disable_vmi_timer = 1;
}
 
-   /* No idle HZ mode only works if VMI timer and no idle is enabled */
-   if (disable_noidle || disable_vmi_timer)
-   para_fill(safe_halt, Halt);
-   else
-   para_wrap(safe_halt, vmi_safe_halt, halt, Halt);
+   para_fill(safe_halt, Halt);
 
/*
 * Alternative instruction rewriting doesn't happen soon enough
diff -r c02ab981c99c include/asm-i386/vmi_time.h
--- a/include/asm-i386/vmi_time.h   Mon Apr 09 15:45:27 2007 -0700
+++ b/include/asm-i386/vmi_time.h   Mon Apr 09 15:45:27 2007 -0700
@@ -53,22 +53,8 @@ extern unsigned long vmi_cpu_khz(void);
 extern unsigned long vmi_cpu_khz(void);
 
 #ifdef CONFIG_X86_LOCAL_APIC
-extern void __init vmi_timer_setup_boot_alarm(void);
-extern void __devinit vmi_timer_setup_secondary_alarm(void);
-extern void apic_vmi_timer_interrupt(void);
-#endif
-
-#ifdef CONFIG_NO_IDLE_HZ
-extern int vmi_stop_hz_timer(void);
-extern void vmi_account_time_restart_hz_timer(void);
-#else
-static inline int vmi_stop_hz_timer(void)
-{
-   return 0;
-}
-static inline void vmi_account_time_restart_hz_timer(void)
-{
-}
+extern void __devinit vmi_time_bsp_init(void);
+extern void __devinit vmi_time_ap_init(void);
 #endif
 
 /*
diff -r c02ab981c99c arch/i386/kernel/entry.S
--- a/arch/i386/kernel/entry.S  Mon Apr 09 15:45:27 2007 -0700
+++ b/arch/i386/kernel/entry.S  Mon Apr 09 16:03:18 2007 -0700
@@ -637,11 +637,6 @@ ENDPROC(name)
 /* The include is where all of the SMP etc. interrupts come from */
 #include "entry_arch.h"
 
-/* This alternate entry is needed because we hijack the apic LVTT */
-#if defined(CONFIG_VMI) && defined(CONFIG_X86_LOCAL_APIC)
-BUILD_INTERRUPT(apic_vmi_timer_interrupt,LOCAL_TIMER_VECTOR)
-#endif
-
 KPROBE_ENTRY(page_fault)
RING0_EC_FRAME
pushl $do_page_fault
diff -r c02ab981c99c arch/i386/kernel/vmiclock.c
--- /dev/null   Thu Jan 01 00:00:00 1970 +
+++ b/arch/i386/kernel/vmiclock.c   Mon Apr 09 15:47:17 2007 -0700
@@ -0,0 +1,318 @@
+/*
+ * VMI paravirtual timer support routines.
+ *
+ * Copyright (C) 2007, VMware, Inc

[PATCH 6/10] Vmi supports compat vdso.patch

2007-04-09 Thread Zachary Amsden

Now that the VDSO can be relocated, we can support it in VMI configurations.

Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

diff -r 158d9ffb46fe arch/i386/Kconfig
--- a/arch/i386/Kconfig Thu Mar 29 04:17:05 2007 -0700
+++ b/arch/i386/Kconfig Thu Mar 29 04:18:05 2007 -0700
@@ -220,7 +220,7 @@ config PARAVIRT
 
 config VMI
bool "VMI Paravirt-ops support"
-   depends on PARAVIRT && !COMPAT_VDSO
+   depends on PARAVIRT
help
  VMI provides a paravirtualized interface to the VMware ESX server
  (it could be used by other hypervisors in theory too, but is not
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 7/10] Resurrect the VMI lazy mode fixes.

2007-04-09 Thread Zachary Amsden

Code changes and cleanup in the paravirt-ops queue caused the original
fix for this in 2.6.21 to create conflicts.  The easiest thing to do was
back it out before applying the queue.  In that case, this fix brings it
back with the newly right properly tidied up paravirt-ops code.

Wheee!

Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

diff -r ecb571084874 arch/i386/kernel/vmi.c
--- a/arch/i386/kernel/vmi.cFri Apr 06 12:31:06 2007 -0700
+++ b/arch/i386/kernel/vmi.cFri Apr 06 14:25:03 2007 -0700
@@ -69,6 +69,7 @@ static struct {
void (*flush_tlb)(int);
void (*set_initial_ap_state)(int, int);
void (*halt)(void);
+   void (*set_lazy_mode)(int mode);
 } vmi_ops;
 
 /*
@@ -545,6 +546,26 @@ vmi_startup_ipi_hook(int phys_apicid, un
 }
 #endif
 
+static void vmi_set_lazy_mode(enum paravirt_lazy_mode mode)
+{
+   static DEFINE_PER_CPU(int, lazy_mode);
+
+   if (!vmi_ops.set_lazy_mode)
+   return;
+
+   /* Modes should never nest or overlap */
+   BUG_ON(__get_cpu_var(lazy_mode) && !(mode == PARAVIRT_LAZY_NONE ||
+mode == PARAVIRT_LAZY_FLUSH));
+   
+   if (mode == PARAVIRT_LAZY_FLUSH) {
+   vmi_ops.set_lazy_mode(0);
+   vmi_ops.set_lazy_mode(__get_cpu_var(lazy_mode));
+   } else {
+   vmi_ops.set_lazy_mode(mode);
+   __get_cpu_var(lazy_mode) = mode;
+   }
+}
+
 static inline int __init check_vmi_rom(struct vrom_header *rom)
 {
struct pci_header *pci;
@@ -769,7 +790,7 @@ static inline int __init activate_vmi(vo
para_wrap(load_esp0, vmi_load_esp0, set_kernel_stack, 
UpdateKernelStack);
para_fill(set_iopl_mask, SetIOPLMask);
para_fill(io_delay, IODelay);
-   para_fill(set_lazy_mode, SetLazyMode);
+   para_wrap(set_lazy_mode, vmi_set_lazy_mode, set_lazy_mode, SetLazyMode);
 
/* user and kernel flush are just handled with different flags to 
FlushTLB */
para_wrap(flush_tlb_user, vmi_flush_tlb_user, flush_tlb, FlushTLB);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/10] I386 acpi remove earlyquirk warning.patch

2007-04-09 Thread Zachary Amsden

Remove a warning about unused variable in !CONFIG_ACPI compilation.

Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

diff -r 1969f6c3440a arch/i386/kernel/acpi/earlyquirk.c
--- a/arch/i386/kernel/acpi/earlyquirk.cFri Apr 06 14:27:45 2007 -0700
+++ b/arch/i386/kernel/acpi/earlyquirk.cFri Apr 06 14:29:46 2007 -0700
@@ -21,8 +21,8 @@ static int __init nvidia_hpet_check(stru
 
 static int __init check_bridge(int vendor, int device)
 {
+#ifdef CONFIG_ACPI
static int warned;
-#ifdef CONFIG_ACPI
/* According to Nvidia all timer overrides are bogus unless HPET
   is enabled. */
if (!acpi_use_timer_override && vendor == PCI_VENDOR_ID_NVIDIA) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 8/10] Vmi kmap_atomic_pte fix.patch

2007-04-09 Thread Zachary Amsden

Implement vmi_kmap_atomic_pte in terms of the backend set_linear_mapping
operation.  The conversion is rather straighforward; call kmap_atomic
and then inform the hypervisor of the page mapping.

The _flush_tlb damage is due to macros being pulled in from highmem.h.

Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

diff -r 2207a31829e7 arch/i386/kernel/vmi.c
--- a/arch/i386/kernel/vmi.cMon Apr 09 13:43:06 2007 -0700
+++ b/arch/i386/kernel/vmi.cMon Apr 09 15:05:33 2007 -0700
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -65,8 +66,8 @@ static struct {
void (*release_page)(u32, u32);
void (*set_pte)(pte_t, pte_t *, unsigned);
void (*update_pte)(pte_t *, unsigned);
-   void (*set_linear_mapping)(int, u32, u32, u32);
-   void (*flush_tlb)(int);
+   void (*set_linear_mapping)(int, void *, u32, u32);
+   void (*_flush_tlb)(int);
void (*set_initial_ap_state)(int, int);
void (*halt)(void);
void (*set_lazy_mode)(int mode);
@@ -217,12 +218,12 @@ static void vmi_load_esp0(struct tss_str
 
 static void vmi_flush_tlb_user(void)
 {
-   vmi_ops.flush_tlb(VMI_FLUSH_TLB);
+   vmi_ops._flush_tlb(VMI_FLUSH_TLB);
 }
 
 static void vmi_flush_tlb_kernel(void)
 {
-   vmi_ops.flush_tlb(VMI_FLUSH_TLB | VMI_FLUSH_GLOBAL);
+   vmi_ops._flush_tlb(VMI_FLUSH_TLB | VMI_FLUSH_GLOBAL);
 }
 
 /* Stub to do nothing at all; used for delays and unimplemented calls */
@@ -345,8 +346,11 @@ static void vmi_check_page_type(u32 pfn,
 #define vmi_check_page_type(p,t) do { } while (0)
 #endif
 
-static void vmi_map_pt_hook(int type, pte_t *va, u32 pfn)
-{
+#ifdef CONFIG_HIGHPTE
+static void *vmi_kmap_atomic_pte(struct page *page, enum km_type type)
+{
+   void *va = kmap_atomic(page, type);
+
/*
 * Internally, the VMI ROM must map virtual addresses to physical
 * addresses for processing MMU updates.  By the time MMU updates
@@ -360,8 +364,11 @@ static void vmi_map_pt_hook(int type, pt
 *  args: SLOT VACOUNT PFN
 */
BUG_ON(type != KM_PTE0 && type != KM_PTE1);
-   vmi_ops.set_linear_mapping((type - KM_PTE0)+1, (u32)va, 1, pfn);
-}
+   vmi_ops.set_linear_mapping((type - KM_PTE0)+1, va, 1, 
page_to_pfn(page));
+
+   return va;
+}
+#endif
 
 static void vmi_allocate_pt(u32 pfn)
 {
@@ -656,7 +663,7 @@ void vmi_bringup(void)
 {
/* We must establish the lowmem mapping for MMU ops to work */
if (vmi_ops.set_linear_mapping)
-   vmi_ops.set_linear_mapping(0, __PAGE_OFFSET, max_low_pfn, 0);
+   vmi_ops.set_linear_mapping(0, (void *)__PAGE_OFFSET, 
max_low_pfn, 0);
 }
 
 /*
@@ -793,8 +800,8 @@ static inline int __init activate_vmi(vo
para_wrap(set_lazy_mode, vmi_set_lazy_mode, set_lazy_mode, SetLazyMode);
 
/* user and kernel flush are just handled with different flags to 
FlushTLB */
-   para_wrap(flush_tlb_user, vmi_flush_tlb_user, flush_tlb, FlushTLB);
-   para_wrap(flush_tlb_kernel, vmi_flush_tlb_kernel, flush_tlb, FlushTLB);
+   para_wrap(flush_tlb_user, vmi_flush_tlb_user, _flush_tlb, FlushTLB);
+   para_wrap(flush_tlb_kernel, vmi_flush_tlb_kernel, _flush_tlb, FlushTLB);
para_fill(flush_tlb_single, InvalPage);
 
/*
@@ -840,9 +847,12 @@ static inline int __init activate_vmi(vo
paravirt_ops.release_pt = vmi_release_pt;
paravirt_ops.release_pd = vmi_release_pd;
}
-#if 0
-   para_wrap(map_pt_hook, vmi_map_pt_hook, set_linear_mapping,
- SetLinearMapping);
+
+   /* Set linear is needed in all cases */
+   vmi_ops.set_linear_mapping = 
vmi_get_function(VMI_CALL_SetLinearMapping);
+#ifdef CONFIG_HIGHPTE
+   if (vmi_ops.set_linear_mapping)
+   paravirt_ops.kmap_atomic_pte = vmi_kmap_atomic_pte;
 #endif
 
/*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/10] i386, VMI, BusLogic, Timer fixes for -mm

2007-04-09 Thread Zachary Amsden

Latest cleanups and junk from Zach's tree.  All for -mm tree.
Based off Jeremy's latest known applied patches.  If the
paravirt or VMI patches reject let me know; we are cleaning up
tree and will redo.

Otherwise, I have 4 fixes for i386; a warning fix in sysenter
which is quite serious; some less important warning removals,
and a bug with pgd locking during root creation.

Next, I tidy up the kmap_atomic_pte code, since it makes no
sense to leave it around under !HIGHMEM and being forced to
define it introduces conflicts with where things get defined
that just is ugly.

Some VMI fixes; we support COMPAT_VDSO now that the relocation
code is there, re-add the lazy MMU fix now that Jeremy's cleanups
have been applied to -mm, implement kmap_atomic_pte.

Next, re-implement the VMI timer, dropping all the NO_IDLE_HZ
code, making it dependent on NO_HZ and using the proper clockevents
infrastructure.  I am looking for feedback on this, but it works
and applies as is, and is certainly better and cleaner than the
existing code.  Hopefully I have taken as many fingers out of the
APIC pie as possible.

Finally, fix the BusLogic driver to use request_range properly.

Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/10] Paravirt kmap_atomic_pte tidy.patch

2007-04-09 Thread Zachary Amsden

Don't implement native_kmap_atomic_pte for !HIGHPTE case; it is never needed,
never called, and leaving it in is just plain confusing.  Making it isolated
to the config where it is used may help find bugs.

Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

diff -r 5c03805411a6 arch/i386/kernel/paravirt.c
--- a/arch/i386/kernel/paravirt.c   Fri Apr 06 14:43:30 2007 -0700
+++ b/arch/i386/kernel/paravirt.c   Fri Apr 06 14:43:37 2007 -0700
@@ -217,13 +217,6 @@ static void native_flush_tlb_single(unsi
 {
__native_flush_tlb_single(addr);
 }
-
-#ifndef CONFIG_HIGHPTE
-static void *native_kmap_atomic_pte(struct page *page, enum km_type type)
-{
-   return kmap_atomic(page, type);
-}
-#endif
 
 /* These are in entry.S */
 extern void native_iret(void);
@@ -324,7 +317,9 @@ struct paravirt_ops paravirt_ops = {
 
.ptep_get_and_clear = native_ptep_get_and_clear,
 
-   .kmap_atomic_pte = native_kmap_atomic_pte,
+#ifdef CONFIG_HIGHPTE
+   .kmap_atomic_pte = kmap_atomic,
+#endif
 
 #ifdef CONFIG_X86_PAE
.set_pte_atomic = native_set_pte_atomic,
diff -r 5c03805411a6 include/asm-i386/highmem.h
--- a/include/asm-i386/highmem.hFri Apr 06 14:43:30 2007 -0700
+++ b/include/asm-i386/highmem.hFri Apr 06 14:43:37 2007 -0700
@@ -74,11 +74,6 @@ void *kmap_atomic_pfn(unsigned long pfn,
 void *kmap_atomic_pfn(unsigned long pfn, enum km_type type);
 struct page *kmap_atomic_to_page(void *ptr);
 
-static inline void *native_kmap_atomic_pte(struct page *page, enum km_type 
type)
-{
-   return kmap_atomic(page, type);
-}
-
 #ifndef CONFIG_PARAVIRT
 #define kmap_atomic_pte(page, type)kmap_atomic(page, type)
 #endif
diff -r 5c03805411a6 include/asm-i386/paravirt.h
--- a/include/asm-i386/paravirt.h   Fri Apr 06 14:43:30 2007 -0700
+++ b/include/asm-i386/paravirt.h   Fri Apr 06 14:43:37 2007 -0700
@@ -190,7 +190,9 @@ struct paravirt_ops
 
pte_t (*ptep_get_and_clear)(pte_t *ptep);
 
+#ifdef CONFIG_HIGHPTE
void *(*kmap_atomic_pte)(struct page *page, enum km_type type);
+#endif
 
 #ifdef CONFIG_X86_PAE
void (*set_pte_atomic)(pte_t *ptep, pte_t pteval);
@@ -892,12 +894,14 @@ static inline void paravirt_release_pd(u
PVOP_VCALL1(release_pd, pfn);
 }
 
+#ifdef CONFIG_HIGHPTE
 static inline void *kmap_atomic_pte(struct page *page, enum km_type type)
 {
unsigned long ret;
ret = PVOP_CALL2(unsigned long, kmap_atomic_pte, page, type);
return (void *)ret;
 }
+#endif
 
 static inline void pte_update(struct mm_struct *mm, unsigned long addr,
  pte_t *ptep)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/10] I386 sysenter arch pages fix.patch

2007-04-09 Thread Zachary Amsden

In compat mode, the return value here was uninitialized.

Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

diff -r 1fda49a076ed arch/i386/kernel/sysenter.c
--- a/arch/i386/kernel/sysenter.c   Fri Apr 06 14:25:09 2007 -0700
+++ b/arch/i386/kernel/sysenter.c   Fri Apr 06 14:27:31 2007 -0700
@@ -254,7 +254,7 @@ int arch_setup_additional_pages(struct l
 {
struct mm_struct *mm = current->mm;
unsigned long addr;
-   int ret;
+   int ret = 0;
bool compat;
 
down_write(&mm->mmap_sem);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: per-thread rusage

2007-04-09 Thread Andrew Morton

On Wed, 4 Apr 2007 11:10:50 -0700
William Lee Irwin III <[EMAIL PROTECTED]> wrote:

> On Wed, 4 Apr 2007 10:29:31 -0700 William Lee Irwin III <[EMAIL PROTECTED]> 
> wrote:
> >> It is not now possible for a thread to retrieve its own rusage in
> >> isolation. Its rusage is nowhere exposed without being intermixed with
> >> that of its sibling threads. This patch adds support for an
> >> RUSAGE_THREAD who argument that returns rusage for only the desired
> >> thread.
> 
> On Wed, Apr 04, 2007 at 07:48:29PM +0200, Eric Dumazet wrote:
> > Please check mm, because we now return ru_inblock and ru_oublock as well.
> > r->ru_inblock = task_io_get_inblock(p);
> > r->ru_oublock = task_io_get_oublock(p);
> 
> Respun vs. 2.6.21-rc5-mm4, still untested. Also...
> 
> Signed-off-by: William Irwin <[EMAIL PROTECTED]>
> 
> 
> -- wli
> 
> 
> Index: mm-2.6.21-rc5-4/include/linux/resource.h
> ===
> --- mm-2.6.21-rc5-4.orig/include/linux/resource.h 2007-04-03 
> 23:31:19.0 -0700
> +++ mm-2.6.21-rc5-4/include/linux/resource.h  2007-04-04 13:08:47.0 
> -0700
> @@ -18,7 +18,8 @@
>   */
>  #define  RUSAGE_SELF 0
>  #define  RUSAGE_CHILDREN (-1)
> -#define RUSAGE_BOTH  (-2)/* sys_wait4() uses this */
> +#define RUSAGE_THREAD(-2)
> +#define RUSAGE_BOTH  (-3)/* sys_wait4() uses this */
>  
>  struct   rusage {
>   struct timeval ru_utime;/* user time used */
> Index: mm-2.6.21-rc5-4/kernel/sys.c
> ===
> --- mm-2.6.21-rc5-4.orig/kernel/sys.c 2007-04-03 23:33:27.0 -0700
> +++ mm-2.6.21-rc5-4/kernel/sys.c  2007-04-04 13:11:17.0 -0700
> @@ -2074,6 +2074,16 @@
>   }
>  
>   switch (who) {
> + case RUSAGE_THREAD:
> + utime = p->utime;
> + stime = p->stime;
> + r->ru_nvcsw = p->nvcsw;
> + r->ru_nivcsw = p->nivcsw;
> + r->ru_minflt = p->min_flt;
> + r->ru_majflt = p->maj_flt;
> + r->ru_inblock = task_io_get_inblock(p);
> + r->ru_oublock = task_io_get_oublock(p);
> + break;
>   case RUSAGE_BOTH:
>   case RUSAGE_CHILDREN:
>   utime = p->signal->cutime;
> @@ -2131,7 +2141,8 @@
>  
>  asmlinkage long sys_getrusage(int who, struct rusage __user *ru)
>  {
> - if (who != RUSAGE_SELF && who != RUSAGE_CHILDREN)
> + if (who != RUSAGE_SELF && who != RUSAGE_CHILDREN &&
> + who != RUSAGE_THREAD)
>   return -EINVAL;
>   return getrusage(current, who, ru);
>  }

Seems sane.  Could we please get it tested and get a full description in
place?  Something which provides enough detail for the manpage maintainers.

Also, a quick comparison between Linux's RUSAGE_THREAD and $other-os's
implementations would reduce the possibility of silly, cast-in-stone
incompatabilities.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

non-blocking io from kernel to user space

2007-04-09 Thread Marcin Krzysztof Porwit

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

I'm trying to find examples of the kernel doing a non-blocking upcall to
userspace while processing a request. None of the kernel books really
get into this, so I'm hoping someone can just point me at an existing
example.

In this case, what I'm looking to do is call up to winbindd to get some
credentials for a user while the kernel is handling a mount request.

Thanks for any info.
- --
Marcin Krzysztof Porwit
[EMAIL PROTECTED]

#include 
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGGseV4OZU6cX5VBERAjceAJ9IPuMxujrzRCMHmQLuwj1lIerAUgCeOdCh
ymMgN+KqAo0Gzya3iihgtBw=
=TSXw
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: problem in 2.6.21-rc6-mm1: i386-enable-4k-stacks-by-default.patch?

2007-04-09 Thread Adrian Bunk

On Mon, Apr 09, 2007 at 03:34:11PM -0700, Jeremy Fitzhardinge wrote:
> --- a/arch/i386/Kconfig.debug~i386-enable-4k-stacks-by-default
> +++ a/arch/i386/Kconfig.debug
> @@ -57,14 +57,16 @@ config DEBUG_RODATA
> If in doubt, say "N".
>  
>  config 4KSTACKS
> - bool "Use 4Kb for kernel stacks instead of 8Kb"
> - depends on DEBUG_KERNEL
> + bool "Use 4Kb for kernel stacks instead of 8Kb" if DEBUG_KERNEL
> + depends on n
> + default y
> 
> 
> I don't see this option appear in my .config, and I guess its always
> using 8k stacks.  If I remove the "depends on n", it does what I expect
> it to do (ie, offers me the option).  Is "depends on n" supposed to do
> something, or is it just a bogon?

Looks bogus, and the "depends on n" wasn't in the patch when I sent it.

> J

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 1/8] [Intel IOMMU] ACPI support for Intel Virtualization Technology for Directed I/O

2007-04-09 Thread Ashok Raj

This patch contains basic ACPI parsing and enumeration support.

Signed-off-by: Ashok Raj <[EMAIL PROTECTED]>
Signed-off-by: Shaohua Li <[EMAIL PROTECTED]>
Index: linux-2.6.21-rc5/arch/x86_64/Kconfig
===
--- linux-2.6.21-rc5.orig/arch/x86_64/Kconfig   2007-04-03 04:30:40.0 
-0700
+++ linux-2.6.21-rc5/arch/x86_64/Kconfig2007-04-03 06:34:17.0 
-0700
@@ -687,6 +687,14 @@
bool "Support mmconfig PCI config space access"
depends on PCI && ACPI
 
+config DMAR
+   bool "Support for DMA Remapping Devices (EXPERIMENTAL)"
+   depends on PCI_MSI && ACPI && EXPERIMENTAL
+   help
+ Support DMA Remapping Devices. The devices are reported via
+ ACPI tables and includes pci device scope under each DMA
+ remapping device.
+
 source "drivers/pci/pcie/Kconfig"
 
 source "drivers/pci/Kconfig"
Index: linux-2.6.21-rc5/drivers/acpi/Makefile
===
--- linux-2.6.21-rc5.orig/drivers/acpi/Makefile 2007-04-03 04:30:40.0 
-0700
+++ linux-2.6.21-rc5/drivers/acpi/Makefile  2007-04-03 06:34:17.0 
-0700
@@ -60,3 +60,4 @@
 obj-$(CONFIG_ACPI_HOTPLUG_MEMORY)  += acpi_memhotplug.o
 obj-y  += cm_sbs.o
 obj-$(CONFIG_ACPI_SBS) += i2c_ec.o sbs.o
+obj-$(CONFIG_DMAR) += dmar.o
Index: linux-2.6.21-rc5/drivers/acpi/dmar.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.21-rc5/drivers/acpi/dmar.c2007-04-03 06:54:27.0 
-0700
@@ -0,0 +1,344 @@
+/*
+ * Copyright (c) 2006, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Copyright (C) Ashok Raj <[EMAIL PROTECTED]>
+ * Copyright (C) Shaohua Li <[EMAIL PROTECTED]>
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#undef PREFIX
+#define PREFIX "ACPI DMAR:"
+
+#define MIN_SCOPE_LEN (sizeof(struct acpi_pci_path) + sizeof(struct 
acpi_dev_scope))
+
+LIST_HEAD(acpi_drhd_units);
+LIST_HEAD(acpi_rmrr_units);
+u8 dmar_host_address_width;
+
+static int __init acpi_register_drhd_unit(struct acpi_drhd_unit *drhd)
+{
+   /*
+* add INCLUDE_ALL at the tail, so scan the list will find it at
+* the very end.
+*/
+   if (drhd->include_all)
+   list_add_tail(&drhd->list, &acpi_drhd_units);
+   else
+   list_add(&drhd->list, &acpi_drhd_units);
+   return 0;
+}
+
+static int __init acpi_register_rmrr_unit(struct acpi_rmrr_unit *rmrr)
+{
+   list_add(&rmrr->list, &acpi_rmrr_units);
+   return 0;
+}
+
+static int acpi_pci_device_match(struct pci_dev *devices[], int cnt,
+struct pci_dev *dev)
+{
+   int index;
+
+   while (dev) {
+   for (index = 0; index < cnt; index ++)
+   if (dev == devices[index])
+   return 1;
+
+   /* Check our parent */
+   dev = dev->bus->self;
+   }
+
+   return 0;
+}
+
+struct acpi_drhd_unit * acpi_find_matched_drhd_unit(struct pci_dev *dev)
+{
+   struct acpi_drhd_unit *drhd = NULL;
+
+   list_for_each_entry(drhd, &acpi_drhd_units, list) {
+   if (drhd->include_all || acpi_pci_device_match(drhd->devices,
+   drhd->devices_cnt, dev))
+   break;
+   }
+
+   return drhd;
+}
+
+struct acpi_rmrr_unit * acpi_find_matched_rmrr_unit(struct pci_dev *dev)
+{
+   struct acpi_rmrr_unit *rmrr;
+
+   list_for_each_entry(rmrr, &acpi_rmrr_units, list) {
+   if (acpi_pci_device_match(rmrr->devices,
+   rmrr->devices_cnt, dev))
+   goto out;
+   }
+   rmrr = NULL;
+out:
+   return rmrr;
+}
+
+static int __init acpi_parse_one_dev_scope(struct acpi_dev_scope *scope,
+  struct pci_dev **dev, u16 segment)
+{
+   struct pci_bus *bus;
+   struct pci_dev *pdev = NULL;
+   struct acpi_pci_path *path;
+   int count;
+
+   bus = pci_find_bus(segment, scope->start_bus);
+   path = (struct acpi_pci_path *)(scope + 1);
+   count = (scope->length - sizeof(struct a

[patch 7/8] [Intel IOMMU] Support for legacy ISA devices

2007-04-09 Thread Ashok Raj

Floppy disk drivers dont work well with DMA remapping. Its possible to 
extend the current use for x86_64, but the gain is very little. If someone
feels compelled to clean this up, its up for grabs. Since these use 16M, we 
just provide a unity map for the ISA bridge device.

Signed-off-by: Ashok Raj <[EMAIL PROTECTED]>
Signed-off-by: Shaohua Li <[EMAIL PROTECTED]>

Index: linux-2.6.21-rc5/Documentation/kernel-parameters.txt
===
--- linux-2.6.21-rc5.orig/Documentation/kernel-parameters.txt   2007-04-09 
03:05:36.0 -0700
+++ linux-2.6.21-rc5/Documentation/kernel-parameters.txt2007-04-09 
03:05:38.0 -0700
@@ -730,6 +730,11 @@
the IOMMU driver to set a unity map for all OS
visible memory. Hence the driver can continue to use
physical addresses for DMA.
+   noisamap
+   This option is required to setup identify map for
+   first 16M. The floppy disk could be modified to use
+   the DMA api's but thats a lot of pain for very small
+   gain. This option is turned on by default.
io7=[HW] IO7 for Marvel based alpha systems
See comment before marvel_specify_io7 in
arch/alpha/kernel/core_marvel.c.
Index: linux-2.6.21-rc5/drivers/pci/intel-iommu.c
===
--- linux-2.6.21-rc5.orig/drivers/pci/intel-iommu.c 2007-04-09 
03:05:34.0 -0700
+++ linux-2.6.21-rc5/drivers/pci/intel-iommu.c  2007-04-09 03:05:38.0 
-0700
@@ -37,6 +37,8 @@
 #include "pci.h"
 
 #define IS_GFX_DEVICE(pdev) ((pdev->class >> 16) == PCI_BASE_CLASS_DISPLAY)
+#define IS_ISA_DEVICE(pdev) ((pdev->class >> 8) == PCI_CLASS_BRIDGE_ISA)
+
 #define IOAPIC_RANGE_START (0xfee0)
 #define IOAPIC_RANGE_END   (0xfeef)
 #define IOAPIC_RANGE_SIZE  (IOAPIC_RANGE_END - IOAPIC_RANGE_START + 1)
@@ -87,6 +89,7 @@
 
 static int dmar_disabled, dmar_force_rw;
 static int dmar_map_gfx = 1, dmar_no_gfx_identity_map = 1;
+static int dmar_fix_isa = 1;
 
 static char *get_fault_reason(u8 fault_reason)
 {
@@ -113,6 +116,9 @@
} else if (!strncmp(str, "gfx_workaround", 14)) {
dmar_no_gfx_identity_map = 0;
printk(KERN_INFO"Intel-IOMMU: do 1-1 mapping whole 
physical memory for GFX device\n");
+   } else if (!strncmp(str, "noisamap", 8)) {
+   dmar_fix_isa = 0;
+   printk (KERN_INFO"Intel-IOMMU: Turning off 16M unity 
map for LPC\n");
}
 
str += strcspn(str, ",");
@@ -1575,6 +1581,25 @@
}
 }
 
+static void iommu_prepare_isa(void)
+{
+   struct pci_dev *pdev = NULL;
+   int ret;
+
+   if (!dmar_fix_isa)
+   return;
+
+   pdev = pci_get_class (PCI_CLASS_BRIDGE_ISA << 8, NULL);
+   if (!pdev)
+   return;
+
+   printk (KERN_INFO "IOMMU: Prepare 0-16M unity mapping for LPC\n");
+   ret = iommu_prepare_identity_map(pdev, 0, 16*1024*1024);
+
+   if (ret)
+   printk ("IOMMU: Failed to create 0-64M identity map, Floppy 
might not work\n");
+
+}
 int __init init_dmars(void)
 {
struct acpi_drhd_unit *drhd;
@@ -1631,6 +1656,7 @@
end_for_each_rmrr_device(rmrr, pdev)
 
iommu_prepare_gfx_mapping();
+   iommu_prepare_isa();
 
/*
 * for each drhd

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 5/8] [Intel IOMMU] Graphics driver workarounds to provide unity map

2007-04-09 Thread Ashok Raj

Most GFX drivers don't call standard PCI DMA APIs to allocate DMA buffer,
Such drivers will be broken with IOMMU enabled. To workaround this issue, 
we added two options.

Once graphics devices are converted over to use the DMA-API's this entire
patch can be removed... 

a. intel_iommu=igfx_off. With this option, DMAR who has just gfx devices
   under it will be ignored. This mostly affect intergated gfx devices.
   If the DMAR is ignored, gfx device under it will get physical address
   for DMA.
b. intel_iommu=gfx_workaround. With this option, we will setup 1:1 mapping
   for whole memory for gfx devices, that is physical address equals to
   virtual address.In this way, gfx will use physical address for DMA, this
   is primarily for add-in card GFX device.

Signed-off-by: Ashok Raj <[EMAIL PROTECTED]>
Signed-off-by: Shaohua Li <[EMAIL PROTECTED]>
Index: linux-2.6.21-rc5/arch/x86_64/kernel/e820.c
===
--- linux-2.6.21-rc5.orig/arch/x86_64/kernel/e820.c 2007-04-09 
03:02:37.0 -0700
+++ linux-2.6.21-rc5/arch/x86_64/kernel/e820.c  2007-04-09 03:05:34.0 
-0700
@@ -730,3 +730,22 @@
printk(KERN_INFO "Allocating PCI resources starting at %lx (gap: 
%lx:%lx)\n",
pci_mem_start, gapstart, gapsize);
 }
+
+int __init arch_get_ram_range(int slot, u64 *addr, u64 *size)
+{
+   int i;
+
+   if (slot < 0 || slot >= e820.nr_map)
+   return -1;
+   for (i = slot; i < e820.nr_map; i++) {
+   if(e820.map[i].type != E820_RAM)
+   continue;
+   break;
+   }
+   if (i == e820.nr_map || e820.map[i].addr > (max_pfn << PAGE_SHIFT))
+   return -1;
+   *addr = e820.map[i].addr;
+   *size = min_t(u64, e820.map[i].size + e820.map[i].addr,
+   max_pfn << PAGE_SHIFT) - *addr;
+   return i + 1;
+}
Index: linux-2.6.21-rc5/drivers/pci/intel-iommu.c
===
--- linux-2.6.21-rc5.orig/drivers/pci/intel-iommu.c 2007-04-09 
03:05:32.0 -0700
+++ linux-2.6.21-rc5/drivers/pci/intel-iommu.c  2007-04-09 03:05:34.0 
-0700
@@ -36,6 +36,7 @@
 #include "iova.h"
 #include "pci.h"
 
+#define IS_GFX_DEVICE(pdev) ((pdev->class >> 16) == PCI_BASE_CLASS_DISPLAY)
 #define IOAPIC_RANGE_START (0xfee0)
 #define IOAPIC_RANGE_END   (0xfeef)
 #define IOAPIC_RANGE_SIZE  (IOAPIC_RANGE_END - IOAPIC_RANGE_START + 1)
@@ -85,6 +86,7 @@
 };
 
 static int dmar_disabled, dmar_force_rw;
+static int dmar_map_gfx = 1, dmar_no_gfx_identity_map = 1;
 
 static char *get_fault_reason(u8 fault_reason)
 {
@@ -105,7 +107,14 @@
} else if (!strncmp(str, "forcerw", 7)) {
dmar_force_rw = 1;
printk(KERN_INFO"Intel-IOMMU: force R/W for W/O 
mapping\n");
+   } else if (!strncmp(str, "igfx_off", 8)) {
+   dmar_map_gfx = 0;
+   printk(KERN_INFO"Intel-IOMMU: disable GFX device 
mapping\n");
+   } else if (!strncmp(str, "gfx_workaround", 14)) {
+   dmar_no_gfx_identity_map = 0;
+   printk(KERN_INFO"Intel-IOMMU: do 1-1 mapping whole 
physical memory for GFX device\n");
}
+
str += strcspn(str, ",");
while (*str == ',')
str++;
@@ -1311,6 +1320,7 @@
struct pci_dev *dev; /* it's NULL for PCIE-to-PCI bridge */
struct domain *domain;
 };
+#define DUMMY_DEVICE_DOMAIN_INFO ((struct device_domain_info *)(-1))
 static DEFINE_SPINLOCK(device_domain_lock);
 static LIST_HEAD(device_domain_list);
 
@@ -1531,10 +1541,40 @@
 static inline int iommu_prepare_rmrr_dev(struct acpi_rmrr_unit *rmrr,
struct pci_dev *pdev)
 {
+   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   return 0;
return iommu_prepare_identity_map(pdev, rmrr->base_address,
rmrr->end_address + 1);
 }
 
+static void iommu_prepare_gfx_mapping(void)
+{
+   struct pci_dev *pdev = NULL;
+   u64 base, size;
+   int slot;
+   int ret;
+
+   if (dmar_no_gfx_identity_map)
+   return;
+
+   for_each_pci_dev(pdev) {
+   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO ||
+   !IS_GFX_DEVICE(pdev))
+   continue;
+   printk(KERN_INFO "IOMMU: gfx device %s 1-1 mapping\n",
+   pci_name(pdev));
+   slot = 0;
+   while ((slot = arch_get_ram_range(slot, &base, &size)) >= 0) {
+   ret = iommu_prepare_identity_map(pdev, base, base + 
size);
+   if (ret)
+   goto error;
+   }
+   continue;
+error:
+   printk(KERN_ERR "IOMMU: mapping reserved region failed\n");
+   }
+}
+
 int __init init_dmars(void)
 {

[patch 6/8] [Intel IOMMU] Doc updates for Intel Virtualization Technology for Directed I/O.

2007-04-09 Thread Ashok Raj

Document Intel IOMMU driver boot option.

Signed-off-by: Ashok Raj <[EMAIL PROTECTED]>
Signed-off-by: Shaohua Li <[EMAIL PROTECTED]>
Index: linux-2.6.21-rc5/Documentation/Intel-IOMMU.txt
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.21-rc5/Documentation/Intel-IOMMU.txt  2007-04-09 
03:05:36.0 -0700
@@ -0,0 +1,119 @@
+Linux IOMMU Support
+===
+
+The architecture spec can be obtained from the below location.
+
+http://www.intel.com/technology/virtualization/
+
+This guide gives a quick cheat sheet for some basic understanding.
+
+Some Keywords
+
+DMAR - DMA remapping
+DRHD - DMA Engine Reporting Structure
+RMRR - Reserved memory Region Reporting Structure
+ZLR  - Zero length reads from PCI devices
+IOVA - IO Virtual address.
+
+Basic stuff
+---
+
+ACPI enumerates and lists the different DMA engines in the platform, and
+device scope relationships between PCI devices and which DMA engine  controls
+them.
+
+What is RMRR?
+-
+
+There are some devices the BIOS controls, for e.g USB devices to perform
+PS2 emulation. The regions of memory used for these devices are marked
+reserved in the e820 map. When we turn on DMA translation, DMA to those
+regions will fail. Hence BIOS uses RMRR to specify these regions along with
+devices that need to access these regions. OS is expected to setup
+unity mappings for these regions for these devices to access these regions.
+
+How is IOVA generated?
+-
+
+Well behaved drivers call pci_map_*() calls before sending command to device
+that needs to perform DMA. Once DMA is completed and mapping is no longer
+required, device performs a pci_unmap_*() calls to unmap the region.
+
+The Intel IOMMU driver allocates a virtual address per domain. Each PCIE
+device has its own domain (hence protection). Devices under p2p bridges
+share the virtual address with all devices under the p2p bridge due to
+transaction id aliasing for p2p bridges.
+
+IOVA generation is pretty generic. We used the same technique as vmalloc()
+but these are not global address spaces, but separate for each domain.
+Different DMA engines may support different number of domains.
+
+We also allocate gaurd pages with each mapping, so we can attempt to catch
+any overflow that might happen.
+
+
+Graphics Problems?
+--
+If you encounter issues with graphics devices, you can try adding
+option intel_iommu=igfx_off to turn off the integrated graphics engine.
+
+If it happens to be a PCI device included in the INCLUDE_ALL Engine,
+then try the intel_iommu=gfx_workaround to setup a 1-1 map. We hear
+graphics drivers may be in process of using DMA api's in the near
+future
+
+Some exceptions to IOVA
+---
+Interrupt ranges are not address translated, (0xfee0 - 0xfeef).
+The same is true for peer to peer transactions. Hence we reserve the
+address from PCI MMIO ranges so they are not allocated for IOVA addresses.
+
+
+Fault reporting
+---
+When errors are reported, the DMA engine signals via an interrupt. The fault
+reason and device that caused it with fault reason is printed on console.
+
+See below for sample.
+
+
+Boot Message Sample
+---
+
+Something like this gets printed indicating presence of DMAR tables
+in ACPI.
+
+ACPI: DMAR (v001 A M I  OEMDMAR  0x0001 MSFT 0x0097) @ 
0x7f5b5ef0
+
+When DMAR is being processed and initialized by ACPI, prints DMAR locations
+and any RMRR's processed.
+
+ACPI DMAR:Host address width 36
+ACPI DMAR:DRHD (flags: 0x)base: 0xfed9
+ACPI DMAR:DRHD (flags: 0x)base: 0xfed91000
+ACPI DMAR:DRHD (flags: 0x0001)base: 0xfed93000
+ACPI DMAR:RMRR base: 0x000ed000 end: 0x000e
+ACPI DMAR:RMRR base: 0x7f60 end: 0x7fff
+
+When DMAR is enabled for use, you will notice..
+
+PCI-DMA: Using DMAR IOMMU
+
+Fault reporting
+---
+
+DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000
+DMAR:[fault reason 05] PTE Write access is not set
+DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000
+DMAR:[fault reason 05] PTE Write access is not set
+
+TBD
+
+
+- No Performance tuning / analysis yet.
+- sysfs needs useful data to be populated.
+  DMAR info, device scope, stats could be exposed to some extent.
+- Add support to Firmware Developer Kit to test ACPI tables for DMAR.
+- For compatibility testing, could use unity map domain for all devices, just
+  provide a 1-1 for all useful memory under a single domain for all devices.
+- API for paravirt ops for abstracting functionlity for VMM folks.
Index: linux-2.6.21-rc5/Documentation/kernel-parameters.txt
===
--- linux-2.6.21-rc5.orig/Documentation/kernel-parameters.txt   2007-04-09 
03:02:37.0 -0700
+++ linux-2.6.21-rc5/Docum

[patch 8/8] [Intel IOMMU] Preserve some Virtual Address when devices cannot address entire range.

2007-04-09 Thread Ashok Raj

Some devices may not support entire 64bit DMA. In a situation where such 
devices are co-located in a shared domain, we need to ensure there is some 
address space reserved for such devices without the low addresses getting
depleted by other devices capable of handling high dma addresses.

Signed-off-by: Ashok Raj <[EMAIL PROTECTED]>
Signed-off-by: Shaohua Li <[EMAIL PROTECTED]>
Index: linux-2.6.21-rc5/Documentation/kernel-parameters.txt
===
--- linux-2.6.21-rc5.orig/Documentation/kernel-parameters.txt   2007-04-09 
03:05:38.0 -0700
+++ linux-2.6.21-rc5/Documentation/kernel-parameters.txt2007-04-09 
03:05:41.0 -0700
@@ -735,6 +735,11 @@
first 16M. The floppy disk could be modified to use
the DMA api's but thats a lot of pain for very small
gain. This option is turned on by default.
+   preserve_{1g/2g/4g/512m/256m/16m}
+   If a device is sharing a domain with other devices
+   and the device mask doesnt cover the 64bit range,
+   use this option to let the iommu code preserve some
+   virtual addr for such devices.
io7=[HW] IO7 for Marvel based alpha systems
See comment before marvel_specify_io7 in
arch/alpha/kernel/core_marvel.c.
Index: linux-2.6.21-rc5/drivers/pci/intel-iommu.c
===
--- linux-2.6.21-rc5.orig/drivers/pci/intel-iommu.c 2007-04-09 
03:05:38.0 -0700
+++ linux-2.6.21-rc5/drivers/pci/intel-iommu.c  2007-04-09 03:06:17.0 
-0700
@@ -90,6 +90,7 @@
 static int dmar_disabled, dmar_force_rw;
 static int dmar_map_gfx = 1, dmar_no_gfx_identity_map = 1;
 static int dmar_fix_isa = 1;
+static u64 dmar_preserve_iova_mask;
 
 static char *get_fault_reason(u8 fault_reason)
 {
@@ -119,6 +120,28 @@
} else if (!strncmp(str, "noisamap", 8)) {
dmar_fix_isa = 0;
printk (KERN_INFO"Intel-IOMMU: Turning off 16M unity 
map for LPC\n");
+   } else if (!strncmp(str, "preserve_", 9)) {
+   if (!strncmp(str + 9, "4g", 2) ||
+   !strncmp(str + 9, "4G", 2))
+   dmar_preserve_iova_mask = DMA_32BIT_MASK;
+   else if (!strncmp(str + 9, "2g", 2) ||
+   !strncmp(str + 9, "2G", 2))
+   dmar_preserve_iova_mask = DMA_31BIT_MASK;
+   else if (!strncmp(str + 9, "1g", 2) ||
+!strncmp(str + 9, "1G", 2))
+   dmar_preserve_iova_mask = DMA_30BIT_MASK;
+   else if (!strncmp(str + 9, "512m", 2) ||
+!strncmp(str + 9, "512M", 2))
+   dmar_preserve_iova_mask = DMA_29BIT_MASK;
+   else if (!strncmp(str + 9, "256m", 4) ||
+!strncmp(str + 9, "256M", 4))
+   dmar_preserve_iova_mask = DMA_28BIT_MASK;
+   else if (!strncmp(str + 9, "16m", 3) ||
+!strncmp(str + 9, "16M", 3))
+   dmar_preserve_iova_mask = DMA_24BIT_MASK;
+   printk(KERN_INFO
+   "DMAR: Preserved IOVA mask 0x%Lx for devices "
+   "sharing domain\n", dmar_preserve_iova_mask);
}
 
str += strcspn(str, ",");
@@ -1726,9 +1749,10 @@
 * leave rooms for other devices
 */
if ((domain->flags & DOMAIN_FLAG_MULTIPLE_DEVICES) &&
-   pdev->dma_mask > DMA_32BIT_MASK)
+   dmar_preserve_iova_mask &&
+   pdev->dma_mask > dmar_preserve_iova_mask)
iova = alloc_iova(domain, addr, size,
-   DMA_32BIT_MASK + 1, pdev->dma_mask);
+   dmar_preserve_iova_mask + 1, pdev->dma_mask);
else
iova = alloc_iova(domain, addr, size,
IOVA_START_ADDR, pdev->dma_mask);

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 0/8] [Intel IOMMU] Support for Intel Virtualization Technology for Directed I/O

2007-04-09 Thread Ashok Raj

Hi,

Pleased to announce support for Intel(R) Virtualization Technology for 
Directed I/O use as an IOMMU in Linux.

This is a series of patches to support the same. 

A brief description of the patches follows.

1. Support for ACPI framework to parse and work with DMA Remapping Tables.
2. Add support for PCI infrastructure to search parent relationships.
3. Hardware support for providing DMA remapping support for Intel Chipsets.
4. Supporting Zero Length Reads on DMAR's not able to support ZLR.
5. Graphics driver workarounds to provide unity map since they dont use dma api.
6. Updates to Documentation area for startup options and some basics.
7. Workaround to provide unity map for ISA bridge device to enable floppy disk.
8. Ability to preserve some mappings for devices not able to address entire 
   range.

Please help review and provide feedback.

Cheers,
Ashok Raj & Shaohua Li
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 2/8] [Intel IOMMU] Some generic search functions required to lookup device relationships.

2007-04-09 Thread Ashok Raj

PCI support functions for DMAR, to find parent bridge. 

When devices are under a p2p bridge, upstream
transactions get replaced by the device id of the bridge as it owns the 
PCIE transaction. Hence its necessary to setup translations on behalf of the 
bridge as well. Due to this limitation all devices under a p2p share the same
domain in a DMAR. 

We just cache the type of device, if its a native PCIe device
or not for later use.

Signed-off-by: Ashok Raj <[EMAIL PROTECTED]>
Signed-off-by: Shaohua Li <[EMAIL PROTECTED]>
-

Index: linux-2.6.21-rc5/drivers/pci/pci.h
===
--- linux-2.6.21-rc5.orig/drivers/pci/pci.h 2007-04-03 04:30:44.0 
-0700
+++ linux-2.6.21-rc5/drivers/pci/pci.h  2007-04-03 06:58:58.0 -0700
@@ -90,3 +90,4 @@
return NULL;
 }
 
+struct pci_dev *pci_find_upstream_pcie_bridge(struct pci_dev *pdev);
Index: linux-2.6.21-rc5/drivers/pci/probe.c
===
--- linux-2.6.21-rc5.orig/drivers/pci/probe.c   2007-04-03 04:30:44.0 
-0700
+++ linux-2.6.21-rc5/drivers/pci/probe.c2007-04-03 06:58:58.0 
-0700
@@ -822,6 +822,19 @@
kfree(pci_dev);
 }
 
+static void set_pcie_port_type(struct pci_dev *pdev)
+{
+   int pos;
+   u16 reg16;
+
+   pos = pci_find_capability(pdev, PCI_CAP_ID_EXP);
+   if (!pos)
+   return;
+   pdev->is_pcie = 1;
+   pci_read_config_word(pdev, pos + PCI_EXP_FLAGS, ®16);
+   pdev->pcie_type = (reg16 & PCI_EXP_FLAGS_TYPE) >> 4;
+}
+
 /**
  * pci_cfg_space_size - get the configuration space size of the PCI device.
  * @dev: PCI device
@@ -919,6 +932,7 @@
dev->device = (l >> 16) & 0x;
dev->cfg_size = pci_cfg_space_size(dev);
dev->error_state = pci_channel_io_normal;
+   set_pcie_port_type(dev);
 
/* Assume 32-bit PCI; let 64-bit PCI cards (which are far rarer)
   set this higher, assuming the system even supports it.  */
Index: linux-2.6.21-rc5/drivers/pci/search.c
===
--- linux-2.6.21-rc5.orig/drivers/pci/search.c  2007-04-03 04:30:44.0 
-0700
+++ linux-2.6.21-rc5/drivers/pci/search.c   2007-04-03 06:58:58.0 
-0700
@@ -14,6 +14,36 @@
 #include "pci.h"
 
 DECLARE_RWSEM(pci_bus_sem);
+/*
+ * find the upstream PCIE-to-PCI bridge of a PCI device
+ * if the device is PCIE, return NULL
+ * if the device isn't connected to a PCIE bridge (that is its parent is a
+ * legacy PCI bridge and the bridge is directly connected to bus 0), return its
+ * parent
+ */
+struct pci_dev *
+pci_find_upstream_pcie_bridge(struct pci_dev *pdev)
+{
+   struct pci_dev *tmp = NULL;
+
+   if (pdev->is_pcie)
+   return NULL;
+   while (1) {
+   if (!pdev->bus->self)
+   break;
+   pdev = pdev->bus->self;
+   /* a p2p bridge */
+   if (!pdev->is_pcie) {
+   tmp = pdev;
+   continue;
+   }
+   /* PCI device should connect to a PCIE bridge */
+   BUG_ON(pdev->pcie_type != PCI_EXP_TYPE_PCI_BRIDGE);
+   return pdev;
+   }
+
+   return tmp;
+}
 
 static struct pci_bus *
 pci_do_find_bus(struct pci_bus* bus, unsigned char busnr)
Index: linux-2.6.21-rc5/include/linux/pci.h
===
--- linux-2.6.21-rc5.orig/include/linux/pci.h   2007-04-03 04:30:51.0 
-0700
+++ linux-2.6.21-rc5/include/linux/pci.h2007-04-03 06:58:58.0 
-0700
@@ -126,6 +126,7 @@
unsigned short  subsystem_device;
unsigned intclass;  /* 3 bytes: (base,sub,prog-if) */
u8  hdr_type;   /* PCI header type (`multi' flag masked 
out) */
+   u8  pcie_type;  /* PCI-E device/port type */
u8  rom_base_reg;   /* which config register controls the 
ROM */
u8  pin;/* which interrupt pin this device uses 
*/
 
@@ -168,6 +169,7 @@
unsigned intmsi_enabled:1;
unsigned intmsix_enabled:1;
unsigned intis_managed:1;
+   unsigned intis_pcie:1;
atomic_tenable_cnt; /* pci_enable_device has been called */
 
u32 saved_config_space[16]; /* config space saved at 
suspend time */

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 4/8] [Intel IOMMU] Supporting Zero Length Reads in Intel IOMMU.

2007-04-09 Thread Ashok Raj

PCI specs permit zero length reads (ZLR) even if the mapping for that region 
is write only. Support for this feature is indicated by the presence of a bit 
in the DMAR capability. If a particular DMAR does not support this capability
we map write-only regions as read-write.

This option can also provides a workaround for some drivers that request
a write-only mapping when they really should request a read-write.
(We ran into one such case in eepro100.c in handling rx_ring_dma)

Signed-off-by: Ashok Raj <[EMAIL PROTECTED]>
Signed-off-by: Shaohua Li <[EMAIL PROTECTED]>
--
 drivers/pci/intel-iommu.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

Index: linux-2.6.21-rc5/drivers/pci/intel-iommu.c
===
--- linux-2.6.21-rc5.orig/drivers/pci/intel-iommu.c 2007-04-09 
03:05:25.0 -0700
+++ linux-2.6.21-rc5/drivers/pci/intel-iommu.c  2007-04-09 03:05:32.0 
-0700
@@ -84,7 +84,7 @@
struct sys_device sysdev;
 };
 
-static int dmar_disabled;
+static int dmar_disabled, dmar_force_rw;
 
 static char *get_fault_reason(u8 fault_reason)
 {
@@ -102,6 +102,9 @@
if (!strncmp(str, "off", 3)) {
dmar_disabled = 1;
printk(KERN_INFO"Intel-IOMMU: disabled\n");
+   } else if (!strncmp(str, "forcerw", 7)) {
+   dmar_force_rw = 1;
+   printk(KERN_INFO"Intel-IOMMU: force R/W for W/O 
mapping\n");
}
str += strcspn(str, ",");
while (*str == ',')
@@ -1668,7 +1671,12 @@
goto error;
}
 
-   if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
+   /*
+* Check if DMAR supports zero-length reads on write only
+* mappings..
+*/
+   if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL || \
+   !cap_zlr(domain->iommu->cap) || dmar_force_rw)
prot |= DMA_PTE_READ;
if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)
prot |= DMA_PTE_WRITE;
Index: linux-2.6.21-rc5/include/linux/intel-iommu.h
===
--- linux-2.6.21-rc5.orig/include/linux/intel-iommu.h   2007-04-09 
03:05:25.0 -0700
+++ linux-2.6.21-rc5/include/linux/intel-iommu.h2007-04-09 
03:05:32.0 -0700
@@ -79,6 +79,7 @@
 #define cap_max_fault_reg_offset(c) \
(cap_fault_reg_offset(c) + cap_num_fault_regs(c) * 16)
 
+#define cap_zlr(c) (((c) >> 22) & 1)
 #define cap_isoch(c)   (((c) >> 23) & 1)
 #define cap_mgaw(c)c) >> 16) & 0x3f) + 1)
 #define cap_sagaw(c)   (((c) >> 8) & 0x1f)

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL -mm] Unionfs branch management code

2007-04-09 Thread Josef Sipek

On Mon, Apr 09, 2007 at 10:49:48AM -0700, Andrew Morton wrote:
> On Mon,  9 Apr 2007 10:53:51 -0400 "Josef 'Jeff' Sipek" <[EMAIL PROTECTED]> 
> wrote:
> 
> > The following patches introduce new branch-management code into Unionfs as
> > well as fix a number of stability issues and resource leaks.

First, a quick note...Unionfs used to have branch management code, but the
code was so crufty that we decided to spare the eyes (and brains) of kernel
developers at large by ripping it out. This series of patches just
reintroduces the functionality in a sane way. With that said...

> I have a mental note that unionfs is in the "stuck" state, due to general
> agreement that we should implement this functionality at the VFS level, one
> reason for which is unionfs's upper-vs-lower coherency problems.

Right. The upper-vs-lower coherency problem is indeed a problem, but it is
not a _Unionfs_ problem, but rather a _stackable filesystems_ problem
(eCrypfs suffers from the same issue people are just less likely to trip
over it as no one in their right mind modifies encrypted data by hand). If
we hope to have Linux do stacking (which, I think makes sense), we need to
make few changes to the kernel to allow stackable filesystems to work
better, and safer. We're working on an OLS paper which discusses some of
these ideas (some of which we got at LSF back in February) - for example, do
we want to have strong or weak cache coherency? (When the lower pages
change, do we want to have the VM enforce coherency, or can we use more of
an NFS-like coherency model - checking {a,c,m}time.)

Josef "Jeff" Sipek.

-- 
Intellectuals solve problems; geniuses prevent them
- Albert Einstein
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?

2007-04-09 Thread Helge Hafting

On Fri, Apr 06, 2007 at 10:37:12PM -0400, Dmitry Torokhov wrote:
> On Friday 06 April 2007 20:54, Helge Hafting wrote:
> > I have an usb  touchscreen (egalax variety) that works with
> > the 2.6.18 kernel supplied by debian.
> > 
> > It fails when I compile 2.6.21-rc5-mm4, tuned to the machine
> > in question.  Unlike the debian kernel, this kernel don't use
> > modules in order to save boot time.
> > 
> > The strange thing is, 2.6.21-rc5-mm4 recognizes the device.
> > dmesg says things like 
> > usb 3-2: Manufacturer: eGalac Inc.
> > usb 3-2: Product: USB TouchController
> > 
> > and a lot more. Unlike 2.6.18, it never gets around to say
> > "usbcore: registered new driver usbtouchscreen"
> > which seems to indicate a problem.
> > usbcore registers several other drivers, such as usbserial and pl2303
> > that makes the gps work. It also registers other drivers like
> > usb-storage,usbfs,hub,libusual,hiddev,usbhid.  But not usbtouchscreen.
> > I believe I have turned on every config option for usb touchscreen,
> > this should not be missing.
> > 
> > Is there something wrong, or could there be a seemingly unrelated option
> > that I need to turn on?
> 
> Please make sure that you have CONFIG_USB_TOUCHSCREEN turned on.
> 
Unfortunately, I have:
CONFIG_USB_TOUCHSCREEN=y
CONFIG_USB_TOUCHSCREEN_EGALAX=y

Anything else I may have missed?

Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: still curious about that CONFIG_FORCED_INLINING option

2007-04-09 Thread Adrian Bunk

On Mon, Apr 09, 2007 at 01:21:27PM -0400, Robert P. J. Day wrote:
> 
>   i'm curious -- has there been any decision about the config option
> of FORCED_INLINING?  the feature removal file suggests it should have
> disappeared last year.  is it still doing something useful?

It never did anything.

> rday

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 >

1 - 100 of 455 matches

Mail list logo