Re: [PATCH v1] sata_mv: conversion to new EH

2007-01-25 Thread Jeff Garzik

Jeff Garzik wrote:

This is the first cut at converting sata_mv to new EH.

It builds, but is untested.

Done:
- freeze, thaw
- hardreset
- prereset, software: intentionally not implemented


s/software/softreset/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add a rounddown_pow_of_two() macro to log2.h.

2007-01-25 Thread Robert P. J. Day
On Thu, 25 Jan 2007, Andrew Morton wrote:

> On Thu, 25 Jan 2007 04:32:12 -0500 (EST)
> "Robert P. J. Day" <[EMAIL PROTECTED]> wrote:

> > +/*
> > + * round down to nearest power of two
> > + */
> > +static inline __attribute__((const))
> > +unsigned long __rounddown_pow_of_two(unsigned long n)
> > +{
> > +   return 1UL << (fls_long(n) - 1);
> > +}
>
> So __rounddown_pow_of_two(16) returns 8?

it does?  but if that was true, so would 17, and 18, and 19 ...  i
didn't actually test this since it seemed so straightforward.
doesn't fls_long() return the most significant bit?  oh, wait ...
reading further ...
 >
> >  /**
> >   * ilog2 - log of base 2 of 32-bit or a 64-bit unsigned value
> >   * @n - parameter
> > @@ -154,4 +174,20 @@ unsigned long __roundup_pow_of_two(unsigned long n)
> > __roundup_pow_of_two(n) \
> >   )
> >
> > +/**
> > + * rounddown_pow_of_two - round the given value down to nearest power of 
> > two
> > + * @n - parameter
> > + *
> > + * round the given value down to the nearest power of two
> > + * - the result is undefined when n == 0
> > + * - this can be used to initialise global variables from constant data
> > + */
> > +#define rounddown_pow_of_two(n)\
> > +(  \
> > +   __builtin_constant_p(n) ? ( \
> > +   (n == 1) ? 0 :  \
> > +   (1UL << ilog2(n)) : \
> > +   __rounddown_pow_of_two(n)   \
> > + )
>
> But (1UL << ilog2(16)) returns 16?
>
>
> And, afiact, your __rounddown_pow_of_two() is basically equivalent to (1UL
> << ilog2(n)) anyway.  So a suitable (and less buggy) implementation might be
>
> static inline unsigned long rounddown_pow_of_two(unsigned long n)
> {
>   return (n == 1) ? 0 : (1UL << ilog2(n));
> }

i think you're right.  it's been a long day so give me a few minutes
to convince myself.

rday

-- 

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://www.fsdev.dreamhosters.com/wiki/index.php?title=Main_Page

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v1] sata_mv: conversion to new EH

2007-01-25 Thread Jeff Garzik
This is the first cut at converting sata_mv to new EH.

It builds, but is untested.

Done:
- freeze, thaw
- hardreset
- prereset, software: intentionally not implemented

Not yet done:
- initiate EH from interrupt handler.  Right now the "whack it"
  old error handling remains, when an error is seen in irq.

So, it should do probing via new EH framework, but any errors
encountered are handled manually.


diff --git a/drivers/ata/sata_mv.c b/drivers/ata/sata_mv.c
index aae0b52..bfc468c 100644
--- a/drivers/ata/sata_mv.c
+++ b/drivers/ata/sata_mv.c
@@ -340,8 +340,8 @@ static u32 mv_scr_read(struct ata_port *ap, unsigned int 
sc_reg_in);
 static void mv_scr_write(struct ata_port *ap, unsigned int sc_reg_in, u32 val);
 static u32 mv5_scr_read(struct ata_port *ap, unsigned int sc_reg_in);
 static void mv5_scr_write(struct ata_port *ap, unsigned int sc_reg_in, u32 
val);
-static void mv_phy_reset(struct ata_port *ap);
-static void __mv_phy_reset(struct ata_port *ap, int can_sleep);
+static void __mv_phy_reset(struct ata_port *ap, int can_sleep,
+  unsigned int *class);
 static void mv_host_stop(struct ata_host *host);
 static int mv_port_start(struct ata_port *ap);
 static void mv_port_stop(struct ata_port *ap);
@@ -349,7 +349,9 @@ static void mv_qc_prep(struct ata_queued_cmd *qc);
 static void mv_qc_prep_iie(struct ata_queued_cmd *qc);
 static unsigned int mv_qc_issue(struct ata_queued_cmd *qc);
 static irqreturn_t mv_interrupt(int irq, void *dev_instance);
-static void mv_eng_timeout(struct ata_port *ap);
+static void mv_error_handler(struct ata_port *ap);
+static void mv_eh_freeze(struct ata_port *ap);
+static void mv_eh_thaw(struct ata_port *ap);
 static int mv_init_one(struct pci_dev *pdev, const struct pci_device_id *ent);
 
 static void mv5_phy_errata(struct mv_host_priv *hpriv, void __iomem *mmio,
@@ -402,17 +404,17 @@ static const struct ata_port_operations mv5_ops = {
.exec_command   = ata_exec_command,
.dev_select = ata_std_dev_select,
 
-   .phy_reset  = mv_phy_reset,
-
.qc_prep= mv_qc_prep,
.qc_issue   = mv_qc_issue,
.data_xfer  = ata_mmio_data_xfer,
 
-   .eng_timeout= mv_eng_timeout,
-
.irq_handler= mv_interrupt,
.irq_clear  = mv_irq_clear,
 
+   .error_handler  = mv_error_handler,
+   .freeze = mv_eh_freeze,
+   .thaw   = mv_eh_thaw,
+
.scr_read   = mv5_scr_read,
.scr_write  = mv5_scr_write,
 
@@ -430,17 +432,17 @@ static const struct ata_port_operations mv6_ops = {
.exec_command   = ata_exec_command,
.dev_select = ata_std_dev_select,
 
-   .phy_reset  = mv_phy_reset,
-
.qc_prep= mv_qc_prep,
.qc_issue   = mv_qc_issue,
.data_xfer  = ata_mmio_data_xfer,
 
-   .eng_timeout= mv_eng_timeout,
-
.irq_handler= mv_interrupt,
.irq_clear  = mv_irq_clear,
 
+   .error_handler  = mv_error_handler,
+   .freeze = mv_eh_freeze,
+   .thaw   = mv_eh_thaw,
+
.scr_read   = mv_scr_read,
.scr_write  = mv_scr_write,
 
@@ -458,17 +460,17 @@ static const struct ata_port_operations mv_iie_ops = {
.exec_command   = ata_exec_command,
.dev_select = ata_std_dev_select,
 
-   .phy_reset  = mv_phy_reset,
-
.qc_prep= mv_qc_prep_iie,
.qc_issue   = mv_qc_issue,
.data_xfer  = ata_mmio_data_xfer,
 
-   .eng_timeout= mv_eng_timeout,
-
.irq_handler= mv_interrupt,
.irq_clear  = mv_irq_clear,
 
+   .error_handler  = mv_error_handler,
+   .freeze = mv_eh_freeze,
+   .thaw   = mv_eh_thaw,
+
.scr_read   = mv_scr_read,
.scr_write  = mv_scr_write,
 
@@ -1327,7 +1329,7 @@ static void mv_err_intr(struct ata_port *ap, int 
reset_allowed)
 
/* check for fatal here and recover if needed */
if (reset_allowed && (EDMA_ERR_FATAL & edma_err_cause))
-   mv_stop_and_reset(ap);
+   mv_stop_and_reset(ap);  /* FIXME: broken for new-EH */
 }
 
 /**
@@ -1911,7 +1913,7 @@ static void mv_stop_and_reset(struct ata_port *ap)
 
mv_channel_reset(hpriv, mmio, ap->port_no);
 
-   __mv_phy_reset(ap, 0);
+   __mv_phy_reset(ap, 0, NULL);
 }
 
 static inline void __msleep(unsigned int msec, int can_sleep)
@@ -1933,7 +1935,8 @@ static inline void __msleep(unsigned int msec, int 
can_sleep)
  *  Inherited from caller.  This is coded to safe to call at
  *  interrupt level, i.e. it does not sleep.
  */

Re: [PATCH] nfs: fix congestion control -v4

2007-01-25 Thread Christoph Lameter
On Thu, 25 Jan 2007, Andrew Morton wrote:

> > We have systems with 8TB main memory and are able to get to 16TB.
> 
> But I bet you don't use 4k pages on 'em ;)

IA64 can be configured for 4k pagesize but yes 16k is the default. There 
are plans to go much higher though. Plus there may be other reaons that 
will force us to 4k pagesize on some configurations.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Juju

2007-01-25 Thread Greg KH
On Thu, Jan 25, 2007 at 03:38:24PM -0800, Pete Zaitcev wrote:
> On Thu, 25 Jan 2007 16:18:35 -0500, Kristian H??gsberg <[EMAIL PROTECTED]> 
> wrote:
> 
> > > I see that ORBs are always allocated with a call (like SKB) and not
> > > embedded into drivers (like URBs). It's great, keep it up. Also,
> > > never allow drivers to pass DMA-mapped buffers into fw_send_request
> > > and friends. We made both of these mistakes in USB, and it hurts.
> > 
> > Oh, the ORBs are SBP-2 specific data structures, struct fw_transaction is 
> > probably what corresponds to USB URBs.  This struct is defined in 
> > fw-transaction.h and is available for embedding into other structs, such as 
> > struct sbp2_orb in fw-sbp2.  Is that what you're suggesting against, and 
> > what 
> > are the problems with this approach?
> 
> Fortunately we do not care about out-of-tree drivers, which are most
> affected, you may even call it a feature ^_^. My main problem is,
> we can't refcount URBs, so usbmon can't tap them and must copy.

urbs are reference counted, it's just that not all drivers who create
them use them that way :(

Perhaps you can inforce this in the new codebase...

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] namespaces: fix exit race by splitting exit

2007-01-25 Thread Andrew Morton
On Thu, 25 Jan 2007 23:26:59 -0600
"Serge E. Hallyn" <[EMAIL PROTECTED]> wrote:

> Fix exit race by splitting the nsproxy putting into two pieces.
> First piece reduces the nsproxy refcount.

This broke

introduce-and-use-get_task_mnt_ns.patch
nsproxy-externalizes-exit_task_namespaces.patch
user-namespace-add-the-framework.patch
user-ns-implement-user-ns-unshare.patch

which I repaired.  Please check the result...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Track mlock()ed pages

2007-01-25 Thread Christoph Lameter
On Fri, 26 Jan 2007, Nick Piggin wrote:

> Christoph Lameter wrote:
> > Add NR_MLOCK
> > 
> > Track mlocked pages via a ZVC
> 
> I think it is not quite right. You are tracking the number of ptes
> that point to mlocked pages, which can be >= the actual number of pages.

Mlocked pages are not inherited. I would expect sharing to be very rare.
 
> Also, page_add_anon_rmap still needs to be balanced with page_remove_rmap.

Hmmm 
 
> I can't think of an easy way to do this without per-page state. ie.
> another page flag.

Thats what I am trying to avoid.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] namespaces: fix exit race by splitting exit

2007-01-25 Thread Andrew Morton
On Thu, 25 Jan 2007 23:26:59 -0600
"Serge E. Hallyn" <[EMAIL PROTECTED]> wrote:

> Fix exit race by splitting the nsproxy putting into two pieces.
> First piece reduces the nsproxy refcount.  If we dropped the last
> reference, then it puts the mnt_ns, and returns the nsproxy as a
> hint to the caller.  Else it returns NULL.  The second piece of
> exiting task namespaces sets tsk->nsproxy to NULL, and drops the
> references to other namespaces and frees the nsproxy only if an
> nsproxy was passed in.
> 
> A little awkward and should probably be reworked, but hopefully
> it fixes the NFS oops.

I'm a bit worried about jamming something like this into 2.6.20.  Could the
usual culprits please review this carefully with some urgency?

And Daniel, if you can find time to runtime test it please?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Track mlock()ed pages

2007-01-25 Thread Nick Piggin

Christoph Lameter wrote:

Add NR_MLOCK

Track mlocked pages via a ZVC


I think it is not quite right. You are tracking the number of ptes
that point to mlocked pages, which can be >= the actual number of pages.

Also, page_add_anon_rmap still needs to be balanced with page_remove_rmap.

I can't think of an easy way to do this without per-page state. ie.
another page flag.



Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>



--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH -rt 2/2] RCU priority boosting additions to rcutorture

2007-01-25 Thread Josh Triplett
Paul E. McKenney wrote:
> On Thu, Jan 25, 2007 at 11:06:35AM -0800, Josh Triplett wrote:
>> Paul E. McKenney wrote:
>>> On Thu, Jan 25, 2007 at 12:47:04AM -0800, Josh Triplett wrote:
 One major item: this new test feature really needs a new module parameter 
 to
 enable or disable it.
>>> CONFIG_PREEMPT_RCU_BOOST is the parameter -- if not set, then no test.
>>> This parameter is provided by the accompanying RCU-boost patch.
>> It seems useful for rcutorture to use or not use the preempting thread
>> independently of CONFIG_PREEMPT_RCU_BOOST.  That would bring you from two
>> cases to four, and the two new cases both make sense:
>>
>> * CONFIG_PREEMPT_RCU_BOOST=n, but run rcutorture with the preempting thread.
>>   This configuration allows you to demonstrate the need for
>>   CONFIG_PREEMPT_RCU_BOOST, by showing what happens when you need it and 
>> don't
>>   have it.
>>
>> * CONFIG_PREEMPT_RCU_BOOST=y, but run rcutorture without the preempting
>>   thread.  This configuration allows you to test with rcutorture while 
>> running
>>   a *real* real-time workload rather than the simple preempting thread, or
>>   just test basic RCU functionality.
>>
>> A simple boolean module_param would work here.
> 
> OK, sold!  I will add this.  Perhaps CONFIG_PREEMPT_RCU_TORTURE.

Why a config option?  Why not a module parameter, settable at module load time?

static int enable_preempter;
...
module_param(enable_preempter, bool, 0);
MODULE_PARM_DESC(enable_preempter, "Enable preempting thread, to test RCU 
priority boosting");
...
rcu_torture_cleanup(void)
{
...
if (enable_preempter && cur_ops->preemptend)
cur_ops->preemptend();
...
if (enable_preempter && cur_ops->preemptstart)
cur_ops->preemptstart();

Then just remove the #ifdef CONFIG_PREEMPT_RCU_BOOST from rcutorture entirely,
and always supply the preempter functions.  rcutorture then doesn't depend on
CONFIG_PREEMPT_RCU_BOOST at all, and the module parameter determines whether
to run the preempter thread.

 Paul E. McKenney wrote:
> diff -urpNa -X dontdiff linux-2.6.20-rc4-rt1/kernel/rcutorture.c 
> linux-2.6.20-rc4-rt1-rcubtorture/kernel/rcutorture.c
> --- linux-2.6.20-rc4-rt1/kernel/rcutorture.c  2007-01-09 
> 10:59:54.0 -0800
> +++ linux-2.6.20-rc4-rt1-rcubtorture/kernel/rcutorture.c  2007-01-23 
> 11:27:49.0 -0800
> +static int rcu_torture_preempt(void *arg)
> +{
> + int completedstart;
> + time_t gcstart;
> + struct sched_param sp;
> +
> + sp.sched_priority = MAX_RT_PRIO - 1;
> + sched_setscheduler(current, SCHED_RR, );
> + current->flags |= PF_NOFREEZE;
> +
> + do {
> + completedstart = rcu_torture_completed();
> + gcstart = xtime.tv_sec;
> + while ((xtime.tv_sec - gcstart < 10) &&
> +(rcu_torture_completed() == completedstart))
> + cond_resched();
> + if (rcu_torture_completed() == completedstart)
> + rcu_torture_preempt_errors++;
> + schedule_timeout_interruptible(shuffle_interval * HZ);
 Why call schedule_timeout_interruptible here without actually handling
 interruptions?  So that you can send it a signal to cause the shuffle 
 early?
>>> It allows you to kill the process in order to get the module unload to
>>> happen more quickly in case someone specified an overly long interval.
>> I didn't actually know that you could kill a kthread from userspace. :)
>>
>> That rationale makes sense.
> 
> It won't actually die, but if I understand correctly (a big "if") the
> signal would cause schedule_timeout_interruptible() to return, allowing
> the kthread_should_stop() check to happen.

Ah, that makes much more sense; thanks.

- Josh Triplett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA ahci Bug in 2.6.19.x

2007-01-25 Thread Stephen Evanchik

On 1/26/07, Luming Yu <[EMAIL PROTECTED]> wrote:

>
Is there any difference in dmesg with acpi=off?
what is your sata driver?


The only difference is that I don't see the "ACPI: PCI Interrupt
:00:0f.0[B] -> GSI
21 (level, low) -> IRQ 19" printk. The driver is AHCI but the device
is a VIA chip.

I'll get a caputre of the boot log when I find my serial cable. This
could be related to the VIA PIC quirks that was changed by Alan.

Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

2007-01-25 Thread Linus Torvalds


On Thu, 25 Jan 2007, Linus Torvalds wrote:
> 
> Have you noticed that the resource-allocation code _already_ never returns 
> zero on sparc64?

Btw, that was a rhetorical question, and I'm not actually sure what the 
heck sparc64 will _really_ do ;)

I picked sparc64 as an example, because I _think_ that sparc64 actually is 
an example of an architecture that sets up a separate root resource for 
each PCI domain, and they are actually set up so that the ioport regions 
are literally offset to match the hardware bases (and there are several 
different kinds of PCI domain controllers that sparc supports, so those 
bases will depend on that too).

So on sparc64, "ioport_resource" really is just a container for the actual 
per-domain resource buckets that the hardware (within that domain) will 
then do the resource allocation from. Afaik.

But you should actaully verify that with somebody like Davem if you 
_really_ care.  I cc'd him in case he wants to pipe up and perhaps prove 
me wrong.


Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Ksummit-2006-discuss] 2007 Linux Kernel Summit

2007-01-25 Thread Greg KH
On Thu, Jan 25, 2007 at 10:28:49PM -0500, Theodore Tso wrote:
> On Fri, Jan 26, 2007 at 06:16:13AM +0530, Sunil Naidu wrote:
> > Good thoughts ;-)  I too believe in this - Where there is a Will,
> > there is a Way! That's the reason why I have proposed India as the
> > location for KS 2007, am still awaiting for the response from Theodore
> > Tso.
> 
> I did give you a response.  Find a way to pay for 80+ kernel summit
> invitees to travel to India (preferably in business class :-), and
> we'll talk.  That's not realistic?  Well, then perhaps having the
> concept of holding Kernel Summit in India is not realistic.

Does this mean that the attendees of the 2007 summit in England all get
business class tickets to travel to it?

Sounds good to me!

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH -rt 1/2] RCU priority boosting that survives semi-vicious testing

2007-01-25 Thread Josh Triplett
Paul E. McKenney wrote:
> On Thu, Jan 25, 2007 at 11:58:16AM -0800, Paul E. McKenney wrote:
>> On Thu, Jan 25, 2007 at 01:29:23AM -0800, Josh Triplett wrote:
>>> Overall, this code looks sensible to me.  Some comments on the patch below.
> 
> [ . . . ]
> 
>> Thank you again for the careful and thorough review!!!
>>
>> I will test these changes and send out an update.
>
> And here is the updated RCU priority-boost patch.

Thanks for integrating my changes, and so quickly! :)

> Signed-off-by: Paul E. McKenney <[EMAIL PROTECTED]>

Acked-by: Josh Triplett <[EMAIL PROTECTED]>

- Josh Triplett
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] user ns: handle file sigio

2007-01-25 Thread Andrew Morton
On Thu, 25 Jan 2007 23:38:08 -0600
"Serge E. Hallyn" <[EMAIL PROTECTED]> wrote:

>  fs/fcntl.c|3 +--
>  include/linux/sched.h |4 +---
>  2 files changed, 2 insertions(+), 5 deletions(-)

My confidence in this lot:

introduce-and-use-get_task_mnt_ns.patch
introduce-and-use-get_task_mnt_ns-tweaks.patch
nsproxy-externalizes-exit_task_namespaces.patch
user-namespace-add-the-framework.patch
user-namespace-add-the-framework-fix.patch
user-namespace-add-the-framework-fixes.patch
user-ns-add-user_namespace-ptr-to-vfsmount.patch
user-ns-add-user_namespace-ptr-to-vfsmount-fixes.patch
# user-ns-hook-permission.patch: Eric no like
user-ns-hook-permission.patch
user-ns-prepare-copy_tree-copy_mnt-and-their-callers-to-handle-errs.patch
user-ns-prepare-copy_tree-copy_mnt-and-their-callers-to-handle-errs-fix.patch
user-ns-implement-shared-mounts.patch
user-ns-implement-shared-mounts-fixes.patch
# user_ns-handle-file-sigio.patch: Eric no like
user_ns-handle-file-sigio.patch
user_ns-handle-file-sigio-fix.patch
user_ns-handle-file-sigio-fix-2.patch
user-ns-implement-user-ns-unshare.patch
user-ns-implement-user-ns-unshare-tidy.patch
#
rename-attach_pid-to-find_attach_pid.patch
attach_pid-with-struct-pid-parameter.patch
remove-find_attach_pid.patch
statically-initialize-struct-pid-for-swapper.patch
explicitly-set-pgid-sid-of-init.patch
#
uts-namespace-remove-config_uts_ns.patch
ipc-namespace-remove-config_ipc_ns.patch

is very low.  If I can by some miracle get all this gunk to compile and
boot a bit, I'd ask that we all very carefully review the patches which
landed and make sure that we're all happy with it.

That's still a couple of days away, I expect.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] nfs: fix congestion control -v4

2007-01-25 Thread Andrew Morton
On Thu, 25 Jan 2007 21:31:43 -0800 (PST)
Christoph Lameter <[EMAIL PROTECTED]> wrote:

> On Thu, 25 Jan 2007, Andrew Morton wrote:
> 
> > atomic_t is 32-bit.  Put 16TB of memory under writeback and blam.
> 
> We have systems with 8TB main memory and are able to get to 16TB.

But I bet you don't use 4k pages on 'em ;)

> Better change it now.

yup.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] Track number of users in the system

2007-01-25 Thread Srivatsa Vaddagiri
This patch tracks number of users in a system and divides cpu bandwidth 
equally among them.

Signed-off-by : Srivatsa Vaddagiri <[EMAIL PROTECTED]>


---


diff -puN include/linux/sched.h~user-interface include/linux/sched.h
--- linux-2.6.20-rc5/include/linux/sched.h~user-interface   2007-01-25 
20:44:53.0 +0530
+++ linux-2.6.20-rc5-vatsa/include/linux/sched.h2007-01-25 
20:44:53.0 +0530
@@ -533,7 +533,18 @@ struct signal_struct {
 
 #ifdef CONFIG_FAIRSCHED
 struct cpu_usage;
-#endif
+
+int sched_alloc_user(struct user_struct *user);
+void sched_free_user(struct user_struct *user);
+void sched_move_task(void);
+
+#else
+
+static inline int sched_alloc_user(struct user_struct *user) { return 0; }
+static inline void sched_free_user(struct user_struct *user) { }
+static inline void sched_move_task(void) { }
+
+#endif /* CONFIG_FAIRSCHED */
 
 /*
  * Some day this will be a full-fledged user tracking system..
@@ -562,6 +573,7 @@ struct user_struct {
 #ifdef CONFIG_FAIRSCHED
int cpu_limit;
struct cpu_usage *cpu_usage;
+   struct list_head fair_sched_list;
 #endif
 };
 
diff -puN kernel/user.c~user-interface kernel/user.c
--- linux-2.6.20-rc5/kernel/user.c~user-interface   2007-01-25 
20:44:53.0 +0530
+++ linux-2.6.20-rc5-vatsa/kernel/user.c2007-01-25 20:44:53.0 
+0530
@@ -51,6 +51,10 @@ struct user_struct root_user = {
.uid_keyring= _user_keyring,
.session_keyring = _session_keyring,
 #endif
+#ifdef CONFIG_FAIRSCHED
+   .cpu_limit = 0, /* No limit */
+   .fair_sched_list = LIST_HEAD_INIT(root_user.fair_sched_list),
+#endif
 };
 
 /*
@@ -112,6 +116,7 @@ void free_uid(struct user_struct *up)
if (atomic_dec_and_lock(>__count, _lock)) {
uid_hash_remove(up);
spin_unlock_irqrestore(_lock, flags);
+   sched_free_user(up);
key_put(up->uid_keyring);
key_put(up->session_keyring);
kmem_cache_free(uid_cachep, up);
@@ -153,6 +158,8 @@ struct user_struct * alloc_uid(uid_t uid
return NULL;
}
 
+   sched_alloc_user(new);
+
/*
 * Before adding this, check whether we raced
 * on adding the same user already..
@@ -163,6 +170,7 @@ struct user_struct * alloc_uid(uid_t uid
key_put(new->uid_keyring);
key_put(new->session_keyring);
kmem_cache_free(uid_cachep, new);
+   sched_free_user(new);
} else {
uid_hash_insert(new, hashent);
up = new;
@@ -186,6 +194,7 @@ void switch_uid(struct user_struct *new_
atomic_inc(_user->processes);
atomic_dec(_user->processes);
switch_uid_keyring(new_user);
+   sched_move_task();
current->user = new_user;
 
/*
diff -puN kernel/sched.c~user-interface kernel/sched.c
--- linux-2.6.20-rc5/kernel/sched.c~user-interface  2007-01-25 
20:44:53.0 +0530
+++ linux-2.6.20-rc5-vatsa/kernel/sched.c   2007-01-26 09:04:04.0 
+0530
@@ -7221,3 +7221,63 @@ void set_curr_task(int cpu, struct task_
 }
 
 #endif
+
+#ifdef CONFIG_FAIRSCHED
+
+static struct list_head user_list = LIST_HEAD_INIT(user_list);
+static atomic_t non_root_users;
+
+static void recalc_user_limits(void)
+{
+   int nr_users;
+   struct user_struct *user;
+
+   nr_users = atomic_read(_root_users);
+   if (!nr_users)
+   return;
+
+   list_for_each_entry(user, _list, fair_sched_list)
+   user->cpu_limit = 100/nr_users;
+}
+
+/* Allocate cpu_usage structure for the new task-group */
+int sched_alloc_user(struct user_struct *user)
+{
+   int i;
+
+   user->cpu_usage = alloc_percpu(struct cpu_usage);
+   if (!user->cpu_usage)
+   return -ENOMEM;
+
+   for_each_possible_cpu(i) {
+   struct cpu_usage *cu;
+
+   cu = per_cpu_ptr(user->cpu_usage, i);
+   cu->tokens = 1;
+   cu->last_update = 0;
+   cu->starve_count = 0;
+   }
+
+   list_add(>fair_sched_list, _list);
+   atomic_inc(_root_users);
+
+   recalc_user_limits();
+
+   return 0;
+}
+
+/* Deallocate cpu_usage structure */
+void sched_free_user(struct user_struct *user)
+{
+   list_del(>fair_sched_list);
+   atomic_dec(_root_users);
+   recalc_user_limits();
+   free_percpu(user->cpu_usage);
+}
+
+void sched_move_task(void)
+{
+   clear_tsk_starving(current);
+}
+
+#endif
diff -puN init/Kconfig~user-interface init/Kconfig
--- linux-2.6.20-rc5/init/Kconfig~user-interface2007-01-25 
20:44:53.0 +0530
+++ linux-2.6.20-rc5-vatsa/init/Kconfig 2007-01-25 20:44:54.0 +0530
@@ -249,6 +249,13 @@ config CPUSETS
 
  Say N if unsure.
 
+config FAIRSCHED
+   bool "Fair user CPU scheduler"

[PATCH 1/2] core scheduler changes

2007-01-25 Thread Srivatsa Vaddagiri

This patch does several things:

- Introduces the notion of control window (current set at 1
  sec - ideally the window size should be adjusted based on
  number of users to avoid rapid context switches). Bandwidth of each 
  user is controlled within this window.  rq->last_update tracks where 
  are in the current window.

- Modifies scheduler_tick() to account cpu bandwidth consumption
  by a task group. Basically bandwidth consumed by a task is
  charged to itself (p->time_slice) -and- to its group as well.

- A task is forced off the CPU once its group has expired the
  bandwidth in the current control window. Such a task is also
  marked as "starving".

- schedule() avoids picking tasks whose group has expired its
  bandwidth in current control window. Any task (with non-zero
  p->timeslice) which is not picked to run in schedule() because of 
  this reason is marked "starving".

- If a group has bandwidth left and it has starving tasks, then 
  schedule() prefers picking such tasks over non-starving tasks.
  This will avoid starvation of lower-priority tasks in a group.


Signed-off-by : Srivatsa Vaddagiri <[EMAIL PROTECTED]>

---


diff -puN include/linux/sched.h~cpu-controller-based-on-rtlimit_rt_cpu-patch 
include/linux/sched.h
--- 
linux-2.6.20-rc5/include/linux/sched.h~cpu-controller-based-on-rtlimit_rt_cpu-patch
 2007-01-19 15:17:27.0 +0530
+++ linux-2.6.20-rc5-vatsa/include/linux/sched.h2007-01-26 
09:04:07.0 +0530
@@ -531,6 +531,10 @@ struct signal_struct {
 #define is_rt_policy(p)((p) != SCHED_NORMAL && (p) != 
SCHED_BATCH)
 #define has_rt_policy(p)   unlikely(is_rt_policy((p)->policy))
 
+#ifdef CONFIG_FAIRSCHED
+struct cpu_usage;
+#endif
+
 /*
  * Some day this will be a full-fledged user tracking system..
  */
@@ -555,6 +559,10 @@ struct user_struct {
/* Hash table maintenance information */
struct list_head uidhash_list;
uid_t uid;
+#ifdef CONFIG_FAIRSCHED
+   int cpu_limit;
+   struct cpu_usage *cpu_usage;
+#endif
 };
 
 extern struct user_struct *find_user(uid_t);
@@ -1137,6 +1145,9 @@ static inline void put_task_struct(struc
/* Not implemented yet, only for 486*/
 #define PF_STARTING0x0002  /* being created */
 #define PF_EXITING 0x0004  /* getting shut down */
+#ifdef CONFIG_FAIRSCHED
+#define PF_STARVING0x0010  /* Task starving for CPU */
+#endif
 #define PF_FORKNOEXEC  0x0040  /* forked but didn't exec */
 #define PF_SUPERPRIV   0x0100  /* used super-user privileges */
 #define PF_DUMPCORE0x0200  /* dumped core */
diff -puN kernel/sched.c~cpu-controller-based-on-rtlimit_rt_cpu-patch 
kernel/sched.c
--- 
linux-2.6.20-rc5/kernel/sched.c~cpu-controller-based-on-rtlimit_rt_cpu-patch
2007-01-19 15:17:27.0 +0530
+++ linux-2.6.20-rc5-vatsa/kernel/sched.c   2007-01-26 09:04:07.0 
+0530
@@ -266,6 +266,9 @@ struct rq {
unsigned long ttwu_local;
 #endif
struct lock_class_key rq_lock_key;
+#ifdef CONFIG_FAIRSCHED
+   unsigned long last_update;
+#endif
 };
 
 static DEFINE_PER_CPU(struct rq, runqueues);
@@ -710,6 +713,126 @@ enqueue_task_head(struct task_struct *p,
p->array = array;
 }
 
+#ifdef CONFIG_FAIRSCHED
+
+struct cpu_usage {
+   long tokens;
+   unsigned long last_update;
+   int starve_count;
+};
+
+#define task_starving(p)   (p->flags & PF_STARVING)
+
+/* Mark a task starving - either we shortcircuited its timeslice or we didnt
+ * pick it to run (because user ran out of bandwidth limit in current epoch).
+ */
+static inline void set_tsk_starving(struct task_struct *p)
+{
+   struct user_struct *user = p->user;
+   struct cpu_usage *cu;
+
+   if (task_starving(p) || !user->cpu_limit)
+   return;
+
+   cu = per_cpu_ptr(user->cpu_usage, task_cpu(p));
+   cu->starve_count++;
+   p->flags |= PF_STARVING;
+}
+
+/* Clear a task's starving flag */
+static inline void clear_tsk_starving(struct task_struct *p)
+{
+   struct user_struct *user = p->user;
+   struct cpu_usage *cu;
+
+   if (!task_starving(p) || !user->cpu_limit)
+   return;
+
+   cu = per_cpu_ptr(user->cpu_usage, task_cpu(p));
+   cu->starve_count--;
+   p->flags &= ~PF_STARVING;
+}
+
+/* Does the task's group have starving tasks? */
+static inline int is_user_starving(struct task_struct *p)
+{
+   struct user_struct *user = p->user;
+   struct cpu_usage *cu;
+
+   if (!user->cpu_limit)
+   return 0;
+
+   cu = per_cpu_ptr(user->cpu_usage, task_cpu(p));
+   if (cu->starve_count)
+   return 1;
+
+   return 0;
+}
+
+/* Are we past the 1-sec control window? If so, all groups get to renew their
+ * expired tokens.
+ 

[RFC] Fair-user scheduler

2007-01-25 Thread Srivatsa Vaddagiri
Current Linux CPU scheduler doesnt recognize process aggregates while
allocating bandwidth. As a result of this, an user could simply spawn large 
number of processes and get more bandwidth than others.

Here's a patch that provides fair allocation for all users in a system.

Some benchmark numbers with and without the patch applied follows:


user "vatsa"user "guest"
(make -s -j4 bzImage)  (make -s -j20 bzImage)

2.6.20-rc5  472.07s (real) 257.48s (real)
2.6.20-rc5+fairsched766.74s (real) 766.73s (real)


(Numbers taken on a 2way Intel x86_64 box)

Eventually something like this can be extended to do weighted fair share
scheduling for:

- KVM
- containers
- resource management

Salient features of the patch:

- Based on Ingo's RTLIMIT_RT_CPU patch [1]. Primary difference between 
  RTLIMIT_RT_CPU patch and this one is that this patch handles 
  starvation of lower priority tasks in a group and also accounting
  is token based (rather than decaying avg).

- Retains existing one-runqueue-per-cpu design

- breaks O(1) (ouch!)
Best way to avoid this is to split runqueue to be per-user and
per-cpu, which I have not implemented to keep the patch simple.

- Fairsched aware SMP load balance NOT addressed (yet)

Comments/flames wellcome!


References:

1. 
http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc2/2.6.11-rc2-mm2/broken-out/rlimit_rt_cpu.patch

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

2007-01-25 Thread Linus Torvalds


On Fri, 26 Jan 2007, David Woodhouse wrote:
> 
> My question was about _how_ you think this should be achieved in this
> particular case.

I already told you.

Have you noticed that the resource-allocation code _already_ never returns 
zero on a PC?

Have you noticed that the resource-allocation code _already_ never returns 
zero on sparc64?

Your special-case hack would actually be actively *wrong*, because the 
resource-allocation code already does this correctly.

But by "correctly" I very much mean "the architecture has to tell it what 
the allocation rules are".

And different architectures will have different rules.

On x86[-64], the reason the allocation code never returns a zero is 
because of several factors:

 - PCIBIOS_MIN_IO is 0x1000
 - PCIBIOS_MIN_CARDBUS_IO is 0x1000
 - the x86 arch startup code has already reserved all the special ranges, 
   including IO port 0.

so on a PC, no dynamic resource will _ever_ be allocated at zero simply 
because there are several rules in place (that the resource allocation 
code honors) that just makes sure it doesn't.

Now, the reason I say that it needs the architecture-specific code to tell 
it those rules is that on other architectures, the reasons for not 
returning zero may simply be *different*.

For example, on other platforms, as I already tried to explain, the root 
PCI bus itself may not even start at 0 at all. You may have 16 bits worth 
of "PIO space", but for all you know, you may have that for 8 different 
PCI domains, and for all the kernel cares about, the domains may end up 
mapping their PCI IO ports in any random way. For example, what might 
"physically" be port 0 on domain0 might be mapped into the IO port 
resource tree at 0xff00, and "physical port 0" in domain1 might be at 
0xff01 - so that the eight different PCI domains really end up having 
8 separate ranges of 64k ports each, covering 0xff00-0xff07 in the 
IO resource map.

See? I tried to explain this to you already..

If you are a cardbus device, and you get plugged in, your IO port range 
DOES NOT GET allocated from "ioport_resource". No, no, no. It gets 
allocated from the IO resource of the cardbus controller. And _that_ IO 
resource was allocated within the resource that was the PCI bridge for the 
PCI bus that the cardbus controller was on. And so on, all the way up to 
the root bridge on that PCI domain.

And on x86, the root bridge will use the ioport_resource directly, since 
it will always cover the whole area from 0 - IO_SPACE_LIMIT.

But that's an *architecture-specific* choice. It makes sense on x86, 
because that's how the IO ports physically work. They're literally tied to 
the CPU, and the CPU ends up being in that sense the "root" of the IO port 
resources.

But on a lot of non-x86 architectures, it actually could make most sense 
to never use "ioport_resource" AT ALL (in which case "cat /proc/ioports" 
will always be empty), and instead make the root PCI controller have it's 
IORESOURFE_IO resource be a resource that is then mapped into 
"iomem_resource". Because that's _physically_ how most non-x86 
architectures are literally wired up.

See? Nobody happens to do that (probably because a number of them want to 
actually emulate ISA accesses too, and then you actually want to make the 
PIO accesses look like a different address space, and probably because 
others just copied that, and never did anything different), but the point 
is, the resource allocation code really doesn't care. And it _shouldn't_ 
care. Because the resource allocation code tries to be generic and cater 
to all these *different* ways people hook up hardware.

Now, I should finish this by saying that there's a number of legacy 
issues, like the fact that "ioport_resource" and "iomem_resource" actually 
even *exist* for all platforms. As mentioned, in some cases, it would 
probably actually make more sense to not even have "ioport_resource" at 
all, except perhaps as a sub-resource of "iomem_resource" (*).

So a lot of this ends up then beign slightly set up in certain way - or at 
least only tested in certain configurations - due to various historical 
reasons.

For example, we've never needed to abstract out the IO port address 
translations as much as we did for MMIO. MMIO has always had 
"remap_io_range()" and other translator functions, because even x86 needed 
them. The PIO resources have generally needed less indirection, not only 
because x86 can't even use them *anyway* (for the "iomap" interfaces we 
actually do the mapping totally in software on x86, because the hardware 
cannot do it), but also because quite often, PIO simply isn't used as 
much, and thus we've often ended up having ugly hacks instead of any real 
"IO port abstraction".

For an example of this, look at the IDE driver. It has a lot of crap to 
just allow other architectures to just use their own MMIO accessors 
instead of even trying to abstract PIO some way. So the PIO layer actually 
lacks some 

Re: [PATCH] Fix race in efi variable delete code

2007-01-25 Thread Andrew Morton
On Thu, 25 Jan 2007 16:20:56 -0600
Matt Domsch <[EMAIL PROTECTED]> wrote:

> Fix race when deleting an EFI variable and issuing another EFI command on the
> same variable.  The removal of the variable from the efivars_list should be
> done in efivar_delete and not delayed until the kobject release.
> 
> Furthermore, remove the item from the list at module unload time, and
> use list_for_each_entry_safe() rather than list_for_each_safe() for 
> readability.
> 

Does it actually need to use the _safe variant?  That's only needed if the
body of the loop can do list_del() and afaict that doesn't happen here.

>  static void __exit
>  efivars_exit(void)
>  {
> - struct list_head *pos, *n;
> + struct efivar_entry *entry, *n;
>  
> - list_for_each_safe(pos, n, _list)
> - efivar_unregister(get_efivar_entry(pos));
> + list_for_each_entry_safe(entry, n, _list, list) {
> + spin_lock(_lock);
> + list_del(>list);
> + spin_unlock(_lock);
> + efivar_unregister(entry);
> + }

That's not exactly a thing of beauty, sorry ;)

Given that the code is single-threaded here, there's nothing to race
against and I don't think we strictly need any locking at all.  But
consistency is OK.  Given the locking here I'm not sure that the code would
be safe against concurrent removes anyway.

A more idiomatic implementation would do:

while (!list_empty(_list)) {
struct efivar_entry *entry = list_entry(...);
list_del(...)
}

Anyway.  Stuff to think about on a rainy day...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] Track mlock()ed pages

2007-01-25 Thread Christoph Lameter
Add NR_MLOCK

Track mlocked pages via a ZVC

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.20-rc6/include/linux/mmzone.h
===
--- linux-2.6.20-rc6.orig/include/linux/mmzone.h2007-01-25 
20:29:58.0 -0800
+++ linux-2.6.20-rc6/include/linux/mmzone.h 2007-01-25 20:31:23.0 
-0800
@@ -58,6 +58,7 @@ enum zone_stat_item {
NR_FILE_DIRTY,
NR_WRITEBACK,
/* Second 128 byte cacheline */
+   NR_MLOCK,   /* Mlocked pages */
NR_SLAB_RECLAIMABLE,
NR_SLAB_UNRECLAIMABLE,
NR_PAGETABLE,   /* used for pagetables */
Index: linux-2.6.20-rc6/mm/rmap.c
===
--- linux-2.6.20-rc6.orig/mm/rmap.c 2007-01-25 20:18:38.0 -0800
+++ linux-2.6.20-rc6/mm/rmap.c  2007-01-25 20:31:23.0 -0800
@@ -551,6 +551,8 @@ void page_add_new_anon_rmap(struct page 
 {
atomic_set(>_mapcount, 0); /* elevate count by 1 (starts at -1) */
__page_set_anon_rmap(page, vma, address);
+   if (vma->vm_flags & VM_LOCKED)
+   __inc_zone_page_state(page, NR_MLOCK);
 }
 
 /**
@@ -565,6 +567,16 @@ void page_add_file_rmap(struct page *pag
__inc_zone_page_state(page, NR_FILE_MAPPED);
 }
 
+/*
+ * Add an rmap in a known vma. This allows us to update the mlock counter.
+ */
+void page_add_file_rmap_vma(struct page *page, struct vm_area_struct *vma)
+{
+   page_add_file_rmap(page);
+   if (vma->vm_flags & VM_LOCKED)
+   __inc_zone_page_state(page, NR_MLOCK);
+}
+
 /**
  * page_remove_rmap - take down pte mapping from a page
  * @page: page to remove mapping from
@@ -602,6 +614,8 @@ void page_remove_rmap(struct page *page,
__dec_zone_page_state(page,
PageAnon(page) ? NR_ANON_PAGES : 
NR_FILE_MAPPED);
}
+   if (vma->vm_flags & VM_LOCKED)
+   __dec_zone_page_state(page, NR_MLOCK);
 }
 
 /*
Index: linux-2.6.20-rc6/include/linux/rmap.h
===
--- linux-2.6.20-rc6.orig/include/linux/rmap.h  2007-01-25 20:18:38.0 
-0800
+++ linux-2.6.20-rc6/include/linux/rmap.h   2007-01-25 20:31:23.0 
-0800
@@ -72,6 +72,7 @@ void __anon_vma_link(struct vm_area_stru
 void page_add_anon_rmap(struct page *, struct vm_area_struct *, unsigned long);
 void page_add_new_anon_rmap(struct page *, struct vm_area_struct *, unsigned 
long);
 void page_add_file_rmap(struct page *);
+void page_add_file_rmap_vma(struct page *, struct vm_area_struct *);
 void page_remove_rmap(struct page *, struct vm_area_struct *);
 
 /**
Index: linux-2.6.20-rc6/mm/fremap.c
===
--- linux-2.6.20-rc6.orig/mm/fremap.c   2007-01-25 20:18:38.0 -0800
+++ linux-2.6.20-rc6/mm/fremap.c2007-01-25 20:31:23.0 -0800
@@ -81,7 +81,7 @@ int install_page(struct mm_struct *mm, s
flush_icache_page(vma, page);
pte_val = mk_pte(page, prot);
set_pte_at(mm, addr, pte, pte_val);
-   page_add_file_rmap(page);
+   page_add_file_rmap_vma(page, vma);
update_mmu_cache(vma, addr, pte_val);
lazy_mmu_prot_update(pte_val);
err = 0;
Index: linux-2.6.20-rc6/mm/memory.c
===
--- linux-2.6.20-rc6.orig/mm/memory.c   2007-01-25 20:18:38.0 -0800
+++ linux-2.6.20-rc6/mm/memory.c2007-01-25 20:31:23.0 -0800
@@ -2256,7 +2256,7 @@ retry:
page_add_new_anon_rmap(new_page, vma, address);
} else {
inc_mm_counter(mm, file_rss);
-   page_add_file_rmap(new_page);
+   page_add_file_rmap_vma(new_page, vma);
if (write_access) {
dirty_page = new_page;
get_page(dirty_page);
Index: linux-2.6.20-rc6/drivers/base/node.c
===
--- linux-2.6.20-rc6.orig/drivers/base/node.c   2007-01-25 20:30:17.0 
-0800
+++ linux-2.6.20-rc6/drivers/base/node.c2007-01-25 20:31:23.0 
-0800
@@ -60,6 +60,7 @@ static ssize_t node_read_meminfo(struct 
   "Node %d FilePages:%8lu kB\n"
   "Node %d Mapped:   %8lu kB\n"
   "Node %d AnonPages:%8lu kB\n"
+  "Node %d Mlock:%8lu KB\n"
   "Node %d PageTables:   %8lu kB\n"
   "Node %d NFS_Unstable: %8lu kB\n"
   "Node %d Bounce:   %8lu kB\n"
@@ -82,6 +83,7 @@ static ssize_t node_read_meminfo(struct 
   nid, K(node_page_state(nid, NR_FILE_PAGES)),
   nid, K(node_page_state(nid, 

Re: [PATCH] x86_64: fix put_user for 64-bit constant

2007-01-25 Thread Roland McGrath
I'm not aware of any code in the tree triggering it.  We copied some of the
uaccess.h macro guts into macros used in systemtap, and there we hit an
instance of someone producing code that used a large constant and hit the
problem.  Since I noticed the kernel code still had the same bug, I was
just being proactive in propagating the fix back.


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/8] user ns: handle file sigio

2007-01-25 Thread Serge E. Hallyn
Quoting Serge E. Hallyn ([EMAIL PROTECTED]):
> Quoting Andrew Morton ([EMAIL PROTECTED]):
> > On Wed, 24 Jan 2007 12:58:45 -0600
> > "Serge E. Hallyn" <[EMAIL PROTECTED]> wrote:
> > 
> > > > If we need to I can see doing something special if the process setting
> > > > fown has CAP_KILL
> > > 
> > > Obviously CAP_KILL is insufficient :)  I assume you mean a new
> > > CAP_XNS_CAP_KILL?
> > > 
> > > > and bypassing the security checks that way, but
> > > > hard coding rules like that when it doesn't appear we have any
> > > > experience to indicate we need the extra functionality looks
> > > > premature.
> > > 
> > > Ok, in this case actually I suspect you're right and we can just ditch
> > > the exception.  But in general the security discussion is one we should
> > > still have.
> > 
> > People like security.
> > 
> > Where do we now stand with this patch, and with "[PATCH 4/8] user ns: hook 
> > permission"?
> 
> Later today I can send a patch against this set which removes the
> the init_task exceptions (out of patch 3 and patch 7), but I'd prefer
> to leave the MS_SHARED_NS option (patch 6) in.
> 
> thanks,
> -serge

Boots with USER_NS=n (given Cedric's patch to fix that original problem)
and passes my testcases with USER_NS=y.

From: Serge E. Hallyn <[EMAIL PROTECTED]>
Subject: [PATCH] user namespace: remove exceptions for initial namespace

Both sigio and file access checks for user namespace equivalence
were being skipped for processes in the initial namespace.
Remove these exceptions, enforcing the same cross-namespace
checks for all processes in all user namespaces.

Signed-off-by: Serge E. Hallyn <[EMAIL PROTECTED]>

---

 fs/fcntl.c|3 +--
 include/linux/sched.h |4 +---
 2 files changed, 2 insertions(+), 5 deletions(-)

939c4da5209a2c00aca70048915007d0eef8ad75
diff --git a/fs/fcntl.c b/fs/fcntl.c
index 6a774c1..d7113d5 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -460,8 +460,7 @@ static const long band_table[NSIGPOLL] =
 static inline int sigio_perm(struct task_struct *p,
  struct fown_struct *fown, int sig)
 {
-   if (fown->user_ns != init_task.nsproxy->user_ns &&
-   fown->user_ns != p->nsproxy->user_ns)
+   if (fown->user_ns != p->nsproxy->user_ns)
return 0;
return (((fown->euid == 0) ||
 (fown->euid == p->suid) || (fown->euid == p->uid) ||
diff --git a/include/linux/sched.h b/include/linux/sched.h
index edbdce2..5c3438b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1614,12 +1614,10 @@ extern int cond_resched_softirq(void);
 static inline int task_mnt_same_uidns(struct task_struct *tsk,
struct vfsmount *mnt)
 {
-   if (tsk->nsproxy == init_task.nsproxy)
+   if (mnt->mnt_user_ns == tsk->nsproxy->user_ns)
return 1;
if (mnt->mnt_flags & MNT_SHARE_NS)
return 1;
-   if (mnt->mnt_user_ns == tsk->nsproxy->user_ns)
-   return 1;
return 0;
 }
 #else
-- 
1.1.6
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] nfs: fix congestion control -v4

2007-01-25 Thread Christoph Lameter
On Thu, 25 Jan 2007, Andrew Morton wrote:

> atomic_t is 32-bit.  Put 16TB of memory under writeback and blam.

We have systems with 8TB main memory and are able to get to 16TB.
Better change it now.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] namespaces: fix exit race by splitting exit

2007-01-25 Thread Serge E. Hallyn
Ok, could you verify that the following patch at least solves
the oopsing?

(I can't reproduce the oops with Daniel's test prog)

thanks,
-serge

From: Serge E. Hallyn <[EMAIL PROTECTED]>
Subject: [PATCH] namespaces: fix exit race by splitting exit

Fix exit race by splitting the nsproxy putting into two pieces.
First piece reduces the nsproxy refcount.  If we dropped the last
reference, then it puts the mnt_ns, and returns the nsproxy as a
hint to the caller.  Else it returns NULL.  The second piece of
exiting task namespaces sets tsk->nsproxy to NULL, and drops the
references to other namespaces and frees the nsproxy only if an
nsproxy was passed in.

A little awkward and should probably be reworked, but hopefully
it fixes the NFS oops.

Signed-off-by: Serge E. Hallyn <[EMAIL PROTECTED]>

---

 include/linux/nsproxy.h |   30 +++---
 kernel/exit.c   |6 --
 kernel/fork.c   |4 ++--
 kernel/nsproxy.c|   16 +++-
 4 files changed, 40 insertions(+), 16 deletions(-)

ab969afa3624aba0bc26dc237d27178137c05d46
diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
index 0b9f0dc..678e1d3 100644
--- a/include/linux/nsproxy.h
+++ b/include/linux/nsproxy.h
@@ -35,22 +35,30 @@ struct nsproxy *dup_namespaces(struct ns
 int copy_namespaces(int flags, struct task_struct *tsk);
 void get_task_namespaces(struct task_struct *tsk);
 void free_nsproxy(struct nsproxy *ns);
+struct nsproxy *put_nsproxy(struct nsproxy *ns);
 
-static inline void put_nsproxy(struct nsproxy *ns)
+static inline void finalize_put_nsproxy(struct nsproxy *ns)
 {
-   if (atomic_dec_and_test(>count)) {
+   if (ns)
free_nsproxy(ns);
-   }
 }
 
-static inline void exit_task_namespaces(struct task_struct *p)
+static inline void put_and_finalize_nsproxy(struct nsproxy *ns)
 {
-   struct nsproxy *ns = p->nsproxy;
-   if (ns) {
-   task_lock(p);
-   p->nsproxy = NULL;
-   task_unlock(p);
-   put_nsproxy(ns);
-   }
+   finalize_put_nsproxy(put_nsproxy(ns));
+}
+
+static inline struct nsproxy *preexit_task_namespaces(struct task_struct *p)
+{
+   return put_nsproxy(p->nsproxy);
+}
+
+static inline void exit_task_namespaces(struct task_struct *p,
+   struct nsproxy *ns)
+{
+   task_lock(p);
+   p->nsproxy = NULL;
+   task_unlock(p);
+   finalize_put_nsproxy(ns);
 }
 #endif
diff --git a/kernel/exit.c b/kernel/exit.c
index 3540172..a5bf532 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -396,7 +396,7 @@ void daemonize(const char *name, ...)
current->fs = fs;
atomic_inc(>count);
 
-   exit_task_namespaces(current);
+   put_and_finalize_nsproxy(current->nsproxy);
current->nsproxy = init_task.nsproxy;
get_task_namespaces(current);
 
@@ -853,6 +853,7 @@ static void exit_notify(struct task_stru
 fastcall NORET_TYPE void do_exit(long code)
 {
struct task_struct *tsk = current;
+   struct nsproxy *ns;
int group_dead;
 
profile_task_exit(tsk);
@@ -938,8 +939,9 @@ fastcall NORET_TYPE void do_exit(long co
 
tsk->exit_code = code;
proc_exit_connector(tsk);
+   ns = preexit_task_namespaces(tsk);
exit_notify(tsk);
-   exit_task_namespaces(tsk);
+   exit_task_namespaces(tsk, ns);
 #ifdef CONFIG_NUMA
mpol_free(tsk->mempolicy);
tsk->mempolicy = NULL;
diff --git a/kernel/fork.c b/kernel/fork.c
index fc723e5..4cf8684 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1265,7 +1265,7 @@ static struct task_struct *copy_process(
return p;
 
 bad_fork_cleanup_namespaces:
-   exit_task_namespaces(p);
+   put_and_finalize_nsproxy(p->nsproxy);
 bad_fork_cleanup_keys:
exit_keys(p);
 bad_fork_cleanup_mm:
@@ -1711,7 +1711,7 @@ asmlinkage long sys_unshare(unsigned lon
}
 
if (new_nsproxy)
-   put_nsproxy(new_nsproxy);
+   put_and_finalize_nsproxy(new_nsproxy);
 
 bad_unshare_cleanup_ipc:
if (new_ipc)
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index f5b9ee6..7b05bce 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -117,7 +117,7 @@ int copy_namespaces(int flags, struct ta
goto out_pid;
 
 out:
-   put_nsproxy(old_ns);
+   put_and_finalize_nsproxy(old_ns);
return err;
 
 out_pid:
@@ -135,6 +135,20 @@ out_ns:
goto out;
 }
 
+struct nsproxy *put_nsproxy(struct nsproxy *ns)
+{
+   if (ns) {
+   if (atomic_dec_and_test(>count)) {
+   if (ns->mnt_ns) {
+   put_mnt_ns(ns->mnt_ns);
+   ns->mnt_ns = NULL;
+   }
+   return ns;
+   }
+   }
+   return NULL;
+}
+
 void free_nsproxy(struct nsproxy *ns)
 {
if (ns->mnt_ns)
-- 
1.1.6
-
To unsubscribe from this list: send the line 

Re: SATA ahci Bug in 2.6.19.x

2007-01-25 Thread Luming Yu

On 1/26/07, Stephen Evanchik <[EMAIL PROTECTED]> wrote:

On 1/25/07, Luming Yu <[EMAIL PROTECTED]> wrote:
> From the log:
> 2.6.18.3:
> ACPI: PCI Interrupt :00:0f.0[B] -> GSI 21 (level, low) -> IRQ 217
> 2.6.20-rc5:
> "ACPI: PCI Interrupt :00:0f.0[B] -> GSI 21 (level, low) -> IRQ 21"
>
> Sounds like acpi interrupt configure problem. Please try acpi=off first.


Still does not recognize the SATA device (and the machine fails to
come up). I tested this with 2.6.19.2, 2.6.20-rc5 and -rc6 this
evening. I am going to build a vanilla 2.6.18 and see if that still
works as I am currently running an FC5 kernel.


Is there any difference in dmesg with acpi=off?
what is your sata driver?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

2007-01-25 Thread David Woodhouse
On Thu, 2007-01-25 at 20:48 -0800, Linus Torvalds wrote:
> Irq0 may _exist_. IO Port 0 may _exist_. Virtual address 0 may
> _exist_. 
> 
> Got it?
> 
> But they ARE NOT VALID THINGS FOR DRIVERS TO WORRY ABOUT.

I do understand what you're saying; there's no need to shout. I think
it's very misguided and leads to both internal inconsistency (as
demonstrated by the setup_irq() patch) and external inconsistency with
stuff like hardware documentation. But I _do_ understand what you're
saying.

> When a *DRIVER* sees a [zero], it's always a sign of "not here".

Except when it isn't. Like when it's a DMA address. Or a file
descriptor. Or a CPU number. Or one of numerous other things.

But still, I do understand what you're saying although I disagree with
your intention and your statement above is plain wrong (well, at least
my misquote of it is wrong -- you actually said 'NULL' which is fair
enough, but in the middle of a rant about _zero_ so I edited the quote
to say zero because that's what we're actually talking about).

> But they ARE NOT VALID THINGS FOR DRIVERS TO WORRY ABOUT.
 ...
> NO NORMAL USER SHOULD EVER SEE [zero] AS A REAL IO PORT. 

Yes, that much I understand. We disagree, but I understand you. My last
response was not intending to pursue that part of the discussion.

My question was about _how_ you think this should be achieved in this
particular case. You didn't like the suggestion that we should put your
new special-case hack into the resource code... where/how _do_ you
suggest that it's done, so that we can protect those poor driver authors
from the number zero?

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] nfs: fix congestion control -v4

2007-01-25 Thread Andrew Morton
On Thu, 25 Jan 2007 16:32:28 +0100
Peter Zijlstra <[EMAIL PROTECTED]> wrote:

> Hopefully the last version ;-)
> 
> 
> ---
> Subject: nfs: fix congestion control
> 
> The current NFS client congestion logic is severly broken, it marks the 
> backing
> device congested during each nfs_writepages() call but doesn't mirror this in
> nfs_writepage() which makes for deadlocks. Also it implements its own 
> waitqueue.
> 
> Replace this by a more regular congestion implementation that puts a cap on 
> the
> number of active writeback pages and uses the bdi congestion waitqueue.
> 
> Also always use an interruptible wait since it makes sense to be able to 
> SIGKILL the process even for mounts without 'intr'.
> 
> ..
>
> --- linux-2.6-git.orig/include/linux/nfs_fs_sb.h  2007-01-25 
> 16:07:03.0 +0100
> +++ linux-2.6-git/include/linux/nfs_fs_sb.h   2007-01-25 16:07:12.0 
> +0100
> @@ -82,6 +82,7 @@ struct nfs_server {
>   struct rpc_clnt *   client_acl; /* ACL RPC client handle */
>   struct nfs_iostats *io_stats;   /* I/O statistics */
>   struct backing_dev_info backing_dev_info;
> + atomic_twriteback;  /* number of writeback pages */

We're going to get in trouble with this sort of thing within a few years. 
atomic_t is 32-bit.  Put 16TB of memory under writeback and blam.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] nfs: fix congestion control -v4

2007-01-25 Thread Andrew Morton
On Thu, 25 Jan 2007 16:32:28 +0100
Peter Zijlstra <[EMAIL PROTECTED]> wrote:

> +long congestion_wait_interruptible(int rw, long timeout)
> +{
> + long ret;
> + DEFINE_WAIT(wait);
> + wait_queue_head_t *wqh = _wqh[rw];
> +
> + prepare_to_wait(wqh, , TASK_INTERRUPTIBLE);
> + if (signal_pending(current))
> + ret = -ERESTARTSYS;
> + else
> + ret = io_schedule_timeout(timeout);
> + finish_wait(wqh, );
> + return ret;
> +}
> +EXPORT_SYMBOL(congestion_wait_interruptible);

I think this can share code with congestion_wait()?

static long __congestion_wait(int rw, long timeout, int state)
{
long ret;
DEFINE_WAIT(wait);
wait_queue_head_t *wqh = _wqh[rw];

prepare_to_wait(wqh, , state);
ret = io_schedule_timeout(timeout);
finish_wait(wqh, );
return ret;
}

long congestion_wait_interruptible(int rw, long timeout)
{
long ret = __congestion_wait(rw, timeout);

if (signal_pending(current))
ret = -ERESTARTSYS;
return ret;
}

it's only infinitesimally less efficient..
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.20-rc5 1/1] MM: enhance Linux swap subsystem

2007-01-25 Thread yunfeng zhang

Current test based on the fact below in my previous mail


Current Linux page allocation fairly provides pages for every process, since
swap daemon only is started when memory is low, so when it starts to scan
active_list, the private pages of processes are messed up with each other,
vmscan.c:shrink_list() is the only approach to attach disk swap page to page on
active_list, as the result, all private pages lost their affinity on swap
partition. I will give a testlater...



Three testcases are imitated here
1) matrix: Some softwares do a lots of matrix arithmetic in their PrivateVMA,
  in fact, the type is much better to pps than Linux.
2) c: C malloc uses an arithmetic just like slab, so when an application resume
  from swap partition, let's supposed it touches three variables whose sizes
  are different, in the result, it should touch three pages and the three
  pages are closed with each other but aren't continual, I also imitate a case
  that if the application starts full-speed running later (touch more pages).
3) destruction: Typically, if an application resumes due to user clicks the
  close button, it totally visits all its private data to execute object
  destruction.

Test stepping
1) run ./entry and say y to it, maybe need root right.
2) wait a moment until echo 'primarywait'.
3) swapoff -a && swapon -a.
4) ./hog until count = 10.
5) 'cat primary entry secondary > /dev/null'
6) 'cat /proc/vmstat' several times and record 'pswpin' field when it's stable.
7) type `1', `2' or `3' to 3 testcases, answer `2' to start fullspeed testcase.
8) record new 'pswpin' field.
9) which is better? see the 'pswpin' increment.
pswpin is increased in mm/page_io.c:swap_readpage.

Test stepping purposes
1) Step 1, 'entry' wakes up 'primary' and 'secondary' simultaneously, every time
  'primary' allocates a page, 'secondary' inserts some pages into active_list
  closed to it.
1) Step 3, we should re-allocate swap pages.
2) Step 4, flush 'entry primary secondary' to swap partition.
3) Step 5, make file content 'entry primary secondary' present in memory.

Testcases are done in vmware virtual machine 5.5, 32M memory. If you argue my
circumstance, do your testcases following the steps advised
1) Run multiple memory-consumer together, make them pause at a point.
  (So mess up all private pages in pg_active list).
2) Flush them to swap partition.
3) Wake up one of them, let it run full-speed for a while, record pswpin of
  /proc/vmstat.
4) Invalidate all readaheaded pages.
5) Wake up another, repeat the test.
6) It's also good if you can record hard LED twinking:)
Maybe your test resumes all memory-consumers together, so Linux readaheads some
pages close to page-fault page but are belong to other processes, I think.

By the way, what's linux-mm mail, it ins't in Documentation/SubmitPatches.

In fact, you will find Linux column makes hard LED twinking per 5 seconds.
-
Linux   pps
matrix  52411597
53221620
(81)(23)

c   80281937
80951954
fullspeed   83131964
(67)(17)
(218)   (10)

destruction 94614445
98254484
(364)   (39)

Comment secondary.c:memset clause, so 'secondary' won't interrupt
page-allocation in 'primary'.
-
Linux   pps
matrix  207 38
256 59
(49)(21)

c   1273347
1341383
fullspeed   1362386
(68)(36)
(21)(3)

destruction 24351178
25131246
(78)(68)

entry.c
-
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

pid_t pids[4];
int sem_set;
siginfo_t si;

int main(int argc, char **argv)
{
int i, data;
unsigned short init_data[4] = { 1, 1, 1, 1 };
if ((sem_set = semget(123321, 4, IPC_CREAT)) == -1)
goto failed;
if (semctl(sem_set, 0, SETALL, _data) == -1)
goto failed;
pid_t pid = vfork();
if (pid == -1) {
goto failed;
} else if (pid == 0) {
if (execlp("./primary", NULL) == -1)
goto failed;
} else {
pids[0] = pid;
}
pid = vfork();
if (pid == -1) {
goto failed;
} else if (pid == 0) {
if (execlp("./secondary", NULL) == -1)
  

Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

2007-01-25 Thread Jeff Garzik

David Woodhouse wrote:

When I eventually get to go home, which will hopefully still be some
time this month, I'll give some more coherent thought to the idea of
just using a (struct irq_desc *) directly instead of an integer. Then


Tejun beat you to it with devres.  devres makes all sorts of resource 
allocation and tracking loads easier.


Learn it, love it, lick it.

:)

Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

2007-01-25 Thread Linus Torvalds


On Fri, 26 Jan 2007, David Woodhouse wrote:
>
> Of course it could, but then again why shouldn't we special-case zero in
> _all_ of those use cases, just to make life easier for those poor driver
> authors who presumably can't manage to write userspace code using stdin
> or open() either.

You're not thinking, or, more likely YOU DON'T WANT TO.

What is so hard to understand? IRQ0 is a _real_irq_ as far as the 
*platform* goes even on PC's. Same goes for IO-port 0 and MMIO area zero.

Same goes for virtual address zero in a lot of user programs. It's a real 
virtual address. The fact that the NULL pointer points there doesn't make 
it less real.

Do you really have such a hard time understanding this fundamental issue?

Irq0 may _exist_. IO Port 0 may _exist_. Virtual address 0 may _exist_. 

Got it?

But they ARE NOT VALID THINGS FOR DRIVERS TO WORRY ABOUT.

When a *DRIVER* sees a NULL pointer, it's always a sign of "not here". 

It's not a "valid pointer that just happens to be at virtual address 
zero". But in other contexts it actually _can_ be exactly that (ie when 
you call "mmap(NULL .. MAP_FIXED)" you actually will get a NULL pointer 
that is a *REAL POINTER*.

Similarly, when a *DRIVER* seens a "port 0" or MMIO 0, it is perfectly 
valid for that driver to say "ok, my device apparently doesn't have an IO 
port".

That does not mean that "port 0" doesn't exist. It just means that AS FAR 
AS A DRIVER IS CONCERNED, port 0 is not a valid port for a driver!

Port 0 actually *does* exist on a PC. It's a system port (it also happens 
to be a total legacy port that nobody would ever use, but that's another 
issue entirely).

The same thing is true of irq 0. It exists. It's a valid IRQ for 
archtiecture code for a PC. It's just NOT A VALID IRQ FOR A DRIVER! So 
when a driver sees a device with !irq, it knows that the irq hasn't been 
allocated.

I don't understand why this is so hard for you to accept. Even when you 
yourself accept that irq0 actually *exists*, and even when you give as an 
example why "setup_irq()" must be able to take that irq, you give that as 
some kind of ass-hat example of why you are "right". Now you do exactly 
the same thing for the IO port space.

You're totally confused. You say the words, and you seem to understand 
that device drivers are special. But then you don't seem to follow that 
understanding through, and you want to then say that everything else is 
special too.

Don't you get it? If everybody is special, then nobody is special.

System code is special. System code can do things that drivers shouldn't 
do. System code can know that irq0 is actually the timer, and can set it 
up. System code can know that IO port 0 is actually decoded by the old and 
insane AT architecture DMA controller.

This is not even kernel-specific. "normal programs" think that NULL is 
special, and is never a valid pointer. But "system programs" may actually 
know better. If you're programming in DOS-EMU, and use vm86 mode, you know 
that you actually need to map the virtual address that is at 0, and NULL 
is suddenly not actually an invalid pointer: it's literally a pointer to 
the first byte in the vm86 model.

But when "malloc()" returns NULL, you know that it doesn't return such a 
"system pointer", so when malloc returns NULL, you realize that it's a 
special value.

The *EXACT* same thing is true within the kernel. When x86 architecture 
code explicitly sets up IRQ0, it does so because it knows that it's the 
timer interrupt. But that doesn't make it a valid irq for *anybody* else.

Ok, enough shouting.

Comprende? Do you _really_ think that the NULL pointer "doesn't exist"? Or 
can you realize that it's generally just a convention, and it's a 
convention that has been selected to be maximally easy to test for (both 
on a code generation level and on a C syntax level)? It doesn't mean that 
virtual address 0 "doesn't exist", and could not be mapped. 

The exact same thing is true of "IO port 0". It's the maximally simple 
_convention_ for someting that may actually exist, but it's somethign that 
NO NORMAL USER SHOULD EVER SEE AS A REAL IO PORT. There are special users 
that may use it, exactly the same way special users who know deep hardware 
details may decide that "on this architecture, the NULL pointer actually 
_literally_ means virtual address zero, and when I do *xyz* I actually can 
access things through it".

Does the fact that some things can use NULL as meaning something else than 
"no pointer" invalidate NULL as a concept? No. It just means that those 
things are very architecture-specific. They're not "common code" aka 
"drivers" any more.

Same exact deal.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Juju

2007-01-25 Thread Pete Zaitcev
On Fri, 26 Jan 2007 03:35:19 +0100, Stefan Richter <[EMAIL PROTECTED]> wrote:

> The fundamental thing about SBP-2 is that ORBs ( = SCSI command blocks
> plus SBP-2 header) and data buffers all reside in the memory of the
> initiator (or of a 3rd party on the FireWire bus).

I recognize the concept, I worked with SRP in Infiniband a bit.

> The target wrote an SBP-2 status block into our memory. The status block
> contains the FireWire bus address of the ORB to which it belongs. [...]

I see. SRP has a more flexible tag which can be used to look up
the just completed command more effectively. But if we only submit
one, it's a moot point of course.

> [...] Since there aren't many
> mapped ORBs per target, a linked list is a reasonable data structure to
> search over.

Righto. I'm used to having thousands of oustanding commands in arrays.

-- Pete
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA ahci Bug in 2.6.19.x

2007-01-25 Thread Stephen Evanchik

On 1/25/07, Luming Yu <[EMAIL PROTECTED]> wrote:

From the log:
2.6.18.3:
ACPI: PCI Interrupt :00:0f.0[B] -> GSI 21 (level, low) -> IRQ 217
2.6.20-rc5:
"ACPI: PCI Interrupt :00:0f.0[B] -> GSI 21 (level, low) -> IRQ 21"

Sounds like acpi interrupt configure problem. Please try acpi=off first.



Still does not recognize the SATA device (and the machine fails to
come up). I tested this with 2.6.19.2, 2.6.20-rc5 and -rc6 this
evening. I am going to build a vanilla 2.6.18 and see if that still
works as I am currently running an FC5 kernel.

Stephen


--
Stephen Evanchik
http://stephen.evanchik.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Ksummit-2006-discuss] 2007 Linux Kernel Summit

2007-01-25 Thread Josh Boyer

On 1/25/07, Theodore Tso <[EMAIL PROTECTED]> wrote:
 (Unless perhaps in some conspiracy theory scenario where

Microsoft pays $$$ to some VC company to sponsor an event in Moskow,
and then contracts out to the KGB to fill the meeting room with an
aerosolized powder of Polonium 210 to kill off all of the top Linux
developers in one fell swoop.  But that sort of thing only happens in
spy novels.  :-)


And if it did, we would be sad to be sure.  But source code never dies ;)

josh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6 I/O schedulers per thread?

2007-01-25 Thread Siddharth Taneja

Hi,

Are the I/O scheduler queues in the 2.6 kernel (particularly for CFQ)
maintained per thread or per process? If it is per process is there a future
plan in place to make them per thread.

Thanks a lot.

Siddharth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

2007-01-25 Thread David Woodhouse
On Thu, 2007-01-25 at 20:00 -0800, Linus Torvalds wrote:
> The resource code really is totally agnostic, and you're barking up the 
> wrong tree there. In many ways, the resource code isn't even about "IO 
> resources", it could do other things too. 

Of course it could, but then again why shouldn't we special-case zero in
_all_ of those use cases, just to make life easier for those poor driver
authors who presumably can't manage to write userspace code using stdin
or open() either.

I've already _said_ I think it's barking up the wrong tree; I think we
should just fix the driver code. But you disagree, so I'm trying to
understand what _you_ think the answer is here -- how the architecture
code should cope with this new decree that PIO address zero is invalid
even though it's actually always been OK in the past.

On this particular platform I believe that the PCI I/O resources are
allocated by firmware before we boot, and they look something like
this...

-007f : /[EMAIL PROTECTED]
  -000f : pcmcia_socket0(the CF card at zero)
  1000-11ff : PCI CardBus #11
  1400-15ff : PCI CardBus #11
00802000-01001fff : /[EMAIL PROTECTED]
  00802400-008024ff : :00:10.0
ff7fe000-dfff : /[EMAIL PROTECTED]

Since you don't like the idea that the resource code should special-case
zero, are you instead suggesting that we should _reassign_ the
already-assigned PCI resources when we boot, just to make sure that
driver authors don't have to deal with zeroes (at least, not until they
start to think about DMA)? Or are you thinking of some other solution?

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Juju

2007-01-25 Thread Kristian Høgsberg

Stefan Richter wrote:

Pete Zaitcev wrote:

On Thu, 25 Jan 2007 16:18:35 -0500, Kristian Høgsberg <[EMAIL PROTECTED]> wrote:

...
will do a status write to the status address specified in the ORB, at which 
point the SBP-2 transaction is complete.

You know, I wanted to use this picture for a long time:
 http://www.flickr.com/photos/zaitcev/369269557/


Haha, sure :)


The fundamental thing about SBP-2 is that ORBs ( = SCSI command blocks
plus SBP-2 header) and data buffers all reside in the memory of the
initiator (or of a 3rd party on the FireWire bus). The target peeks and
pokes them when and how it sees fit. The initiator pushes only tiny
notifications about availability of new ORBs to the target. The target
eventually completes SCSI commands in-order or out-of-order and signals
so by pushing a status block per one or more completed commands.

(Juju's fw-sbp2 gives only one command at a time to the target.
Mainline's sbp2 can optionally give more commands in a row, but the
implementation is subtly broken in several ways and therefore disabled
by default until I fix it right after hell froze over.)

Another important thing to know in order to understand fw-sbp2 and sbp2
is that they currently rely on OHCI-1394's physical DMA feature, which
I'll not explain here. It means two things: 1. FireWire bus addresses of
ORBs and buffers are directly derived from the DMA mapped address.
(FireWire bus addresses are the addresses used in communication between
SBP-2 initiator and target.) 2. Almost all of the transfers done by the
target do not generate interrupts. (Just the status write generates an
interrupt.)


Another thing that probably makes my explanation a little confusing is that 
there are two types of transactions: FireWire transactions which consists of a 
 request followed by a response and are pretty much the smallest interaction 
you can have with a remote device.  Then there are SBP-2 transactions, which 
are a higer level sequence layered on top of FireWire transactions.  An SBP-2 
transaction consists of a sequence of FireWire transactions, the first of 
which is initiated by the initiator.  This is the FireWire transaction that 
complete_transaction handles.  When this first FireWire transaction finishes 
succesfully, we know that the SBP-2 transaction has been started and we sit 
back and wait for the target to do it's part.  If that initial FireWire 
transaction fails, we need to fail the SBP-2 transaction we we're trying to start.



...

Now that you drew my attention to sbp2_status_write(), this looks wrong:

/* Lookup the orb corresponding to this status write. */
spin_lock_irqsave(>lock, flags);
list_for_each_entry(orb, >orb_list, link) {
if (status_get_orb_high(status) == 0 &&
status_get_orb_low(status) == orb->request_bus) {
list_del(>link);
break;
}
}
spin_unlock_irqrestore(>lock, flags);

Why is it that fw_request can't carry a pointer?


The target wrote an SBP-2 status block into our memory. The status block
contains the FireWire bus address of the ORB to which it belongs. Juju's
fw-sbp2 does the same as mainline's sbp2: Looking through the pile of
unfinished ORBs for one with the same FireWire bus address, which was
previously derived from the DMA mapped address.


But the status write actually does carry the address of the ORB it signals the 
completion of.  So in theory, we could just read out the ORB address from the 
status write packet and map that back to kernel virtual memory and do an 
appropriate container_of() call and we should have the struct sbp2_orb 
pointer.  The reason I still search through the list is of course that this is 
way to much trust to put into hardware as buggy as external storage devices. 
Blindly dereferencing a pointer returned by storage driver firmware is 
probably a very bad idea.


One thing I want to do (though very low priority) is to allocate the ORBs out 
of a preallocated circular buffer.  We can then check that the ORB pointer 
returned in the status write points into this buffer and that it's a multiple 
of the ORB size, at which point it should be safe to dereference it.


> Since there aren't many

mapped ORBs per target, a linked list is a reasonable data structure to
search over. That said --- Kristian, doesn't fw-sbp2 have at most 1 ORB
in sd->orb_list?


Yes, there is only ever one pending ORB in the list, so looking through the 
list is not exactly a time sink :)


Kristian

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

2007-01-25 Thread Linus Torvalds


On Fri, 26 Jan 2007, David Woodhouse wrote:
> 
> It's not just "my laptop", I believe. It's the generic resource code,
> which is happy to assign address zero since it's never been taught that
> zero is now a special case. If we're not going to ask for the bug I
> observed to be fixed -- if we're going to declare that driver authors
> don't have to sober up and clean up their code -- then the resource code
> should be modified accordingly.

The resource code really is totally agnostic, and you're barking up the 
wrong tree there. In many ways, the resource code isn't even about "IO 
resources", it could do other things too.

[ In practice, of course, IO resources is all it does and what it was 
  designed for, since there really aren't a lot of hierarchical things 
  that need to be able to nest and handle byte-range-like things. ]

It's really up to the architecture-specific PCI initialization what the 
PCI resources look like. The resource code just takes whatever resource 
layout it is given. Yes, there's a "root" ioport_resource, but that's just 
the container for the whole PCI resource tree, and generally you'd show 
the different PCI domains exposed with their buses in that tree.

Of course, for all the historical reasons (a single domain, and it was 
written for a PC), on PC's, the root PCI bus just points directly to the 
root io port resource. But the way things work is that you cardbus card 
doesn't just allocate space from that "ioport_resource" itself. No, it 
allocates space from the cardbus controller resources, which in turn have 
allocated space from the PCI bridge controller resources, etc etc all the 
way up to whatever is the PCI root resource.

There *are* drivers that use the "ioport_resource" directly, but they are 
system devices (where "ISA" counts as a system device - augh: it's not 
enumerable or discoverable) which know where they go. But a normal driver 
never does in any modern world.

So the way to make sure that PCI devices get allocated in the proper area 
is not to change the resource manager, but to make sure that the 
architecture initializes the root bridges for all the domains properly.

(A lot of them do the "PC thing", of course: they just make the ioport 
resource the direct parent of the root bridge, and that's ok if the root 
really _is_ supposed to cover everything from zero. On a PC, that's 
actually the right thing to do, because the system devices will insert 
themselves into the low area, and then PCIBIOS_MIN_IO - 0x1000 on a PC - 
is used as the minimum for any *dynamic* allocation.)

PCI PIO/IOMEM resource allocation is actually fairly complicated, and most 
people really *really* never need to care. It should be considered a sign 
of how well the resource code works that it all usually works without most 
people ever really needing to understand it.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: + oops-in-drivers-net-shaperc.patch added to -mm tree

2007-01-25 Thread David Miller
From: [EMAIL PROTECTED]
Date: Wed, 24 Jan 2007 19:54:51 -0800

> Hi,
> 
> The following code:
> [...]
> 
> Causes the following oops:
> 
 ...
> [   66.355188]  [] error_code+0x7c/0x84
> [   66.355192]  [] packet_sendmsg+0x147/0x201 [af_packet]
> [   66.355199]  [] sock_sendmsg+0xf9/0x116
> [   66.355204]  [] sys_sendto+0xbf/0xe0
> [   66.355208]  [] sys_socketcall+0x1aa/0x277
> [   66.355212]  [] sysenter_past_esp+0x5f/0x99
> [   66.355216]  ===
> [   66.355218] Code:  Bad EIP value.
> [   66.355223] EIP: [<>] 0x0 SS:ESP 0068:f6261d70
> 
> shaper_header() should check for shaper->dev not being NULL (ie. the
> shaper was actually attached) as in the following patch.
> This happens in mainline too (tested 2.6.19.2).
> 
> Signed-off-by: Frederik Deweerdt <[EMAIL PROTECTED]>
> Cc: "David S. Miller" <[EMAIL PROTECTED]>
> Cc: Stephen Hemminger <[EMAIL PROTECTED]>
> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>

Shaper is actually OK.  None of these hardware header callbacks
should be invoked if the device is down.  Yet, this is what is
accidently being allowed in the AF_PACKET socket layer.

Shaper makes sure to fail ->open() if shaper->dev is NULL, in order
to prevent this.

But AF_PACKET does it's check of device state too late, after the
dev->header() call.  That's the bug.

I'll fix it like this:

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 594c078..6dc01bd 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -359,6 +359,10 @@ static int packet_sendmsg_spkt(struct kiocb *iocb, struct 
socket *sock,
if (dev == NULL)
goto out_unlock;

+   err = -ENETDOWN;
+   if (!(dev->flags & IFF_UP))
+   goto out_unlock;
+
/*
 *  You may not queue a frame bigger than the mtu. This is the 
lowest level
 *  raw protocol and you must do your own fragmentation at this 
level.
@@ -407,10 +411,6 @@ static int packet_sendmsg_spkt(struct kiocb *iocb, struct 
socket *sock,
if (err)
goto out_free;
 
-   err = -ENETDOWN;
-   if (!(dev->flags & IFF_UP))
-   goto out_free;
-
/*
 *  Now send it
 */
@@ -738,6 +738,10 @@ static int packet_sendmsg(struct kiocb *iocb, struct 
socket *sock,
if (sock->type == SOCK_RAW)
reserve = dev->hard_header_len;
 
+   err = -ENETDOWN;
+   if (!(dev->flags & IFF_UP))
+   goto out_unlock;
+
err = -EMSGSIZE;
if (len > dev->mtu+reserve)
goto out_unlock;
@@ -770,10 +774,6 @@ static int packet_sendmsg(struct kiocb *iocb, struct 
socket *sock,
skb->dev = dev;
skb->priority = sk->sk_priority;
 
-   err = -ENETDOWN;
-   if (!(dev->flags & IFF_UP))
-   goto out_free;
-
/*
 *  Now send it
 */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

2007-01-25 Thread David Woodhouse
On Thu, 2007-01-25 at 18:58 -0800, Linus Torvalds wrote:
> And it's why I decreed, that the ONLY SANE THING is to just let people do 
> the obvious thing:
> 
> if (!dev->irq)
> return -ENODEV;
> 
> you don't have to know ANYTHING, and that code just works, and just looks 
> obvious. And you know what? If it causes a bit of pain for some platform 
> maintainer, I don't care one whit. Because it's obviously much better than 
> the alternatives.

I do understand the benefits; I'm just dubious about the trade-off. If
we could _completely_ isolate our poor crack-addled driver authors from
the nasty number zero then perhaps I'd be less unimpressed, but as it is
they still have to wake up and smell the coffee when it comes to DMA
addresses _anyway_.

When I eventually get to go home, which will hopefully still be some
time this month, I'll give some more coherent thought to the idea of
just using a (struct irq_desc *) directly instead of an integer. Then
you get to use NULL as a special case still. It's not as if people
should be pulling 'raw' IRQ numbers out of a hat or even module
parameters these days, except for ISA drivers where they can do
something like isa_irq[7] for the parallel port etc. Although that kind
of stuff should be done through a platform_device anyway too these days.

> But in the meantime: if nobody complains, and it happens to work on 
> hardware even though some devices _can_ see a port of zero, I also
> don't care. So I'm certainly not going to claim that your laptop "must
> be fixed". If it works, it works. Hey fine.

It's not just "my laptop", I believe. It's the generic resource code,
which is happy to assign address zero since it's never been taught that
zero is now a special case. If we're not going to ask for the bug I
observed to be fixed -- if we're going to declare that driver authors
don't have to sober up and clean up their code -- then the resource code
should be modified accordingly.

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Ksummit-2006-discuss] 2007 Linux Kernel Summit

2007-01-25 Thread Theodore Tso
On Fri, Jan 26, 2007 at 06:16:13AM +0530, Sunil Naidu wrote:
> Good thoughts ;-)  I too believe in this - Where there is a Will,
> there is a Way! That's the reason why I have proposed India as the
> location for KS 2007, am still awaiting for the response from Theodore
> Tso.

I did give you a response.  Find a way to pay for 80+ kernel summit
invitees to travel to India (preferably in business class :-), and
we'll talk.  That's not realistic?  Well, then perhaps having the
concept of holding Kernel Summit in India is not realistic.

As Dirk has pointed out, the Kernel Summit is a little unusual
compared to events such as FOSDEM or FISL, where there are 4000-5000
attendees, and the emphasis is on the power of a large number of
people in the OSS community.  The Kernel Summit is a very different
event, in that it is by-invitation with less than 100 people.  The
whole point is to get the top contributors together to be able to talk
amongst themselves in a high bandwidth environment.  You can't do that
amongst a crowd of 800, never mind 2000 or 4000.

So the only reason why any organization would be willing to pay so
that top contributors would come to some country like India would be
if to attract visibility and excitement to some big conference or
other big OSS/Linux initiative that happened right after the kernel
summit.  But quite frankly, I personally wouldn't consider it a wise
use of money; it would cost a heck of a lot of money and there are
plenty of other, more cost effective ways to promote a big OSS
conference in India.

And if there's no business case for the Indian government or some
local Indian companies to pay to fly all of the KS attendees to India,
why in the world do you think that companies like HP, Intel, IBM, Red
Hat, Novell, etc. will pay for their employees to travel to the Kernel
Summit?  They don't have even less of the incentive than the local
Indian companies/government to do so!  Maybe during the dot-com
madness of the late 1990's, when people spent money like crazy on
things that made no business sense whatsoever, but those days are long
gone.  Money doesn't grow on trees any more, if it ever did.

The main reason why we are trying a one-year experiment in Cambridge
is because approximately 1/3rd of the KS attendees are from Europe.
At the moment I believe we have exactly one person from India, who has
been selected through her own merit, to attend the Kernel Summit.  So
does it make sense to fly everyone else to India?  It doesn't seem so
to me!  

So the real answer to how do get the Kernel Summit to happen in India?
Bring a very large number of developers together in India.  Get them
to work really hard, encourage them to participate on LKML, and
produce lots of useful patches.  Eventually, some of them will do
enough good work that they will be recognized as maintainers of key
subsystems.  When there are 25-30+ people from India who have done
enough for the Linux kernel community and risen to be recognized as
top contributors in the Linux world such that they are invited to the
Kernel Summit on their own merits, I'm sure there a Kernel Summit in
India would very quickly follow.

Still, if someone wants to pay a vast quantity of money to pay travel
for all so that the KS can be held in some exotic location (especially
if it's Waikiki beach, or Aspen Colorado during the skiing season),
I'm sure people will be willing to listen.  But realistically, it just
doesn't make sense, so it's not likely someone would make us such an
offer.  (Unless perhaps in some conspiracy theory scenario where
Microsoft pays $$$ to some VC company to sponsor an event in Moskow,
and then contracts out to the KGB to fill the meeting room with an
aerosolized powder of Polonium 210 to kill off all of the top Linux
developers in one fell swoop.  But that sort of thing only happens in
spy novels.  :-)

Regards,

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA ahci Bug in 2.6.19.x

2007-01-25 Thread Luming Yu

From the log:

2.6.18.3:
ACPI: PCI Interrupt :00:0f.0[B] -> GSI 21 (level, low) -> IRQ 217
2.6.20-rc5:
"ACPI: PCI Interrupt :00:0f.0[B] -> GSI 21 (level, low) -> IRQ 21"

Sounds like acpi interrupt configure problem. Please try acpi=off first.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add a rounddown_pow_of_two() macro to log2.h.

2007-01-25 Thread Andrew Morton
On Thu, 25 Jan 2007 04:32:12 -0500 (EST)
"Robert P. J. Day" <[EMAIL PROTECTED]> wrote:

> 
>   In the same way that include/linux/log2.h defines the
> roundup_pow_of_two() macro, define the rounddown_pow_of_two() macro so
> peopls can stop re-implementing this operation using a loop.
> 
> Signed-off-by: Robert P. J. Day <[EMAIL PROTECTED]>
> 
> ---
> 
>   compile tested on x86 using "make allyesconfig", but there wasn't
> much chance of the build failing anyway since the patch only adds the
> macro definition, it doesn't change any existing code to use it.
> those patches will be submitted later, bit by bit.
> 
> diff --git a/include/linux/log2.h b/include/linux/log2.h
> index d02e1a5..6cf7081 100644
> --- a/include/linux/log2.h
> +++ b/include/linux/log2.h
> @@ -52,6 +63,15 @@ unsigned long __roundup_pow_of_two(unsigned long n)
>   return 1UL << fls_long(n - 1);
>  }
> 
> +/*
> + * round down to nearest power of two
> + */
> +static inline __attribute__((const))
> +unsigned long __rounddown_pow_of_two(unsigned long n)
> +{
> + return 1UL << (fls_long(n) - 1);
> +}

So __rounddown_pow_of_two(16) returns 8?

>  /**
>   * ilog2 - log of base 2 of 32-bit or a 64-bit unsigned value
>   * @n - parameter
> @@ -154,4 +174,20 @@ unsigned long __roundup_pow_of_two(unsigned long n)
>   __roundup_pow_of_two(n) \
>   )
> 
> +/**
> + * rounddown_pow_of_two - round the given value down to nearest power of two
> + * @n - parameter
> + *
> + * round the given value down to the nearest power of two
> + * - the result is undefined when n == 0
> + * - this can be used to initialise global variables from constant data
> + */
> +#define rounddown_pow_of_two(n)  \
> +(\
> + __builtin_constant_p(n) ? ( \
> + (n == 1) ? 0 :  \
> + (1UL << ilog2(n)) : \
> + __rounddown_pow_of_two(n)   \
> + )

But (1UL << ilog2(16)) returns 16?


And, afiact, your __rounddown_pow_of_two() is basically equivalent to (1UL
<< ilog2(n)) anyway.  So a suitable (and less buggy) implementation might be

static inline unsigned long rounddown_pow_of_two(unsigned long n)
{
return (n == 1) ? 0 : (1UL << ilog2(n));
}

But I'm not sure.  Please create a userspace test harness to test this
patch.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

2007-01-25 Thread Linus Torvalds


On Fri, 26 Jan 2007, David Woodhouse wrote:
> 
> The quality of our drivers is low; I'm fully aware that trying to
> improve driver quality is a quixotic task.

This is an important point. I actually hold things like the kernel VM 
layer to _much_ higher standards than I would ever hold a driver writer. 
That said, we tend to have "0 is special" even there, although we tend to 
try to make it always be about a pointer for those cases.

But drivers really are not just the bulk of the code, but they also can't 
be tested nearly as well. So for that reason (and really _only_ for that 
reason), we should always just accept that a "driver" should never have to 
worry, and *all* of the inconvenience of some special case or whatever 
must be solidly elsewhere.

> But where do we draw the line? Should we abandon the dma-mapping stuff 
> too? Declare that page zero is a special case and you can't DMA to it? 
> Should we try to make every PCI write also do a read in order to flush 
> posted writes, because people can't cope with the real world?

I think we should try to aim for two things:

 - things that "just work" on a PC should generally "just work" everywhere 
   else. Just for the simple reason that most drivers never get tested 
   anywhere else.

 - if it can't "just work", we should have as many static checks as 
   possible, and not let it compile.

 - and if it does compile, the stuff that "looks simple" should just work.

For example, the reason "0" is special really _is_ about the compiler. For 
any integer (or pointer, or FP) type in C, there really is just one 
special value. 0 (or NULL, for pointers).

The reason why resources work fine with zeroes is that for a *resource*, 
zero isn't actually a special value at all. Why? A resource isn't an 
integer value. It's a struct.

So compiler type safety (or lack thereof) really ends up forcing some of 
these issues. If something is represented as an integral type, the only 
*obvious* real special case is always going to be 0, just because it's the 
only one you can "test for" implicitly. Similarly, if you return a 
pointer, you only have NULL that can sanely be used as a special value.

(Yeah, mmap() has taught some people about MAP_FAIL, but that's pretty 
unusual too. And in the VFS layer, we use the magic "error pointers" that 
actually encode error values in the bits too - but then, VFS people tend 
to have to know a lot more about the kernel than a driver writer should 
have to - VFS people are just held to higher standards).

With a pointer, NULL is usually ok anyway, and always tends to mean 
something special - and for the other stuff you can then hide things 
"behind" the pointer. So with a pointer, you could often have "ptr->valid" 
or something else. With an integer, you can't really do that, because 0 
always remains special, just because EVEN JUST BY MISTAKE the test of the 
integer actually just ends up making zero be special.

So if you want a separate "enable" value, it almost has to be a structure 
or some opaque type, because that's the only type where there isn't an 
implicit special value.

We've done it occasionally. The VM has done it, for example. And we do it 
in drivers for more complex cases (and resources is one such case: it's 
already not a single value, since it has a start and a length, and other 
structure).

We could have done it for interrupts too. A "struct irqnum" that has a bit 
that specifies "valid". That would work. But it tends to be painful, so it 
really has to give you something more than "zero is disabled".

It's just not worth it.

And it's why I decreed, that the ONLY SANE THING is to just let people do 
the obvious thing:

if (!dev->irq)
return -ENODEV;

you don't have to know ANYTHING, and that code just works, and just looks 
obvious. And you know what? If it causes a bit of pain for some platform 
maintainer, I don't care one whit. Because it's obviously much better than 
the alternatives.

We may not "need" that rule for IO ports. If IO port 0 "happens to work", 
then hey, fine. But on the other hand, _all_ the same arguments really 
end up being still 100% true. If some driver happens to write

if (!dev->ioport)
return -ENODEV;

then I say: go for it. The _driver_ is correct. It's the obvious thing to 
do. It works on PC's, and it's simple and looks fine. Why not? If it 
causes some minor heartburn for an architecture maintainer, so what? 
Really? The tradeoff is obvious, and the architecture maintainer needs to 
just go and fix his IO mappings so that no device ever sees a "valid" port 
at port 0.

But in the meantime: if nobody complains, and it happens to work on 
hardware even though some devices _can_ see a port of zero, I also don't 
care. So I'm certainly not going to claim that your laptop "must be 
fixed". If it works, it works. Hey fine.

But the first driver that doesn't work because it thought it didn't have 
an IO port (beause it was zero), 

[PATCH] x86_64 - Fix FS/GS registers for VT execution

2007-01-25 Thread Zachary Amsden
Hi Andi, as we discussed, FS/GS segment state doesn't allow VT execution 
during boot.  Patch fixes this problem.  Please apply.  I will be 
sending for -stable review when upstream.


Thanks,

Zach
Initialize FS and GS to __KERNEL_DS as well.  The actual value of them is not
important, but it is important to reload them in protected mode.  At this time,
they still retain the real mode values from initial boot.  VT disallows
execution of code under such conditions, which means hardware virtualization
can not be used to boot the kernel on Intel platforms, making the boot time
painfully slow.

This requires moving the GS load before the load of GS_BASE, so just move
all the segments loads there to keep them together in the code.

Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

Index: linux-2.6.19/arch/x86_64/kernel/head.S
===
--- linux-2.6.19.orig/arch/x86_64/kernel/head.S 2006-11-29 13:57:37.0 
-0800
+++ linux-2.6.19/arch/x86_64/kernel/head.S  2007-01-11 16:57:24.0 
-0800
@@ -163,6 +163,20 @@ startup_64:
 */
lgdtcpu_gdt_descr
 
+   /* set up data segments. actually 0 would do too */
+   movl $__KERNEL_DS,%eax
+   movl %eax,%ds   
+   movl %eax,%ss
+   movl %eax,%es
+
+   /*
+* We don't really need to load %fs or %gs, but load them anyway
+* to kill any stale realmode selectors.  This allows execution
+* under VT hardware.
+*/
+   movl %eax,%fs
+   movl %eax,%gs
+   
/* 
 * Setup up a dummy PDA. this is just for some early bootup code
 * that does in_interrupt() 
@@ -173,12 +187,6 @@ startup_64:
shrq$32,%rdx
wrmsr   
 
-   /* set up data segments. actually 0 would do too */
-   movl $__KERNEL_DS,%eax
-   movl %eax,%ds   
-   movl %eax,%ss
-   movl %eax,%es
-   
/* esi is pointer to real mode structure with interesting info.
   pass it to C */
movl%esi, %edi


Re: [PATCH 02/09] atomic.h : Complete atomic_long operations in asm-generic

2007-01-25 Thread Mathieu Desnoyers
As Joe Perches pointed out, 4 casts to (long) are unneeded here.

The *_test functions only return integers, never a long.

Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]>

--- a/include/asm-generic/atomic.h
+++ b/include/asm-generic/atomic.h
@@ -70,28 +70,28 @@ static inline int atomic_long_sub_and_test(long i, 
atomic_long_t *l)
 {
atomic64_t *v = (atomic64_t *)l;

-   return (long)atomic64_sub_and_test(i, v);
+   return atomic64_sub_and_test(i, v);
 }
 
 static inline int atomic_long_dec_and_test(atomic_long_t *l)
 {
atomic64_t *v = (atomic64_t *)l;

-   return (long)atomic64_dec_and_test(v);
+   return atomic64_dec_and_test(v);
 }
 
 static inline int atomic_long_inc_and_test(atomic_long_t *l)
 {
atomic64_t *v = (atomic64_t *)l;

-   return (long)atomic64_inc_and_test(v);
+   return atomic64_inc_and_test(v);
 }
 
 static inline int atomic_long_add_negative(long i, atomic_long_t *l)
 {
atomic64_t *v = (atomic64_t *)l;

-   return (long)atomic64_add_negative(i, v);
+   return atomic64_add_negative(i, v);
 }
 
 static inline long atomic_long_add_return(long i, atomic_long_t *l)
-- 
OpenPGP public key:  http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.21 1/4] ehca: fix improper use of yield with spinlock held

2007-01-25 Thread Christoph Hellwig
On Wed, Jan 24, 2007 at 12:10:36AM +0100, Hoang-Nam Nguyen wrote:
> Here is a patch for ehca_cq.c that fixes improper use of yield
> with spinlock held.

Btw, please don't forget to replace the yield call with a proper
condition for 2.6.21.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Juju

2007-01-25 Thread Stefan Richter
Pete Zaitcev wrote:
> On Thu, 25 Jan 2007 16:18:35 -0500, Kristian Høgsberg <[EMAIL PROTECTED]> 
> wrote:
...
>> will do a status write to the status address specified in the ORB, at which 
>> point the SBP-2 transaction is complete.
> 
> You know, I wanted to use this picture for a long time:
>  http://www.flickr.com/photos/zaitcev/369269557/

The fundamental thing about SBP-2 is that ORBs ( = SCSI command blocks
plus SBP-2 header) and data buffers all reside in the memory of the
initiator (or of a 3rd party on the FireWire bus). The target peeks and
pokes them when and how it sees fit. The initiator pushes only tiny
notifications about availability of new ORBs to the target. The target
eventually completes SCSI commands in-order or out-of-order and signals
so by pushing a status block per one or more completed commands.

(Juju's fw-sbp2 gives only one command at a time to the target.
Mainline's sbp2 can optionally give more commands in a row, but the
implementation is subtly broken in several ways and therefore disabled
by default until I fix it right after hell froze over.)

Another important thing to know in order to understand fw-sbp2 and sbp2
is that they currently rely on OHCI-1394's physical DMA feature, which
I'll not explain here. It means two things: 1. FireWire bus addresses of
ORBs and buffers are directly derived from the DMA mapped address.
(FireWire bus addresses are the addresses used in communication between
SBP-2 initiator and target.) 2. Almost all of the transfers done by the
target do not generate interrupts. (Just the status write generates an
interrupt.)

...
> Now that you drew my attention to sbp2_status_write(), this looks wrong:
> 
> /* Lookup the orb corresponding to this status write. */
> spin_lock_irqsave(>lock, flags);
> list_for_each_entry(orb, >orb_list, link) {
> if (status_get_orb_high(status) == 0 &&
> status_get_orb_low(status) == orb->request_bus) {
> list_del(>link);
> break;
> }
> }
> spin_unlock_irqrestore(>lock, flags);
> 
> Why is it that fw_request can't carry a pointer?

The target wrote an SBP-2 status block into our memory. The status block
contains the FireWire bus address of the ORB to which it belongs. Juju's
fw-sbp2 does the same as mainline's sbp2: Looking through the pile of
unfinished ORBs for one with the same FireWire bus address, which was
previously derived from the DMA mapped address. Since there aren't many
mapped ORBs per target, a linked list is a reasonable data structure to
search over. That said --- Kristian, doesn't fw-sbp2 have at most 1 ORB
in sd->orb_list?
-- 
Stefan Richter
-=-=-=== ---= ==-=-
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

2007-01-25 Thread David Woodhouse
On Thu, 2007-01-25 at 17:28 -0800, Linus Torvalds wrote:
> 
> On Fri, 26 Jan 2007, David Woodhouse wrote:
> > 
> > You're thinking of MMIO, while the case we were discussing was PIO. My
> > laptop is perfectly happy to assign PIO resources from zero.
> 
> I was indeed thinking MMIO, but I really think it should extend to PIO 
> also. It certainly is (again) true on PC's, where the low IO space is 
> special and reserved for motherboard/system devices.

As you wish. 

There's a trade-off to be made. I happen to disagree with your choice --
you seem to want to introduce special cases and new layers of
'translation', and in some (admittedly, non-PC and relatively rare)
cases reduce functionality just to account for the fact that Linux
driver authors are fairly incompetent, and I think that's the wrong
choice.

So although I've mostly given up on interrupts¹, I reserve the right to
object if the "zero is not a real number" fallacy extends itself into
new areas. The thing about PIO addresses is an example of that. The
ide-cs and pata_pcmcia drivers (and indeed most other drivers AFAICT;
certainly all PCMCIA drivers I've used in my laptop) work _fine_ with
their main PIO address being zero. This thread started when I noticed
pata_pcmcia getting it wrong if its _BMDMA_ I/O range is zero.

Certainly, the resource code does not know about this newly-invented
special case for PIO address zero and is happy to assign it.

> > It doesn't need to be per-architecture; it can just be -1.
> 
> Bollocks. People tried that. People tried to force this idiotic notion of 
> "NO_IRQ" down my throat for several years. I even accepted it.
> 
> And then, after several years, when it was clear that it still didn't 
> work, and drivers just weren't getting updated, it was time to just face 
> reality: if the choice is between 0 and -1, 0 is simply much easier for 
> the bulk of the code.

The quality of our drivers is low; I'm fully aware that trying to
improve driver quality is a quixotic task. But where do we draw the
line? Should we abandon the dma-mapping stuff too? Declare that page
zero is a special case and you can't DMA to it? Should we try to make
every PCI write also do a read in order to flush posted writes, because
people can't cope with the real world?

> Live with it, or don't. I really don't care what you do on your hardware. 
> But if you can't face that
> 
>   if (!dev->irq)
>   ..
> 
> is simpler for people to write, and that it's what we've done for a long 
> time, then that really is YOUR problem.

Even userspace people seem to cope with it in the case of file
descriptors. Kernel people have to cope too, for stuff like DMA
addresses.

> And I bet there are PIO devices out there that consider address zero to be 
> disabled. For EXACTLY the same reason.

> (And yes, hardware actually tends to do the same thing. For PCI irq 
> routing registers, an irq value of 0 pretty much universally means 
> "disabled". In fact, even your lovely Cardbus example actually is an 
> example of exactly this: the very IO limit registers are DEFINED IN 
> HARDWARE to special-case address zero - so that making the base/limit 
> registers be zero actually disables the IO window, rather than making it 
> mean "four IO bytes at address zero").

My example was 16-bit PCMCIA (actually CompactFlash), and it's
_certainly_ not true in that case. Devices often don't bother to decode
address lines higher than the two or three they need to tell which of
their registers is being accessed; they let the socket do the equivalent
of the 'chip select' decode.

-- 
dwmw2

¹ Mostly. I still wonder occasionally if we could use _pointers_ in the
generic code, and let drivers deal with a (struct irq_desc *) instead of
just a number. Since the old system of ISA IRQ numbering is fairly out
of date now and we're starting to recognise the fact that IRQs are a 
_tree_, we might get away with it. And it would let us preserve the special
case for NULL -- but I haven't had time to fully work through the
implications. 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.20-rc6 - build failure

2007-01-25 Thread Eyal Lebedinsky
I should have added that this is on Debian stable:
$ gcc --version
gcc (GCC) 3.3.5 (Debian 1:3.3.5-13)

Eyal Lebedinsky wrote:
> i386
> Practically all modules selected.
> 
>   Building modules, stage 2.
>   MODPOST 1931 modules
> WARNING: drivers/atm/fore_200e.o - Section mismatch: reference to .init.text: 
> from .text between 'fore200e_initialize' (at offset 0x25af) and 
> 'fore200e_monitor_putc'
> WARNING: drivers/atm/lanai.o - Section mismatch: reference to .init.text: 
> from .text between 'sram_test_pass' (at offset 0x1a8) and 
> 'sram_test_and_clear'
> WARNING: drivers/atm/zatm.o - Section mismatch: reference to .init.text: from 
> .text after 'zatm_init_one' (at offset 0x1f25)
> WARNING: drivers/atm/zatm.o - Section mismatch: reference to .init.text: from 
> .text after 'zatm_init_one' (at offset 0x1f32)
> WARNING: drivers/net/rrunner.o - Section mismatch: reference to 
> .init.text:rr_init from .text between 'rr_init_one' (at offset 0x1d0) and 
> 'rr_remove_one'
> WARNING: drivers/net/sis900.o - Section mismatch: reference to 
> .init.text:sis900_mii_probe from .text between 'sis900_probe' (at offset 
> 0x47b) and 'sis900_default_phy'
> WARNING: drivers/net/sunhme.o - Section mismatch: reference to .init.text: 
> from .text between 'happy_meal_pci_probe' (at offset 0x2add) and 
> 'happy_meal_pci_remove'
> WARNING: drivers/net/tokenring/3c359.o - Section mismatch: reference to 
> .init.text:xl_init from .text between 'xl_probe' (at offset 0x1da) and 
> 'xl_hw_reset'
> WARNING: drivers/net/tulip/de2104x.o - Section mismatch: reference to 
> .init.text: from .text between 'de_init_one' (at offset 0x2151) and 
> 'de_remove_one'
> WARNING: drivers/net/tulip/de2104x.o - Section mismatch: reference to 
> .init.text: from .text between 'de_init_one' (at offset 0x2158) and 
> 'de_remove_one'
> WARNING: drivers/net/tulip/de2104x.o - Section mismatch: reference to 
> .init.text: from .text between 'de_init_one' (at offset 0x2220) and 
> 'de_remove_one'
> WARNING: "__udivdi3" [fs/ocfs2/ocfs2.ko] undefined!
> make[1]: *** [__modpost] Error 1

-- 
Eyal Lebedinsky ([EMAIL PROTECTED]) 
attach .zip as .dat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] remove duplicate ids from ata_piix

2007-01-25 Thread Jeff Garzik

Greg KH wrote:

From: Greg Kroah-Hartman <[EMAIL PROTECTED]>

It seems that the ata_piix driver has two duplicate ids, one of them
with a different 'private' field in it, which was never being used due
to the match for the device happening on an earlier entry.

This patch removes the duplicates, if this is the correct thing to do in
this case for the ICH5 device or not, I'll leave to you :)

This duplication was pointed out to me by Kay Sievers <[EMAIL PROTECTED]>

Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>

---
 drivers/ata/ata_piix.c |4 
 1 file changed, 4 deletions(-)

--- gregkh-2.6.orig/drivers/ata/ata_piix.c
+++ gregkh-2.6/drivers/ata/ata_piix.c
@@ -191,12 +191,8 @@ static const struct pci_device_id piix_p
/* Intel ICH4 (i845GV, i845E, i852, i855) UDMA 100 */
{ 0x8086, 0x24CA, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich_pata_100 },
{ 0x8086, 0x24CB, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich_pata_100 },
-   /* Intel ICH5 */
-   { 0x8086, 0x24DB, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich_pata_133 },
/* C-ICH (i810E2) */
{ 0x8086, 0x245B, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich_pata_100 },
-   /* ESB (855GME/875P + 6300ESB) UDMA 100  */
-   { 0x8086, 0x25A2, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich_pata_100 },
/* ICH6 (and 6) (i915) UDMA 100 */
{ 0x8086, 0x266F, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich_pata_100 },
/* ICH7/7-R (i945, i975) UDMA 100*/


Both IDs are clearly dups, but ICH5 should support ich_pata_133, so 
you're removing the wrong entry there.


I think the right thing to do is remove the entries that appear near the 
top of the list, but I would like Alan (Mr. PATA) to confirm...


/* Also PIIX4E (fn3 rev 2) and PIIX4M (fn3 rev 3) */
{ 0x8086, 0x7111, PCI_ANY_ID, PCI_ANY_ID, 0, 0, piix_pata_33 },
-   { 0x8086, 0x24db, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich_pata_100 },
-   { 0x8086, 0x25a2, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich_pata_100 },

Regards,

Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

2007-01-25 Thread Jeff Garzik

Linus Torvalds wrote:
So *if* you use the new "iomap" interfaces, and the new "pci_iomap()" 
things, that should actually not just allow drivers (like the ATA layer) 
to share much more code between the PIO and MMIO cases, but it hopefully 
actually makes it easier for strange architectures to do it all.



Another aside:  Tejun took my libata iomap and came up with something 
I'm quite happy with, so libata will /finally/ switch over to using the 
new iomap interfaces for 98% of all drivers, as of 2.6.21.


Look at "[PATCHSET] Managed device resources, take #3" on LKML if you're 
interested.  Tejun's 'devres' stuff makes it a lot easier for drivers to 
reserve, map, unmap, and free various hardware resources.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2/5: Updates to SPI and mmc_spi: clock with cs inactive, kernel 2.6.19

2007-01-25 Thread David Brownell
On Wednesday 24 January 2007 8:50 pm, Hans-Peter Nilsson wrote:
> 
> There was a comment in the mmc_spi.c glue driver at
> http://www.gossamer-threads.com/lists/linux/kernel/671939#671939>:
> +* some cards seemed happier if they were initialized first
> +* by the native MMC stack, not SPI ... and in other cases
> +* rmmod/modprobe of mmc_spi helped the card work better,
> +* even without power cycling
> +*
> +* FIXME find out what that important state is, which is
> +* not reset here... and makes robustness problems
> 
> I think I've spotted the problem, or at least a problem with a
> solution that fits the description.  What's missing is "at least
> 74 SD clocks to the SD card with keeping CMD line to high. In
> case of SPI mode, CS shall be held to high during 74 clock
> cycles" (from Section 6.4.1, in "Simplified Physical Layer
> Specification 2.0"). ...
> 
> The gotcha is that the SPI framework didn't have a way to
> express transfers with chip-select inactive.  Sure, you can set
> chip-select to inactive for a period of *time*, but never while
> also toggling the clock. 

Actually it _does_ have a way to handle it, if you think about the
problem a bit differently ... focus on high/low, not "inactive":

spi->mode |= SPI_CS_HIGH;
spi_setup(spi);
// chipselect is now low (conventionally active)

// ... then high during this next transfer:
... write 74+ zero bits (10+ bytes)
// now low again (conventionally "active")

spi->mode &= ~SPI_CS_HIGH;
spi_setup(spi); 
// now high (conventionally "inactive")

That is, for just one one transfer, say the device uses inverse
chipselect polarity:  active-high, not active-low.  Then back to
normal.  Right?  So long as nobody else can access that SPI bus
(the claim/release issue), all should be fine.

That mechanism has been defined for some time, but not widely
implemented; a few chips rquite that semantic for chipselect in
normal operation.

If you agree on this, please update your patch #4 accordingly.
You may need to update your SPI controller driver to handle this
issue too.  (ISTR punting on making the bitbang framework handle
it, but so long as the chipselect is managed with a GPIO this
ought to be almost trivial.)

- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

2007-01-25 Thread Linus Torvalds


On Thu, 25 Jan 2007, Jeff Garzik wrote:
> 
> On sparc64, for example, after I pointed this out to DaveM, he was able to
> implement the new iomap interface without the 'if (pio-mem-area)' branch
> present on x86.

However, in all honesty, we have triggered bugs in that area too, simply 
because some driver code "knew" that PIO addresses could fit in 16 bits, 
and used u16 or "unsigned short" to remember the PIO address. Both ARM and 
Sparc was bitten by this, although usually the issue is trivial to fix 
once found.

Also, many ISA-only drivers actually have hardcoded PIO numbers (eg 
"0x1f0").

But yes, I would generally suggest that architectures where the PIO range 
is really just another magic MMIO range (which is most of the non-x86 
world, as you point out) might as well at least aim for doing the 
remapping early (ie with "pci_resource_start()")

Making that easy was one of my goals for the "new" IO accessor functions, 
in fact.

Not that many people actually use them.

So *if* you use the new "iomap" interfaces, and the new "pci_iomap()" 
things, that should actually not just allow drivers (like the ATA layer) 
to share much more code between the PIO and MMIO cases, but it hopefully 
actually makes it easier for strange architectures to do it all.

So traditionally, we've had PIO be "limited integer addresses, and some 
drivers know magic numbers", but hopefully new drivers could at least try 
to use some of the infrastructure where we try to help people not have to 
deal with it so much as a special case any more.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] suspend/resume debugging: device filter

2007-01-25 Thread Greg KH
On Thu, Jan 25, 2007 at 12:05:01PM +0100, Ingo Molnar wrote:
> Subject: [patch] suspend/resume debugging: device filter
> From: Ingo Molnar <[EMAIL PROTECTED]>
> 
> this patch implements the /sys/power/filter attribute, which takes a 
> string. If a device's name matches the filter string (exactly), then 
> that device is excluded from suspend/resume.
> 
> this can be helpful in a number of ways when debugging suspend and 
> resume problems:
> 
>  - if CONFIG_DISABLE_CONSOLE_SUSPEND is used then the serial
>console is still suspended after which point there's no
>log output. Doing "echo serial > /sys/power/filter" keeps
>the serial port active, so any messages (and crash info)
>after that point is displayed.
> 
>  - if a device is suspected to be the reason of resume failure
>then it can be excluded via the filter. That device obviously
>wont work, but users can thus help us debug resume problems
>in combination with pm_trace, without having to hack the kernel.
> 
> (note that you can obvious break suspend/resume via the filter, by 
> excluding a vital device - so it is only to be used when suspend or 
> resume is broken to begin with.)
> 
> it might be better to do this centrally in sysfs, via a per-device 
> attribute, to individually enable suspend and resume on a per device 
> basis, but my sysfs-fu is not strong enough for that now ;-)

Here's a (compile tested only) patch that does this on a per-device
basis, which is smaller, and should work just as well as your patch.

It creates a new file in the power/ directory for every device called
"can_suspend".  Write a '0' to it to prevent that device from being
suspended.

Does this work for you?

Yeah, the wording of the filename and variable isn't the best, I'm open
to better choices if anyone has them.

thanks,

greg k-h

---
 drivers/base/power/suspend.c |2 +-
 drivers/base/power/sysfs.c   |   30 ++
 include/linux/device.h   |1 +
 3 files changed, 32 insertions(+), 1 deletion(-)

--- gregkh-2.6.orig/drivers/base/power/suspend.c
+++ gregkh-2.6/drivers/base/power/suspend.c
@@ -78,7 +78,7 @@ int suspend_device(struct device * dev, 
suspend_report_result(dev->class->suspend, error);
}
 
-   if (!error && dev->bus && dev->bus->suspend && 
!dev->power.power_state.event) {
+   if (!error && !dev->no_suspend && dev->bus && dev->bus->suspend && 
!dev->power.power_state.event) {
dev_dbg(dev, "%s%s\n",
suspend_verb(state.event),
((state.event == PM_EVENT_SUSPEND)
--- gregkh-2.6.orig/drivers/base/power/sysfs.c
+++ gregkh-2.6/drivers/base/power/sysfs.c
@@ -141,12 +141,42 @@ wake_store(struct device * dev, struct d
 
 static DEVICE_ATTR(wakeup, 0644, wake_show, wake_store);
 
+static ssize_t can_suspend_show(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   return sprintf(buf, "%s\n", dev->no_suspend ? "no" : "yes");
+}
+
+static ssize_t can_suspend_store(struct device *dev,
+struct device_attribute *attr,
+const char *buf, size_t n)
+{
+   if (!n)
+   return -EINVAL;
+
+   switch (buf[0]) {
+   case 'y':
+   case 'Y':
+   case '1':
+   dev->no_suspend = 0;
+   break;
+   case 'n':
+   case 'N':
+   case '0':
+   dev->no_suspend = 1;
+   break;
+   }
+
+   return n;
+}
+static DEVICE_ATTR(can_suspend, 0644, can_suspend_show, can_suspend_store);
 
 static struct attribute * power_attrs[] = {
 #ifdef CONFIG_PM_SYSFS_DEPRECATED
_attr_state.attr,
 #endif
_attr_wakeup.attr,
+   _attr_can_suspend.attr,
NULL,
 };
 static struct attribute_group pm_attr_group = {
--- gregkh-2.6.orig/include/linux/device.h
+++ gregkh-2.6/include/linux/device.h
@@ -365,6 +365,7 @@ struct device {
charbus_id[BUS_ID_SIZE];/* position on parent bus */
struct device_type  *type;
unsignedis_registered:1;
+   unsignedno_suspend:1;
struct device_attribute uevent_attr;
struct device_attribute *devt_attr;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH -rt 2/2] RCU priority boosting additions to rcutorture

2007-01-25 Thread Paul E. McKenney
On Thu, Jan 25, 2007 at 11:06:35AM -0800, Josh Triplett wrote:
> Paul E. McKenney wrote:
> > On Thu, Jan 25, 2007 at 12:47:04AM -0800, Josh Triplett wrote:
> >> One major item: this new test feature really needs a new module parameter 
> >> to
> >> enable or disable it.
> > 
> > CONFIG_PREEMPT_RCU_BOOST is the parameter -- if not set, then no test.
> > This parameter is provided by the accompanying RCU-boost patch.
> 
> It seems useful for rcutorture to use or not use the preempting thread
> independently of CONFIG_PREEMPT_RCU_BOOST.  That would bring you from two
> cases to four, and the two new cases both make sense:
> 
> * CONFIG_PREEMPT_RCU_BOOST=n, but run rcutorture with the preempting thread.
>   This configuration allows you to demonstrate the need for
>   CONFIG_PREEMPT_RCU_BOOST, by showing what happens when you need it and don't
>   have it.
> 
> * CONFIG_PREEMPT_RCU_BOOST=y, but run rcutorture without the preempting
>   thread.  This configuration allows you to test with rcutorture while running
>   a *real* real-time workload rather than the simple preempting thread, or
>   just test basic RCU functionality.
> 
> A simple boolean module_param would work here.

OK, sold!  I will add this.  Perhaps CONFIG_PREEMPT_RCU_TORTURE.

> At some point, we may want to add the ability to run multiple preempting
> threads, but that doesn't need to happen for this patch.

I considered that for this initial round, but you only need to preempt
a single RCU reader to force the RCU booster to do something.  ;-)

> >> Paul E. McKenney wrote:
> >>> diff -urpNa -X dontdiff linux-2.6.20-rc4-rt1/kernel/rcutorture.c 
> >>> linux-2.6.20-rc4-rt1-rcubtorture/kernel/rcutorture.c
> >>> --- linux-2.6.20-rc4-rt1/kernel/rcutorture.c  2007-01-09 
> >>> 10:59:54.0 -0800
> >>> +++ linux-2.6.20-rc4-rt1-rcubtorture/kernel/rcutorture.c  2007-01-23 
> >>> 11:27:49.0 -0800
> 
> >>> +static int rcu_torture_preempt(void *arg)
> >>> +{
> >>> + int completedstart;
> >>> + time_t gcstart;
> >>> + struct sched_param sp;
> >>> +
> >>> + sp.sched_priority = MAX_RT_PRIO - 1;
> >>> + sched_setscheduler(current, SCHED_RR, );
> >>> + current->flags |= PF_NOFREEZE;
> >>> +
> >>> + do {
> >>> + completedstart = rcu_torture_completed();
> >>> + gcstart = xtime.tv_sec;
> >>> + while ((xtime.tv_sec - gcstart < 10) &&
> >>> +(rcu_torture_completed() == completedstart))
> >>> + cond_resched();
> >>> + if (rcu_torture_completed() == completedstart)
> >>> + rcu_torture_preempt_errors++;
> >>> + schedule_timeout_interruptible(shuffle_interval * HZ);
> >> Why call schedule_timeout_interruptible here without actually handling
> >> interruptions?  So that you can send it a signal to cause the shuffle 
> >> early?
> > 
> > It allows you to kill the process in order to get the module unload to
> > happen more quickly in case someone specified an overly long interval.
> 
> I didn't actually know that you could kill a kthread from userspace. :)
> 
> That rationale makes sense.

It won't actually die, but if I understand correctly (a big "if") the
signal would cause schedule_timeout_interruptible() to return, allowing
the kthread_should_stop() check to happen.

> > But now that you mention this, a simple one-second sleep is probably
> > appropriate here.
> 
> OK.
> 
> >>> + } while (!kthread_should_stop());
> >>> + return NULL;
> >>> +}
> >>> +
> >>> +static void rcu_preempt_start(void)
> >>> +{
> >>> + rcu_preeempt_task = kthread_run(rcu_torture_preempt, NULL,
> >>> + "rcu_torture_preempt");
> >>> + if (IS_ERR(rcu_preeempt_task)) {
> >>> + VERBOSE_PRINTK_ERRSTRING("Failed to create preempter");
> >> This ought to include the errno value, PTR_ERR(rcu_preempt_task).
> > 
> > Good point -- what I should do is return this value so that
> > rcu_torture_init() can return it, failing the module-load process
> > and unwinding.
> 
> Even better, yes.
> 
> >>> + rcu_preeempt_task = NULL;
> >>> + }
> >>> +}
> >>> +
> >>> +static void rcu_preempt_end(void)
> >>> +{
> >>> + if (rcu_preeempt_task != NULL) {
> >> if (rcu_preempt_task) would work just as well here.
> > 
> > True, but was being consistent with usage elsewhere in this file.
> 
> Fair enough; don't worry about it for this patch, then.  I'll deal with that
> particular style cleanup later, throughout rcutorture.

Sounds good to me!  ;-)

> >>>  static struct rcu_torture_ops rcu_ops = {
> >>>   .init = NULL,
> >>>   .cleanup = NULL,
> >>> @@ -267,7 +334,9 @@ static struct rcu_torture_ops rcu_ops = 
> >>>   .completed = rcu_torture_completed,
> >>>   .deferredfree = rcu_torture_deferred_free,
> >>>   .sync = synchronize_rcu,
> >>> - .stats = NULL,
> >>> + .preemptstart = rcu_preempt_start,
> >>> + .preemptend = rcu_preempt_end,
> >>> + .stats = rcu_preempt_stats,
> >>>   .name = "rcu"
> >>>  };
> >>>  
> >>> @@ -306,6 +375,8 @@ static struct rcu_torture_ops 

Re: [PATCH] x86_64: fix put_user for 64-bit constant

2007-01-25 Thread Linus Torvalds


On Thu, 25 Jan 2007, Roland McGrath wrote:
>
> On x86-64, a put_user call using a 64-bit pointer and a constant value that
> is > 0x will produce code that doesn't assemble.  This patch fixes
> the asm construct to use the Z constraint for 32-bit constants.

Ahh. Will apply.

Just out of interest: did we have such code and it just happened to use a 
register, or was this found because you wrote some new code that triggered 
something that had just never been triggered before? Or is there a newer 
gcc that is better at optimizations and finds a constant propagation where 
it used to not find it?

Inquiring minds..

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH -rt 1/2] RCU priority boosting that survives semi-vicious testing

2007-01-25 Thread Paul E. McKenney
On Thu, Jan 25, 2007 at 11:58:16AM -0800, Paul E. McKenney wrote:
> On Thu, Jan 25, 2007 at 01:29:23AM -0800, Josh Triplett wrote:
> > Overall, this code looks sensible to me.  Some comments on the patch below.

[ . . . ]

> Thank you again for the careful and thorough review!!!
> 
> I will test these changes and send out an update.

And here is the updated RCU priority-boost patch.

Signed-off-by: Paul E. McKenney <[EMAIL PROTECTED]>
---

 include/linux/init_task.h  |   12 +
 include/linux/rcupdate.h   |   12 +
 include/linux/rcupreempt.h |   20 +
 include/linux/sched.h  |   16 +
 init/main.c|1 
 kernel/Kconfig.preempt |   32 ++
 kernel/fork.c  |6 
 kernel/rcupreempt.c|  528 +
 kernel/rtmutex.c   |7 
 kernel/sched.c |5 
 10 files changed, 636 insertions(+), 3 deletions(-)

diff -urpNa -X dontdiff linux-2.6.20-rc4-rt1/include/linux/init_task.h 
linux-2.6.20-rc4-rt1-rcub/include/linux/init_task.h
--- linux-2.6.20-rc4-rt1/include/linux/init_task.h  2007-01-09 
10:59:54.0 -0800
+++ linux-2.6.20-rc4-rt1-rcub/include/linux/init_task.h 2007-01-09 
11:01:12.0 -0800
@@ -87,6 +87,17 @@ extern struct nsproxy init_nsproxy;
.siglock= __SPIN_LOCK_UNLOCKED(sighand.siglock),\
 }
 
+#ifdef CONFIG_PREEMPT_RCU_BOOST
+#define INIT_RCU_BOOST_PRIO .rcu_prio  = MAX_PRIO,
+#define INIT_PREEMPT_RCU_BOOST(tsk)\
+   .rcub_rbdp  = NULL, \
+   .rcub_state = RCU_BOOST_IDLE,   \
+   .rcub_entry = LIST_HEAD_INIT(tsk.rcub_entry),
+#else /* #ifdef CONFIG_PREEMPT_RCU_BOOST */
+#define INIT_RCU_BOOST_PRIO
+#define INIT_PREEMPT_RCU_BOOST(tsk)
+#endif /* #else #ifdef CONFIG_PREEMPT_RCU_BOOST */
+
 extern struct group_info init_groups;
 
 /*
@@ -143,6 +154,7 @@ extern struct group_info init_groups;
.pi_lock= RAW_SPIN_LOCK_UNLOCKED(tsk.pi_lock),  \
INIT_TRACE_IRQFLAGS \
INIT_LOCKDEP\
+   INIT_PREEMPT_RCU_BOOST(tsk) \
 }
 
 
diff -urpNa -X dontdiff linux-2.6.20-rc4-rt1/include/linux/rcupdate.h 
linux-2.6.20-rc4-rt1-rcub/include/linux/rcupdate.h
--- linux-2.6.20-rc4-rt1/include/linux/rcupdate.h   2007-01-09 
10:59:54.0 -0800
+++ linux-2.6.20-rc4-rt1-rcub/include/linux/rcupdate.h  2007-01-09 
11:01:12.0 -0800
@@ -227,6 +227,18 @@ extern void rcu_barrier(void);
 extern void rcu_init(void);
 extern void rcu_advance_callbacks(int cpu, int user);
 extern void rcu_check_callbacks(int cpu, int user);
+#ifdef CONFIG_PREEMPT_RCU_BOOST
+extern void init_rcu_boost_late(void);
+extern void __rcu_preempt_boost(void);
+#define rcu_preempt_boost() \
+   do { \
+   if (unlikely(current->rcu_read_lock_nesting > 0)) \
+   __rcu_preempt_boost(); \
+   } while (0)
+#else /* #ifdef CONFIG_PREEMPT_RCU_BOOST */
+#define init_rcu_boost_late()
+#define rcu_preempt_boost()
+#endif /* #else #ifdef CONFIG_PREEMPT_RCU_BOOST */
 
 #endif /* __KERNEL__ */
 #endif /* __LINUX_RCUPDATE_H */
diff -urpNa -X dontdiff linux-2.6.20-rc4-rt1/include/linux/rcupreempt.h 
linux-2.6.20-rc4-rt1-rcub/include/linux/rcupreempt.h
--- linux-2.6.20-rc4-rt1/include/linux/rcupreempt.h 2007-01-09 
10:59:54.0 -0800
+++ linux-2.6.20-rc4-rt1-rcub/include/linux/rcupreempt.h2007-01-25 
16:30:46.0 -0800
@@ -42,6 +42,26 @@
 #include 
 #include 
 
+#ifdef CONFIG_PREEMPT_RCU_BOOST
+/*
+ * Task state with respect to being RCU-boosted.  This state is changed
+ * by the task itself in response to the following three events:
+ * 1. Preemption (or block on lock) while in RCU read-side critical section.
+ * 2. Outermost rcu_read_unlock() for blocked RCU read-side critical section.
+ *
+ * The RCU-boost task also updates the state when boosting priority.
+ */
+enum rcu_boost_state {
+   RCU_BOOST_IDLE = 0,/* Not yet blocked if in RCU read-side. */
+   RCU_BOOST_BLOCKED = 1, /* Blocked from RCU read-side. */
+   RCU_BOOSTED = 2,   /* Boosting complete. */
+   RCU_BOOST_INVALID = 3, /* For bogus state sightings. */
+};
+
+#define N_RCU_BOOST_STATE (RCU_BOOST_INVALID + 1)
+
+#endif /* #ifdef CONFIG_PREEMPT_RCU_BOOST */
+
 #define rcu_qsctr_inc(cpu)
 #define rcu_bh_qsctr_inc(cpu)
 #define call_rcu_bh(head, rcu) call_rcu(head, rcu)
diff -urpNa -X dontdiff linux-2.6.20-rc4-rt1/include/linux/sched.h 
linux-2.6.20-rc4-rt1-rcub/include/linux/sched.h
--- linux-2.6.20-rc4-rt1/include/linux/sched.h  2007-01-09 10:59:54.0 
-0800
+++ linux-2.6.20-rc4-rt1-rcub/include/linux/sched.h 2007-01-09 
11:01:12.0 -0800
@@ -699,6 +699,14 @@ struct signal_struct {
 #define is_rt_policy(p)((p) != SCHED_NORMAL && (p) 

Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

2007-01-25 Thread Jeff Garzik

Linus Torvalds wrote:


On Fri, 26 Jan 2007, David Woodhouse wrote:

You're thinking of MMIO, while the case we were discussing was PIO. My
laptop is perfectly happy to assign PIO resources from zero.


I was indeed thinking MMIO, but I really think it should extend to PIO 
also. It certainly is (again) true on PC's, where the low IO space is 
special and reserved for motherboard/system devices.


Via the even-less-of-an-excuse-than-you-thought department:

Many (most?) non-x86 handle PIO via special mappings and additional 
serialization instructions, but otherwise treat PIO register space in a 
very similar manner to MMIO.


Thus, it's /easier/ on non-x86 to ensure that PIO addresses never land 
at zero, because you must remap /anyway/.  It's only on x86 that PIO 
register spaces are accessed by vastly different CPU instructions.  Most 
other arches convert PIO accesses into massage+mmio R/W+massage.


On sparc64, for example, after I pointed this out to DaveM, he was able 
to implement the new iomap interface without the 'if (pio-mem-area)' 
branch present on x86.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

2007-01-25 Thread Linus Torvalds


On Fri, 26 Jan 2007, David Woodhouse wrote:
> 
> You're thinking of MMIO, while the case we were discussing was PIO. My
> laptop is perfectly happy to assign PIO resources from zero.

I was indeed thinking MMIO, but I really think it should extend to PIO 
also. It certainly is (again) true on PC's, where the low IO space is 
special and reserved for motherboard/system devices.

If you want to be different, that's YOUR problem. Some architectures have 
tried to look different, and then drivers break on them, but they have 
only themselves to blame.

> I believe PCMCIA just uses the generic resource code, which also seems
> to lack any knowledge of this hackish special case for zero.

The resource code actually knows what "enabled" means. A lot of other code 
does not.

> > kernel and all devices are concerned, "!dev->irq" means that the irq 
> > doesn't exist or hasn't been mapped for that device yet).
> 
> So again you end up in a situation where zero is a strange special case.

No. We end up in a situation where *drivers* never have any strange or 
special cases. 

You need to have a "no irq" thing. It might as well be zero, since that is 
not just the de-facto standard, it's also the one and only value that 
leads to easily readable source-code (ie test it as a boolean in C).

The exact same thing has been true of MMIO. I would be not at all 
surprised if several drivers do the same for PIO.

It's something you can trust on a PC. See above on your problems if you 
decide that you want to be "generic" and use a value that is illegal on 
99% of all hardware.

> It doesn't need to be per-architecture; it can just be -1.

Bollocks. People tried that. People tried to force this idiotic notion of 
"NO_IRQ" down my throat for several years. I even accepted it.

And then, after several years, when it was clear that it still didn't 
work, and drivers just weren't getting updated, it was time to just face 
reality: if the choice is between 0 and -1, 0 is simply much easier for 
the bulk of the code.

Live with it, or don't. I really don't care what you do on your hardware. 
But if you can't face that

if (!dev->irq)
..

is simpler for people to write, and that it's what we've done for a long 
time, then that really is YOUR problem.

The exact same issues have been true in MMIO. Some code will keep track of 
separate "enabled" bits: the resource management code is such code. Guess 
what? Not a lot of drivers tend to do that. You can try to fight 
windmills, or you can just accept that the very language we use (namely, 
C) has made 0 be special, and tends to be used to say "nobody home" simply 
because it has that special meaning for a C compiler.

And I bet there are PIO devices out there that consider address zero to be 
disabled. For EXACTLY the same reason.

(And yes, hardware actually tends to do the same thing. For PCI irq 
routing registers, an irq value of 0 pretty much universally means 
"disabled". In fact, even your lovely Cardbus example actually is an 
example of exactly this: the very IO limit registers are DEFINED IN 
HARDWARE to special-case address zero - so that making the base/limit 
registers be zero actually disables the IO window, rather than making it 
mean "four IO bytes at address zero").

But hey, if it works for you, go wild. Just don't expect drivers to always 
work.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] scsi: use lock per host instead of per device for shared queue tag host

2007-01-25 Thread Jeff Garzik

Ed Lin wrote:

There may possibly be some other errors. So we need a lock here.
I think the simple but reliable way to do it is just to replace
queue lock with a host lock. James pointed out that there may be
performance slow down when many devices are accessed at the
same time. But I think the major part is still on the hardware,
and a host lock is the price these kind of controllers must pay.



I agree.

Further, a host lock is (a) common across many controllers, to protect 
host-wide resources and (b) only limits us when the controller is 
CPU-limited, a very rare scenario.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86_64: fix put_user for 64-bit constant

2007-01-25 Thread Roland McGrath
On x86-64, a put_user call using a 64-bit pointer and a constant value that
is > 0x will produce code that doesn't assemble.  This patch fixes
the asm construct to use the Z constraint for 32-bit constants.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 include/asm-x86_64/uaccess.h |   70 +-
 1 files changed, 35 insertions(+), 35 deletions(-)

diff --git a/include/asm-x86_64/uaccess.h b/include/asm-x86_64/uaccess.h
index d5dbc87..0129c79 100644  
--- a/include/asm-x86_64/uaccess.h
+++ b/include/asm-x86_64/uaccess.h
@@ -157,7 +157,7 @@ do {
\
  case 1: __put_user_asm(x,ptr,retval,"b","b","iq",-EFAULT); break;\
  case 2: __put_user_asm(x,ptr,retval,"w","w","ir",-EFAULT); break;\
  case 4: __put_user_asm(x,ptr,retval,"l","k","ir",-EFAULT); break;\
- case 8: __put_user_asm(x,ptr,retval,"q","","ir",-EFAULT); break;\
+ case 8: __put_user_asm(x,ptr,retval,"q","","Zr",-EFAULT); break;\
  default: __put_user_bad();\
}   \
 } while (0)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [patch] scsi: use lock per host instead of per device for shared queue tag host

2007-01-25 Thread Ed Lin


> -Original Message-
> From: Jens Axboe [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, January 25, 2007 7:48 AM
> To: Ed Lin
> Cc: David Somayajulu; Michael Reed; linux-scsi; linux-kernel; 
> james.Bottomley; jeff; Promise_Linux
> Subject: Re: [patch] scsi: use lock per host instead of per 
> device for shared queue tag host
> 
> 
> On Thu, Jan 25 2007, Jens Axboe wrote:
> > On Wed, Jan 24 2007, Ed Lin wrote:
> > > 
> > > 
> > > > -Original Message-
> > > > From: David Somayajulu [mailto:[EMAIL PROTECTED] 
> > > > Sent: Wednesday, January 24, 2007 5:03 PM
> > > > To: Ed Lin; Michael Reed
> > > > Cc: linux-scsi; linux-kernel; james.Bottomley; jeff; 
> > > > Promise_Linux; Jens Axboe
> > > > Subject: RE: [patch] scsi: use lock per host instead of per 
> > > > device for shared queue tag host
> > > > 
> > > > 
> > > > > It seems another driver(qla4xxx) is also using shared 
> queue tag.
> > > > > It is natural to imagine there might be same symptom in that
> > > > > driver. But I don't know the driver and have no hardware so I
> > > > > can not say anything certain about it.
> > > > 
> > > > qla4xxx implements slightly differently, in the sense we 
> > > > don't have the
> > > > equivalent of 
> > > > struct st_ccb ccb[MU_MAX_REQUEST]; 
> > > > which is in struct st_hba. In other words we don't have 
> a local array
> > > > which like stex to keep track of the outstanding 
> commands to the hba.
> > > > 
> > > > We had a discussion on this one while implementing 
> block-layer tagging
> > > > in qla4xxx and Jens Axboe added the test_and_set_bit() in the 
> > > > following
> > > > code in blk_queue_start_tag() to take care of it.
> > > > do {
> > > > tag = find_first_zero_bit(bqt->tag_map, 
> bqt->max_depth);
> > > > if (tag >= bqt->max_depth)
> > > > return 1;
> > > > } while (test_and_set_bit(tag, bqt->tag_map));
> > > > Please see the following link for the discussion
> > > > http://marc.theaimsgroup.com/?l=linux-scsi=115886351206726=2
> > > > 
> > > > Cheers
> > > > David Somayajulu
> > > > QLogic Corporation
> > > >
> > > 
> > > Yes, this piece of code of allocating tag, in itself, is safe.
> > > But the following
> > > 
> > >   if (unlikely(!__test_and_clear_bit(tag, bqt->tag_map))) {
> > >   printk(KERN_ERR "%s: attempt to clear non-busy tag
> > > (%d)\n",
> > >  __FUNCTION__, tag);
> > >   return;
> > >   }
> > > 
> > > code of freeing tag (in blk_queue_end_tag())seems to be using
> > > unsafe __test_and_clear_bit instead of test_and_clear_bit.
> > > I once changed it to test_and_clear_bit and thought it was fixed.
> > > But the panic happened thereafter nonetheless(using gcc 3.4.6.
> > > gcc 4.1.0 is better but still with kernel errors). bqt also needs
> > > to be protected in this case. Replacing queue lock per device with
> > > a host lock is a simple but logical fix for it. To introduce a
> > > more refined lock is possible, but seems too tedious and elaborate
> > > for this issue, since a queue lock is already out there, and a
> > > hostwide lock is needed anyway.
> > 
> > Does this fix it? There really should be no need to add 
> extra locking
> > for this, it would be a shame.
> > 
> > diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
> > index fb67897..e752e5d 100644
> > --- a/block/ll_rw_blk.c
> > +++ b/block/ll_rw_blk.c
> > @@ -1072,12 +1072,16 @@ void 
> blk_queue_end_tag(request_queue_t *q, struct request *rq)
> >  */
> > return;
> >  
> > -   if (unlikely(!__test_and_clear_bit(tag, bqt->tag_map))) {
> > +   smp_mb__before_clear_bit();
> > +
> > +   if (unlikely(!test_and_clear_bit(tag, bqt->tag_map))) {
> > printk(KERN_ERR "%s: attempt to clear non-busy 
> tag (%d)\n",
> >__FUNCTION__, tag);
> > return;
> > }
> >  
> > +   smp_mb__after_clear_bit();
> > +
> > list_del_init(>queuelist);
> > rq->cmd_flags &= ~REQ_QUEUED;
> > rq->tag = -1;
> > 
> 
> Double checking the actual implementation, the smp_mb__* should not be
> needed with the test_and_*_bit operations. The __test_and_clear_bit()
> change is needed, though. What kind of crash did you see when you did
> that? It should not crash, but you could see the "attempt to clear
> non-busy tag" error though.

Besides the test_and_clear_bit, I think the bqt code(refer to last mail)
also needs protection, like:

list_del_init(>queuelist);
...
if (unlikely(bqt->tag_index[tag] == NULL))
printk(KERN_ERR "%s: tag %d is missing\n",
   __FUNCTION__, tag);

bqt->tag_index[tag] = NULL;
bqt->busy--;

and

bqt->tag_index[tag] = rq;
...
list_add(>queuelist, >busy_list);
bqt->busy++;

because bqt is also globally shared within all devices in the host in
this case. (q->queue_tags was assigned as host->bqt in scsi_activate_tcq
)

With a gcc 

Re: Powermac Sound Issue with Wallstreet PowerBook

2007-01-25 Thread Andreas Schwab
Jim Gifford <[EMAIL PROTECTED]> writes:

> For over the last few weeks, I've been trying to get sound working on my
> Powerbook G3. I've tried various different things with no luck. Anyone got
> any ideas??
>
> dmasound_pmac: Awacs/Screamer Codec Mfct: 2 Rev 3
> input: dmasound beeper as /class/input/input4
> PowerMac Screamer  DMA sound driver rev 016 installed
> Core driver edition 01.06 : PowerMac Built-in Sound driver edition 00.07
> Write will use4 fragments of   32768 bytes as default
> Read  will use4 fragments of   32768 bytes as default
> Advanced Linux Sound Architecture Driver Version 1.0.13 (Tue Nov 28
> 14:07:24 2006 UTC).
> PM: Adding info for platform:snd_powermac
> snd: can't request rsrc  0 (Sound Control:
> 0xf3014000:f3014fff)
> ALSA device list:
>  No soundcards found.

Looks like the problem is that you have both the OSS and the ALSA driver
built into your kernel.  Try removing one of them, preferably the OSS
driver (it's obsolete).

Andreas.

-- 
Andreas Schwab, SuSE Labs, [EMAIL PROTECTED]
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Excessive printks increase top mem usage?

2007-01-25 Thread David Schwartz

> On 1/25/07, yogeshwar sonawane <[EMAIL PROTECTED]> wrote:
> > Hi all,
> > I am running a user application which will just open/close my driver
> > (simple one, empty functions with only printks) infinitely.
> > A massive use of printk can slow down the system noticeably OR it can
> > affect some time calculations.
> > Apart from this, it was increasing top mem usage also. After closing
> > the application, the memory consumption was not coming down(not
> > freeing mem). Is this the expected behaviour? OR i am missing
> > something?
> >
> > Can anybody help me in guiding the reason for this? Any help/links plz.
> >
> > Thanks in advance,
> > Yogeshwar

What does "increased top mem usage" mean? You have given no reason to
suggest that there's anything unusual about this.

Here's one theory: Some program is writing these kernel messages to a file.
Because some other process might come and read that file later, the kernel
keeps copies of the file data in memory.

The kernel sees no advantage in having memory free. Free memory is memory
that's not doing any good. Better to keep data in memory that might be
useful later. We can always free the memory later if (and only if) we have
something better to do with it.

DS


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux i386 kernel 2.4.16 2.4.22 scheduling issue?

2007-01-25 Thread Robert Hancock

Fei Liu wrote:
Hello group, I have some concern about scheduling issue with 2.4.16 and 
2.4.22 i386 kernel where I see 1200+ interrupts and context switches per 
second through vmstat when machine is under load. Is this behavior 
normal? Is there any know scheduling issue with the above mentioned 
kernel versions?


That number doesn't seem too unusual..

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.20-rc6 - clean boot on P4/HT

2007-01-25 Thread Sunil Naidu

On 1/25/07, Sunil Naidu <[EMAIL PROTECTED]> wrote:


It was a cool booting, have really enjoyed this.


Here is the clean boot for me after spending for good time. Here is
the box info:-

Linux Typhoon 2.6.20-rc6-Akula-II #1 SMP Fri Jan 26 05:33:18 IST 2007
i686 i686 i386 GNU/Linux

I shall test more on different boards/arch, sleep time for now.


Linux version 2.6.20-rc6-Akula-II ([EMAIL PROTECTED]) (gcc version 4.1.1
20070105 (Red Hat 4.1.1-51)) #1 SMP Fri Jan 26 05:33:18 IST 2007
BIOS-provided physical RAM map:
sanitize start
sanitize end
copy_e820_map() start:  size: 0009fc00 end:
0009fc00 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 0009fc00 size: 0400 end:
000a type: 2
copy_e820_map() start: 000e6000 size: 0001a000 end:
0010 type: 2
copy_e820_map() start: 0010 size: 1f62f800 end:
1f72f800 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 1f72f800 size: 0800 end:
1f73 type: 4
copy_e820_map() start: 1f73 size: 0001 end:
1f74 type: 3
copy_e820_map() start: 1f74 size: 000b end:
1f7f type: 4
copy_e820_map() start: 1f7f size: 0001 end:
1f80 type: 2
copy_e820_map() start: e000 size: 1000 end:
f000 type: 2
copy_e820_map() start: fed13000 size: 7000 end:
fed1a000 type: 2
copy_e820_map() start: fed1c000 size: 00084000 end:
feda type: 2
BIOS-e820:  - 0009fc00 (usable)
BIOS-e820: 0009fc00 - 000a (reserved)
BIOS-e820: 000e6000 - 0010 (reserved)
BIOS-e820: 0010 - 1f72f800 (usable)
BIOS-e820: 1f72f800 - 1f73 (ACPI NVS)
BIOS-e820: 1f73 - 1f74 (ACPI data)
BIOS-e820: 1f74 - 1f7f (ACPI NVS)
BIOS-e820: 1f7f - 1f80 (reserved)
BIOS-e820: e000 - f000 (reserved)
BIOS-e820: fed13000 - fed1a000 (reserved)
BIOS-e820: fed1c000 - feda (reserved)
503MB LOWMEM available.
found SMP MP-table at 000ff780
Entering add_active_range(0, 0, 128815) 0 entries of 256 used
Zone PFN ranges:
 DMA 0 -> 4096
 Normal   4096 ->   128815
early_node_map[1] active PFN ranges
   0:0 ->   128815
On node 0 totalpages: 128815
 DMA zone: 32 pages used for memmap
 DMA zone: 0 pages reserved
 DMA zone: 4064 pages, LIFO batch:0
 Normal zone: 974 pages used for memmap
 Normal zone: 123745 pages, LIFO batch:31
DMI 2.3 present.
ACPI: RSDP (v000 ACPIAM) @ 0x000f4eb0
ACPI: RSDT (v001 INTEL  D915GAV  0x20060222 MSFT 0x0097) @ 0x1f73
ACPI: FADT (v002 INTEL  D915GAV  0x20060222 MSFT 0x0097) @ 0x1f730200
ACPI: MADT (v001 INTEL  D915GAV  0x20060222 MSFT 0x0097) @ 0x1f730390
ACPI: MCFG (v001 INTEL  D915GAV  0x20060222 MSFT 0x0097) @ 0x1f730400
ACPI: ASF! (v016 LEGEND I865PASF 0x0001 INTL 0x02002026) @ 0x1f736050
ACPI: TCPA (v001 INTEL  TBLOEMID 0x0001 MSFT 0x0097) @ 0x1f7360f0
ACPI: WDDT (v001 INTEL  OEMWDDT  0x0001 INTL 0x02002026) @ 0x1f736122
ACPI: DSDT (v001 INTEL  D915GAV  0x0001 INTL 0x02002026) @ 0x
ACPI: PM-Timer IO Port: 0x408
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:4 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 15:4 APIC version 20
ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 2, version 32, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 2000 (gap: 1f80:c080)
Detected 3000.275 MHz processor.
Built 1 zonelists.  Total pages: 127809
Kernel command line: ro root=LABEL=/1 rhgb quiet
mapped APIC to d000 (fee0)
mapped IOAPIC to c000 (fec0)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
CPU 0 irqstacks, hard=c0451000 soft=c044f000
PID hash table entries: 2048 (order: 11, 8192 bytes)
Console: colour VGA+ 80x25
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Memory: 504940k/515260k available (2052k kernel code, 9788k reserved,
1048k data, 248k init, 0k highmem)
virtual kernel memory layout:
   fixmap  : 0xfffb7000 - 0xf000   ( 288 kB)
  

Re: [Ksummit-2006-discuss] 2007 Linux Kernel Summit

2007-01-25 Thread Sunil Naidu

From: Dirk Hohndel <[EMAIL PROTECTED]>
Date: Thu, 25 Jan 2007 06:22:54 -0800

We've held netconf in Japan, Montreal, Portland, and this year will
likely be Europe.  People found a way to make it and we found
sufficient sponsorship for all attendees who needed monetary travel
assistence every time.  This is why I don't buy the funding argument
at all.  People who want to come and have the desire, will find a way.
Conferences who think attendance is important, will find a way to
provide sponsorship for travel when needed.


Good thoughts ;-)  I too believe in this - Where there is a Will,
there is a Way! That's the reason why I have proposed India as the
location for KS 2007, am still awaiting for the response from Theodore
Tso.

But, funding or drag or time zone could be genuine reasons for many
because of the Geographical factors. Again, this shouldn't deter I
feel...Else, we have to wait for KS 2008 or 2009 ;-)

[OT] Dirk, I did attend Intel Developer Forum 2006 by paying $70
because I wanted to!

~Akula2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.20-rc6] MIPS: Fix some whitespace damage

2007-01-25 Thread Ralf Baechle
On Thu, Jan 25, 2007 at 08:46:14PM +0100, Jan Altenberg wrote:

> Fix some whitespace damage in arch/mips/Kconfig.debug

Thanks, applied.

  Ralf
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/8] Allow huge page allocations to use GFP_HIGH_MOVABLE

2007-01-25 Thread Mel Gorman

Huge pages are not movable so are not allocated from ZONE_MOVABLE. However,
as ZONE_MOVABLE will always have pages that can be migrated or reclaimed,
it can be used to satisfy hugepage allocations even when the system has been
running a long time. This allows an administrator to resize the hugepage
pool at runtime depending on the size of ZONE_MOVABLE.

This patch adds a new sysctl called hugepages_treat_as_movable. When
a non-zero value is written to it, future allocations for the huge page
pool will use ZONE_MOVABLE. Despite huge pages being non-movable, we do not
introduce additional external fragmentation of note as huge pages are always
the largest contiguous block we care about.

Signed-off-by: Mel Gorman <[EMAIL PROTECTED]>
---

 include/linux/hugetlb.h   |3 +++
 include/linux/mempolicy.h |6 +++---
 include/linux/sysctl.h|1 +
 kernel/sysctl.c   |8 
 mm/hugetlb.c  |   23 ---
 mm/mempolicy.c|5 +++--
 6 files changed, 38 insertions(+), 8 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.20-rc4-mm1-002_create_movable_zone/include/linux/hugetlb.h 
linux-2.6.20-rc4-mm1-003_mark_hugepages_movable/include/linux/hugetlb.h
--- linux-2.6.20-rc4-mm1-002_create_movable_zone/include/linux/hugetlb.h
2007-01-07 05:45:51.0 +
+++ linux-2.6.20-rc4-mm1-003_mark_hugepages_movable/include/linux/hugetlb.h 
2007-01-25 17:34:15.0 +
@@ -14,6 +14,7 @@ static inline int is_vm_hugetlb_page(str
 }
 
 int hugetlb_sysctl_handler(struct ctl_table *, int, struct file *, void __user 
*, size_t *, loff_t *);
+int hugetlb_treat_movable_handler(struct ctl_table *, int, struct file *, void 
__user *, size_t *, loff_t *);
 int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct 
vm_area_struct *);
 int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct 
page **, struct vm_area_struct **, unsigned long *, int *, int);
 void unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned 
long);
@@ -28,6 +29,8 @@ int hugetlb_reserve_pages(struct inode *
 void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
 
 extern unsigned long max_huge_pages;
+extern unsigned long hugepages_treat_as_movable;
+extern gfp_t htlb_alloc_mask;
 extern const unsigned long hugetlb_zero, hugetlb_infinity;
 extern int sysctl_hugetlb_shm_group;
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.20-rc4-mm1-002_create_movable_zone/include/linux/mempolicy.h 
linux-2.6.20-rc4-mm1-003_mark_hugepages_movable/include/linux/mempolicy.h
--- linux-2.6.20-rc4-mm1-002_create_movable_zone/include/linux/mempolicy.h  
2007-01-07 05:45:51.0 +
+++ linux-2.6.20-rc4-mm1-003_mark_hugepages_movable/include/linux/mempolicy.h   
2007-01-25 17:34:15.0 +
@@ -159,7 +159,7 @@ extern void mpol_fix_fork_child_flag(str
 
 extern struct mempolicy default_policy;
 extern struct zonelist *huge_zonelist(struct vm_area_struct *vma,
-   unsigned long addr);
+   unsigned long addr, gfp_t gfp_flags);
 extern unsigned slab_node(struct mempolicy *policy);
 
 extern enum zone_type policy_zone;
@@ -256,9 +256,9 @@ static inline void mpol_fix_fork_child_f
 #define set_cpuset_being_rebound(x) do {} while (0)
 
 static inline struct zonelist *huge_zonelist(struct vm_area_struct *vma,
-   unsigned long addr)
+   unsigned long addr, gfp_t gfp_flags)
 {
-   return NODE_DATA(0)->node_zonelists + gfp_zone(GFP_HIGHUSER);
+   return NODE_DATA(0)->node_zonelists + gfp_zone(gfp_flags);
 }
 
 static inline int do_migrate_pages(struct mm_struct *mm,
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.20-rc4-mm1-002_create_movable_zone/include/linux/sysctl.h 
linux-2.6.20-rc4-mm1-003_mark_hugepages_movable/include/linux/sysctl.h
--- linux-2.6.20-rc4-mm1-002_create_movable_zone/include/linux/sysctl.h 
2007-01-07 05:45:51.0 +
+++ linux-2.6.20-rc4-mm1-003_mark_hugepages_movable/include/linux/sysctl.h  
2007-01-25 17:34:15.0 +
@@ -202,6 +202,7 @@ enum
VM_PANIC_ON_OOM=33, /* panic at out-of-memory */
VM_VDSO_ENABLED=34, /* map VDSO into new processes? */
VM_MIN_SLAB=35,  /* Percent pages ignored by zone reclaim */
+   VM_HUGETLB_TREAT_MOVABLE=36, /* Allocate hugepages from ZONE_MOVABLE */
 };
 
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.20-rc4-mm1-002_create_movable_zone/kernel/sysctl.c 
linux-2.6.20-rc4-mm1-003_mark_hugepages_movable/kernel/sysctl.c
--- linux-2.6.20-rc4-mm1-002_create_movable_zone/kernel/sysctl.c
2007-01-17 17:08:38.0 +
+++ linux-2.6.20-rc4-mm1-003_mark_hugepages_movable/kernel/sysctl.c 
2007-01-25 17:34:15.0 +
@@ -919,6 +919,14 @@ static ctl_table vm_table[] = {
.mode   = 0644,
.proc_handler   = _dointvec,
 },
+{
+ 

[PATCH 1/8] Add __GFP_MOVABLE for callers to flag allocations that may be migrated

2007-01-25 Thread Mel Gorman

It is often known at allocation time when a page may be migrated or
not. This patch adds a flag called __GFP_MOVABLE and a new mask called
GFP_HIGH_MOVABLE. Allocations using the __GFP_MOVABLE can be either migrated
using the page migration mechanism or reclaimed by syncing with backing
storage and discarding.

An API function very similar to alloc_zeroed_user_highpage() is added for
__GFP_MOVABLE allocations called alloc_zeroed_user_highpage_movable(). The
flags used by alloc_zeroed_user_highpage() are not changed because it changes
the semantics of an existing API. After this patch is applied there are no
in-kernel users of alloc_zeroed_user_highpage() so it probably should be
marked deprecated if this patch is merged.

Note that this patch includes a minor cleanup to the use of __GFP_ZERO
in shmem.c to keep all flag modifications to inode->mapping in the
shmem_dir_alloc() helper function. This clean-up suggestion is courtesy of
Hugh Dickens.

Additional credit goes to Christoph Lameter and Linus Torvalds for shaping
the concept. Credit to Hugh Dickens for catching issues with shmem swap
vector and ramfs allocations.

Signed-off-by: Mel Gorman <[EMAIL PROTECTED]>
---

 fs/inode.c|   10 ++--
 fs/ramfs/inode.c  |1 
 include/asm-alpha/page.h  |3 +-
 include/asm-cris/page.h   |3 +-
 include/asm-h8300/page.h  |3 +-
 include/asm-i386/page.h   |3 +-
 include/asm-ia64/page.h   |5 ++--
 include/asm-m32r/page.h   |3 +-
 include/asm-s390/page.h   |3 +-
 include/asm-x86_64/page.h |3 +-
 include/linux/gfp.h   |   10 +++-
 include/linux/highmem.h   |   51 +++--
 mm/memory.c   |8 +++---
 mm/mempolicy.c|4 +--
 mm/migrate.c  |2 -
 mm/shmem.c|7 -
 mm/swap_prefetch.c|2 -
 mm/swap_state.c   |2 -
 18 files changed, 98 insertions(+), 25 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.20-rc4-mm1-clean/fs/inode.c 
linux-2.6.20-rc4-mm1-001_mark_highmovable/fs/inode.c
--- linux-2.6.20-rc4-mm1-clean/fs/inode.c   2007-01-17 17:08:26.0 
+
+++ linux-2.6.20-rc4-mm1-001_mark_highmovable/fs/inode.c2007-01-25 
17:30:30.0 +
@@ -145,7 +145,7 @@ static struct inode *alloc_inode(struct 
mapping->a_ops = _aops;
mapping->host = inode;
mapping->flags = 0;
-   mapping_set_gfp_mask(mapping, GFP_HIGHUSER);
+   mapping_set_gfp_mask(mapping, GFP_HIGH_MOVABLE);
mapping->assoc_mapping = NULL;
mapping->backing_dev_info = _backing_dev_info;
 
@@ -521,7 +521,13 @@ repeat:
  * new_inode   - obtain an inode
  * @sb: superblock
  *
- * Allocates a new inode for given superblock.
+ * Allocates a new inode for given superblock. The default gfp_mask
+ * for allocations related to inode->i_mapping is GFP_HIGH_MOVABLE. If
+ * HIGHMEM pages are unsuitable or it is known that pages allocated
+ * for the page cache are not reclaimable or migratable,
+ * mapping_set_gfp_mask() must be called with suitable flags on the
+ * newly created inode's mapping
+ *
  */
 struct inode *new_inode(struct super_block *sb)
 {
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.20-rc4-mm1-clean/fs/ramfs/inode.c 
linux-2.6.20-rc4-mm1-001_mark_highmovable/fs/ramfs/inode.c
--- linux-2.6.20-rc4-mm1-clean/fs/ramfs/inode.c 2007-01-07 05:45:51.0 
+
+++ linux-2.6.20-rc4-mm1-001_mark_highmovable/fs/ramfs/inode.c  2007-01-25 
17:30:30.0 +
@@ -61,6 +61,7 @@ struct inode *ramfs_get_inode(struct sup
inode->i_blocks = 0;
inode->i_mapping->a_ops = _aops;
inode->i_mapping->backing_dev_info = _backing_dev_info;
+   mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
switch (mode & S_IFMT) {
default:
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.20-rc4-mm1-clean/include/asm-alpha/page.h 
linux-2.6.20-rc4-mm1-001_mark_highmovable/include/asm-alpha/page.h
--- linux-2.6.20-rc4-mm1-clean/include/asm-alpha/page.h 2007-01-07 
05:45:51.0 +
+++ linux-2.6.20-rc4-mm1-001_mark_highmovable/include/asm-alpha/page.h  
2007-01-25 17:30:30.0 +
@@ -17,7 +17,8 @@
 extern void clear_page(void *page);
 #define clear_user_page(page, vaddr, pg)   clear_page(page)
 
-#define alloc_zeroed_user_highpage(vma, vaddr) alloc_page_vma(GFP_HIGHUSER | 
__GFP_ZERO, vma, vmaddr)
+#define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
+   alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vmaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
 extern void copy_page(void * _to, void * _from);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 

[PATCH 2/8] Create the ZONE_MOVABLE zone

2007-01-25 Thread Mel Gorman

This patch creates an additional zone, ZONE_MOVABLE.  This zone is only
usable by allocations which specify both __GFP_HIGHMEM and __GFP_MOVABLE.
Hot-added memory continues to be placed in their existing destination as
there is no mechanism to redirect them to a specific zone.


Signed-off-by: Mel Gorman <[EMAIL PROTECTED]>
---

 include/linux/gfp.h|3 
 include/linux/mm.h |1 
 include/linux/mmzone.h |   21 +++-
 mm/highmem.c   |5 
 mm/page_alloc.c|  224 +++-
 5 files changed, 247 insertions(+), 7 deletions(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.20-rc4-mm1-001_mark_highmovable/include/linux/gfp.h 
linux-2.6.20-rc4-mm1-002_create_movable_zone/include/linux/gfp.h
--- linux-2.6.20-rc4-mm1-001_mark_highmovable/include/linux/gfp.h   
2007-01-25 17:30:30.0 +
+++ linux-2.6.20-rc4-mm1-002_create_movable_zone/include/linux/gfp.h
2007-01-25 17:32:18.0 +
@@ -101,6 +101,9 @@ static inline enum zone_type gfp_zone(gf
if (flags & __GFP_DMA32)
return ZONE_DMA32;
 #endif
+   if ((flags & (__GFP_HIGHMEM | __GFP_MOVABLE)) ==
+   (__GFP_HIGHMEM | __GFP_MOVABLE))
+   return ZONE_MOVABLE;
 #ifdef CONFIG_HIGHMEM
if (flags & __GFP_HIGHMEM)
return ZONE_HIGHMEM;
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.20-rc4-mm1-001_mark_highmovable/include/linux/mm.h 
linux-2.6.20-rc4-mm1-002_create_movable_zone/include/linux/mm.h
--- linux-2.6.20-rc4-mm1-001_mark_highmovable/include/linux/mm.h
2007-01-17 17:08:35.0 +
+++ linux-2.6.20-rc4-mm1-002_create_movable_zone/include/linux/mm.h 
2007-01-25 17:32:18.0 +
@@ -974,6 +974,7 @@ extern unsigned long find_max_pfn_with_a
 extern void free_bootmem_with_active_regions(int nid,
unsigned long max_low_pfn);
 extern void sparse_memory_present_with_active_regions(int nid);
+extern int cmdline_parse_kernelcore(char *p);
 #ifndef CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
 extern int early_pfn_to_nid(unsigned long pfn);
 #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.20-rc4-mm1-001_mark_highmovable/include/linux/mmzone.h 
linux-2.6.20-rc4-mm1-002_create_movable_zone/include/linux/mmzone.h
--- linux-2.6.20-rc4-mm1-001_mark_highmovable/include/linux/mmzone.h
2007-01-17 17:08:35.0 +
+++ linux-2.6.20-rc4-mm1-002_create_movable_zone/include/linux/mmzone.h 
2007-01-25 17:32:18.0 +
@@ -138,6 +138,7 @@ enum zone_type {
 */
ZONE_HIGHMEM,
 #endif
+   ZONE_MOVABLE,
MAX_NR_ZONES
 };
 
@@ -159,6 +160,7 @@ enum zone_type {
+ defined(CONFIG_ZONE_DMA32)\
+ 1 \
+ defined(CONFIG_HIGHMEM)   \
+   + 1 \
 )
 #if __ZONE_COUNT < 2
 #define ZONES_SHIFT 0
@@ -166,6 +168,8 @@ enum zone_type {
 #define ZONES_SHIFT 1
 #elif __ZONE_COUNT <= 4
 #define ZONES_SHIFT 2
+#elif __ZONE_COUNT <= 8
+#define ZONES_SHIFT 3
 #else
 #error ZONES_SHIFT -- too many zones configured adjust calculation
 #endif
@@ -499,10 +503,21 @@ static inline int populated_zone(struct 
return (!!zone->present_pages);
 }
 
+extern int movable_zone;
+static inline int zone_movable_is_highmem(void)
+{
+#ifdef CONFIG_HIGHMEM
+   return movable_zone == ZONE_HIGHMEM;
+#else
+   return 0;
+#endif
+}
+
 static inline int is_highmem_idx(enum zone_type idx)
 {
 #ifdef CONFIG_HIGHMEM
-   return (idx == ZONE_HIGHMEM);
+   return (idx == ZONE_HIGHMEM ||
+   (idx == ZONE_MOVABLE && zone_movable_is_highmem()));
 #else
return 0;
 #endif
@@ -522,7 +537,9 @@ static inline int is_normal_idx(enum zon
 static inline int is_highmem(struct zone *zone)
 {
 #ifdef CONFIG_HIGHMEM
-   return zone == zone->zone_pgdat->node_zones + ZONE_HIGHMEM;
+   int zone_idx = zone - zone->zone_pgdat->node_zones;
+   return zone_idx == ZONE_HIGHMEM ||
+   (zone_idx == ZONE_MOVABLE && zone_movable_is_highmem());
 #else
return 0;
 #endif
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.20-rc4-mm1-001_mark_highmovable/mm/highmem.c 
linux-2.6.20-rc4-mm1-002_create_movable_zone/mm/highmem.c
--- linux-2.6.20-rc4-mm1-001_mark_highmovable/mm/highmem.c  2007-01-07 
05:45:51.0 +
+++ linux-2.6.20-rc4-mm1-002_create_movable_zone/mm/highmem.c   2007-01-25 
17:32:18.0 +
@@ -46,8 +46,11 @@ unsigned int nr_free_highpages (void)
pg_data_t *pgdat;
unsigned int pages = 0;
 
-   for_each_online_pgdat(pgdat)
+   for_each_online_pgdat(pgdat) {
pages += pgdat->node_zones[ZONE_HIGHMEM].free_pages;
+   if (zone_movable_is_highmem())
+   pages += pgdat->node_zones[ZONE_MOVABLE].free_pages;
+   }
 
return pages;
 }
diff -rup -X 

[PATCH 0/8] Create ZONE_MOVABLE to partition memory between movable and non-movable pages

2007-01-25 Thread Mel Gorman
The following 8 patches against 2.6.20-rc4-mm1 create a zone called
ZONE_MOVABLE that is only usable by allocations that specify both __GFP_HIGHMEM
and __GFP_MOVABLE. This has the effect of keeping all non-movable pages
within a single memory partition while allowing movable allocations to be
satisified from either partition.

The size of the zone is determined by a kernelcore= parameter specified at
boot-time. This specifies how much memory is usable by non-movable allocations
and the remainder is used for ZONE_MOVABLE. Any range of pages within
ZONE_MOVABLE can be released by migrating the pages or by reclaiming.

When selecting a zone to take pages from for ZONE_MOVABLE, there are two
things to consider. First, only memory from the highest populated zone is
used for ZONE_MOVABLE. On the x86, this is probably going to be ZONE_HIGHMEM
but it would be ZONE_DMA on ppc64 or possibly ZONE_DMA32 on x86_64. Second,
the amount of memory usable by the kernel will be spreadly evenly throughout
NUMA nodes where possible. If the nodes are not of equal size, the amount
of memory usable by the kernel on some nodes may be greater than others.

By default, the zone is not as useful for hugetlb allocations because they
are pinned and non-migratable (currently at least). A sysctl is provided that
allows huge pages to be allocated from that zone. This means that the huge
page pool can be resized to the size of ZONE_MOVABLE during the lifetime of
the system assuming that pages are not mlocked. Despite huge pages being
non-movable, we do not introduce additional external fragmentation of note
as huge pages are always the largest contiguous block we care about.

A lot of credit goes to Andy Whitcroft for catching a large variety of
problems during review of the patches.
-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Fix apparent typo CONFIG_USB_CDCETHER.

2007-01-25 Thread Robert P. J. Day

  Replace the apparent typo CONFIG_USB_CDCETHER with
CONFIG_USB_NET_CDCETHER.

Signed-off-by: Robert P. J. Day <[EMAIL PROTECTED]>

---

  i *think* this is correct, but i'll leave the final decision to
those higher up the food chain.


diff --git a/drivers/usb/core/otg_whitelist.h b/drivers/usb/core/otg_whitelist.h
index 627a5a2..7f31a49 100644
--- a/drivers/usb/core/otg_whitelist.h
+++ b/drivers/usb/core/otg_whitelist.h
@@ -31,7 +31,7 @@ static struct usb_device_id whitelist_table [] = {
 { USB_DEVICE_INFO(7, 1, 3) },
 #endif

-#ifdef CONFIG_USB_CDCETHER
+#ifdef CONFIG_USB_NET_CDCETHER
 /* Linux-USB CDC Ethernet gadget */
 { USB_DEVICE(0x0525, 0xa4a1), },
 /* Linux-USB CDC Ethernet + RNDIS gadget */

-- 

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://www.fsdev.dreamhosters.com/wiki/index.php?title=Main_Page

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

2007-01-25 Thread David Woodhouse
On Thu, 2007-01-25 at 09:56 -0800, Linus Torvalds wrote:
> Broken architectures that put PCI things at some "PCI physical address 
> zero" need to map their PCI addresses to something else. It's part of why 
> we have the whole infrastructure for doing things like
> 
>   pcibios_bus_to_resource()
>   pcibios_resource_to_bus()
> 
> etc - not just because a resource having zero in "start" means that it is 
> disabled, but because normally such broken setups have *other* problems 
> too (ie they don't have a 1:1 mapping between PCI addresses and physical 
> addresses _anyway_, so they need to address translations).

You're thinking of MMIO, while the case we were discussing was PIO. My
laptop is perfectly happy to assign PIO resources from zero.

The claim that "zero is disabled" is a relatively new (and IMO broken)
thing, and doesn't seem to be universal. This bizarre new special case
certainly isn't shared by the PCMCIA code, for example, which is quite
happy to put devices are PIO address 0...

shinybook /root # cat 
/sys/class/pcmcia_socket/pcmcia_socket0/available_resources_io 
0x - 0x007f
shinybook /root # dmesg | grep 'max PIO'
ata2: PATA max PIO0 cmd 0x0 ctl 0xE bmdma 0x0 irq 53
ata2.00: CFA, max PIO4, 500400 sectors: LBA 

I believe PCMCIA just uses the generic resource code, which also seems
to lack any knowledge of this hackish special case for zero.

> This has come up before. For example: for an IRQ, 0 means "does not 
> exist", it does _not_ mean "physical irq 0", and we test for whether a 
> device has a valid irq by doing "if (dev->irq)" rather than having some 
> insane archiecture-specific "IRQ_NONE". And if you validly really have an 
> irq at the hardware level that is zero, then that just means that the irq 
> numbers you should tell the kernel should be translated some way.
>
> (On a PC, hardware irq 0 is a real irq too, but it's a _special_ irq, and 
> it is set up by architecture-specific code. So as far as the generic 
> kernel and all devices are concerned, "!dev->irq" means that the irq 
> doesn't exist or hasn't been mapped for that device yet).

So again you end up in a situation where zero is a strange special case.
It works sometimes (setup_irq()) but not other times, and has meaning in
some places but not others. Once we go tickless and I start playing with
the PC speaker audio driver again, that hackish special case is going to
be fun to play with, I'm sure.

And your IRQ numbers no longer match the labels on the IRQ lines in the
hardware documentation; you have to have some 'mapping', which should
ideally be used for user-visible displays of IRQ numbers too.

That doesn't really sound much like the 'CLEANEST SOURCE CODE' to me. 

Programmers do seem to cope with comparing file descriptors with -1.
Counting from zero is quite normal. But would you suggest that our "zero
is invalid" policy should extend to those too, just in case our kernel
hackers can't manage? Obviously userspace programs will still expect
stdin to be fd #0 but glibc can fix that up for us -- it can handle the
"should be translated in some way" requirement of which you speak, so
that we can have the special case which seems to be permeating the
kernel.

> So there are three issues:
> 
>  - we always need a way of saying "not mapped"/"nonexistent", and using 
>the value zero is the one that GIVES THE CLEANEST SOURCE CODE! It's why 
>NULL pointers are zero too. Sure, virtual address zero is a real 
>virtual address, but that doesn't change the fact that C made 0 be 
>special on a language level. If you want to access that virtual 
>address, and use NULL as a "doesn't exist" at the same time, you need 
>to swizzle your pointer somehow.

Really, it doesn't make the code cleaner when you have to add a whole
layer of translation just because of a single misguided special case.
The "special case" of NULL isn't really a special case because we just
refrain from _mapping_ that page. We don't actually treat it any
differently.
 
>  - the x86[-64] architecture is the one that gets tested the most. So 
>everybody else should try to look like it, rather than say "we are 
>different/better/strange". Don't have any special IO instructions? 
>Tough. You'd better do "inb/outb" anyway, and map them onto your 
>memory-mapped IO somehow.

I look forward to pretending the world is little-endian and
dma-coherent, and doesn't need memory barriers :)

>  - per-architecture magic values is a bad idea. If you need a magic value 
>(and things like this _do_ need one, unless we always want to carry a 
>separate flag around saying "valid" or "not valid"), it's a lot better 
>to just say "everybody uses this value" and then have the _small_ 
>architecture-specific code work around it, than have everybody have to 
>work around a lot of architectures doing things differently.

It doesn't need to be per-architecture; it can just be -1. The only
reason it was 

Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-25 Thread Justin Piszcz


On Thu, 25 Jan 2007, Mark Hahn wrote:

> > Something is seriously wrong with that OOM killer.
> 
> do you know you don't have to operate in OOM-slaughter mode?
> 
> "vm.overcommit_memory = 2" in your /etc/sysctl.conf puts you into a mode where
> the kernel tracks your "committed" memory needs, and will eventually cause
> some allocations to fail.
> this is often much nicer than the default random OOM slaughter.
> (you probably also need to adjust vm.overcommit_ratio with some knowlege of
> your MemTotal and SwapTotal.)
> 
> regards, mark hahn.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

# sysctl -a | grep vm.over
vm.overcommit_ratio = 50
vm.overcommit_memory = 0

I'll have to experiment with these options, thanks for the info!

Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/8] x86_64 - Specify amount of kernel memory at boot time

2007-01-25 Thread Mel Gorman

This patch adds the kernelcore= parameter for x86_64.

Signed-off-by: Mel Gorman <[EMAIL PROTECTED]>
---

 e820.c |1 +
 1 files changed, 1 insertion(+)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.20-rc4-mm1-005_ppc64_set_kernelcore/arch/x86_64/kernel/e820.c 
linux-2.6.20-rc4-mm1-006_x8664_set_kernelcore/arch/x86_64/kernel/e820.c
--- linux-2.6.20-rc4-mm1-005_ppc64_set_kernelcore/arch/x86_64/kernel/e820.c 
2007-01-17 17:08:01.0 +
+++ linux-2.6.20-rc4-mm1-006_x8664_set_kernelcore/arch/x86_64/kernel/e820.c 
2007-01-25 17:40:16.0 +
@@ -617,6 +617,7 @@ static int __init parse_memopt(char *p)
return 0;
 } 
 early_param("mem", parse_memopt);
+early_param("kernelcore", cmdline_parse_kernelcore);
 
 static int userdef __initdata;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH 7/8] ia64 - Specify amount of kernel memory at boot time

2007-01-25 Thread Mel Gorman

This patch adds the kernelcore= parameter for ia64.

Signed-off-by: Mel Gorman <[EMAIL PROTECTED]>
---

 efi.c |3 +++
 1 files changed, 3 insertions(+)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.20-rc4-mm1-006_x8664_set_kernelcore/arch/ia64/kernel/efi.c 
linux-2.6.20-rc4-mm1-007_ia64_set_kernelcore/arch/ia64/kernel/efi.c
--- linux-2.6.20-rc4-mm1-006_x8664_set_kernelcore/arch/ia64/kernel/efi.c
2007-01-07 05:45:51.0 +
+++ linux-2.6.20-rc4-mm1-007_ia64_set_kernelcore/arch/ia64/kernel/efi.c 
2007-01-25 17:42:15.0 +
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -422,6 +423,8 @@ efi_init (void)
mem_limit = memparse(cp + 4, );
} else if (memcmp(cp, "max_addr=", 9) == 0) {
max_addr = GRANULEROUNDDOWN(memparse(cp + 9, ));
+   } else if (memcmp(cp, "kernelcore=",11) == 0) {
+   cmdline_parse_kernelcore(cp+11);
} else if (memcmp(cp, "min_addr=", 9) == 0) {
min_addr = GRANULEROUNDDOWN(memparse(cp + 9, ));
} else {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 8/8] Add documentation for additional boot parameter and sysctl

2007-01-25 Thread Mel Gorman

Once all patches are applied, a new command-line parameter exist and a new
sysctl. This patch adds the necessary documentation.


Signed-off-by: Mel Gorman <[EMAIL PROTECTED]>
---

 filesystems/proc.txt  |   15 +++
 kernel-parameters.txt |   16 
 sysctl/vm.txt |3 ++-
 3 files changed, 33 insertions(+), 1 deletion(-)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.20-rc4-mm1-007_ia64_set_kernelcore/Documentation/filesystems/proc.txt 
linux-2.6.20-rc4-mm1-008_documentation/Documentation/filesystems/proc.txt
--- 
linux-2.6.20-rc4-mm1-007_ia64_set_kernelcore/Documentation/filesystems/proc.txt 
2007-01-07 05:45:51.0 +
+++ linux-2.6.20-rc4-mm1-008_documentation/Documentation/filesystems/proc.txt   
2007-01-25 18:27:28.0 +
@@ -1288,6 +1288,21 @@ nr_hugepages configures number of hugetl
 hugetlb_shm_group contains group id that is allowed to create SysV shared
 memory segment using hugetlb page.
 
+hugepages_treat_as_movable
+--
+
+This paramter is only useful when kernelcore= is specified at boot time to
+create ZONE_MOVABLE for pages that may be reclaimed or migrated. Huge pages
+are not movable so are not normally allocated from ZONE_MOVABLE. A non-zero
+value written to hugepages_treat_as_movable allows huge pages to be allocated
+from ZONE_MOVABLE.
+
+Once enabled, the ZONE_MOVABLE is treated as an area of memory the huge
+pages pool can easily grow or shrink within. Assuming that applications are
+not running that mlock() a lot of memory, it is likely the huge pages pool
+can grow to the size of ZONE_MOVABLE by repeatly entering the desired value
+into nr_hugepages and triggering page reclaim.
+
 laptop_mode
 ---
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.20-rc4-mm1-007_ia64_set_kernelcore/Documentation/kernel-parameters.txt
 linux-2.6.20-rc4-mm1-008_documentation/Documentation/kernel-parameters.txt
--- 
linux-2.6.20-rc4-mm1-007_ia64_set_kernelcore/Documentation/kernel-parameters.txt
2007-01-17 17:07:54.0 +
+++ linux-2.6.20-rc4-mm1-008_documentation/Documentation/kernel-parameters.txt  
2007-01-25 18:27:28.0 +
@@ -762,6 +762,22 @@ and is between 256 and 4096 characters. 
js= [HW,JOY] Analog joystick
See Documentation/input/joystick.txt.
 
+   kernelcore=nn[KMG]  [KNL,IA-32,IA-64,PPC,X86-64] This parameter
+   specifies the amount of memory usable by the kernel
+   for non-movable allocations.  The requested amount is
+   spread evenly throughout all nodes in the system. The
+   remaining memory in each node is used for Movable
+   pages. In the event, a node is too small to have both
+   kernelcore and Movable pages, kernelcore pages will
+   take priority and other nodes will have a larger number
+   of kernelcore pages.  The Movable zone is used for the
+   allocation of pages that may be reclaimed or moved
+   by the page migration sybsystem.  This means that
+   HugeTLB pages may not be allocated from this zone.
+   Note that allocations like PTEs-from-HighMem still
+   use the HighMem zone if it exists, and the Normal
+   zone if it does not.
+
keepinitrd  [HW,ARM]
 
kstack=N[IA-32,X86-64] Print N words from the kernel stack
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.20-rc4-mm1-007_ia64_set_kernelcore/Documentation/sysctl/vm.txt 
linux-2.6.20-rc4-mm1-008_documentation/Documentation/sysctl/vm.txt
--- linux-2.6.20-rc4-mm1-007_ia64_set_kernelcore/Documentation/sysctl/vm.txt
2007-01-17 17:07:54.0 +
+++ linux-2.6.20-rc4-mm1-008_documentation/Documentation/sysctl/vm.txt  
2007-01-25 18:27:28.0 +
@@ -39,7 +39,8 @@ Currently, these files are in /proc/sys/
 
 dirty_ratio, dirty_background_ratio, dirty_expire_centisecs,
 dirty_writeback_centisecs, vfs_cache_pressure, laptop_mode,
-block_dump, swap_token_timeout, drop-caches:
+block_dump, swap_token_timeout, drop-caches,
+hugepages_treat_as_movable:
 
 See Documentation/filesystems/proc.txt
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/8] x86 - Specify amount of kernel memory at boot time

2007-01-25 Thread Mel Gorman

This patch adds the kernelcore= parameter for x86.

Signed-off-by: Mel Gorman <[EMAIL PROTECTED]>
---

 setup.c |1 +
 1 files changed, 1 insertion(+)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.20-rc4-mm1-003_mark_hugepages_movable/arch/i386/kernel/setup.c 
linux-2.6.20-rc4-mm1-004_x86_set_kernelcore/arch/i386/kernel/setup.c
--- linux-2.6.20-rc4-mm1-003_mark_hugepages_movable/arch/i386/kernel/setup.c
2007-01-17 17:07:57.0 +
+++ linux-2.6.20-rc4-mm1-004_x86_set_kernelcore/arch/i386/kernel/setup.c
2007-01-25 17:36:17.0 +
@@ -196,6 +196,7 @@ static int __init parse_mem(char *arg)
return 0;
 }
 early_param("mem", parse_mem);
+early_param("kernelcore", cmdline_parse_kernelcore);
 
 #ifdef CONFIG_PROC_VMCORE
 /* elfcorehdr= specifies the location of elf core header
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/8] ppc and powerpc - Specify amount of kernel memory at boot time

2007-01-25 Thread Mel Gorman

This patch adds the kernelcore= parameter for ppc and powerpc.

Signed-off-by: Mel Gorman <[EMAIL PROTECTED]>
---

 powerpc/kernel/prom.c |1 +
 ppc/mm/init.c |2 ++
 2 files changed, 3 insertions(+)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.20-rc4-mm1-004_x86_set_kernelcore/arch/powerpc/kernel/prom.c 
linux-2.6.20-rc4-mm1-005_ppc64_set_kernelcore/arch/powerpc/kernel/prom.c
--- linux-2.6.20-rc4-mm1-004_x86_set_kernelcore/arch/powerpc/kernel/prom.c  
2007-01-07 05:45:51.0 +
+++ linux-2.6.20-rc4-mm1-005_ppc64_set_kernelcore/arch/powerpc/kernel/prom.c
2007-01-25 17:38:17.0 +
@@ -431,6 +431,7 @@ static int __init early_parse_mem(char *
return 0;
 }
 early_param("mem", early_parse_mem);
+early_param("kernelcore", cmdline_parse_kernelcore);
 
 /*
  * The device tree may be allocated below our memory limit, or inside the
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.20-rc4-mm1-004_x86_set_kernelcore/arch/ppc/mm/init.c 
linux-2.6.20-rc4-mm1-005_ppc64_set_kernelcore/arch/ppc/mm/init.c
--- linux-2.6.20-rc4-mm1-004_x86_set_kernelcore/arch/ppc/mm/init.c  
2007-01-07 05:45:51.0 +
+++ linux-2.6.20-rc4-mm1-005_ppc64_set_kernelcore/arch/ppc/mm/init.c
2007-01-25 17:38:17.0 +
@@ -214,6 +214,8 @@ void MMU_setup(void)
}
 }
 
+early_param("kernelcore", cmdline_parse_kernelcore);
+
 /*
  * MMU_init sets up the basic memory mappings for the kernel,
  * including both RAM and possibly some I/O regions,
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [discuss] portmapping sucks

2007-01-25 Thread Henrique de Moraes Holschuh
On Thu, 25 Jan 2007, Jan Engelhardt wrote:
> As we all know, mountd and other SUNRPC (I question this invention too) 
> services are at a fixed RPC port number (/etc/rpc) which are mapped 
> to a random TCP/UDP port, and the application doing the mappings is
> portmap. This random TCP/UDP port selection is what makes it suck.

1. This is OT here.
2. See "portreserve" in Debian for a possible solution (that nobody in
Debian paid any attention to, so it never reserves anything :p).  Other
distros (RedHat/Fedora?) might have it too.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Powermac Sound Issue with Wallstreet PowerBook

2007-01-25 Thread Jim Gifford
For over the last few weeks, I've been trying to get sound working on my 
Powerbook G3. I've tried various different things with no luck. Anyone 
got any ideas??


dmasound_pmac: Awacs/Screamer Codec Mfct: 2 Rev 3
input: dmasound beeper as /class/input/input4
PowerMac Screamer  DMA sound driver rev 016 installed
Core driver edition 01.06 : PowerMac Built-in Sound driver edition 00.07
Write will use4 fragments of   32768 bytes as default
Read  will use4 fragments of   32768 bytes as default
Advanced Linux Sound Architecture Driver Version 1.0.13 (Tue Nov 28 
14:07:24 2006 UTC).

PM: Adding info for platform:snd_powermac
snd: can't request rsrc  0 (Sound Control: 
0xf3014000:f3014fff)

ALSA device list:
 No soundcards found.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Fix apparent typo "CONFIG_SERIAL_CPM_SMC".

2007-01-25 Thread Robert P. J. Day

  Replace an apparent typo of CONFIG_SERIAL_CPM_SMC with
CONFIG_SERIAL_CPM_SMC2.

Signed-off-by: Robert P. J. Day <[EMAIL PROTECTED]>

---

diff --git a/arch/ppc/platforms/mpc866ads_setup.c 
b/arch/ppc/platforms/mpc866ads_setup.c
index 8a0c07e..5b05d4b 100644
--- a/arch/ppc/platforms/mpc866ads_setup.c
+++ b/arch/ppc/platforms/mpc866ads_setup.c
@@ -369,7 +369,7 @@ int __init mpc866ads_init(void)
ppc_sys_device_setfunc(MPC8xx_CPM_SMC1, PPC_SYS_FUNC_UART);
 #endif

-#ifdef CONFIG_SERIAL_CPM_SMC
+#ifdef CONFIG_SERIAL_CPM_SMC2
ppc_sys_device_enable(MPC8xx_CPM_SMC2);
ppc_sys_device_setfunc(MPC8xx_CPM_SMC2, PPC_SYS_FUNC_UART);
 #endif

-- 

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://www.fsdev.dreamhosters.com/wiki/index.php?title=Main_Page

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-25 Thread Mark Hahn

Something is seriously wrong with that OOM killer.


do you know you don't have to operate in OOM-slaughter mode?

"vm.overcommit_memory = 2" in your /etc/sysctl.conf puts you 
into a mode where the kernel tracks your "committed" memory 
needs, and will eventually cause some allocations to fail.

this is often much nicer than the default random OOM slaughter.
(you probably also need to adjust vm.overcommit_ratio with 
some knowlege of your MemTotal and SwapTotal.)


regards, mark hahn.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix apparent typo of "CONFIG_MT_SMP".

2007-01-25 Thread Ralf Baechle
On Thu, Jan 25, 2007 at 06:41:35PM -0500, Robert P. J. Day wrote:

>   Replace apparent typo of CONFIG_MT_SMP with CONFIG_MIPS_MT_SMP.

Good catch, thanks.

Applied,

  Ralf
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[git patches] libata fixes

2007-01-25 Thread Jeff Garzik

Please pull from 'upstream-linus' branch of
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev.git 
upstream-linus

to receive the following updates:

 drivers/ata/ahci.c   |   57 --
 drivers/ata/ata_generic.c|6 +++-
 drivers/ata/libata-core.c|   14 +-
 drivers/ata/libata-sff.c |   15 +++---
 drivers/ata/pata_cmd64x.c|   23 +++-
 drivers/ata/pata_hpt3x2n.c   |6 ++--
 drivers/ata/pata_it821x.c|4 ++-
 drivers/ata/pata_ixp4xx_cf.c |5 ++-
 drivers/ata/pata_legacy.c|4 ++-
 drivers/ata/pata_rz1000.c|6 +++-
 drivers/ata/sata_uli.c   |3 +-
 drivers/ata/sata_via.c   |   12 -
 include/linux/libata.h   |5 ++-
 13 files changed, 113 insertions(+), 47 deletions(-)

Alan (4):
  libata cmd64x: whack into a shape that looks like the documentation
  libata hpt3xn: Hopefully sort out the DPLL logic versus the vendor code
  libata: set_mode, Fix the FIXME
  libata-sff: Don't call bmdma_stop on non DMA capable controllers

Tejun Heo (3):
  sata_via: don't diddle with ATA_NIEN in ->freeze
  ahci: improve and limit spurious interrupt messages, take#3
  libata: implement ATA_FLAG_IGN_SIMPLEX and use it in sata_uli

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index e3c7b31..2fe5a58 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -75,6 +75,7 @@ enum {
AHCI_CMD_CLR_BUSY   = (1 << 10),
 
RX_FIS_D2H_REG  = 0x40, /* offset of D2H Register FIS data */
+   RX_FIS_SDB  = 0x58, /* offset of SDB FIS data */
RX_FIS_UNK  = 0x60, /* offset of Unknown FIS data */
 
board_ahci  = 0,
@@ -202,6 +203,10 @@ struct ahci_port_priv {
dma_addr_t  cmd_tbl_dma;
void*rx_fis;
dma_addr_t  rx_fis_dma;
+   /* for NCQ spurious interrupt analysis */
+   int ncq_saw_spurious_sdb_cnt;
+   unsigned intncq_saw_d2h:1;
+   unsigned intncq_saw_dmas:1;
 };
 
 static u32 ahci_scr_read (struct ata_port *ap, unsigned int sc_reg);
@@ -1109,8 +1114,9 @@ static void ahci_host_intr(struct ata_port *ap)
void __iomem *mmio = ap->host->mmio_base;
void __iomem *port_mmio = ahci_port_base(mmio, ap->port_no);
struct ata_eh_info *ehi = >eh_info;
+   struct ahci_port_priv *pp = ap->private_data;
u32 status, qc_active;
-   int rc;
+   int rc, known_irq = 0;
 
status = readl(port_mmio + PORT_IRQ_STAT);
writel(status, port_mmio + PORT_IRQ_STAT);
@@ -1137,17 +1143,52 @@ static void ahci_host_intr(struct ata_port *ap)
 
/* hmmm... a spurious interupt */
 
-   /* some devices send D2H reg with I bit set during NCQ command phase */
-   if (ap->sactive && (status & PORT_IRQ_D2H_REG_FIS))
+   /* if !NCQ, ignore.  No modern ATA device has broken HSM
+* implementation for non-NCQ commands.
+*/
+   if (!ap->sactive)
return;
 
-   /* ignore interim PIO setup fis interrupts */
-   if (ata_tag_valid(ap->active_tag) && (status & PORT_IRQ_PIOS_FIS))
-   return;
+   if (status & PORT_IRQ_D2H_REG_FIS) {
+   if (!pp->ncq_saw_d2h)
+   ata_port_printk(ap, KERN_INFO,
+   "D2H reg with I during NCQ, "
+   "this message won't be printed again\n");
+   pp->ncq_saw_d2h = 1;
+   known_irq = 1;
+   }
+
+   if (status & PORT_IRQ_DMAS_FIS) {
+   if (!pp->ncq_saw_dmas)
+   ata_port_printk(ap, KERN_INFO,
+   "DMAS FIS during NCQ, "
+   "this message won't be printed again\n");
+   pp->ncq_saw_dmas = 1;
+   known_irq = 1;
+   }
+
+   if (status & PORT_IRQ_SDB_FIS &&
+  pp->ncq_saw_spurious_sdb_cnt < 10) {
+   /* SDB FIS containing spurious completions might be
+* dangerous, we need to know more about them.  Print
+* more of it.
+*/
+   const u32 *f = pp->rx_fis + RX_FIS_SDB;
+
+   ata_port_printk(ap, KERN_INFO, "Spurious SDB FIS during NCQ "
+   "issue=0x%x SAct=0x%x FIS=%08x:%08x%s\n",
+   readl(port_mmio + PORT_CMD_ISSUE),
+   readl(port_mmio + PORT_SCR_ACT), f[0], f[1],
+   pp->ncq_saw_spurious_sdb_cnt < 10 ?
+   "" : ", shutting up");
+
+   pp->ncq_saw_spurious_sdb_cnt++;
+   known_irq = 1;
+   }
 
-   if (ata_ratelimit())
+   if (!known_irq)
ata_port_printk(ap, KERN_INFO, "spurious interrupt "
-   

Re: x86 instability with 2.6.1{8,9}

2007-01-25 Thread Ken Moffat
On Mon, Jan 15, 2007 at 04:29:11PM +, Ken Moffat wrote:
> 
>  Today, I've built 2.6.19.2 without highmem (the box only has 1GB,
> dunno why I'd included that in the original config) and I will
> continue to wait patiently for either a week without problems, or
> something that I can manage to note - although I think at the moment
> that the second coming of the great prophet Zarquon is more likely.
> 
 Bizarre - it panic'd again last Thursday while I was in X, but I
still didn't manage to log any output.  At the weekend, I had the
bright idea of using chattr +j on the syslog to try to journal any
data, since then it has been fine.  So, it isn't down to highmem, and
I still can't trigger it reliably, or get any trace.  Tried running
as x86_64 this morning (because cold starts on Thursdays seem
particularly problematic, perhaps it's a time/power-supply-noise
problem), then x86 from a cold start this afternoon.

 Time to hope it won't bite me too often, and move on to testing
2.6.20-rc6.

Ken
-- 
das eine Mal als Tragödie, das andere Mal als Farce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] i386: add option to show more code in oops reports

2007-01-25 Thread Chuck Ebbert
Andrew Morton wrote:
> On Wed, 24 Jan 2007 15:22:49 -0500
> Chuck Ebbert <[EMAIL PROTECTED]> wrote:
>
>   
>> Sometimes we may need to see more code than the default in an oops,
>> so add an option for that.
>> 
>
> spose so, but some more justification would be nice.  As would an x86_64
> version?
>   

Can't think of a way to word the justification, but I've wanted to see more
code a few times.

As for x86_64, Andi doesn't want to print any code before the failure
and just
printing more afterwards didn't seem to make much sense.
>   
>> Signed-off-by: Chuck Ebbert <[EMAIL PROTECTED]>
>> 
>
> ooh, congrats.
>   
Thanks. Looks like I'll have plenty to do here...
>   
>> --- 2.6.20-rc5-32.orig/arch/i386/kernel/traps.c
>> +++ 2.6.20-rc5-32/arch/i386/kernel/traps.c
>> @@ -94,6 +94,7 @@ asmlinkage void spurious_interrupt_bug(v
>>  asmlinkage void machine_check(void);
>>  
>>  int kstack_depth_to_print = 24;
>> +int code_bytes = 64;
>> 
>
> static scope, please.  And I think it should be unsigned.
>
>   
>>  ATOMIC_NOTIFIER_HEAD(i386die_chain);
>>  
>>  int register_die_notifier(struct notifier_block *nb)
>> @@ -324,7 +325,7 @@ void show_registers(struct pt_regs *regs
>>   */
>>  if (in_kernel) {
>>  u8 *eip;
>> -int code_bytes = 64;
>> +int code_prologue = code_bytes * 43 / 64;
>>  unsigned char c;
>>  
>>  printk("\n" KERN_EMERG "Stack: ");
>> @@ -332,7 +333,7 @@ void show_registers(struct pt_regs *regs
>>  
>>  printk(KERN_EMERG "Code: ");
>>  
>> -eip = (u8 *)regs->eip - 43;
>> +eip = (u8 *)regs->eip - code_prologue;
>>  if (eip < (u8 *)PAGE_OFFSET ||
>>  probe_kernel_address(eip, c)) {
>>  /* try starting at EIP */
>> 
>
> You missed this bit:
>
>   if (eip < (u8 *)PAGE_OFFSET ||
>   probe_kernel_address(eip, c)) {
>   /* try starting at EIP */
>   eip = (u8 *)regs->eip;
>   code_bytes = 32;
>   }
>
> Do we really want to be modifying the global variable here?
>   
Oops.
>   
>> @@ -1191,3 +1192,15 @@ static int __init kstack_setup(char *s)
>>  return 1;
>>  }
>>  __setup("kstack=", kstack_setup);
>> +
>> +static int __init code_bytes_setup(char *s)
>> +{
>> +code_bytes = simple_strtoul(s, NULL, 0);
>> +if (code_bytes < 64)
>> +code_bytes = 64;
>> +if (code_bytes > 1024)
>> +code_bytes = 1024;
>> +
>> +return 1;
>> +}
>> +__setup("code_bytes=", code_bytes_setup);
>> 
>
> I'm OK with the upper limit, but I'd sugegst that we remove the lower
> limit: someone might _want_ to be able to set code_bytes=0, who knows?
>
> And if code_bytes is unsigned, the single comparison with 1024 will suffice.
>
> OTOH, why have any checks at all in there?  If the user sets
> code_bytes=0xfff0 and things break, he gets to own both pieces...
>
>   
It's multiplying the number by 43 and dividing by 64, so we need to
avoid overflow.
(I couldn't think of an easy way to preserve current behavior.)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Fix apparent typo of "CONFIG_MT_SMP".

2007-01-25 Thread Robert P. J. Day

  Replace apparent typo of CONFIG_MT_SMP with CONFIG_MIPS_MT_SMP.

Signed-off-by: Robert P. J. Day <[EMAIL PROTECTED]>

---

diff --git a/arch/mips/mips-boards/sim/sim_setup.c 
b/arch/mips/mips-boards/sim/sim_setup.c
index 2659c1c..ea2066c 100644
--- a/arch/mips/mips-boards/sim/sim_setup.c
+++ b/arch/mips/mips-boards/sim/sim_setup.c
@@ -57,7 +57,7 @@ void __init plat_mem_setup(void)
board_time_init = sim_time_init;
prom_printf("Linux started...\n");

-#ifdef CONFIG_MT_SMP
+#ifdef CONFIG_MIPS_MT_SMP
sanitize_tlb_entries();
 #endif
 }

-- 

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://www.fsdev.dreamhosters.com/wiki/index.php?title=Main_Page

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] KVM: 'asm' operand has impossible constraints

2007-01-25 Thread S.Çağlar Onur
Hi;

-rc6 fails with latest gcc 4.2 snapshot as following;

CC [M]  drivers/kvm/svm.o
drivers/kvm/svm.c:206: warning: 'inject_db' defined but not used
drivers/kvm/svm.c: In function 'svm_vcpu_run':
drivers/kvm/kvm.h:560: error: 'asm' operand has impossible constraints
make[2]: *** [drivers/kvm/svm.o] Error 1
make[1]: *** [drivers/kvm] Error 2
make: *** [drivers] Error 2

And according to this thread http://lkml.org/lkml/2006/11/11/129, solution is 
converting g to rm, patch follows

Signed-off-by: S.Çağlar Onur <[EMAIL PROTECTED]>

Index: linux-2.6/drivers/kvm/kvm.h
===
--- linux-2.6.orig/drivers/kvm/kvm.h2007-01-26 01:38:35.0 +0200
+++ linux-2.6/drivers/kvm/kvm.h 2007-01-26 01:37:48.0 +0200
@@ -557,7 +557,7 @@
 #ifndef load_ldt
 static inline void load_ldt(u16 sel)
 {
-   asm ("lldt %0" : : "g"(sel));
+   asm ("lldt %0" : : "rm"(sel));
 }
 #endif


-- 
S.Çağlar Onur <[EMAIL PROTECTED]>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!


pgp6Vpq3XRuCJ.pgp
Description: PGP signature


Re: Juju

2007-01-25 Thread Pete Zaitcev
On Thu, 25 Jan 2007 16:18:35 -0500, Kristian Høgsberg <[EMAIL PROTECTED]> wrote:

> > I see that ORBs are always allocated with a call (like SKB) and not
> > embedded into drivers (like URBs). It's great, keep it up. Also,
> > never allow drivers to pass DMA-mapped buffers into fw_send_request
> > and friends. We made both of these mistakes in USB, and it hurts.
> 
> Oh, the ORBs are SBP-2 specific data structures, struct fw_transaction is 
> probably what corresponds to USB URBs.  This struct is defined in 
> fw-transaction.h and is available for embedding into other structs, such as 
> struct sbp2_orb in fw-sbp2.  Is that what you're suggesting against, and what 
> are the problems with this approach?

Fortunately we do not care about out-of-tree drivers, which are most
affected, you may even call it a feature ^_^. My main problem is,
we can't refcount URBs, so usbmon can't tap them and must copy.

I understand that fw_transaction points to an external buffer while
SKB does not, so any jujumon code won't be able to refcount data.
So maybe it's not important.

> As for passing DMA-mapped buffers, do you mean that fw_send_request (or code 
> below that function) should map the payload as it currently does as opposed 
> to 
> callers pre-mapping DMA buffers?

Yes, this is what I mean. It serves well for our networking stack,
which is pretty fast. For some reason someone decided that USB users
should be let mapping their buffers. This led to big trouble with
usbmon, extra fields in URB (although already fat...), tying up precious
IOMMU resources (not so precious on x86, fortunately).

> > orb->rcode = rcode;
> > -   if (rcode != RCODE_COMPLETE) {
> > +   if (rcode != RCODE_COMPLETE) {  /* Huh? */
> > spin_lock_irqsave(>lock, flags);
> > list_del(>link);
> > spin_unlock_irqrestore(>lock, flags);
> > 
> > This looks like an inverted test. Who does remove ORB from the list
> > if it's completed normally?

>[...]
> The rest of the SBP-2 transaction continues with the SBP-2 device issuing a 
> FireWire read transaction to read out the ORB from host memory and then carry 
> out the instructions in the ORB.  Once the device has completed the ORB it 
> will do a status write to the status address specified in the ORB, at which 
> point the SBP-2 transaction is complete.

You know, I wanted to use this picture for a long time:
 http://www.flickr.com/photos/zaitcev/369269557/
I hope it's all right.

I guess my problem is that I don't see just why the ORB transaction callback
is being delivered before the whole I/O was done.

Now that you drew my attention to sbp2_status_write(), this looks wrong:

/* Lookup the orb corresponding to this status write. */
spin_lock_irqsave(>lock, flags);
list_for_each_entry(orb, >orb_list, link) {
if (status_get_orb_high(status) == 0 &&
status_get_orb_low(status) == orb->request_bus) {
list_del(>link);
break;
}
}
spin_unlock_irqrestore(>lock, flags);

Why is it that fw_request can't carry a pointer?

> > orb->request.response.high= 0;
> > orb->request.response.low = orb->response_bus;
> > +   /*
> > +* XXX Kristian, what exactly makes you think that DMA address
> > +* is 32 bits wide above? This is only guaranteed if device->
> > +* card->device has mask 0x. Where is that set?
> > +*/
> 
> I guess that should be in pci_probe in fw-ohci.c, but it isn't. If the 
> allocation is in physical memory above the 4G limit, will dma_map_single set 
> up a trampoline buffer below 4G if the device DMA mask is 0x?

If an IOMMU (or SWIOTLB) is not available, dma_map_single will fail.
If IOMMU is available, it will allocate a mapping so that DMA address
is constrained, failing if tables are full. At least this was my
understanding. However, I implemented that many years ago.

>  I have 
> a similar problem when mapping the buffer passed down from the SCSI stack, in 
> that I don't know for sure that these buffers are accessible within the first 
> 4G.

Please ask James, I do not remember that either.

For block layer, Jens guarantees it for you (it's controlled by
blk_queue_bounce_limit), but I am not sure about SCSI. I don't expect
it to have bounce buffers though.

> > Obvious.
> 
> Huh? What do you mean?

I meant the replacement of the tests for zero with dma_mapping_error().

> > @@ -450,16 +458,18 @@ sbp2_send_management_orb(struct fw_unit *unit, int 
> > node_id, int generation,
> >  
> > retval = 0;
> >   out:
> > -   dma_unmap_single(device->card->device, orb->base.request_bus,
> > -sizeof orb->request, DMA_TO_DEVICE);
> > dma_unmap_single(device->card->device, orb->response_bus,
> >  sizeof orb->response, DMA_FROM_DEVICE);
> > + out_orbresp:
> > +   dma_unmap_single(device->card->device, orb->base.request_bus,
> 

Re: [PATCH] libata-sff: Don't call bmdma_stop on non DMA capable controllers

2007-01-25 Thread Jeff Garzik

Alan wrote:

Fixes bogus accesses to ports 0-15 with a non DMA capable controller.
This I think should go in for 2.6.20


applied to #upstream-fixes, but it's a hack based on a misunderstanding. 
 See comments below for further work needed.




Arguably it shouldn't be called for PIO commands at all but thats a
matter for Jeff to decide 


You are getting misled by the function name.

ata_bmdma_post_internal_cmd() is the common ->post_internal_cmd() hook 
for BMDMA-like (SFF-like) controllers.  ->post_internal_cmd() hook will 
always be called, for all commands, when present.


For PIO-only controllers, simply delete the post_internal_cmd hook from 
that specific driver's ata_port_operations.  (assuming no other cleanup 
is needed)


For other SFF controllers, perhaps ata_bmdma_post_internal_cmd() should 
be revised to check the taskfile protocol (PIO, DMA, ...)?


I leave that up to your judgement, to figure out what's best.  I 
certainly AGREE that an unconditional ata_bmdma_stop() for all commands, 
for all taskfile protocols, sounds wrong.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems on x86_64 laptops (high-load crashes?)

2007-01-25 Thread Jiri Kosina
On Fri, 26 Jan 2007, mirek kratochvil wrote:

> I want to ask about strange behavior of linux kernel on some laptops 
> (namely recent Asus laptops with dualcore 64bit Athlons). There's a 
> weird bug when the kernel's under some kind of heavy load. It usually 
> freezes all processes which run from X11 (including X11..) - happens 
> usually when:
[ .. snip .. ]
> about hardware - this is mostly seen on Asus A6 and similar laptops. 
> A6T, A6Tc, A6Km,

What BIOS do the machines have? 04xx versions are known to be horribly 
buggy (even the-other-OS(tm) users experience strange things).

In case you have BIOS 04.., try to upgrade to 06.. version. Also, does it 
help when you boot with acpi=off kernel commandline parameter? (do you 
compile kernel with both acpi and apic support?).

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Bcm43xx oops after suspend to disk

2007-01-25 Thread roucaries bastien

Do you have the log stuff that precedes this part? In particular, was there a 
assertion that failed?


It is oops on resume and no asseertion failled this boot. However I
have usually a lot of
bcm43xx: ASSERTION FAILED (radio_attenuation < 10) at:
drivers/net/wireless/bcm43xx/bcm43xx_phy.c:1496:bcm43xx_find_lopair()
Moreover after the first oops I have a second oops (Surelly a
consequence of the first).

-
Jan 25 18:17:06 portablebastien kernel: bcm43xx: PHY connected
Jan 25 18:17:06 portablebastien kernel: bcm43xx: Microcode rev 0x118,
pl 0x17 (2004-05-06  21:34:00)
Jan 25 18:17:06 portablebastien kernel: bcm43xx: Radio turned on
Jan 25 18:17:06 portablebastien kernel: bcm43xx: Chip initialized
Jan 25 18:17:06 portablebastien kernel: bcm43xx: 32-bit DMA initialized
Jan 25 18:17:06 portablebastien kernel: bcm43xx: Keys cleared
Jan 25 18:17:06 portablebastien kernel: bcm43xx: Selected 802.11 core
(phytype 2)
Jan 25 18:17:06 portablebastien kernel: ADDRCONF(NETDEV_UP): eth2:
link is not ready
Jan 25 18:17:08 portablebastien dhcpd: Internet Systems Consortium
DHCP Server V3.0.4
Jan 25 18:17:08 portablebastien dhcpd: Copyright 2004-2006 Internet
Systems Consortium.
Jan 25 18:17:08 portablebastien dhcpd: All rights reserved.
Jan 25 18:17:08 portablebastien dhcpd: For info, please visit
http://www.isc.org/sw/dhcp/
Jan 25 18:17:08 portablebastien dhcpd: Internet Systems Consortium
DHCP Server V3.0.4
Jan 25 18:17:08 portablebastien dhcpd: Copyright 2004-2006 Internet
Systems Consortium.
Jan 25 18:17:08 portablebastien dhcpd: All rights reserved.
Jan 25 18:17:08 portablebastien dhcpd: For info, please visit
http://www.isc.org/sw/dhcp/
Jan 25 18:17:08 portablebastien dhcpd: Wrote 0 deleted host decls to
leases file.
Jan 25 18:17:08 portablebastien dhcpd: Wrote 0 new dynamic host decls
to leases file.
Jan 25 18:17:08 portablebastien dhcpd: Wrote 1 leases to leases file.
Jan 25 18:20:06 portablebastien kernel: NET: Registered protocol family 4
Jan 25 18:20:06 portablebastien kernel: NET: Registered protocol family 5
Jan 25 18:36:53 portablebastien -- MARK --
Jan 25 18:56:53 portablebastien -- MARK --
Jan 25 19:05:41 portablebastien kernel: Disabling non-boot CPUs ...
Jan 25 19:05:41 portablebastien kernel: Cannot set affinity for irq 0
Jan 25 19:05:41 portablebastien kernel: CPU 1 is now offline
Jan 25 19:05:41 portablebastien kernel: SMP alternatives: switching to UP code
Jan 25 19:46:45 portablebastien kernel: CPU1 is down
Jan 25 19:46:45 portablebastien kernel: Stopping tasks ... done.
Jan 25 19:46:45 portablebastien kernel: Shrinking memory...  ^H-^Hdone
(0 pages freed)
Jan 25 19:46:45 portablebastien kernel: Freed 0 kbytes in 0.02 seconds
(0.00 MB/s)
Jan 25 19:46:45 portablebastien kernel: Suspending console(s)
Jan 25 19:46:45 portablebastien kernel: bcm43xx: Suspending...
Jan 25 19:46:45 portablebastien kernel: bcm43xx: Radio turned off
Jan 25 19:46:45 portablebastien kernel: bcm43xx: DMA-32 0x0200 (RX)
max used slots: 1/64
Jan 25 19:46:45 portablebastien kernel: bcm43xx: DMA-32 0x02A0 (TX)
max used slots: 0/512
Jan 25 19:46:45 portablebastien kernel: bcm43xx: DMA-32 0x0280 (TX)
max used slots: 0/512
Jan 25 19:46:45 portablebastien kernel: bcm43xx: DMA-32 0x0260 (TX)
max used slots: 0/512
Jan 25 19:46:45 portablebastien kernel: bcm43xx: DMA-32 0x0240 (TX)
max used slots: 0/512
Jan 25 19:46:45 portablebastien kernel: bcm43xx: DMA-32 0x0220 (TX)
max used slots: 0/512
Jan 25 19:46:45 portablebastien kernel: bcm43xx: DMA-32 0x0200 (TX)
max used slots: 0/512
Jan 25 19:46:45 portablebastien kernel: ACPI: PCI interrupt for device
:03:03.0 disabled

Jan 25 19:46:45 portablebastien kernel: bcm43xx: Device suspended.
Jan 25 19:46:45 portablebastien kernel: ACPI: PCI interrupt for device
:03:01.2 disabled
Jan 25 19:46:45 portablebastien kernel: ohci1394 does not fully
support suspend and resume yet
Jan 25 19:46:45 portablebastien kernel: ACPI: PCI interrupt for device
:00:10.1 disabled
Jan 25 19:46:45 portablebastien kernel: ACPI: PCI interrupt for device
:00:0b.1 disabled
Jan 25 19:46:45 portablebastien kernel: ACPI: PCI interrupt for device
:00:0b.0 disabled
Jan 25 19:46:45 portablebastien kernel: swsusp: critical section:
Jan 25 19:46:45 portablebastien kernel: swsusp: Need to copy 114482 pages
Jan 25 19:46:45 portablebastien kernel: PCI: Enabling device
:00:0b.0 ( -> 0002)
Jan 25 19:46:45 portablebastien kernel: ACPI: PCI Interrupt
:00:0b.0[A] -> Link [LUB0] -> GSI 23 (level, low) -> IRQ 23
Jan 25 19:46:45 portablebastien kernel: ACPI: PCI Interrupt
:00:0b.1[B] -> Link [LUB2] -> GSI 22 (level, low) -> IRQ 22
Jan 25 19:46:45 portablebastien kernel: usb usb2: root hub lost power
or was reset
Jan 25 19:46:45 portablebastien kernel: ehci_hcd :00:0b.1: debug port 1
Jan 25 19:46:45 portablebastien kernel: ACPI: PCI Interrupt
:00:10.1[B] -> Link [LAZA] -> GSI 21 (level, low) -> IRQ 21
Jan 25 19:46:45 portablebastien kernel: 

Re: 2.6.18-stable release plans?

2007-01-25 Thread Alistair John Strachan
On Thursday 25 January 2007 09:16, Chris Rankin wrote:
> But anyway - can someone please tell me what "Eeek! page_mapcount(page)
> went negative! (-1)" is *really* saying/implying? Because I am currently
> translating this as "I WANT TO EAT YOUR FILESYSTEMS".

Hugh already did, multiple times. If there's an external hardware event that 
corrupts memory, code executing on your CPU is no longer going to behave 
deterministically. So cases that are typically "impossible" in the design of 
the code have a chance to trigger.

You can continue to flame 2.6.19, but you're an extreme minority when it comes 
to this kind of bug and as, again, Hugh already said, almost all of the 
reports of this and similar other bugs have led to hardware problems that 
were either unchecked or difficult to detect.

Imagine this scenario. It might seem unrealistic to you, but it's not 
impossible!

First Use of Linux -> Upgrading to 2.6.19
Undetected hardware error never triggered.

Running 2.6.19
Hardware error triggers. Linux crashes.

Going back to 2.6.18
Hardware error has not yet triggered again.

Will it eat your filesystem? Maybe. But it probably won't, if you claim the 
memory is tested, it could have been a single bit error, or a cosmic ray 
event, or a brownout, or anything similar. It's much more likely to simply 
crash your machine, as it did.

Not running the affected kernel again is a sure way to have _nobody_ listen to 
your complaints about 2.6.19 having a real software bug, because you're 
totally unwilling to test the kernel again and see if it triggers. A single 
report is simply not enough evidence. 

Additionally, reports from other users (who may have a million different 
experimental variables involved) are also insufficient, for reasons which 
have already been explained (drivers, proprietary code, et cetera).

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux i386 kernel 2.4.16 2.4.22 scheduling issue?

2007-01-25 Thread Fei Liu
Hello group, I have some concern about scheduling issue with 2.4.16 and 
2.4.22 i386 kernel where I see 1200+ interrupts and context switches per 
second through vmstat when machine is under load. Is this behavior 
normal? Is there any know scheduling issue with the above mentioned 
kernel versions?


Thanks,

Fei

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Fix apparent "CONFIG_64_BIT" typo.

2007-01-25 Thread Robert P. J. Day

  Fix apparent typo, where CONFIG_64_BIT should read CONFIG_64BIT.

Signed-off-by: Robert P. J. Day <[EMAIL PROTECTED]>

---

diff --git a/include/asm-um/elf-ppc.h b/include/asm-um/elf-ppc.h
index 9971113..d3b90b7 100644
--- a/include/asm-um/elf-ppc.h
+++ b/include/asm-um/elf-ppc.h
@@ -11,7 +11,7 @@ extern long elf_aux_hwcap;

 #define elf_check_arch(x) (1)

-#ifdef CONFIG_64_BIT
+#ifdef CONFIG_64BIT
 #define ELF_CLASS ELFCLASS64
 #else
 #define ELF_CLASS ELFCLASS32

-- 

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://www.fsdev.dreamhosters.com/wiki/index.php?title=Main_Page

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   >