date:20070511

Re: [patch 3/3] clockevents: Fix resume logic - updated version

2007-05-11 Thread Andrew Morton

On Fri, 11 May 2007 23:09:15 +0200 "Rafael J. Wysocki" <[EMAIL PROTECTED]> 
wrote:

> > > 
> > > hm, Fedora don't seem to want to give me an RPM which contains acpidump 
> > > and
> > > all the yum servers are featuring scrogged checksums.  I could build it, I
> > > guess, but there's a principle involved ;)
> > > 
> > > http://userweb.kernel.org/~akpm/dsdt is /proc/acpi/dsdt.  Is that OK?
> > 
> > Yes, thanks.
> 
> Hmm, have you tried to do 'echo shutdown > /sys/power/disk' before the
> hibernation?

That didn't change the behaviour.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ALPHA: MARVEL - check for allocated memory

2007-05-11 Thread Cyrill Gorcunov

This patch adds checking for allocated memory
which is used to hold AGP info. Also some whitespace
cleanup.

Signed-off-by: Cyrill Gorcunov <[EMAIL PROTECTED]>

---

 arch/alpha/kernel/core_marvel.c |  137 ---
 1 files changed, 71 insertions(+), 66 deletions(-)

diff --git a/arch/alpha/kernel/core_marvel.c b/arch/alpha/kernel/core_marvel.c
index 7f6a984..9f6d1a2 100644
--- a/arch/alpha/kernel/core_marvel.c
+++ b/arch/alpha/kernel/core_marvel.c
@@ -29,7 +29,7 @@
 #include "proto.h"
 #include "pci_impl.h"
 
-
+
 /*
  * Debug helpers
  */
@@ -41,13 +41,13 @@
 # define DBG_CFG(args)
 #endif
 
-
+
 /*
  * Private data
  */
 static struct io7 *io7_head = NULL;
 
-
+
 /*
  * Helper functions
  */
@@ -79,7 +79,7 @@ mk_resource_name(int pe, int port, char *str)
 {
char tmp[80];
char *name;
-   
+
sprintf(tmp, "PCI %s PE %d PORT %d", str, pe, port);
name = alloc_bootmem(strlen(tmp) + 1);
strcpy(name, tmp);
@@ -130,19 +130,19 @@ alloc_io7(unsigned int pe)
 * Insert in pe sorted order.
 */
if (NULL == io7_head)   /* empty list */
-   io7_head = io7; 
+   io7_head = io7;
else if (io7_head->pe > io7->pe) {  /* insert at head */
io7->next = io7_head;
io7_head = io7;
} else {/* insert at position */
for (insp = io7_head; insp; insp = insp->next) {
if (insp->pe == io7->pe) {
-   printk(KERN_ERR "Too many IO7s at PE %d\n", 
+   printk(KERN_ERR "Too many IO7s at PE %d\n",
   io7->pe);
return NULL;
}
 
-   if (NULL == insp->next || 
+   if (NULL == insp->next ||
insp->next->pe > io7->pe) { /* insert here */
io7->next = insp->next;
insp->next = io7;
@@ -157,7 +157,7 @@ alloc_io7(unsigned int pe)
io7_head = io7;
}
}
-   
+
return io7;
 }
 
@@ -191,7 +191,7 @@ io7_clear_errors(struct io7 *io7)
p7csrs->PO7_CRRCT_SYM.csr = -1UL;
 }
 
-
+
 /*
  * IO7 PCI, PCI/X, AGP configuration.
  */
@@ -206,11 +206,11 @@ io7_init_hose(struct io7 *io7, int port)
int i;
 
hose->index = hose_index++; /* arbitrary */
-   
+
/*
 * We don't have an isa or legacy hose, but glibc expects to be
 * able to use the bus == 0 / dev == 0 form of the iobase syscall
-* to determine information about the i/o system. Since XFree86 
+* to determine information about the i/o system. Since XFree86
 * relies on glibc's determination to tell whether or not to use
 * sparse access, we need to point the pci_isa_hose at a real hose
 * so at least that determination is correct.
@@ -249,10 +249,10 @@ io7_init_hose(struct io7 *io7, int port)
hose->mem_space->flags = IORESOURCE_MEM;
 
if (request_resource(&ioport_resource, hose->io_space) < 0)
-   printk(KERN_ERR "Failed to request IO on hose %d\n", 
+   printk(KERN_ERR "Failed to request IO on hose %d\n",
   hose->index);
if (request_resource(&iomem_resource, hose->mem_space) < 0)
-   printk(KERN_ERR "Failed to request MEM on hose %d\n", 
+   printk(KERN_ERR "Failed to request MEM on hose %d\n",
   hose->index);
 
/*
@@ -284,7 +284,7 @@ io7_init_hose(struct io7 *io7, int port)
hose->sg_isa = iommu_arena_new_node(marvel_cpuid_to_nid(io7->pe),
hose, 0x0080, 0x0080, 0);
hose->sg_isa->align_entry = 8;  /* cache line boundary */
-   csrs->POx_WBASE[0].csr = 
+   csrs->POx_WBASE[0].csr =
hose->sg_isa->dma_base | wbase_m_ena | wbase_m_sg;
csrs->POx_WMASK[0].csr = (hose->sg_isa->size - 1) & wbase_m_addr;
csrs->POx_TBASE[0].csr = virt_to_phys(hose->sg_isa->ptes);
@@ -302,7 +302,7 @@ io7_init_hose(struct io7 *io7, int port)
hose->sg_pci = iommu_arena_new_node(marvel_cpuid_to_nid(io7->pe),
hose, 0xc000, 0x4000, 0);
hose->sg_pci->align_entry = 8;  /* cache line boundary */
-   csrs->POx_WBASE[2].csr = 
+   csrs->POx_WBASE[2].csr =
hose->sg_pci->dma_base | wbase_m_ena | wbase_m_sg;
csrs->POx_WMASK[2].csr = (hose->sg_pci->size - 1) & wbase_m_addr;
csrs->POx_TBASE[2].csr = virt_to_phys(hose->sg_pci->ptes);
@@ -357,7 +357,7 @@ marvel_io7_present(gct6_node *node)
int pe;
 
if (node->type != GCT_TYPE_HOSE ||
-   node->subtype != GCT_SUBTYPE_IO_PORT_MODULE) 
+   node->subtype != GCT_SUBT

[PATCH] ALPHA: TITAN - check for allocated memory

2007-05-11 Thread Cyrill Gorcunov

This patch adds checking for allocated memory
which is used to hold AGP info. Also some whitespace
cleanup.

Signed-off-by: Cyrill Gorcunov <[EMAIL PROTECTED]>

---

 arch/alpha/kernel/core_titan.c |   99 +---
 1 files changed, 52 insertions(+), 47 deletions(-)

diff --git a/arch/alpha/kernel/core_titan.c b/arch/alpha/kernel/core_titan.c
index 3662fef..419dbc8 100644
--- a/arch/alpha/kernel/core_titan.c
+++ b/arch/alpha/kernel/core_titan.c
@@ -46,7 +46,7 @@ struct
 # define DBG_CFG(args)
 #endif
 
-
+
 /*
  * Routines to access TIG registers.
  */
@@ -56,21 +56,21 @@ mk_tig_addr(int offset)
return (volatile unsigned long *)(TITAN_TIG_SPACE + (offset << 6));
 }
 
-static inline u8 
+static inline u8
 titan_read_tig(int offset, u8 value)
 {
volatile unsigned long *tig_addr = mk_tig_addr(offset);
return (u8)(*tig_addr & 0xff);
 }
 
-static inline void 
+static inline void
 titan_write_tig(int offset, u8 value)
 {
volatile unsigned long *tig_addr = mk_tig_addr(offset);
*tig_addr = (unsigned long)value;
 }
 
-
+
 /*
  * Given a bus, device, and function number, compute resulting
  * configuration space address
@@ -84,7 +84,7 @@ titan_write_tig(int offset, u8 value)
  *
  * Type 1:
  *
- *  3 3|3 3 2 2|2 2 2 2|2 2 2 2|1 1 1 1|1 1 1 1|1 1 
+ *  3 3|3 3 2 2|2 2 2 2|2 2 2 2|1 1 1 1|1 1 1 1|1 1
  *  3 2|1 0 9 8|7 6 5 4|3 2 1 0|9 8 7 6|5 4 3 2|1 0 9 8|7 6 5 4|3 2 1 0
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  * | | | | | | | | | | |B|B|B|B|B|B|B|B|D|D|D|D|D|F|F|F|R|R|R|R|R|R|0|1|
@@ -95,11 +95,11 @@ titan_write_tig(int offset, u8 value)
  * 15:11   Device number (5 bits)
  * 10:8function number
  *  7:2register number
- *  
+ *
  * Notes:
- * The function number selects which function of a multi-function device 
+ * The function number selects which function of a multi-function device
  * (e.g., SCSI and Ethernet).
- * 
+ *
  * The register selects a DWORD (32 bit) register offset.  Hence it
  * doesn't get shifted by 2 bits as we want to "drop" the bottom two
  * bits.
@@ -123,7 +123,7 @@ mk_conf_addr(struct pci_bus *pbus, unsigned int device_fn, 
int where,
 
 addr = (bus << 16) | (device_fn << 8) | where;
addr |= hose->config_space_base;
-   
+
*pci_addr = addr;
DBG_CFG(("mk_conf_addr: returning pci_addr 0x%lx\n", addr));
return 0;
@@ -154,7 +154,7 @@ titan_read_config(struct pci_bus *bus, unsigned int devfn, 
int where,
return PCIBIOS_SUCCESSFUL;
 }
 
-static int 
+static int
 titan_write_config(struct pci_bus *bus, unsigned int devfn, int where,
   int size, u32 value)
 {
@@ -185,17 +185,17 @@ titan_write_config(struct pci_bus *bus, unsigned int 
devfn, int where,
return PCIBIOS_SUCCESSFUL;
 }
 
-struct pci_ops titan_pci_ops = 
+struct pci_ops titan_pci_ops =
 {
.read = titan_read_config,
.write =titan_write_config,
 };
 
-
+
 void
 titan_pci_tbi(struct pci_controller *hose, dma_addr_t start, dma_addr_t end)
 {
-   titan_pachip *pachip = 
+   titan_pachip *pachip =
  (hose->index & 1) ? TITAN_pachip1 : TITAN_pachip0;
titan_pachip_port *port;
volatile unsigned long *csr;
@@ -203,11 +203,11 @@ titan_pci_tbi(struct pci_controller *hose, dma_addr_t 
start, dma_addr_t end)
 
/* Get the right hose.  */
port = &pachip->g_port;
-   if (hose->index & 2) 
+   if (hose->index & 2)
port = &pachip->a_port;
 
/* We can invalidate up to 8 tlb entries in a go.  The flush
-  matches against <31:16> in the pci address.  
+  matches against <31:16> in the pci address.
   Note that gtlbi* and atlbi* are in the same place in the g_port
   and a_port, respectively, so the g_port offset can be used
   even if hose is an a_port */
@@ -215,7 +215,7 @@ titan_pci_tbi(struct pci_controller *hose, dma_addr_t 
start, dma_addr_t end)
if (((start ^ end) & 0x) == 0)
csr = &port->port_specific.g.gtlbiv.csr;
 
-   /* For TBIA, it doesn't matter what value we write.  For TBI, 
+   /* For TBIA, it doesn't matter what value we write.  For TBI,
   it's the shifted tag bits.  */
value = (start & 0x) >> 12;
 
@@ -249,11 +249,11 @@ titan_init_one_pachip_port(titan_pachip_port *port, int 
index)
hose->mem_space = alloc_resource();
 
/*
-* This is for userland consumption.  The 40-bit PIO bias that we 
-* use in the kernel through KSEG doesn't work in the page table 
+* This is for userland consumption.  The 40-bit PIO bias that we
+* use in the kernel through KSEG doesn't work in the page table
 * based user mappings. (43-bit KSEG sign extends the physical
 * address from bit 40 to hit the I/O bit - mapped addresses don't).
-* So make s

Re: FUTEX_CMP_REQUEUE_PI is not quite there

2007-05-11 Thread Ulrich Drepper


Andrew Morton wrote:

Well yup.  We're kind of waiting for someone to reply
to http://lkml.org/lkml/2007/5/7/129


Seems to be the same or at least related.

On comment about my first mail: this is the correct code of condvars, 
despite what I wrote before.  I wasn't thinking clear.  The internal 
futex is a normal futex.  It is the job of the CMP_REQUEUE_PI call to 
figure this out, select the waiter with the highest priority, and boost 
the priority if necessary based on the targer futex which always is a PI 
futex.


--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] kbuild: silence section mismatch warnings

2007-05-11 Thread Sam Ravnborg

On Fri, May 11, 2007 at 04:22:28PM -0500, Kumar Gala wrote:
> 
> On May 11, 2007, at 4:08 PM, Sam Ravnborg wrote:
> 
> >- Forwarded message from Sam Ravnborg <[EMAIL PROTECTED]> -
> >
> >Forgot lkml in first mail...
> >
> > Sam
> >
> >Subject: [RFC PATCH] kbuild: silence section mismatch warnings
> >From: Sam Ravnborg <[EMAIL PROTECTED]>
> >Date: Fri, 11 May 2007 23:03:46 +0200
> >User-Agent: Mutt/1.4.2.1i
> >To: Chris Wedgwood <[EMAIL PROTECTED]>, Andrew Morton <[EMAIL PROTECTED]>,
> > "David S. Miller" <[EMAIL PROTECTED]>,
> > Russell King <[EMAIL PROTECTED]>,
> > Satyam Sharma <[EMAIL PROTECTED]>
> >Cc: [EMAIL PROTECTED]
> >
> >Following patch allow us in specific places to silence section  
> >mismatch warnings.
> >There is a few legitime places that modpost does not yet recognize  
> >where
> >reference from .text to .init.text (likewise for data) are legitime.
> >This allow us to spot the few places and annotate them so we do not
> >get false warnings that in the end will let real warnings pass.
> >
> >The annotation is simple to grep for so revieing all uses in a few
> >months time are trivial. It is assumed that a few places will
> >use this to shut up the warning as replacement for the real fix.
> >But these cases are esay to spot and to fix up.
> 
> Its unclear if you expect that some things will be tagged  
> __init_refok/__initdata_refok forever or if we'll find some way to  
> fix/change the code so the things tagged no longer need it.

A few places will need the __init_refok tag forever.
But as Satyam points out it will likely be misused.

So the __init_refok is introduced to stay.

akpm pointed out in private mail that I need to update the
linker scripts too - and running out of time this weekend so that
will be later.

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] "volatile considered harmful", take 3

2007-05-11 Thread H. Peter Anvin

Satyam Sharma wrote:
> 
> Because volatile is ill-defined? Or actually, *undefined* (well,
> implementation-defined is as good as that)? It's *so* _vague_,
> one doesn't _feel_ like using it at all!
> 

Sorry, that's just utter crap.  Linux isn't written in some mythical C
which only exists in standard document, it is written in a particular
subset of GNU C.  "volatile" is well enough defined in that context, it
is just frequently misused.

> We already have a complete API containing optimization barriers,
> load/store/full memory barriers. With well-defined and
> well-understood semantics. Just ... _why_ use volatile?

See below.

> It will _always_ work. In fact you can't really say the same for
> volatile. We already assume the compiler _actually_ took some
> pains to stuff meaning into C's (lack of) definition of volatile and
> implement it -- but in what sense, nobody knows (the C standard
> doesn't, so what are we).

It will always work within the context of GNU C.

>> more heavy-handed as it's disabling *all* optimization such as loop
>> invariants across the barrier.
> 
> This is a legitimate criticism, I agree.

There you have it.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: FUTEX_CMP_REQUEUE_PI is not quite there

2007-05-11 Thread Andrew Morton

On Fri, 11 May 2007 23:10:47 -0700 Ulrich Drepper <[EMAIL PROTECTED]> wrote:

> I hooked up FUTEX_CMP_REQUEUE_PI here and got a kernel crash.

Well yup.  We're kind of waiting for someone to reply
to http://lkml.org/lkml/2007/5/7/129
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] LinuxPPS: Pulse per Second support for Linux

2007-05-11 Thread Andrew Morton

On Fri, 11 May 2007 23:55:37 +0200
Rodolfo Giometti <[EMAIL PROTECTED]> wrote:

> Hello,
> 
> here my new patch with a lot of fixes.
> 
> The only issue not still fixed is the one related with:
> 
>   #define NETLINK_PPSAPI  20
> 
> I need time to resolve it.
> 
> Follows my comments and then the patch, hope now I can came back into
> -mm tree again! :)

Well I suppose I could toss it in there for a bit of review-and-test.  But
I'll need to drop it again because we do need to split this patch into the 
series
of patches, please.

You should do this earlier rather than later because it improves reviewability.

> > - This:
> > 
> > static void pps_class_release(struct class_device *cdev)
> > {
> > /* Nop??? */
> > }
> > 
> >   is a bug and it earns you a nastygram from Greg.  These objects must be
> >   dynamically allocated - this is not optional.
> 
> It could be acceptable defining this function as void?

No, it needs to be a proper release function, like all the other ones
around the place.

This comes up again and again and again and I recently asked Greg to direct
me to (or to write) suitable documentation, and I think he did, but I lost
it.  Greg, can you remind us please?

> >   We have a bunch of code in random other drivers which is dependent upon
> >   CONFIG_PPS_CLIENT_foo.  The problem is that if a kernel was compiled with
> >   CONFIG_PPS_CLIENT_foo=n and then the pps driver is later built for that
> >   kernel, it won't actually work because lp, serial etc weren't correctly
> >   configured when _they_ were built.
> > 
> >   This sort of cross-module coupling is considered to be a bad thing, but
> >   I'm not really sure it's all that important.
> >
> > - Please split the patch up into a series of patches: one for pps core and
> >   one for each of the clients (servers?): one for lp, one for serial, etc.
> > 
> >   Try to arrange for that series of patches to build and run at each stage
> >   of application.
> > 
> >   Please don't lose my changes when you do so ;)
> > 
> >   Please review the changes I made and a) stick to the same style and b) fix
> >   up any sites which I missed.
> > 
> > - Please remove all the typedefs:
> > 
> > +typedef struct ntp_fp {
> > +typedef union pps_timeu {
> > +typedef struct pps_info {
> > +typedef struct pps_params {
> > 
> >   and just use `struct ntp_fp' everywhere.
> 
> Those typedefs are defined in PPS specifications (please, see RFC 2783).

We don't use typedefs in-kernel.  Please convert the code to use `struct
ntp_fp' everywhere.

For RFC compatibility to userspace you can do

#ifndef __KERNEL__
typedef struct ntp_fp ntp_fp_t;
...
#endif

> > - The above four structures are communicated with userspace, yes?
> > 
> >   I believe that they will not work correctly when 32-bit userspace is
> >   communicating with a 64-bit kernel.  Alignments change and sizeof(long)
> >   changes.
> > 
> >   You don't want to have to write compat code.  I suggest that you redo
> >   those structures in terms of __u32, __u64, etc.  You probably need to use
> >   attribute((packed)) too, not sure.
> > 
> >   Then let's get that part carefully reviewed (Arnd Bergmann <[EMAIL 
> > PROTECTED]>
> >   is my go-to guru on this) and please test it carefully.
> > 
> >   Yeah, you just haven't got a chance that something as huge and as complex
> >   as struct pps_netlink_msg will survive the 32->64 transition.
> 
> The same as above. These structure are fixed by RFC 2783.

Your answer has no relationship to my question.

The problem here is that under a 64-bit kernel we require that applications
which use this structure definition work correctly when they are compiled
to generate 32-bit code and when they are compiled to generate 64-bit code.

Furthermore we should aim to to have to code work correctly across
different version of the compiler, and when different compiler options are
used, and when altogether different compilers are used.

It is not clear to me that your definition is sufficiently defensive
against _any_ of these things.

> > - Please ensure that `make headers_check' passes OK (you'll hear from me if
> >   it doesn't ;))
> 
> Done.
> 
> > - Can we get rid of the private dbg, err and info macros?  Surely there are
> >   generic ones somewhere.
> 
> They are very useful to LinuxPPS users who can enable/disable them by
> configuration menu.

You misunderstand.  I'm not saying "remove the callsites".  I'm saying
"remove the definitions".

Because we already have things like pr_debug() and pr_info(), so new code
should use those rather than reinventing them.

Plus, we already have at least 52 different implementations of "dbg" in the
tree and your 53rd one didn't compile because it clashed with someone
else's.  This is the compiler sending us a message: "use the exiting
infrastructure".   If that infrastructure is insufficient then let's
improve it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body o

Re: [PATCH] "volatile considered harmful", take 3

2007-05-11 Thread Satyam Sharma

On 5/12/07, H. Peter Anvin <[EMAIL PROTECTED]> wrote:

Satyam Sharma wrote:
>
>> +  - Pointers to data structures in coherent memory which might be
>> modified
>> +by I/O devices can, sometimes, legitimately be volatile.  A ring
>> buffer
>> +used by a network adapter, where that adapter changes pointers to
>> +indicate which descriptors have been processed, is an example of
>> this
>> +type of situation.
>
> is a legitimate use case for volatile is still not clear to me (I
> agree with Alan's
> comment in a previous thread that this seems to be a case where a memory
> barrier would be applicable^Wbetter, actually). I could be wrong here, so
> would be nice if Peter explains why volatile is legitimate here.
>
> Otherwise, it's fine with me.
>

I don't see why Alan's way is necessarily better;

Because volatile is ill-defined? Or actually, *undefined* (well,
implementation-defined is as good as that)? It's *so* _vague_,
one doesn't _feel_ like using it at all!

We already have a complete API containing optimization barriers,
load/store/full memory barriers. With well-defined and
well-understood semantics. Just ... _why_ use volatile?

it should work but is

It will _always_ work. In fact you can't really say the same for
volatile. We already assume the compiler _actually_ took some
pains to stuff meaning into C's (lack of) definition of volatile and
implement it -- but in what sense, nobody knows (the C standard
doesn't, so what are we).

more heavy-handed as it's disabling *all* optimization such as loop
invariants across the barrier.

This is a legitimate criticism, I agree.

Thanks,
Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

FUTEX_CMP_REQUEUE_PI is not quite there

2007-05-11 Thread Ulrich Drepper

I hooked up FUTEX_CMP_REQUEUE_PI here and got a kernel crash.  No serial 
console so this is the output of the screen after the machine stopped.


This is of course on x86-64.  Compiled from a rawhide-ified upstream 
kernel from two days ago.


The situation is the we requeue from a non-PI futex to a PI futex.  We 
might now actually want to change the condvar implementation to use
internally a PI futex if the mutex in use is PI, too, but this kind of 
mismatch can still happen.  I can provide binaries if necessary.



There is quite a lot of output from the kernel:

BUG: at kernel/futex.c:1665 set_pi_futex_owner()

Call Trace:
 [] futex_lock_pi+0x351/0x685
 [] _spin_lock_irqsave+0x9/0xe
 [] __up_read+0x19/0x7f
 [] default_wake_function+0x0/0xe
 [] do_futex+0xa68/0x10e8
 [] sys_futex+0xee/0x10c
 [] _spin_unlock_irq+0x9/0xc
 [] system_call+0x7e/0x83

BUG: at lib/plist.c:78 plist_add()

Call Trace:
 [] plist_add+0x3a/0x90
 [] futex_lock_pi+0x387/0x685
 [] _spin_lock_irqsave+0x9/0xe
 [] __up_read+0x19/0x7f
 [] default_wake_function+0x0/0xe
 [] do_futex+0xa68/0x10e8
 [] sys_futex+0xee/0x10c
 [] _spin_unlock_irq+0x9/0xc
 [] system_call+0x7e/0x83

BUG: at kernel/futex.c:483 exit_pi_state_list()

Call Trace:
 [] exit_pi_state_list+0xbe/0x11e
 [] do_exit+0x801/0x84e
 [] complete_and_exit+0x0/0x16
 [] system_call+0x7e/0x83

list_add corruption. prev->next should be next (81001dda1cb8), but 
was 81006c 6e06c8. (prev=81006c6e06c8).

[ cut here ]
kernel BUG at lib/list_debug.c:33!
invalid opcode:  [1] SMP
CPU 0
Pid: 15097, comm: ld-linux-x86-64 Not tainted 2.6.21-1.3145.fc7 #1
RIP: 0010:[]  [] __list_add+0x47/0x5b
RSP: 0018:81003cc01e78  EFLAGS: 00010092
RAX: 0079 RBX: 81001dda1cb8 RCX: fca9
RDX:  RSI: 0282 RDI: 80559a50
RBP: 81001dda1cb0 R08: 00a0 R09: 0010
R10: 81000305dd00 R11:  R12: 81001dda1c88
R13: 0282 R14: 81006c6e0080 R15: 810075edac78
FS:  () GS:8059e000() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 40400eb8 CR3: 1c40f000 CR4: 26e0
Process ld-linux-x86-64 (pid: 15097, threadinfo 81003cc0, task 
81006c6e00


Stack:  81006c6e06b0 8030c7a2 81006c6e07b0 810075edac50
 81006c6e06b0 8043ac19 81006c6e06b0 810075edac40
 81006c6e06b0 8070f9f0 81006c6e07b0 81006c6e0080
Call Trace:
 [] plist_del+0x3a/0x70
 [] rt_mutex_slowunlock+0x8c/0x1cd
 [] exit_pi_state_list+0xec/0x11e
 [] do_exit+0x801/0x84e
 [] complete_and_exit+0x0/0x16
 [] system_call+0x7e/0x83


Code: 0f 0b eb fe 48 89 7e 08 48 89 37 48 89 57 08 48 89 3a 5a c3
RIP  [] __list_add+0x47/0x5b
 RSP 

--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] LinuxPPS: Pulse per Second support for Linux

2007-05-11 Thread Rodolfo Giometti

Hello,

here my new patch with a lot of fixes.

The only issue not still fixed is the one related with:

#define NETLINK_PPSAPI  20

I need time to resolve it.

Follows my comments and then the patch, hope now I can came back into
-mm tree again! :)

On Thu, May 10, 2007 at 12:27:52AM -0700, [EMAIL PROTECTED] wrote:
>
> Review comments:
>
> - Running a timer once per second will make the super-low-power people upset.

The ktimer modules is just for debugging pourpose and it's not needed
into real working system.

> - This uses netlink?  Is that interface documented anywhere?
>
>   Please check with Dave Miller that this:
>
>   #define NETLINK_PPSAPI  20
>
>   reservation is OK.

Is not ok. To be fixed.

> - This:
>
>   if ((nlpps->tsformat != PPS_TSFMT_TSPEC) != 0 ) {
>
>   is weird.  I changed it to
>
>   if (nlpps->tsformat != PPS_TSFMT_TSPEC) {

Fixed.

> - This:
>
>   timeout += nlpps->timeout.tv_nsec/(10/HZ);
>
>   probably won't work on i386.  We use do_div() for 64/32 divides.  I'll
>   find out when I compile it.
>
>   It's nice to use NSEC_PER_SEC rather than having to count all those
>   zeroes.

Fixed.

> - The code uses interruptible_sleep_on_timeout().  That API is deprecated
>   and is racy.  Please convert to wait_event_interruptible_timeout().
>
>   Ditto interruptible_sleep_on()

Fixed.

> - This:
>
> memset(pps_source, 0, sizeof(struct pps_s) * PPS_MAX_SOURCES);
>
>   was unneeded.  The C startup code already did that.

Fixed.

> - All these separators:
>
> +/* --- Input function --
+*/
>
>   aren't typical for kernel code.  I left them in, but please consider
>   removing them all.

Fixed.

> - This:
>
>   static void pps_class_release(struct class_device *cdev)
>   {
>   /* Nop??? */
>   }
>   
>   is a bug and it earns you a nastygram from Greg.  These objects must be
>   dynamically allocated - this is not optional.

It could be acceptable defining this function as void?

> - What's this doing in 8250.c?
>
> + if (up->port.flags & UPF_HARDPPS_CD)
> + up->ier |= UART_IER_MSI;/* enable interrupts */
>   
>   Please fully describe the reasons for this change in the changelog, and in
>   a code comment and then get the change reviewed by Russell King
>   <[EMAIL PROTECTED]>.

If user specify a serial port as PPS source we enable IRQ on that
port.

> - Please document within the changelog the other changes to the serial code
>   and we'll ask Russell to take a look at those as well.

OK. I'll do it.

> - The Kconfig purports to support CONFIG_PPS=m.  Does that actually work?

Yes. It works...

>   We have a bunch of code in random other drivers which is dependent upon
>   CONFIG_PPS_CLIENT_foo.  The problem is that if a kernel was compiled with
>   CONFIG_PPS_CLIENT_foo=n and then the pps driver is later built for that
>   kernel, it won't actually work because lp, serial etc weren't correctly
>   configured when _they_ were built.
>
>   This sort of cross-module coupling is considered to be a bad thing, but
>   I'm not really sure it's all that important.
>
> - Please split the patch up into a series of patches: one for pps core and
>   one for each of the clients (servers?): one for lp, one for serial, etc.
>
>   Try to arrange for that series of patches to build and run at each stage
>   of application.
>   
>   Please don't lose my changes when you do so ;)
>
>   Please review the changes I made and a) stick to the same style and b) fix
>   up any sites which I missed.
>
> - Please remove all the typedefs:
>
> +typedef struct ntp_fp {
> +typedef union pps_timeu {
> +typedef struct pps_info {
> +typedef struct pps_params {
>
>   and just use `struct ntp_fp' everywhere.

Those typedefs are defined in PPS specifications (please, see RFC 2783).

> - The above four structures are communicated with userspace, yes?
> 
>   I believe that they will not work correctly when 32-bit userspace is
>   communicating with a 64-bit kernel.  Alignments change and sizeof(long)
>   changes.
>   
>   You don't want to have to write compat code.  I suggest that you redo
>   those structures in terms of __u32, __u64, etc.  You probably need to use
>   attribute((packed)) too, not sure.
> 
>   Then let's get that part carefully reviewed (Arnd Bergmann <[EMAIL 
> PROTECTED]>
>   is my go-to guru on this) and please test it carefully.
> 
>   Yeah, you just haven't got a chance that something as huge and as complex
>   as struct pps_netlink_msg will survive the 32->64 transition.

The same as above. These structure are fixed by RFC 2783.

> - Please ensure that `make headers_check' passes OK (you'll hear from me if
>   it doesn't ;))

Done.

> - Can we get rid of the private dbg, err and info macros?  Surely there are
>   generic ones somewhere.

They are very useful to LinuxPPS users who can enable/disable them by
configuration menu.

Also I'm planning to

Re: [PATCH] mm: swap prefetch improvements

2007-05-11 Thread Paul Jackson

Con wrote:
> Hmm I'm not really sure what it takes to make it cpuset aware;
> ...
> It is numa aware to some degree. It stores the node id and when it starts 
> prefetching it only prefetches to nodes that are suitable for prefetching to 
> ...
> It would be absolutely trivial to add a check for 'number_of_cpusets' <= 1
> in  the prefetch_enabled() function. Would you like that?

Hmmm ... it seems that we shadow boxing here ... trying to pick a solution
to solve a problem when we aren't even sure we have a problem, much less
what the problem is.

That does not usually lead to the right path.

Could you put some more effort into characterizing what problems
can arise if one has prefetch and cpusets active at the same time?

My first wild guess is that the only incompatibility would have been that
prefetch might mess up NUMA placement (get pages on wrong nodes), which
it seems you have tried to address in your current patches.  So it would
not surprise me if there was no problem here.

We may just have to lean on Nick some more, if he is the only one who
understands what the problem is, to try again to explain it to us.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-mm2: HDAPS? BUG: at kernel/mutex.c:311

2007-05-11 Thread Satyam Sharma

Hi Dmitry,

On 5/12/07, Dmitry Torokhov <[EMAIL PROTECTED]> wrote:

On Friday 11 May 2007 20:53, Andrew Morton wrote:
> Ho hum. I suppose a suitable workaround would be to convert hdaps_mtx back
> into a semaphore. ug.

Actually I was looking for victimes^Wvolunteers to test the patch below.
It gets rid of _trylock business.

Ah! You just beat me here, and your patch is definitely better.

I was wondering why this driver wanted to use a mutex (previously the
semaphore) to synchronize between process and interrupt context in the
first place. Most of the code in here uses synchronous delays so never
sleeps anyway, but then unfortunately it does a weird
repeated-waiting-hardware-status-register-check thingy in its .probe()
which meant a straightforward mutex -> spinlock wasn't possible.

So then made a patch pushing off the poll to keventd workqueue, when
I saw your mail that does exactly the same, but wrapped about in the
generic input-polldev infrastructure! It's barely 12 days old in mainline --
no wonder I didn't know about it. Seems to be good-looking code!

Thanks,
Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] "volatile considered harmful", take 3

2007-05-11 Thread H. Peter Anvin

H. Peter Anvin wrote:
> 
> I don't see why Alan's way is necessarily better; it should work but is
> more heavy-handed as it's disabling *all* optimization such as loop
> invariants across the barrier.
> 

To expand on this further: the way this probably *should* be handled,
Linux-style, is with internally-volatile versions of le32_to_cpup() and
friends.  That obeys the concept that the volatility should be
associated with an operation, not a data structure, and, being related
to an I/O device, should have its endianness explicitly declared.

Right now those macros don't exist, however.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] "volatile considered harmful", take 3

2007-05-11 Thread H. Peter Anvin

Satyam Sharma wrote:
> 
>> +  - Pointers to data structures in coherent memory which might be
>> modified
>> +by I/O devices can, sometimes, legitimately be volatile.  A ring
>> buffer
>> +used by a network adapter, where that adapter changes pointers to
>> +indicate which descriptors have been processed, is an example of
>> this
>> +type of situation.
> 
> is a legitimate use case for volatile is still not clear to me (I
> agree with Alan's
> comment in a previous thread that this seems to be a case where a memory
> barrier would be applicable^Wbetter, actually). I could be wrong here, so
> would be nice if Peter explains why volatile is legitimate here.
> 
> Otherwise, it's fine with me.
> 

I don't see why Alan's way is necessarily better; it should work but is
more heavy-handed as it's disabling *all* optimization such as loop
invariants across the barrier.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] "volatile considered harmful", take 2

2007-05-11 Thread H. Peter Anvin

pradeep singh wrote:
> 
> Sorry, for my misunderstanding but i hope Jonathan actually means
> volatile harmful only in C and not while using extended asm with gcc? Or
> does you all consider volatile while using extended asm as harmful too?
> Incidentally i came to know that using volatile in such cases may be
> still be optimized by the gcc. And the correct way is to fake a side
> effect to the gcc, which can be done using "memory" clobbering directive
> in the correct place and not "m" or "+m".
> 
> Does this means to exclude volatile from extended asm also, while using
> them in kernel?
> 

We were talking about "register", not "volatile".

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 05/10] Linux Kernel Markers - i386 optimized version

2007-05-11 Thread Suparna Bhattacharya

On Fri, May 11, 2007 at 10:27:29AM +0530, Ananth N Mavinakayanahalli wrote:
> On Thu, May 10, 2007 at 12:59:18PM -0400, Mathieu Desnoyers wrote:
> > * Alan Cox ([EMAIL PROTECTED]) wrote:
> 
> ...
> > > > * Third issue : Scalability. Changing code will stop every CPU on the
> > > >   system for a while. Compared to this, the int3-based approach will run
> > > >   through the breakpoint handler "if" one of the CPU happens to execute
> > > >   this code at the wrong time. The standard case is just an IPI (to
> > > 
> > > If I read the errata right then patching in an int3 will itself trigger
> > > the errata so anything could happen.
> > > 
> > > I believe there are other safe sequences for doing code patching - perhaps
> > > one of the Intel folk can advise ?
> 
> IIRC, when the first implementation of what exists now as kprobes was
> done (as part of the dprobes framework), this question did come up. I
> think the conclusion was that the errata applies only to multi-byte
> modifications and single-byte changes are guaranteed to be atomic.
> Given int3 on Intel is just 1-byte, we are safe.
> 
> > I'll let the Intel guys confirm this, I don't have the reference nearby
> > (I got this information by talking with the kprobe team members, and
> > they got this information directly from Intel developers) but the
> > int3 is the one special case to which the errata does not apply.
> > Otherwise, kprobes and gdb would have a big, big issue.
> 
> Perhaps Richard/Suparna can confirm.

I just tried digging up past discussions on this from Richard, about int3
being safe

http://sourceware.org/ml/systemtap/2005-q3/msg00208.html
http://lkml.org/lkml/2006/9/20/30

Regards
Suparna

> 
> Ananth

-- 
Suparna Bhattacharya ([EMAIL PROTECTED])
Linux Technology Center
IBM Software Lab, India

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: swap prefetch improvements

2007-05-11 Thread Con Kolivas

On Saturday 12 May 2007 15:03, Paul Jackson wrote:
> > Swap prefetch is not cpuset aware so make the config option depend on
> > !CPUSETS.
>
> Ok.
>
> Could you explain what it means to say "swap prefetch is not cpuset aware",
> or could you give a rough idea of what it would take to make it cpuset
> aware?

Hmm I'm not really sure what it takes to make it cpuset aware; it was Nick 
that pointed out that it was not, so I'm not sure and still going off your 
original recommendation that there was no need to make it cpuset aware but at 
least honour node placement (see below).

> I wouldn't go so far as to say that no one would ever want to prefetch and
> use cpusets at the same time, but I will grant that it's not a sufficiently
> important need that it should block a useful prefetch implementation on
> non-cpuset systems.

Thank you for agreeing on me there :)

> One case that would be useful, however, is to handle prefetch in the case
> that cpusets are configured into ones kernel, but one is not making any
> real use of them ('number_of_cpusets' <= 1).  That will actually be the
> most common case for the major distribution(s) that enable cpusets by
> default in their builds, for most arch's including the arch's popular
> on desktops.
>
> So what would it take to allow CONFIG'ing both prefetch and cpusets on,
> but having prefetch dynamically adapt to the presence of active cpuset
> usage, perhaps by basically shutting down if it can't easily do any
> better?  I could certainly entertain requests to callout to some
> prefetch routine from the cpuset code, at the critical points that
> cpusets transitioned in or out of active use.

It would be absolutely trivial to add a check for 'number_of_cpusets' <= 1 in 
the prefetch_enabled() function. Would you like that?

> Semi-separate issue -- is it just cpusets that aren't prefetch friendly,
> or is it also mm/mempolicy (mbind, set_mempolicy) as well?
>
> For that matter, even if neither mm/mempolicy nor cpusets are used, on
> systems with multiple memory nodes (not all memory equally distant from
> all CPUs, aka NUMA), could prefetch cause some sort of shuffling of
> memory placement, which might harm the performance of an HPC (High
> Performance Computing) application with carefully tuned memory
> placement.  Granted, this -is- getting to be a corner case.  Most HPC
> apps running on NUMA hardware are making at least some use of
> mm/mempolicy or cpusets.

It is numa aware to some degree. It stores the node id and when it starts 
prefetching it only prefetches to nodes that are suitable for prefetching to 
(based on a number of arbitrary freeness arguments I invented). It uses the 
original node id it came from by allocating a page via:
alloc_pages_node(node, GFP_HIGHUSER & ~__GFP_WAIT, 0);
where "node" is the original node the swapped page came from.

Thanks for comments.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-mm2: HDAPS? BUG: at kernel/mutex.c:311

2007-05-11 Thread Dmitry Torokhov

On Friday 11 May 2007 20:53, Andrew Morton wrote:
> Ho hum.  I suppose a suitable workaround would be to convert hdaps_mtx back
> into a semaphore.  ug.

Actually I was looking for victimes^Wvolunteers to test the patch below.
It gets rid of _trylock business.

-- 
Dmitry

HWMON: hdaps - convert to use input-polldev

Switch to using input-polldev skeleton instead of implementing
polling loop by itself.

Signed-off-by: Dmitry Torokhov <[EMAIL PROTECTED]>
---

 drivers/hwmon/Kconfig |1 
 drivers/hwmon/hdaps.c |   55 +-
 2 files changed, 25 insertions(+), 31 deletions(-)

Index: work/drivers/hwmon/Kconfig
===
--- work.orig/drivers/hwmon/Kconfig
+++ work/drivers/hwmon/Kconfig
@@ -602,6 +602,7 @@ config SENSORS_W83627EHF
 config SENSORS_HDAPS
tristate "IBM Hard Drive Active Protection System (hdaps)"
depends on INPUT && X86
+   select INPUT_POLLDEV
default n
help
  This driver provides support for the IBM Hard Drive Active Protection
Index: work/drivers/hwmon/hdaps.c
===
--- work.orig/drivers/hwmon/hdaps.c
+++ work/drivers/hwmon/hdaps.c
@@ -28,7 +28,7 @@
 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -61,13 +61,12 @@
 #define INIT_TIMEOUT_MSECS 4000/* wait up to 4s for device init ... */
 #define INIT_WAIT_MSECS200 /* ... in 200ms increments */
 
-#define HDAPS_POLL_PERIOD  (HZ/20) /* poll for input every 1/20s */
+#define HDAPS_POLL_INTERVAL50  /* poll for input every 1/20s (50 ms)*/
 #define HDAPS_INPUT_FUZZ   4   /* input event threshold */
 #define HDAPS_INPUT_FLAT   4
 
-static struct timer_list hdaps_timer;
 static struct platform_device *pdev;
-static struct input_dev *hdaps_idev;
+static struct input_polled_dev *hdaps_idev;
 static unsigned int hdaps_invert;
 static u8 km_activity;
 static int rest_x;
@@ -323,24 +322,19 @@ static void hdaps_calibrate(void)
__hdaps_read_pair(HDAPS_PORT_XPOS, HDAPS_PORT_YPOS, &rest_x, &rest_y);
 }
 
-static void hdaps_mousedev_poll(unsigned long unused)
+static void hdaps_mousedev_poll(struct input_polled_dev *dev)
 {
+   struct input_dev *input_dev = dev->input;
int x, y;
 
-   /* Cannot sleep.  Try nonblockingly.  If we fail, try again later. */
-   if (mutex_trylock(&hdaps_mtx)) {
-   mod_timer(&hdaps_timer,jiffies + HDAPS_POLL_PERIOD);
-   return;
-   }
+   mutex_lock(&hdaps_mtx);
 
if (__hdaps_read_pair(HDAPS_PORT_XPOS, HDAPS_PORT_YPOS, &x, &y))
goto out;
 
-   input_report_abs(hdaps_idev, ABS_X, x - rest_x);
-   input_report_abs(hdaps_idev, ABS_Y, y - rest_y);
-   input_sync(hdaps_idev);
-
-   mod_timer(&hdaps_timer, jiffies + HDAPS_POLL_PERIOD);
+   input_report_abs(input_dev, ABS_X, x - rest_x);
+   input_report_abs(input_dev, ABS_Y, y - rest_y);
+   input_sync(input_dev);
 
 out:
mutex_unlock(&hdaps_mtx);
@@ -536,6 +530,7 @@ static struct dmi_system_id __initdata h
 
 static int __init hdaps_init(void)
 {
+   struct input_dev *idev;
int ret;
 
if (!dmi_check_system(hdaps_whitelist)) {
@@ -563,39 +558,37 @@ static int __init hdaps_init(void)
if (ret)
goto out_device;
 
-   hdaps_idev = input_allocate_device();
+   hdaps_idev = input_allocate_polled_device();
if (!hdaps_idev) {
ret = -ENOMEM;
goto out_group;
}
 
+   hdaps_idev->poll = hdaps_mousedev_poll;
+   hdaps_idev->poll_interval = HDAPS_POLL_INTERVAL;
+
/* initial calibrate for the input device */
hdaps_calibrate();
 
/* initialize the input class */
-   hdaps_idev->name = "hdaps";
-   hdaps_idev->dev.parent = &pdev->dev;
-   hdaps_idev->evbit[0] = BIT(EV_ABS);
-   input_set_abs_params(hdaps_idev, ABS_X,
+   idev = hdaps_idev->input;
+   idev->name = "hdaps";
+   idev->dev.parent = &pdev->dev;
+   idev->evbit[0] = BIT(EV_ABS);
+   input_set_abs_params(idev, ABS_X,
-256, 256, HDAPS_INPUT_FUZZ, HDAPS_INPUT_FLAT);
-   input_set_abs_params(hdaps_idev, ABS_Y,
+   input_set_abs_params(idev, ABS_Y,
-256, 256, HDAPS_INPUT_FUZZ, HDAPS_INPUT_FLAT);
 
-   ret = input_register_device(hdaps_idev);
+   ret = input_register_polled_device(hdaps_idev);
if (ret)
goto out_idev;
 
-   /* start up our timer for the input device */
-   init_timer(&hdaps_timer);
-   hdaps_timer.function = hdaps_mousedev_poll;
-   hdaps_timer.expires = jiffies + HDAPS_POLL_PERIOD;
-   add_timer(&hdaps_timer);
-
printk(KERN_INFO "hdaps: driver successfully loaded.\n");
return 0;
 
 out_idev:
-   input_free_device(hdaps_idev);
+   inp

Re: Is this a preempt issue in drivers/input/evdev.c

2007-05-11 Thread Dmitry Torokhov

Hi,

On Friday 11 May 2007 23:18, Yin,Fengwei wrote:
> 
> So if the evdev_release() is preempted at the point marked by another
> process which will open the evdev, which will make operation sequence
> like:
> 
>--evdev->open in evdev_release()
>  -> preempted
>   evdev->open++ and input_open_devie()
><- reschedule 
>   input_close_device()
> 
> Should we introduce a mutex here? Or do I miss something? Thanks.
> 

Locking is completely absent in evdev. There was a patch introducing
locking in recent -mm's but it got dropped. I need to refresh it.

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: swap prefetch improvements

2007-05-11 Thread Paul Jackson

> Swap prefetch is not cpuset aware so make the config option depend on 
> !CPUSETS.

Ok.

Could you explain what it means to say "swap prefetch is not cpuset aware",
or could you give a rough idea of what it would take to make it cpuset aware?

I wouldn't go so far as to say that no one would ever want to prefetch and
use cpusets at the same time, but I will grant that it's not a sufficiently
important need that it should block a useful prefetch implementation on
non-cpuset systems.

One case that would be useful, however, is to handle prefetch in the case
that cpusets are configured into ones kernel, but one is not making any
real use of them ('number_of_cpusets' <= 1).  That will actually be the
most common case for the major distribution(s) that enable cpusets by
default in their builds, for most arch's including the arch's popular
on desktops.

So what would it take to allow CONFIG'ing both prefetch and cpusets on,
but having prefetch dynamically adapt to the presence of active cpuset
usage, perhaps by basically shutting down if it can't easily do any
better?  I could certainly entertain requests to callout to some
prefetch routine from the cpuset code, at the critical points that
cpusets transitioned in or out of active use.

Semi-separate issue -- is it just cpusets that aren't prefetch friendly,
or is it also mm/mempolicy (mbind, set_mempolicy) as well?

For that matter, even if neither mm/mempolicy nor cpusets are used, on
systems with multiple memory nodes (not all memory equally distant from
all CPUs, aka NUMA), could prefetch cause some sort of shuffling of
memory placement, which might harm the performance of an HPC (High
Performance Computing) application with carefully tuned memory
placement.  Granted, this -is- getting to be a corner case.  Most HPC
apps running on NUMA hardware are making at least some use of
mm/mempolicy or cpusets.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] mm: swap prefetch improvements

2007-05-11 Thread Con Kolivas

It turns out that fixing swap prefetch was not that hard to fix and improve 
upon, and since Andrew hasn't dropped swap prefetch, instead here are a swag 
of fixes and improvements, including making it depend on !CPUSETS as Nick 
requested.

These changes lead to dramatic improvements.

Eg on a machine with 2GB ram and only 500MB swap:

Prefetch disabled:
./sp_tester
Ram 2060352000  Swap 522072000
Total ram to be malloced: 2321388000 bytes
Starting first malloc of 1160694000 bytes
Starting 1st read of first malloc
Touching this much ram takes 529 milliseconds
Starting second malloc of 1160694000 bytes
Completed second malloc and free
Sleeping for 300 seconds
Important part - starting reread of first malloc
Completed read of first malloc
Timed portion 6030 milliseconds


Prefetch enabled:
/sp_tester
Ram 2060352000  Swap 522072000
Total ram to be malloced: 2321388000 bytes
Starting first malloc of 1160694000 bytes
Starting 1st read of first malloc
Touching this much ram takes 528 milliseconds
Starting second malloc of 1160694000 bytes
Completed second malloc and free
Sleeping for 300 seconds
Important part - starting reread of first malloc
Completed read of first malloc
Timed portion 665 milliseconds

Note that simply touching the ram took 528 ms so the time taken for the 230MB
converted from major faults to minor faults took only 137ms instead of 5.5s.

---
Numerous improvements to swap prefetch.

It was possible for kprefetchd to go to sleep indefinitely before/after
changing the /proc value of swap prefetch. Fix that.

The cost of remove_from_swapped_list() can be removed from every page swapin
by moving it to be done entirely by kprefetchd lazily.

The call site for add_to_swapped_list need only be at one place.

Wakeups can occur much less frequently if swap prefetch is disabled.

Make it possible to enable swap prefetch explicitly via /proc when laptop_mode
is enabled by changing the value of the sysctl to 2.

The complicated iteration over every entry can be consolidated by using
list_for_each_safe.

Swap prefetch is not cpuset aware so make the config option depend on !CPUSETS.

Fix potential irq problem by converting read_lock_irq to irqsave etc.

Code style fixes.

Change the ioprio from IOPRIO_CLASS_IDLE to normal lower priority to ensure
that bio requests are not starved if other I/O begins during prefetching.

Signed-off-by: Con Kolivas <[EMAIL PROTECTED]>

---
 Documentation/sysctl/vm.txt |4 -
 init/Kconfig|2 
 mm/page_io.c|2 
 mm/swap_prefetch.c  |  158 +++-
 mm/swap_state.c |2 
 mm/vmscan.c |1 
 6 files changed, 75 insertions(+), 94 deletions(-)

Index: linux-2.6.21-mm1/mm/page_io.c
===
--- linux-2.6.21-mm1.orig/mm/page_io.c  2007-02-05 22:52:04.0 +1100
+++ linux-2.6.21-mm1/mm/page_io.c   2007-05-12 14:30:52.0 +1000
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static struct bio *get_swap_bio(gfp_t gfp_flags, pgoff_t index,
@@ -118,6 +119,7 @@ int swap_writepage(struct page *page, st
ret = -ENOMEM;
goto out;
}
+   add_to_swapped_list(page);
if (wbc->sync_mode == WB_SYNC_ALL)
rw |= (1 << BIO_RW_SYNC);
count_vm_event(PSWPOUT);
Index: linux-2.6.21-mm1/mm/swap_state.c
===
--- linux-2.6.21-mm1.orig/mm/swap_state.c   2007-05-07 21:53:51.0 
+1000
+++ linux-2.6.21-mm1/mm/swap_state.c2007-05-12 14:30:52.0 +1000
@@ -83,7 +83,6 @@ static int __add_to_swap_cache(struct pa
error = radix_tree_insert(&swapper_space.page_tree,
entry.val, page);
if (!error) {
-   remove_from_swapped_list(entry.val);
page_cache_get(page);
SetPageLocked(page);
SetPageSwapCache(page);
@@ -102,7 +101,6 @@ int add_to_swap_cache(struct page *page,
int error;
 
if (!swap_duplicate(entry)) {
-   remove_from_swapped_list(entry.val);
INC_CACHE_INFO(noent_race);
return -ENOENT;
}
Index: linux-2.6.21-mm1/mm/vmscan.c
===
--- linux-2.6.21-mm1.orig/mm/vmscan.c   2007-05-07 21:53:51.0 +1000
+++ linux-2.6.21-mm1/mm/vmscan.c2007-05-12 14:30:52.0 +1000
@@ -410,7 +410,6 @@ int remove_mapping(struct address_space 
 
if (PageSwapCache(page)) {
swp_entry_t swap = { .val = page_private(page) };
-   add_to_swapped_list(page);
__delete_from_swap_cache(page);
write_unlock_irq(&mapping->tree_lock);
swap_free(swap);
Index: linux-2.6.21-mm1/mm/swap_pref

Re: [PATCH] "volatile considered harmful", take 3

2007-05-11 Thread Jeff Garzik


Satyam Sharma wrote:

On 5/11/07, Jonathan Corbet <[EMAIL PROTECTED]> wrote:
+  - Pointers to data structures in coherent memory which might be 
modified
+by I/O devices can, sometimes, legitimately be volatile.  A ring 
buffer

+used by a network adapter, where that adapter changes pointers to
+indicate which descriptors have been processed, is an example of 
this

+type of situation.


is a legitimate use case for volatile is still not clear to me (I


IMO it is not.  We do /not/ want to encourage volatile use in those 
cases, and indeed, it's not necessary even if you can rationalize the 
use of the English word "volatile" to describe the situation.


Drivers work quite well without volatile in such situations.

Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Re: Re: [announce] Intel announces the PowerTOP utility for Linux

2007-05-11 Thread Jose Celestino

Words by Matt Mackall [Fri, May 11, 2007 at 09:39:05PM -0500]:
> On Sat, May 12, 2007 at 02:40:52AM +0100, Jose Celestino wrote:
> > Words by Matt Mackall [Fri, May 11, 2007 at 07:17:19PM -0500]:
> > > On Fri, May 11, 2007 at 04:07:18PM -0700, Arjan van de Ven wrote:
> > > > 
> > > > What's eating the battery life of my laptop? Why isn't it many more 
> > > > hours? Which software component causes the most power to be burned? 
> > > > These are important questions without a good answer... until now.
> > > 
> > > I get:
> > > 
> > > No detailed statistics available; please enable the CONFIG_TIMER_STATS
> > > kernel option
> > > 
> > 
> > Must run as root (rw to /proc/timer_stats is needed).
> 
> That file doesn't exist, despite CONFIG_TIMER_STATS being in
> /proc/config.gz.
> 

Then again, perhaps you have /proc/tstats instead.

If so apply this (well, you get the idea):

--- powertop/powertop.c 2007-05-12 05:01:15.0 +0100
+++ powertop_new/powertop.c 2007-05-12 05:08:46.0 +0100
@@ -212,8 +212,8 @@
 void stop_timerstats(void)
 {
FILE *file;
-   file = fopen("/proc/timer_stats","w");
-   if (!file) {
+   if (!(file = fopen("/proc/timer_stats","w")) &&
+   !(file = fopen("/proc/stats","w")) ) {
nostats = 1;
return;
}
@@ -223,8 +223,8 @@
 void start_timerstats(void)
 {
FILE *file;
-   file = fopen("/proc/timer_stats","w");
-   if (!file) {
+   if (!(file = fopen("/proc/timer_stats","w")) &&
+   !(file = fopen("/proc/stats","w")) ) {
nostats = 1;
return;
}
@@ -388,7 +388,7 @@
i = 0;
totalticks = 0;
if (!nostats)
-   file = popen("cat /proc/timer_stats | sort -n | tail 
-190", "r");
+   file = popen("cat /proc/timer_stats 2>>/dev/null|| cat 
/proc/tstats | sort -n | tail -190", "r");
while (file && !feof(file) && i<190) {
char *count, *pid, *process, *func;
int cnt;


-- 
Jose Celestino

http://www.msversus.org/ ; http://techp.org/petition/show/1
http://www.vinc17.org/noswpat.en.html

"And on the trillionth day, Man created Gods." -- Thomas D. Pate
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] leds:arch/sh/boards/landisk LEDs supports

2007-05-11 Thread kogiidena

To: Richard-san

I'm sorry. The patch sent yesterday is corrected.
Only the ledtrig_bitpat_default function was changed.
The patch of "Custom triggers support, which are might not supported by all 
LEDs"
is necessary.

LED driver of I-O DATA LANDISK and USL-5P

Signed-off-by: kogiidena <[EMAIL PROTECTED]>
---
diff -urpN OLD/drivers/leds/Kconfig NEW/drivers/leds/Kconfig
--- OLD/drivers/leds/Kconfig2007-04-28 06:49:26.0 +0900
+++ NEW/drivers/leds/Kconfig2007-05-11 21:15:28.0 +0900
@@ -94,6 +94,12 @@ config LEDS_COBALT
help
  This option enables support for the front LED on Cobalt Server

+config LEDS_LANDISK
+   tristate "LED Support for LANDISK Series"
+   depends on LEDS_CLASS && SH_LANDISK
+   help
+ This option enables support for the LED on LANDISK Series
+
 comment "LED Triggers"

 config LEDS_TRIGGERS
diff -urpN OLD/drivers/leds/Makefile NEW/drivers/leds/Makefile
--- OLD/drivers/leds/Makefile   2007-04-28 06:49:26.0 +0900
+++ NEW/drivers/leds/Makefile   2007-05-11 23:34:07.0 +0900
@@ -16,6 +16,7 @@ obj-$(CONFIG_LEDS_NET48XX)+= leds-net4
 obj-$(CONFIG_LEDS_WRAP)+= leds-wrap.o
 obj-$(CONFIG_LEDS_H1940)   += leds-h1940.o
 obj-$(CONFIG_LEDS_COBALT)  += leds-cobalt.o
+obj-$(CONFIG_LEDS_LANDISK) += leds-landisk.o

 # LED Triggers
 obj-$(CONFIG_LEDS_TRIGGER_TIMER)   += ledtrig-timer.o
diff -urpN OLD/drivers/leds/leds-landisk.c NEW/drivers/leds/leds-landisk.c
--- OLD/drivers/leds/leds-landisk.c 1970-01-01 09:00:00.0 +0900
+++ NEW/drivers/leds/leds-landisk.c 2007-05-12 11:31:47.0 +0900
@@ -0,0 +1,215 @@
+/*
+ * LEDs driver for I-O DATA DEVICE, INC. "LANDISK Series" support.
+ *
+ * Copyright (C) 2007 kogiidena
+ *
+ * Based on the drivers/leds/leds-ams-delta.c by:
+ * Copyright (C) 2006 Jonathan McDowell <[EMAIL PROTECTED]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static enum {
+   LANDISK = 0,
+   USL_5P  = 1,
+} landisk_product;
+
+static DEFINE_SPINLOCK(landisk_led_lock);
+
+static void landisk_led_set(struct led_classdev *led_cdev,
+   enum led_brightness value);
+
+static struct led_classdev landisk_leds[] = {
+   [0] = {
+  .name = "power",
+  .brightness_set = landisk_led_set,
+  .default_trigger = "bitpat",
+  },
+   [1] = {
+  .name = "status",
+  .brightness_set = landisk_led_set,
+  .default_trigger = "bitpat",
+  },
+   [2] = {
+  .name = "led1",
+  .brightness_set = landisk_led_set,
+  .default_trigger = "bitpat",
+  },
+   [3] = {
+  .name = "led2",
+  .brightness_set = landisk_led_set,
+  .default_trigger = "bitpat",
+  },
+   [4] = {
+  .name = "led3",
+  .brightness_set = landisk_led_set,
+  .default_trigger = "bitpat",
+  },
+   [5] = {
+  .name = "led4",
+  .brightness_set = landisk_led_set,
+  .default_trigger = "bitpat",
+  },
+   [6] = {
+  .name = "led5",
+  .brightness_set = landisk_led_set,
+  .default_trigger = "bitpat",
+  },
+   [7] = {
+  .name = "buzzer",
+  .brightness_set = landisk_led_set,
+  .default_trigger = "bitpat",
+  },
+};
+
+void ledtrig_bitpat_default(struct led_classdev *led_cdev,
+   unsigned long *delay, char *bitdata)
+{
+   int led;
+
+   led = (led_cdev - &landisk_leds[0]);
+   if ((led == 0) || (led == 1)) {
+   strcpy(bitdata, "blink");
+   }
+   if (led == 7) {
+   *delay = 250;
+   }
+
+}
+
+static void landisk_led_set(struct led_classdev *led_cdev,
+   enum led_brightness value)
+{
+   u8 tmp, bitmask;
+   unsigned long flags;
+
+   bitmask = 0x01 << (led_cdev - &landisk_leds[0]);
+
+   spin_lock_irqsave(&landisk_led_lock, flags);
+   tmp = ctrl_inb(PA_LED);
+   if (value)
+   tmp |= bitmask;
+   else
+   tmp &= ~bitmask;
+   ctrl_outb(tmp, PA_LED);
+   spin_unlock_irqrestore(&landisk_led_lock, flags);
+}
+
+static int landisk_led_probe(struct platform_device *pdev)
+{
+   int i, nr_leds;
+   int ret;
+
+   nr_leds = (landisk_product == LANDISK) ? 2 : 8;
+
+   for (i = ret = 0; ret >= 0 && i < nr_leds; i++) {
+   ret = led_classdev_register(&pdev->dev, &landisk_leds[i]);
+   }
+
+   if (ret < 0 && i > 1) {
+   nr_leds = i

Re: [PATCH 2/2] leds:arch/sh/boards/landisk LEDs supports

2007-05-11 Thread kogiidena

To: Richard-san

I'm sorry. The patch sent yesterday is corrected, too.
Because the source had not been read easily, it cleaned it.
There is no change for the basic function.

Add Bitpattern Trigger.
Bitpattern continuously turns LED on and off according to
the value directed "bitdata". "bitdata" is composed of
the character string that consists of the following three
characters. '0' turn off LED. '1' turn on  LED. 'R' is
repeated from the head of the "bitdata".
In addition, the character string of "on", "off", and "blink"
can be set to "bitdata".
The transition time of "bitdata" is set by "delay".

Signed-off-by: kogiidena <[EMAIL PROTECTED]>
---
diff -urpN OLD/drivers/leds/Kconfig NEW/drivers/leds/Kconfig
--- OLD/drivers/leds/Kconfig2007-04-28 06:49:26.0 +0900
+++ NEW/drivers/leds/Kconfig2007-05-11 21:15:28.0 +0900
@@ -127,5 +133,19 @@ config LEDS_TRIGGER_HEARTBEAT
  load average.
  If unsure, say Y.

+config LEDS_TRIGGER_BITPAT
+   tristate "LED Bitpattern Trigger"
+   depends on LEDS_TRIGGERS
+   help
+ Bitpattern continuously turns LED on and off according to
+ the value directed "bitdata". "bitdata" is composed of
+ the character string that consists of the following three
+ characters. '0' turn off LED. '1' turn on  LED. 'R' is
+ repeated from the head of the "bitdata".
+ In addition, the character string of "on", "off", and "blink"
+ can be set to "bitdata".
+ The transition time of "bitdata" is set by "delay".
+ If unsure, say Y.
+
 endmenu

diff -urpN OLD/drivers/leds/Makefile NEW/drivers/leds/Makefile
--- OLD/drivers/leds/Makefile   2007-05-11 23:44:12.0 +0900
+++ NEW/drivers/leds/Makefile   2007-05-11 23:46:47.0 +0900
@@ -22,3 +22,4 @@ obj-$(CONFIG_LEDS_LANDISK)+= leds-land
 obj-$(CONFIG_LEDS_TRIGGER_TIMER)   += ledtrig-timer.o
 obj-$(CONFIG_LEDS_TRIGGER_IDE_DISK)+= ledtrig-ide-disk.o
 obj-$(CONFIG_LEDS_TRIGGER_HEARTBEAT)   += ledtrig-heartbeat.o
+obj-$(CONFIG_LEDS_TRIGGER_BITPAT)  += ledtrig-bitpat.o
diff -urpN OLD/drivers/leds/ledtrig-bitpat.c NEW/drivers/leds/ledtrig-bitpat.c
--- OLD/drivers/leds/ledtrig-bitpat.c   1970-01-01 09:00:00.0 +0900
+++ NEW/drivers/leds/ledtrig-bitpat.c   2007-05-12 11:29:34.0 +0900
@@ -0,0 +1,231 @@
+/*
+ * LED Bitpattern Trigger
+ *
+ * Copyright (C) 2007 kogiidena
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "leds.h"
+
+#define BITDATA_LEN 18
+
+struct bitpat_trig_data {
+   char bitdata[BITDATA_LEN + 2];
+   int cnt;
+   unsigned long delay;
+   struct timer_list timer;
+};
+
+void __attribute__ ((weak))
+ledtrig_bitpat_default(struct led_classdev *led_cdev,
+  unsigned long *delay, char *bitdata)
+{
+   /* Nothing to do. */
+}
+
+static void led_bitpat_function(unsigned long data)
+{
+   struct led_classdev *led_cdev = (struct led_classdev *)data;
+   struct bitpat_trig_data *bitpat_data = led_cdev->trigger_data;
+   unsigned long delay = bitpat_data->delay;
+   char bitpat;
+
+   bitpat = bitpat_data->bitdata[bitpat_data->cnt++];
+
+   if (bitpat == '0' || bitpat == '1') {
+   led_set_brightness(led_cdev,
+  (bitpat == '1') ? LED_FULL : LED_OFF);
+   } else {
+   bitpat_data->cnt = 0;
+   return;
+   }
+
+   if (bitpat_data->bitdata[bitpat_data->cnt] == 'R')
+   bitpat_data->cnt = 0;
+
+   mod_timer(&bitpat_data->timer, jiffies + msecs_to_jiffies(delay));
+}
+
+static ssize_t led_delay_show(struct class_device *dev, char *buf)
+{
+   struct led_classdev *led_cdev = class_get_devdata(dev);
+   struct bitpat_trig_data *bitpat_data = led_cdev->trigger_data;
+
+   sprintf(buf, "%lu\n", bitpat_data->delay);
+
+   return strlen(buf) + 1;
+}
+
+static ssize_t led_delay_store(struct class_device *dev, const char *buf,
+  size_t size)
+{
+   struct led_classdev *led_cdev = class_get_devdata(dev);
+   struct bitpat_trig_data *bitpat_data = led_cdev->trigger_data;
+   int ret = -EINVAL;
+   char *after;
+   unsigned long state = simple_strtoul(buf, &after, 10);
+   size_t count = after - buf;
+
+   if (*after && isspace(*after))
+   count++;
+
+   if (count == size) {
+   bitpat_data->delay = state;
+   mod_timer(&bitpat_data->timer, jiffies + 1);
+   ret = count;
+   }
+   return ret;
+}
+
+static void led_bitdata_update(struct bitpat_trig_data *bitpat_data,
+  const char *buf)
+{
+   int i;
+   const char *s

Re: [PATCH] swsusp: Use platform mode by default

2007-05-11 Thread Len Brown

I agree that we should keep the "platform" default,
as it went in 2 releases ago (nearly 6 months) without
any reported failures until this one -- and it fixed
a longstanding issue documented on many machines.

We should debug Qi's failure like any other.
We are actually in better shape on this one than others
because we already know something that works around it.

Qi,
Please open a bug report here:

http://bugzilla.kernel.org/enter_bug.cgi?product=ACPI
in the Power-Off category.  There are some other open
poweroff bugs and maybe we'll find a common thread.
Please attach the output from acpidump and
dmesg -s64000.

thanks,
-Len
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: libata reset-seq merge broke sata_sil on sh

2007-05-11 Thread Paul Mundt

On Fri, May 11, 2007 at 11:39:20AM +0200, Tejun Heo wrote:
> Paul Mundt wrote:
> > Bumping the hardreset delay up does indeed fix it, I've had to bump it up
> > to 1200 before it started working (at 600 it still fails):
> > 
> > [0.967379] scsi0 : sata_sil
> > [0.970425] scsi1 : sata_sil
> > [0.973298] ata1: SATA max UDMA/100 cmd 0xfd000280 ctl 0xfd00028a bmdma 
> > 0xfd000200 irq 0
> > [0.981331] ata2: SATA max UDMA/100 cmd 0xfd0002c0 ctl 0xfd0002ca bmdma 
> > 0xfd000208 irq 0
> > [1.299353] ata1: device not ready (errno=-19), forcing hardreset
> > [2.817893] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> > [2.826284] ata1.00: ata_hpa_resize 1: sectors = 39070080, hpa_sectors = 
> > 39070080
> > [2.831052] ata1.00: ATA-5: HHD424020F7SV00, 00MLA0A5, max UDMA/100
> > [2.837548] ata1.00: 39070080 sectors, multi 0: LBA
> > [2.842702] ata1.00: applying bridge limits
> > [2.854162] ata1.00: ata_hpa_resize 1: sectors = 39070080, hpa_sectors = 
> > 39070080
> > [2.858938] ata1.00: configured for UDMA/100
> > [3.172602] ata2: SATA link down (SStatus 0 SControl 310)
> > [3.175736] scsi 0:0:0:0: Direct-Access ATA  HHD424020F7SV00  
> > 00ML PQ: 0 ANSI: 5
> > 
> > I'm not sure if it matters or not, but this is an iVDR drive, so that
> > might also have additional implications.
> 
> Don't have the flimsiest idea what an iVDR drive is but I take it that
> it's some sort of special purpose thing.  :-)
> 
http://www.ivdr.org

The GoVault appears to be a similar device, in that they're both
removeable cartridges.

> Gary, IIRC, the requirement for GoVault was 3secs, right?  Paul, can you
> try to estimate the minimum required delay?  Please go down by 100ms and
> report where it breaks.
> 
800ms was the lowest it would work at, 700ms still breaks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: APIC error on 32-bit kernel

2007-05-11 Thread Len Brown

> > We're trying to track down the source of a problem that occurs
> > whenever the atl1 network driver is activated on a 32-bit 2.6.21-rc4
> 
> and -rc5, -rc6, 2.6.20.x, 2.6.19.3, and probably others.
> 
> > We can load the driver just fine, but whenever we activate the
> > network, we see APIC errors (a sample of them are shown here,
> > captured from a serial console):
> > 
> > [EMAIL PROTECTED] ~]# echo 8 > /proc/sys/kernel/printk
> > [EMAIL PROTECTED] ~]# [   93.942012] process `sysctl' is using deprecated
> > sysctl (sysc.
> > [   94.396609] atl1: eth0 link is up 1000 Mbps full duplex
> > [   94.498887] APIC error on CPU0: 00(08)
> > [   94.498534] APIC error on CPU1: 00(08)
> > [   94.550079] APIC error on CPU0: 08(08)
> > [   94.549725] APIC error on CPU1: 08(08)
> > [   94.600915] APIC error on CPU1: 08(08)
> > [   94.601276] APIC error on CPU0: 08(08)
> > [   94.652108] APIC error on CPU1: 08(08)
> > [   94.652470] APIC error on CPU0: 08(08)
> > [   94.703659] APIC error on CPU0: 08(08)
> > [   94.703305] APIC error on CPU1: 08(08)
> > [   94.754852] APIC error on CPU0: 08(40)
> > [   94.806045] APIC error on CPU0: 40(08)

/* Here is what the APIC error bits mean:
   0: Send CS error
   1: Receive CS error
   2: Send accept error
   3: Receive accept error
   4: Reserved
   5: Send illegal vector
   6: Received illegal vector
   7: Illegal register address
*/

So the 40 means the APIC got an illegal vector.
Certainly this is consistent with the fact that
the errors start when a specific device is being
used.  I assume that device is using MSI?
Curious that it is different in 32-bit and 64-bit mode.



> > [   94.805692] APIC error on CPU1: 08(08)
> > [   94.857238] APIC error on CPU0: 08(08)
> > [   94.856884] APIC error on CPU1: 08(08)
> > [   94.908432] APIC error on CPU0: 08(08)
> > [   94.908078] APIC error on CPU1: 08(08)
> > [snip, more of the same]
> > [   98.901156] APIC error on CPU1: 08(08)
> > [   98.952702] APIC error on CPU0: 08(08)
> > [   98.952349] APIC error on CPU1: 08(08)
> > [   99.003895] APIC error on CPU0: 08(08)
> > [   99.003542] APIC error on CPU1: 08(08)
> > 
> > The machine hangs for about 5-10 seconds, then spontaneously reboots
> > without further console output.
> 
> I can prompt an oops by pinging my router while the apic errors are
> scrolling by.
> 
> > 
> > This is an Asus M2V (Via K8T890) motherboard.
> > 
> > The problem does not occur on a 32-bit kernel if we boot with
> > pci=nomsi, and it doesn't occur at all on a 64-bit kernel on the same
> > motherboard.

pci=nomsi, works, okay...


> > We also do not see this problem on Intel-based motherboards, with
> > either 32- or 64-bit kernels.
> 
> A full raft of documentation -- including acpidump and
> linux-firmware-kit output, console capture, kernel config, lspci -vvxxx
> (with apic=debug boot option), dmesg, and /proc/interrupts -- is
> available at http://www.hogchain.net/m2v/apic-problem/


[06Dh 109  2]  Boot Architecture Flags : 0003

for what it is worth, the bit in ACPI that is used to
disable MSI support is not set -- so as  far as the BIOS
is concerned, this system should support MSI.

Is it an add-in card, or lan-on-motherboard?

-Len
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] "volatile considered harmful", take 3

2007-05-11 Thread Satyam Sharma


On 5/11/07, Jonathan Corbet <[EMAIL PROTECTED]> wrote:

Here's another version of the volatile document.  Once again, I've tried
to address all of the comments.  There haven't really been any recent
comments addressing the correctness of the document; people have been
more concerned with how it's expressed.  I'm glad to see files in
Documentation/ held to a high standard of writing, but, unless somebody
has a factual issue this time around I would like to declare Mission
Accomplished and move on.


The document looks good, but whether:


+  - Pointers to data structures in coherent memory which might be modified
+by I/O devices can, sometimes, legitimately be volatile.  A ring buffer
+used by a network adapter, where that adapter changes pointers to
+indicate which descriptors have been processed, is an example of this
+type of situation.


is a legitimate use case for volatile is still not clear to me (I
agree with Alan's
comment in a previous thread that this seems to be a case where a memory
barrier would be applicable^Wbetter, actually). I could be wrong here, so
would be nice if Peter explains why volatile is legitimate here.

Otherwise, it's fine with me.

Thanks,
Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Is this a preempt issue in drivers/input/evdev.c

2007-05-11 Thread Yin,Fengwei

Hi,
When open/close evdev, the code is as following to handle multi-client
operation:
static int evdev_release(...)
{
...
if (!--evdev->open) {
exist)
input_close_device(...);
else
evdev_free(evdev);
}
return 0;
}

static int evdev_open(...)
{
...
if (!evdev->open++ && evdev->exist)｛
errror = input_open_device(...);
if (error) {
...
}
}
...
return 0;
}

So if the evdev_release() is preempted at the point marked by another
process which will open the evdev, which will make operation sequence
like:

   --evdev->open in evdev_release()
 -> preempted
evdev->open++ and input_open_devie()
 <- reschedule 
input_close_device()

Should we introduce a mutex here? Or do I miss something? Thanks.

Regards
Yin, Fengwei






-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] ip_local_port_range sysctl has annoying default

2007-05-11 Thread Bernd Eckenfels

In article <[EMAIL PROTECTED]> you wrote:
> However, there are a large number of applications which have registered
> ports in this range.

And some application who request random listening ports actually query the
/etc/services file to ensure it is a "unnamed" port.

Gruss
Bernd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] spelling fixes: init/

2007-05-11 Thread Satyam Sharma


On 5/12/07, Simon Arlott <[EMAIL PROTECTED]> wrote:

Spelling fix in init/.

Signed-off-by: Simon Arlott <[EMAIL PROTECTED]>
---
 init/main.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/init/main.c b/init/main.c
index e8d080c..7ee2031 100644
--- a/init/main.c
+++ b/init/main.c
@@ -275,7 +275,7 @@ static int __init unknown_bootoption(char *param, char *val)
return 0;

/*
-* Preemptive maintenance for "why didn't my mispelled command
+* Preemptive maintenance for "why didn't my misspelled command
 * line work?"


That was probably intentional.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] kbuild: silence section mismatch warnings

2007-05-11 Thread Satyam Sharma

Hi Sam,

On May 11, 2007, at 4:08 PM, Sam Ravnborg wrote:
> Following patch allow us in specific places to silence section
> mismatch warnings.

Well, I had spelled out my reservations about this earlier, but I don't feel
too strongly. Most people probably do not want / prefer to see warnings
(even if they know they're false positives), and this also helps reduce
unnecessary reports of known FPs on lkml.

> The annotation is simple to grep for so revieing all uses in a few
> months time are trivial. It is assumed that a few places will
> use this to shut up the warning as replacement for the real fix.
> But these cases are esay to spot and to fix up.

Yes, so we have to be careful with its use.

On 5/12/07, Kumar Gala <[EMAIL PROTECTED]> wrote:

Its unclear if you expect that some things will be tagged
__init_refok/__initdata_refok forever or if we'll find some way to
fix/change the code so the things tagged no longer need it.

We'll _have_ to fix those bugs that use the whitelisting in modpost
merely to kill off a warning, need to fix binutils for some others,
and may have to live with this for still others (mm/sl*b.c suffer
from a chicken-and-egg problem, for example).

On May 11, 2007, at 4:08 PM, Sam Ravnborg wrote:
With this and the following two patches I have a section mismatch free
build.
The plan is that a section mismatch soon will graduate from a
warning to an error.

Yes, that's only sane.

diff --git a/include/linux/init.h b/include/linux/init.h
[...]
+/* modpost check for references from .text to .init.text and likewise
+ * from .data to .init.data. They are in most cases sign of bugs but

You may want to list all illegal combinations in the comment above
check_sec_ref instead, and only introduce __init{data}_refok here.

+ * in a few places this is OK. The following can be used to tell
+ * modpost that such a reference is OK.
+ * For references to .exit.text and .exit.data the same annotation
+ * will silence warnings from modpost.
+ */
+#define __init_refok noinline __attribute__ ((__section__ 
(".text_initrefok")))
+#define __initdata_refok noinline __attribute__ ((__section__ 
(".data_initrefok")))

Actually, for a second there I got confused you had done this the other
way round. __init_refok sounds similar to __init (almost a "variant" of
__init) so I thought you were annotating the _callees_ and not the
_callers_.

BTW, I wonder if there would be any relative merits of doing things that
way. Did you consider this "reversed" approach?

Hmmm ... we would be annotating lesser functions, for one. With
the current __init_refok-for-callers semantics, we mark the callee __init
_and_ the caller __init_refok, which is unnecessary double-work, and
will only get worse if the same __init callee is called by multiple
callers in .text.

For the case where a caller references multiple __init callees? I don't
see any relative advantage of either scheme, or is there ...

Also, it's easier to spot a function that _is_ (or should be) __init
in the code already, than see if it is being referenced from .text,
and if so, make it __init_whatever (__init_refok is an equally
good name for callee-semantics).

(looking at code in 21-mm2)

I looked at scripts/mod/modpost.c only briefly, but it seems to me
shifting the semantics of __init_refok to refer to callees could also
subsume (and make redundant) patterns #1, 2, 6, 7 and 9 of
secref_whitelist, no?

[BTW I noticed _three_ whitelisting functions in there -- could it be
possible for us to do what init_section_ref_ok and
exit_section_ref_ok do in secref_whitelist itself? Those three
whitelists are beginning to look darn ugly.]

Anyway, somehow the __init_refok-callee scheme seems saner to
me -- please do consider it. Note, in that case, however:

1. We'll have to invent separate __exit_refok and __exitdata_refok,
of course. But that's a good thing for me, compared to giving a
multiplexed definition to __init_refok to mean
caller-can-safely-call-__init && caller-can-safely-call-__exit.

2. The init section freeing code in the kernel would need to be patched
to free sections marked as __init_refok, __initdata_refok, __exit_refok
and __exitdata_refok too. But that would most likely be a trivial patch.

3. When the exception / bug is fixed, we would convert such
__init_refok annotations to __init.

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 113dc77..986200b 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -582,6 +582,14 @@ static int strrcmp(const char *s, const char *sub)

 /**

  ^^^ should be simply /*

That's a doc-book-style comment header for a comment that's actually not
doc-book-style. Randy gets angry.

Thanks,
Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Re: [announce] Intel announces the PowerTOP utility for Linux

2007-05-11 Thread Matt Mackall

On Sat, May 12, 2007 at 02:40:52AM +0100, Jose Celestino wrote:
> Words by Matt Mackall [Fri, May 11, 2007 at 07:17:19PM -0500]:
> > On Fri, May 11, 2007 at 04:07:18PM -0700, Arjan van de Ven wrote:
> > > 
> > > What's eating the battery life of my laptop? Why isn't it many more 
> > > hours? Which software component causes the most power to be burned? 
> > > These are important questions without a good answer... until now.
> > 
> > I get:
> > 
> > No detailed statistics available; please enable the CONFIG_TIMER_STATS
> > kernel option
> > 
> 
> Must run as root (rw to /proc/timer_stats is needed).

That file doesn't exist, despite CONFIG_TIMER_STATS being in
/proc/config.gz.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/12] crypto: don't pollute the global namespace with sg_next()

2007-05-11 Thread Herbert Xu

Benny Halevy <[EMAIL PROTECTED]> wrote:
>
> I was trying to say that the methods should be compatible, otherwise
> bugs can happen, and that your scheme is better since it can
> handle sglists with zero length entries that aren't the last.
> A case that might be valid after dma mapping and merging.
> If indeed this case is possible, this seems to be the right time
> to converge to your scheme.

Well right now this isn't possible because the crypto layer is not
directly hooked to any DMA code since it's (mostly) software only.

However, I completely agree that it should be converted to this new
scheme.  The only user of chaining right now is crypto/hmac.c so it
should be easy to fix.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 2/2] epoll locks changes and cleanups ...

2007-05-11 Thread Davide Libenzi

Changes the rwlock to a spinlock, and drops the use-count variable.
Operations are always bound by the mutex now, so the use-count is
no more needed. For the same reason, the rwlock can become a simple
spinlock.


Signed-off-by: Davide Libenzi <[EMAIL PROTECTED]>


- Davide



Index: linux-2.6.21/fs/eventpoll.c
===
--- linux-2.6.21.orig/fs/eventpoll.c2007-05-11 17:21:25.0 -0700
+++ linux-2.6.21/fs/eventpoll.c 2007-05-11 19:20:32.0 -0700
@@ -1,6 +1,6 @@
 /*
- *  fs/eventpoll.c ( Efficent event polling implementation )
- *  Copyright (C) 2001,...,2006 Davide Libenzi
+ *  fs/eventpoll.c (Efficent event polling implementation)
+ *  Copyright (C) 2001,...,2007 Davide Libenzi
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -44,8 +44,8 @@
  * There are three level of locking required by epoll :
  *
  * 1) epmutex (mutex)
- * 2) ep->mtx (mutes)
- * 3) ep->lock (rw_lock)
+ * 2) ep->mtx (mutex)
+ * 3) ep->lock (spinlock)
  *
  * The acquire order is the one listed above, from 1 to 3.
  * We need a spinlock (ep->lock) because we manipulate objects
@@ -140,6 +140,12 @@
/* List header used to link this structure to the eventpoll ready list 
*/
struct list_head rdllink;
 
+   /*
+* Works together "struct eventpoll"->ovflist in keeping the
+* single linked chain of items.
+*/
+   struct epitem *next;
+
/* The file descriptor information this item refers to */
struct epoll_filefd ffd;
 
@@ -152,23 +158,11 @@
/* The "container" of this item */
struct eventpoll *ep;
 
-   /* The structure that describe the interested events and the source fd 
*/
-   struct epoll_event event;
-
-   /*
-* Used to keep track of the usage count of the structure. This avoids
-* that the structure will desappear from underneath our processing.
-*/
-   atomic_t usecnt;
-
/* List header used to link this item to the "struct file" items list */
struct list_head fllink;
 
-   /*
-* Works together "struct eventpoll"->ovflist in keeping the
-* single linked chain of items.
-*/
-   struct epitem *next;
+   /* The structure that describe the interested events and the source fd 
*/
+   struct epoll_event event;
 };
 
 /*
@@ -178,7 +172,7 @@
  */
 struct eventpoll {
/* Protect the this structure access */
-   rwlock_t lock;
+   spinlock_t lock;
 
/*
 * This mutex is used to ensure that files are not removed
@@ -394,78 +388,11 @@
 }
 
 /*
- * Unlink the "struct epitem" from all places it might have been hooked up.
- * This function must be called with write IRQ lock on "ep->lock".
- */
-static int ep_unlink(struct eventpoll *ep, struct epitem *epi)
-{
-   int error;
-
-   /*
-* It can happen that this one is called for an item already unlinked.
-* The check protect us from doing a double unlink ( crash ).
-*/
-   error = -ENOENT;
-   if (!ep_rb_linked(&epi->rbn))
-   goto error_return;
-
-   /*
-* Clear the event mask for the unlinked item. This will avoid item
-* notifications to be sent after the unlink operation from inside
-* the kernel->userspace event transfer loop.
-*/
-   epi->event.events = 0;
-
-   /*
-* At this point is safe to do the job, unlink the item from our 
rb-tree.
-* This operation togheter with the above check closes the door to
-* double unlinks.
-*/
-   ep_rb_erase(&epi->rbn, &ep->rbr);
-
-   /*
-* If the item we are going to remove is inside the ready file 
descriptors
-* we want to remove it from this list to avoid stale events.
-*/
-   if (ep_is_linked(&epi->rdllink))
-   list_del_init(&epi->rdllink);
-
-   error = 0;
-error_return:
-
-   DNPRINTK(3, (KERN_INFO "[%p] eventpoll: ep_unlink(%p, %p) = %d\n",
-current, ep, epi->ffd.file, error));
-
-   return error;
-}
-
-/*
- * Increment the usage count of the "struct epitem" making it sure
- * that the user will have a valid pointer to reference.
- */
-static void ep_use_epitem(struct epitem *epi)
-{
-   atomic_inc(&epi->usecnt);
-}
-
-/*
- * Decrement ( release ) the usage count by signaling that the user
- * has finished using the structure. It might lead to freeing the
- * structure itself if the count goes to zero.
- */
-static void ep_release_epitem(struct epitem *epi)
-{
-   if (atomic_dec_and_test(&epi->usecnt))
-   kmem_cache_free(epi_cache, epi);
-}
-
-/*
  * Removes a "struct epitem" from the eventpoll RB tree and deallocates
- * all the associated resources.
+ * all the associated resources. Must be called with "mtx" held.
  */
 static int ep

Re: [PATCH]: Fix assertion failure with MSI on sparc64

2007-05-11 Thread Michael Ellerman

On Fri, 2007-05-11 at 13:26 -0700, David Miller wrote:
> Hi Michael, I'm still working through the various regressions on
> sparc64 added by your MSI changes :-)

Hi Dave,

Guilty as charged - I did CC you on the patches though ;)

> The one I fixed the other day was a missed switch over to
> alloc_pci_dev() in the sparc64 PCI probing code which caused an OOPS
> in pci_enable_msi() because the list head of the pci dev was not
> initialized.  PowerPC's OBP firmware tree based PCI probing code
> was updated, sparc64's wasnt.

Sorry - not sure how I missed that one, it even matches
"k.alloc(.*pci_dev" - thanks for fixing it :)

> Today's find is a triggered assertion in msi_free_irqs() when the
> system doesn't support MSI, in which case arch_setup_msi_irqs() always
> returns an error.

What do you need to determine that the system can't support MSI? Could
you do that logic in arch_msi_check_device()?

> The problem is that when this happens we branch into msi_free_irqs(),
> to which you added the following assertion loop:
> 
>   list_for_each_entry(entry, &dev->msi_list, list)
>   BUG_ON(irq_has_action(entry->irq));
> 
> Well, if arch_setup_msi_irqs() fails, entry->irq will be zero and
> although that's never assigned to any normal devices we use that IRQ
> number for the timer interrupt on sparc64 so this assertion triggers.
> 
> Better to test for zero before doing the irq_has_action() assertion
> thing.

Yep, looks good - it matches the logic in arch_teardown_msi_irqs().

cheers


Acked-by: Michael Ellerman <[EMAIL PROTECTED]>

> Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index e6740d1..d9cbd58 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -549,8 +549,10 @@ static int msi_free_irqs(struct pci_dev* dev)
>  {
>   struct msi_desc *entry, *tmp;
>  
> - list_for_each_entry(entry, &dev->msi_list, list)
> - BUG_ON(irq_has_action(entry->irq));
> + list_for_each_entry(entry, &dev->msi_list, list) {
> + if (entry->irq)
> + BUG_ON(irq_has_action(entry->irq));
> + }
>  
>   arch_teardown_msi_irqs(dev);
>  
-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


signature.asc
Description: This is a digitally signed message part

[patch 1/2] fix epoll single pass code and add wait-exclusive flag ...

2007-05-11 Thread Davide Libenzi

Fixes the epoll single pass code. During the unlocked event delivery
(to userspace) code, the poll callback can re-issue new events, and
we must receive them correctly. Since we loop in a lockless fashion,
we want to be O(nready), and we don't want to flash on/off the spinlock
for every event, we have the poll callback to use a secondary list to
queue events while we're inside the event delivery loop.
The rw_semaphore has been turned into a mutex.
This patch also adds the wait-exclusive flag, as suggested by Davi Arnaut.



Signed-off-by: Davide Libenzi <[EMAIL PROTECTED]>


- Davide



Index: linux-2.6.21/fs/eventpoll.c
===
--- linux-2.6.21.orig/fs/eventpoll.c2007-05-11 14:32:31.0 -0700
+++ linux-2.6.21/fs/eventpoll.c 2007-05-11 16:33:38.0 -0700
@@ -26,7 +26,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -39,14 +38,13 @@
 #include 
 #include 
 #include 
-#include 
 
 /*
  * LOCKING:
  * There are three level of locking required by epoll :
  *
  * 1) epmutex (mutex)
- * 2) ep->sem (rw_semaphore)
+ * 2) ep->mtx (mutes)
  * 3) ep->lock (rw_lock)
  *
  * The acquire order is the one listed above, from 1 to 3.
@@ -57,20 +55,20 @@
  * a spinlock. During the event transfer loop (from kernel to
  * user space) we could end up sleeping due a copy_to_user(), so
  * we need a lock that will allow us to sleep. This lock is a
- * read-write semaphore (ep->sem). It is acquired on read during
- * the event transfer loop and in write during epoll_ctl(EPOLL_CTL_DEL)
- * and during eventpoll_release_file(). Then we also need a global
- * semaphore to serialize eventpoll_release_file() and ep_free().
- * This semaphore is acquired by ep_free() during the epoll file
+ * mutex (ep->mtx). It is acquired during the event transfer loop,
+ * during epoll_ctl(EPOLL_CTL_DEL) and during eventpoll_release_file().
+ * Then we also need a global mutex to serialize eventpoll_release_file()
+ * and ep_free().
+ * This mutex is acquired by ep_free() during the epoll file
  * cleanup path and it is also acquired by eventpoll_release_file()
  * if a file has been pushed inside an epoll set and it is then
  * close()d without a previous call toepoll_ctl(EPOLL_CTL_DEL).
- * It is possible to drop the "ep->sem" and to use the global
- * semaphore "epmutex" (together with "ep->lock") to have it working,
- * but having "ep->sem" will make the interface more scalable.
+ * It is possible to drop the "ep->mtx" and to use the global
+ * mutex "epmutex" (together with "ep->lock") to have it working,
+ * but having "ep->mtx" will make the interface more scalable.
  * Events that require holding "epmutex" are very rare, while for
- * normal operations the epoll private "ep->sem" will guarantee
- * a greater scalability.
+ * normal operations the epoll private "ep->mtx" will guarantee
+ * a better scalability.
  */
 
 #define DEBUG_EPOLL 0
@@ -102,6 +100,8 @@
 
 #define EP_MAX_EVENTS (INT_MAX / sizeof(struct epoll_event))
 
+#define EP_UNACTIVE_PTR ((void *) -1L)
+
 struct epoll_filefd {
struct file *file;
int fd;
@@ -111,7 +111,7 @@
  * Node that is linked into the "wake_task_list" member of the "struct 
poll_safewake".
  * It is used to keep track on all tasks that are currently inside the 
wake_up() code
  * to 1) short-circuit the one coming from the same task and same wait queue 
head
- * ( loop ) 2) allow a maximum number of epoll descriptors inclusion nesting
+ * (loop) 2) allow a maximum number of epoll descriptors inclusion nesting
  * 3) let go the ones coming from other tasks.
  */
 struct wake_task_node {
@@ -130,6 +130,48 @@
 };
 
 /*
+ * Each file descriptor added to the eventpoll interface will
+ * have an entry of this type linked to the "rbr" RB tree.
+ */
+struct epitem {
+   /* RB-Tree node used to link this structure to the eventpoll rb-tree */
+   struct rb_node rbn;
+
+   /* List header used to link this structure to the eventpoll ready list 
*/
+   struct list_head rdllink;
+
+   /* The file descriptor information this item refers to */
+   struct epoll_filefd ffd;
+
+   /* Number of active wait queue attached to poll operations */
+   int nwait;
+
+   /* List containing poll wait queues */
+   struct list_head pwqlist;
+
+   /* The "container" of this item */
+   struct eventpoll *ep;
+
+   /* The structure that describe the interested events and the source fd 
*/
+   struct epoll_event event;
+
+   /*
+* Used to keep track of the usage count of the structure. This avoids
+* that the structure will desappear from underneath our processing.
+*/
+   atomic_t usecnt;
+
+   /* List header used to link this item to the "struct file" items list */
+   struct list_head fllink;
+
+   /*
+* Works together "struct eventpoll"->ovflist in keeping the
+* single linked chain of items.
+

Re: [PATCH] MAINTAINERS: remove invalid list address for TPM

2007-05-11 Thread Stephen Rothwell

On Fri, 11 May 2007 09:19:32 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote:
>
> On Fri, 11 May 2007 16:42:07 +1000 Stephen Rothwell wrote:
>
> > This address bounces with "550 Unknown user".
> >
> > Signed-off-by: Stephen Rothwell <[EMAIL PROTECTED]>
> > ---
> >  MAINTAINERS |1 -
> >  1 files changed, 0 insertions(+), 1 deletions(-)
>
> Hm, I get:
>
> Your mail to 'tpmdd-devel' with the subject
> test only
> Is being held until the list moderator can review it for approval.
> The reason it is being held:
> Post by non-member to a members-only list

Hmm, interesting.  I did get the "550 Unknown user" bounce.  I guess some
glitch in sourceforge.net's mail system?

> so what MAINTAINERS usually says in that case is "(subscribers-only)":
>
> L:[EMAIL PROTECTED] (subscribers-only)

Fine, that seems sensible.

--
Cheers,
Stephen Rothwell[EMAIL PROTECTED]
http://www.canb.auug.org.au/~sfr/


pgpOows0ZBJSo.pgp
Description: PGP signature

Re: [PATCH] update sysfs kset initialisation in PPC64 DLPAR IO driver

2007-05-11 Thread Michael Ellerman

On Fri, 2007-05-11 at 11:40 +0100, Andy Whitcroft wrote:
> Michael Ellerman wrote:
> > On Fri, 2007-05-11 at 00:16 -0700, Greg KH wrote:
> >> On Thu, May 10, 2007 at 04:54:41PM +0100, Andy Whitcroft wrote:
> >>> Greg KH wrote:
>  On Thu, May 10, 2007 at 03:00:50PM +0100, Andy Whitcroft wrote:
> > Move the rpadlpar device from "struct subsystem" to "struct kset"
> > following the changes in sysfs.
> >
> > Signed-off-by: Andy Whitcroft <[EMAIL PROTECTED]>
> > ---
> >
> > Ok, this patch seems to sort out the compile problem
> > here and indeed boots and runs kernbench.  Perhaps
> > you could confirm this is sufficient.
>  As per the discussion on the pci hotplug list, no, this doesn't seem to
>  fix the problem.  The developers there are looking into it.  If you can
>  test out patches for this, I'm sure the people there would appreciate
>  the help.
> >>> Sure anything they have for testing, send them to me ...
> >> They have the same patch that you made (I made it), yet they reported
> >> that it didn't work properly for them.
> >>
> >> Can you test your patch out on "real" hardware?
> > 
> > I tested it on real hardware, but it can't hurt for Andy to try it too I
> > guess.
> 
> To be fair I am not sure I have a clue how to test it.  Got a recipe?
> My patch was based on how other drivers seemed to be converted which is
> a concern for those drivers.
> 
> What sort of failure do you see?

Prior to the removal of struct subysystem I get two files called
'add_slot' and 'remove_slot' under /sys/bus/pci/slots/control.

With Greg's patch I get the directory /sys/bus/pci/slots/control, but
nothing under it. 

Apparently John Rose is looking into it.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


signature.asc
Description: This is a digitally signed message part

Re: [patch] ip_local_port_range sysctl has annoying default

2007-05-11 Thread H. Peter Anvin

David Miller wrote:
> 
> All ports above and including 1024 are non-privileged and available to
> anyone.
> 
> Applications which have some requirements in this area need to work
> those things out themselves.

However, there are a large number of applications which have registered
ports in this range.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] 2.6.21-git15 - Kconfig Cleanup

2007-05-11 Thread Matt LaPlante

Fix misc small issues/typos/grammar in Kconfigs for 2.6.21-git15.

Signed-off-by: Matt LaPlante <[EMAIL PROTECTED]>
--

diff -ru a/arch/arm/plat-s3c24xx/Kconfig b/arch/arm/plat-s3c24xx/Kconfig
--- a/arch/arm/plat-s3c24xx/Kconfig 2007-04-25 23:08:32.0 -0400
+++ b/arch/arm/plat-s3c24xx/Kconfig 2007-05-11 21:44:06.0 -0400
@@ -70,7 +70,7 @@
help
  Set the chunksize in Kilobytes of the CRC for checking memory
  corruption over suspend and resume. A smaller value will mean that
- the CRC data block will take more memory, but wil identify any
+ the CRC data block will take more memory, but will identify any
  faults with better precision.
 
  See 
diff -ru a/arch/blackfin/Kconfig b/arch/blackfin/Kconfig
--- a/arch/blackfin/Kconfig 2007-05-11 20:32:24.0 -0400
+++ b/arch/blackfin/Kconfig 2007-05-11 21:33:28.0 -0400
@@ -435,100 +435,100 @@
default y
help
  If enabled interrupt entry code (STORE/RESTORE CONTEXT) is linked
- into L1 instruction memory.(less latency)
+ into L1 instruction memory. (less latency)
 
 config EXCPT_IRQ_SYSC_L1
-   bool "Locate entire ASM lowlevel excepetion / interrupt - Syscall and 
CPLB handler code in L1 Memory"
+   bool "Locate entire ASM lowlevel exception / interrupt - Syscall and 
CPLB handler code in L1 Memory"
default y
help
- If enabled entire ASM lowlevel exception and interrupt entry code 
(STORE/RESTORE CONTEXT) is linked
- into L1 instruction memory.(less latency)
+ If enabled, the entire ASM lowlevel exception and interrupt entry 
code 
+ (STORE/RESTORE CONTEXT) is linked into L1 instruction memory. (less 
latency)
 
 config DO_IRQ_L1
bool "Locate frequently called do_irq dispatcher function in L1 Memory"
default y
help
- If enabled frequently called do_irq dispatcher function is linked
- into L1 instruction memory.(less latency)
+ If enabled, the frequently called do_irq dispatcher function is linked
+ into L1 instruction memory. (less latency)
 
 config CORE_TIMER_IRQ_L1
bool "Locate frequently called timer_interrupt() function in L1 Memory"
default y
help
- If enabled frequently called timer_interrupt() function is linked
- into L1 instruction memory.(less latency)
+ If enabled, the frequently called timer_interrupt() function is linked
+ into L1 instruction memory. (less latency)
 
 config IDLE_L1
bool "Locate frequently idle function in L1 Memory"
default y
help
- If enabled frequently called idle function is linked
- into L1 instruction memory.(less latency)
+ If enabled, the frequently called idle function is linked
+ into L1 instruction memory. (less latency)
 
 config SCHEDULE_L1
bool "Locate kernel schedule function in L1 Memory"
default y
help
- If enabled frequently called kernel schedule is linked
- into L1 instruction memory.(less latency)
+ If enabled, the frequently called kernel schedule is linked
+ into L1 instruction memory. (less latency)
 
 config ARITHMETIC_OPS_L1
bool "Locate kernel owned arithmetic functions in L1 Memory"
default y
help
  If enabled arithmetic functions are linked
- into L1 instruction memory.(less latency)
+ into L1 instruction memory. (less latency)
 
 config ACCESS_OK_L1
bool "Locate access_ok function in L1 Memory"
default y
help
- If enabled access_ok function is linked
- into L1 instruction memory.(less latency)
+ If enabled, the access_ok function is linked
+ into L1 instruction memory. (less latency)
 
 config MEMSET_L1
bool "Locate memset function in L1 Memory"
default y
help
- If enabled memset function is linked
- into L1 instruction memory.(less latency)
+ If enabled, the memset function is linked
+ into L1 instruction memory. (less latency)
 
 config MEMCPY_L1
bool "Locate memcpy function in L1 Memory"
default y
help
- If enabled memcpy function is linked
- into L1 instruction memory.(less latency)
+ If enabled, the memcpy function is linked
+ into L1 instruction memory. (less latency)
 
 config SYS_BFIN_SPINLOCK_L1
bool "Locate sys_bfin_spinlock function in L1 Memory"
default y
help
- If enabled sys_bfin_spinlock function is linked
- into L1 instruction memory.(less latency)
+ If enabled, the sys_bfin_spinlock function is linked
+ into L1 instruction memory. (less latency)
 
 config IP_CHECKSUM_L1
bool "Locate IP Checksum function in L1 Memory"
default n
help
- If enabled IP Checksum function is linked
-

Re: [patch] ip_local_port_range sysctl has annoying default

2007-05-11 Thread H. Peter Anvin

Mark Glines wrote:
> 
> By a one-in-a-million coincidence, this machine has a default port
> range starting with 2048, and this breaks things for me.  I'm trying to
> run both klive and nfs on this box, but klive starts first (probably
> because of the filename sort order), and claims UDP port 2049 for its
> own purposes, causing the nfs server to fail to start.
> 
> If the bind hash size is over a certain threshold, the range
> 32768-61000 is used.  If it is under a certain threshold, a range
> like (1024|2048|3072)-4999 is used, depending on exactly how small it
> is.  Thix box happened to get the 2048-4999 range, which broke nfs.
> 
> A comment just above the code that does this says, "Try to be a bit
> smarter and adjust defaults depending on available memory."  "smarter"?
> Maybe, maybe not.  Either way, it's unexpected.
> 
> Following the principle of least astonishment, I think it seems better
> to use high, out-of-the-way port numbers regardless of how much RAM the
> system has.  So, the following patch changes this behavior slightly.
> The system still picks a dynamic range depending on the bind hash size,
> but now, all ranges start with 32768.  I suppose another reasonable way
> to do this would be to end all ranges with 61000, or something like
> that.
> 

Yes, that would be better.  The IANA recommended port range for dynamic
ports are 49152-65535; Linux extends this to 32768 and chops off some of
the really high ports, but keeping them in the high range is thus the
right thing to do.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/2] From: Paul Mundt <[EMAIL PROTECTED]>

2007-05-11 Thread Andrew Morton

On Sat, 12 May 2007 10:33:00 +0900
Paul Mundt <[EMAIL PROTECTED]> wrote:

> On Fri, May 11, 2007 at 11:39:15AM -0700, Andrew Morton wrote:
> > On Fri, 11 May 2007 09:57:50 -0700
> > [EMAIL PROTECTED] wrote:
> > 
> > > > I'll take a look at tidying up the PMB slab, getting rid of the dtor
> > > > shouldn't be terribly painful. I simply opted to do the list management
> > > > there since others were doing it for the PGD slab cache at the time that
> > > > was written.
> > > 
> > > And here's the bit for dropping pmb_cache_dtor(), moving the list
> > > management up to pmb_alloc() and pmb_free().
> > > 
> > > With this applied, we're all set for killing off slab destructors
> > > from the kernel entirely.
> > 
> > hm, this is already in Paul's git tree.
> > 
> > If we're going to slam all this into 2.6.22 then I can just tempdrop Paul's
> > tree.
> > 
> > However I think we've done enough slab work for 2.6.22 now so I'm inclined
> > to queue these changes for 2.6.23.  That would mean that the slab changes in
> > -mm have a dependency on the sh git tree which I am sure to forget about.
> > If I end up merging these changes before Paul merges his tree, sh will
> > break.  Presumably Paul will notice this ;)
> 
> I can prune it from my tree if you'd rather just bundle these together, I
> wasn't sure what the timeline for these changes were, so I opted just to
> toss the PMB rework in my git tree ahead of time.
> 
> On the other hand, if Christoph's changes are going to be queued for
> 2.6.23, the PMB changes will trickle in well before then anyways.

It looks like we'll be going the latter trickle-in way, thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Re: [announce] Intel announces the PowerTOP utility for Linux

2007-05-11 Thread Jose Celestino

Words by Matt Mackall [Fri, May 11, 2007 at 07:17:19PM -0500]:
> On Fri, May 11, 2007 at 04:07:18PM -0700, Arjan van de Ven wrote:
> > 
> > What's eating the battery life of my laptop? Why isn't it many more 
> > hours? Which software component causes the most power to be burned? 
> > These are important questions without a good answer... until now.
> 
> I get:
> 
> No detailed statistics available; please enable the CONFIG_TIMER_STATS
> kernel option
> 

Must run as root (rw to /proc/timer_stats is needed).

-- 
Jose Celestino

http://www.msversus.org/ ; http://techp.org/petition/show/1
http://www.vinc17.org/noswpat.en.html

"And on the trillionth day, Man created Gods." -- Thomas D. Pate
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] spelling fixes: arch/sh/

2007-05-11 Thread Paul Mundt

On Fri, May 11, 2007 at 08:43:12PM +0100, Simon Arlott wrote:
> Spelling fixes in arch/sh/.
> 
> Signed-off-by: Simon Arlott <[EMAIL PROTECTED]>

Applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] spelling fixes: arch/sh64/

2007-05-11 Thread Paul Mundt

On Fri, May 11, 2007 at 08:43:19PM +0100, Simon Arlott wrote:
> Spelling fixes in arch/sh64/.
> 
> Signed-off-by: Simon Arlott <[EMAIL PROTECTED]>

Applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/2] From: Paul Mundt <[EMAIL PROTECTED]>

2007-05-11 Thread Paul Mundt

On Fri, May 11, 2007 at 11:39:15AM -0700, Andrew Morton wrote:
> On Fri, 11 May 2007 09:57:50 -0700
> [EMAIL PROTECTED] wrote:
> 
> > > I'll take a look at tidying up the PMB slab, getting rid of the dtor
> > > shouldn't be terribly painful. I simply opted to do the list management
> > > there since others were doing it for the PGD slab cache at the time that
> > > was written.
> > 
> > And here's the bit for dropping pmb_cache_dtor(), moving the list
> > management up to pmb_alloc() and pmb_free().
> > 
> > With this applied, we're all set for killing off slab destructors
> > from the kernel entirely.
> 
> hm, this is already in Paul's git tree.
> 
> If we're going to slam all this into 2.6.22 then I can just tempdrop Paul's
> tree.
> 
> However I think we've done enough slab work for 2.6.22 now so I'm inclined
> to queue these changes for 2.6.23.  That would mean that the slab changes in
> -mm have a dependency on the sh git tree which I am sure to forget about.
> If I end up merging these changes before Paul merges his tree, sh will
> break.  Presumably Paul will notice this ;)

I can prune it from my tree if you'd rather just bundle these together, I
wasn't sure what the timeline for these changes were, so I opted just to
toss the PMB rework in my git tree ahead of time.

On the other hand, if Christoph's changes are going to be queued for
2.6.23, the PMB changes will trickle in well before then anyways.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/2] Slab allocators: Drop support for destructors

2007-05-11 Thread Paul Mundt

On Fri, May 11, 2007 at 09:57:51AM -0700, [EMAIL PROTECTED] wrote:
> There is no user of destructors left. There is no reason why we should
> keep checking for destructors calls in the slab allocators.
> 
> The RFC for this patch was discussed at
> http://marc.info/?l=linux-kernel&m=117882364330705&w=2
> 
> Destructors were mainly used for list management which required them to take a
> spinlock. Taking a spinlock in a destructor is a bit risky since the slab
> allocators may run the destructors anytime they decide a slab is no longer
> needed.
> 
> Patch drops destructor support. Any attempt to use a destructor will BUG().
> 
> Cc: Pekka Enberg <[EMAIL PROTECTED]>
> Cc: Paul Mundt <[EMAIL PROTECTED]>
> Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
> 
Acked-by: Paul Mundt <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Bugme-new] [Bug 8462] New: applications under wine freezes

2007-05-11 Thread Charles Gagalac

--- Davide Libenzi wrote:

> Charles, would you mind trying the patch below
> against -git13 on your 
> machine. I tested it with wine and firefox on a 32
> bit P4 with HT and it's 
> working fine.

i applied the patch against git13.  starcraft,
pokerstars, and firefox under wine have not frozen. 
looks good.

Never miss an email again!
Yahoo! Toolbar alerts you the instant new Mail arrives.
http://tools.search.yahoo.com/toolbar/features/mail/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way

2007-05-11 Thread Linus Torvalds

On Sat, 12 May 2007, Oleg Nesterov wrote:
> 
> However, in my opininon THAT PATCH has nothing to do with this problem.
> It just improves the code that we already have.

Sure. 

However, I think it does it THE WRONG WAY, and doesn't actually fix the 
much deeper problems with the freezer, as shown by the fact that the lock 
is *still* broken for other cases.

So, here's a summary:

 - we should not take the lock inside the function, because taking it 
   there is fundamentally wrong, and leaves all the *other* races in 
   place.

 - if you actually want to solve the other races, the lock needs to be 
   taken by the caller, in which case taking it in the callee is obviously 
   (again) wrong.

 - or then, we accept that the race wasn't fixed AT ALL, and you add other 
   code to _other_ places to handle the case where you froze the wrong 
   thread (or didn't freeze the right one).

   And I'm not making that up. Look at most of the other patches in that 
   series: they are _exactly_ about the scenario I'm outlining.

 - the whole "kernel thread vs user thread" thing is the wrong thing to 
   check in the first place, since we just should never touch kernel 
   threads in the first place, and anything that wants to freeze user 
   space should have disabled exec_usermodehelper() at a higher level

That's why I'm so unhappy. The "fix" is going in the wrong direction. Each 
fix on their own may be an "improvement", but the end result of many of 
the fixes is a total mess!

We can continue to add bandaids to something broken, until it "works". But 
the end result, while "working", is not actually any better. Quite the 
reverse - the end result of something like that is that you add all these 
magic rules and special cases.

So in the end one ugly design decision leads to broken locking, which in 
turn leads to other cases where you add more broken code, which just leads 
to a situation where nobody actually understands what the *design* is, 
because there simply *isn't* any design - it's just a hodge-podge of "but 
this fixes a bug" ad-hoc "fixes".

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-mm2: HDAPS? BUG: at kernel/mutex.c:311

2007-05-11 Thread Andrew Morton

On Fri, 11 May 2007 17:53:35 -0700
Andrew Morton <[EMAIL PROTECTED]> wrote:

> And indeed that's buggy - the non-debug version of spin_lock_mutex() is not
> irq-safe.
> 
> I'd say that's pretty dumb of the mutex interface, really.  Doing a
> mutex_trylock() should be OK from all contexts.

We can fix this in a low-impact fashion by making mutex_trylock() do a
spin_trylock() on mutex->wait_lock, no?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way

2007-05-11 Thread Rafael J. Wysocki

On Saturday, 12 May 2007 02:08, Linus Torvalds wrote:
> 
> On Sat, 12 May 2007, Oleg Nesterov wrote:
> > 
> > things change, ->mm is not stable if the kernel thread does use_mm/unuse_mm.
> 
> ->mm is not stable *regardless*!
> 
> Trivial examples:
>  - kernel thread does execve()
>  - user thread does exit().
> 
> The use "use_mm()" and "unuse_mm()" things are total red herrings.
> 
> If the freezer depends on the difference between user and kernel threads, 
> then THAT PATCH IS BUGGY. It's that simple. It tests something that simply 
> isn't stable outside the lock, and then returns that value after having 
> unlocked it.
> 
> It might as well return a random number.
> 
> > However, the return value == 0 does not change in that particular case,
> > exactly because is_user_space() takes task_lock().
> 
> As does exit_mm() etc.
> 
> That's NOT THE POINT. You cannot use the end result after releasing the 
> task lock, because the moment you release the task lock, it becomes 
> totally irrelevant, and may not be true any more.
> 
> Example (a):
>  - you ask "is_user_space(p)", it returns 1.
>  - before you actually have time to do anything about it, the task exists, 
>and (since you don't hold the lock any more) will now have a NULL 
>tsk->mm again (and would now return 0 if you called it again).

In which case we won't be freezing this task at all.

> Example (b):
>  - you ask "is_user_space(p)" and it returns 0, because it's a kernel 
>thread
>  - before you actually do anything about it (but after you released the 
>task lock), the kernel thread does an "execve(/sbin/hotplug)" and is no 
>longer a kernel thread.

This is a special case that needs special handling.

> In both cases will the caller have a return value THAT IS NO LONGER TRUE.
> 
> See? The locking was pointless. Exactly because you release the lock 
> before the user can actually do anything about the return value!
> 
> The fact that the locking protects against the very specific case of AIO 
> where the threads _stay_ user tasks and don't really change is pretty much 
> irrelevant, as far as I can see. 

Well, I disagree.  We need the locking *exactly* to avoid situations in which
the threads don't really change, but we might think that they *have changed*.
More precisely, it's needed, because without it kernel threads which execute
use_mm()/unuse_mm() might be identified as user space processes, and that
would be wrong.  The other cases are beyond the scope of this patch.

Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] ip_local_port_range sysctl has annoying default

2007-05-11 Thread Mark Glines

On Sat, 12 May 2007 00:06:45 UTC
David Miller <[EMAIL PROTECTED]> wrote:
> All ports above and including 1024 are non-privileged and available to
> anyone.
> 
> Applications which have some requirements in this area need to work
> those things out themselves.

Hi David,

I agree completely.  My issue is that an application which doesn't care
which port it binds to (twistd, on klive's behalf) stomped on the port
of an application which cares very much about which port it binds to
(nfs).  I will gladly accept *any* solution to this problem.

I agree that it would be preferable to change the port NFS decides to
bind to.  If you have a patch to do this, I will happily apply it and
go on my merry way.

However, the world we live in does have port numbers exceeding 1024
listed in /etc/services.  What I'd like to know is, for applications
which don't care what port they get, the kernel will assign values of
32768 and above on some machines, but not others. (Based on their bind
hash size.)  Starting from 32768 seems like very sane behavior to me,
because it minimizes the chances of a collision, and (as far as I know)
doesn't cost anything.  A configuration which stomps on a
not-entirely-unknown application like nfs *by default* isn't
necessarily a bug, but it is a worst case scenario, from the
perspective of a lowly user like me, who wants things to Just Work. :)

Is there a compelling reason not to assign random ports starting from
32768 everywhere regardless of their bind hash size, like my patch
attempts to do?  Does it consume any extra resources to do so?

Thanks,

Mark
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] spelling fixes: arch/m68knommu/

2007-05-11 Thread Finn Thain



On Fri, 11 May 2007, Simon Arlott wrote:

> - *   Local routines to interrcept the standard I/O and vector handling
> - *   code. Don't include this 'till now - initialization code above needs
> + *   Local routines to intercept the standard I/O and vector handling
> + *   code. Don't include this until now - initialization code above needs
>   *  access to the real code too.

What's wrong with 'til?

> - *   Sub-architcture dependant initialization code for the Freescale
> + *   Sub-architcture dependent initialization code for the Freescale

...

> - *   Sub-architcture dependant initialization code for the Freescale
> + *   Sub-architcture dependent initialization code for the Freescale

...

> - *   Sub-architcture dependant initialization code for the Motorola
> + *   Sub-architcture dependent initialization code for the Motorola

You want "Sub-architecture".

-f
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way

2007-05-11 Thread Oleg Nesterov

On 05/12, Oleg Nesterov wrote:
> 
> Do we need freezer? Should we freeze kernel threads? I can't judge. I tried
> to read a long thread about suspend, and failed to understand it.
> 
> I personally think we can simplify things if CPU-hotplug use freezer, at 
> least.

Just a small example,

debug_smp_processor_id:

/*
 * Kernel threads bound to a single CPU can safely use
 * smp_processor_id():
 */
this_mask = cpumask_of_cpu(this_cpu);

if (cpus_equal(current->cpus_allowed, this_mask))
goto out;

This is not true with CONFIG_HOTPLUG_CPU. This becomes true if we freeze
the kernel threads from CPU_DOWN_PREPARE to CPU_DEAD.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-mm2: HDAPS? BUG: at kernel/mutex.c:311

2007-05-11 Thread Andrew Morton

On Fri, 11 May 2007 19:21:15 -0500
Matt Mackall <[EMAIL PROTECTED]> wrote:

> This just hit:
> 
> [7.856000] usbcore: registered new interface driver usbhid
> [7.86] BUG: at kernel/mutex.c:311 __mutex_trylock_slowpath()
> [7.868000]  [] show_trace_log_lvl+0x1a/0x30
> [7.872000]  [] show_trace+0x12/0x14
> [7.876000]  [] dump_stack+0x15/0x17
> [7.88]  [] mutex_trylock+0x56/0x15a
> [7.888000]  [] hdaps_mousedev_poll+0x10/0xcb
> [7.892000]  [] run_timer_softirq+0x10e/0x16f
> [7.896000]  [] __do_softirq+0x5d/0xc0
> [7.90]  [] do_softirq+0x6e/0xf0
> [7.904000]  [] irq_exit+0x3e/0x7b
> [7.912000]  [] do_IRQ+0x9d/0xb2
> [7.916000]  [] common_interrupt+0x2e/0x34
> [7.92]  [] printk+0x1b/0x1d
> [7.924000]  [] usb_register_driver+0xa0/0xe5
> [7.928000]  [] hid_init+0x28/0x51
> [7.932000]  [] kernel_init+0xbc/0x23e
> [7.94]  [] kernel_thread_helper+0x7/0x10
> [7.944000]  ===
> [7.948000] drivers/hid/usbhid/hid-core.c: v2.6:USB HID core driver
> 
> Looks like it's triggered by the HDAPS driver.
> 

It's complaining about a mutex_trylock() being run from irq context.

And indeed that's buggy - the non-debug version of spin_lock_mutex() is not
irq-safe.

I'd say that's pretty dumb of the mutex interface, really.  Doing a
mutex_trylock() should be OK from all contexts.

This is caused by a recent semaphore->mutex conversion and it's in mainline
now.

Ho hum.  I suppose a suitable workaround would be to convert hdaps_mtx back
into a semaphore.  ug.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] Refine SCREEN_INFO sanity check for vgacon initialization.

2007-05-11 Thread Eric W. Biederman

Gerd Hoffmann <[EMAIL PROTECTED]> writes:

>   Hi,
>
> Checking video mode field only to see whenever SCREEN_INFO is
> initialized is not enougth, in some cases it is zero although
> a vga card is present.  Lets additionally check cols and lines.

Acked-by: "Eric W. Biederman" <[EMAIL PROTECTED]>

>
> Updates f82af20e1a028e16b9bb11da081fa1148d40fa6a, should go
> into 2.6.22.
>
> please apply,
>   Gerd
>
> Refine SCREEN_INFO sanity check for vgacon initialization.
>
> Checking video mode field only to see whenever SCREEN_INFO is
> initialized is not enougth, in some cases it is zero although
> a vga card is present.  Lets additionally check cols and lines.
>
> Signed-off-by: Gerd Hoffmann <[EMAIL PROTECTED]>
> Cc: Rusty Russell <[EMAIL PROTECTED]>
> Cc: Andi Kleen <[EMAIL PROTECTED]>
> Cc: Alan <[EMAIL PROTECTED]>
> Cc: Ingo Molnar <[EMAIL PROTECTED]>
> Cc: Eric W. Biederman <[EMAIL PROTECTED]>
> ---
>  drivers/video/console/vgacon.c |9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> Index: vanilla-2.6.21-git11/drivers/video/console/vgacon.c
> ===
> --- vanilla-2.6.21-git11.orig/drivers/video/console/vgacon.c
> +++ vanilla-2.6.21-git11/drivers/video/console/vgacon.c
> @@ -368,9 +368,14 @@ static const char *vgacon_startup(void)
>  #endif
>   }
>  
> + /* SCREEN_INFO initialized? */
> + if ((ORIG_VIDEO_MODE  == 0) &&
> + (ORIG_VIDEO_LINES == 0) &&
> + (ORIG_VIDEO_COLS  == 0))
> + goto no_vga;
> +
>   /* VGA16 modes are not handled by VGACON */
> - if ((ORIG_VIDEO_MODE == 0x00) || /* SCREEN_INFO not initialized */
> - (ORIG_VIDEO_MODE == 0x0D) ||/* 320x200/4 */
> + if ((ORIG_VIDEO_MODE == 0x0D) ||/* 320x200/4 */
>   (ORIG_VIDEO_MODE == 0x0E) ||/* 640x200/4 */
>   (ORIG_VIDEO_MODE == 0x10) ||/* 640x350/4 */
>   (ORIG_VIDEO_MODE == 0x12) ||/* 640x480/4 */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way

2007-05-11 Thread Oleg Nesterov

On 05/11, Linus Torvalds wrote:
> 
> On Sat, 12 May 2007, Oleg Nesterov wrote:
> > 
> > things change, ->mm is not stable if the kernel thread does use_mm/unuse_mm.
> 
> ->mm is not stable *regardless*!
> 
> Trivial examples:
>  - kernel thread does execve()
>  - user thread does exit().

Yes sure. Quoting myself,
>
>  true->false means daemonize() or do_exit(), seems harmless.
>
>  false->true means exec from kernel space. That is why 
FREEZER_KERNEL_THREADS
>  in fact means all tasks, not only kernel threads.
>

> The use "use_mm()" and "unuse_mm()" things are total red herrings.
> 
> If the freezer depends on the difference between user and kernel threads, 
> then THAT PATCH IS BUGGY. It's that simple.

This is another story, I can't comment because I am not educated enough.

However, in my opininon THAT PATCH has nothing to do with this problem.
It just improves the code that we already have.

> > However, the return value == 0 does not change in that particular case,
> > exactly because is_user_space() takes task_lock().
> 
> As does exit_mm() etc.

Note the "in that particular case".

> See? The locking was pointless. Exactly because you release the lock 
> before the user can actually do anything about the return value!

Yes. See the "Quoting myself" above.

> Anyway, I think the whole freezer thing is broken. There's no reason to 
> freeze kernel threads. 

It is not perfect. Rafael tries to improve it.

Do we need freezer? Should we freeze kernel threads? I can't judge. I tried
to read a long thread about suspend, and failed to understand it.

I personally think we can simplify things if CPU-hotplug use freezer, at least.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SLUB under lguest on i386

2007-05-11 Thread Christoph Lameter

On Fri, 11 May 2007, Oliver Xymoron wrote:

> And no sign of further progress. SLAB worked fine.

Add slub_debug to the command line. Any changes or any additional 
diagnostic output?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: x86 setup rewrite tree ready for flamage^W review

2007-05-11 Thread H. Peter Anvin

Kevin Winchester wrote:
> Not sure if you were looking for testing, but I fuzzed it to apply to
> 2.6.21-git and gave it a spin.  Worked just like a normal boot (which I
> assume was the point).

That would be the point, yes :)  Looking for breakage in video mode
detection, memory detection, and APM are probably the trickiest areas.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/2] From: Paul Mundt <[EMAIL PROTECTED]>

2007-05-11 Thread Christoph Lameter

On Fri, 11 May 2007, Andrew Morton wrote:

> However I think we've done enough slab work for 2.6.22 now so I'm inclined
> to queue these changes for 2.6.23.  That would mean that the slab changes in
> -mm have a dependency on the sh git tree which I am sure to forget about.
> If I end up merging these changes before Paul merges his tree, sh will
> break.  Presumably Paul will notice this ;)

Ok. Only mm is fine for what I have planned. I want to add a 
kmem_cache_ops structure for 2.6.23. Maybe I can use the now useless dtor 
field of kmem_cache_create for this?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.21-mm2: HDAPS? BUG: at kernel/mutex.c:311

2007-05-11 Thread Matt Mackall

This just hit:

[7.856000] usbcore: registered new interface driver usbhid
[7.86] BUG: at kernel/mutex.c:311 __mutex_trylock_slowpath()
[7.868000]  [] show_trace_log_lvl+0x1a/0x30
[7.872000]  [] show_trace+0x12/0x14
[7.876000]  [] dump_stack+0x15/0x17
[7.88]  [] mutex_trylock+0x56/0x15a
[7.888000]  [] hdaps_mousedev_poll+0x10/0xcb
[7.892000]  [] run_timer_softirq+0x10e/0x16f
[7.896000]  [] __do_softirq+0x5d/0xc0
[7.90]  [] do_softirq+0x6e/0xf0
[7.904000]  [] irq_exit+0x3e/0x7b
[7.912000]  [] do_IRQ+0x9d/0xb2
[7.916000]  [] common_interrupt+0x2e/0x34
[7.92]  [] printk+0x1b/0x1d
[7.924000]  [] usb_register_driver+0xa0/0xe5
[7.928000]  [] hid_init+0x28/0x51
[7.932000]  [] kernel_init+0xbc/0x23e
[7.94]  [] kernel_thread_helper+0x7/0x10
[7.944000]  ===
[7.948000] drivers/hid/usbhid/hid-core.c: v2.6:USB HID core driver

Looks like it's triggered by the HDAPS driver.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [announce] Intel announces the PowerTOP utility for Linux

2007-05-11 Thread Matt Mackall

On Fri, May 11, 2007 at 04:07:18PM -0700, Arjan van de Ven wrote:
> 
> What's eating the battery life of my laptop? Why isn't it many more 
> hours? Which software component causes the most power to be burned? 
> These are important questions without a good answer... until now.

I get:

No detailed statistics available; please enable the CONFIG_TIMER_STATS
kernel option

with:

$ zgrep STATS /proc/config.gz 
# CONFIG_TASKSTATS is not set
# CONFIG_SCHEDSTATS is not set
CONFIG_TIMER_STATS=y

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way

2007-05-11 Thread Rafael J. Wysocki

On Saturday, 12 May 2007 01:25, Andrew Morton wrote:
> On Sat, 12 May 2007 01:22:06 +0200
> "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:
> 
> > On Saturday, 12 May 2007 00:56, Linus Torvalds wrote:
> > > 
> > > On Fri, 11 May 2007, Rafael J. Wysocki wrote:
> > > > 
> > > > For user space processes this condition is always true.
> > > > 
> > > > For kernel threads:
> > > > (1) the change of tsk->mm from NULL to a nonzero value is only made in
> > > > fs/aio.c:use_mm() along with the setting of PF_BORROWED_MM under
> > > > the task_lock(),
> > > > (2) the change of tsk->mm from a nonzero value to NULL is only made in
> > > > fs/aio.c:unuse_mm() along with the resetting of PF_BORROWED_MM
> > > > under the task_lock().
> > > > Therefore, by taking the task_lock() here we make sure that the 
> > > > condition
> > > > is alyways false when we check it for kernel threads.
> > > 
> > > Why *test* it then and return anything?
> > > 
> > > Why not just doa "task_lock(p); task_unlock(p);" with no return value? 
> > > 
> > > As it is, it sounds like either the code is buggy, or it's pointless.
> > 
> > I'm not sure what you mean.
> > 
> > We use this function (ie. kernel/power/process.c:is_user_space()) to
> > distinguish kernel threads from user space processes.  Therefore we make it
> > always return true for user space processes and always return false for 
> > kernel
> > threads.  In the latter case we need to use the task_lock() to ensure that 
> > the
> > result is as desired (ie. false), because otherwise it might be racing with
> > either fs/aio.c:use_mm() or fs/aio.c:unuse_mm().
> > 
> 
> ah, OK.
> 
> static void use_mm(struct mm_struct *mm)
> {
>   struct mm_struct *active_mm;
>   struct task_struct *tsk = current;
> 
>   task_lock(tsk);
>   tsk->flags |= PF_BORROWED_MM;
>   active_mm = tsk->active_mm;
>   atomic_inc(&mm->mm_count);
>   tsk->mm = mm;
>   tsk->active_mm = mm;
>   /*
>* Note that on UML this *requires* PF_BORROWED_MM to be set, otherwise
>* it won't work. Update it accordingly if you change it here
>*/
>   switch_mm(active_mm, mm, tsk);
>   task_unlock(tsk);
> 
> So is_user_space() requires that the state of p->mm and p->flags be
> consistent: it doesn't want to be looking at those two things in that
> three-statement window above.
> 
> Good changelogging and commenting save quite a bit of time and email.

Very true.

I have added a comment to the patch, so that we remeber why the task_lock()
is there.  Please replace the original patch with this one (unless you think 
it's
worse ;-)).

---
From: Rafael J. Wysocki <[EMAIL PROTECTED]>

The reading of PF_BORROWED_MM in is_user_space() without task_lock() is racy. 
Fix it.

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
Acked-by: Pavel Machek <[EMAIL PROTECTED]>
---
 kernel/power/process.c |   14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

Index: linux-2.6/kernel/power/process.c
===
--- linux-2.6.orig/kernel/power/process.c
+++ linux-2.6/kernel/power/process.c
@@ -8,6 +8,7 @@
 
 #undef DEBUG
 
+#include 
 #include 
 #include 
 #include 
@@ -88,7 +89,18 @@ static void cancel_freezing(struct task_
 
 static inline int is_user_space(struct task_struct *p)
 {
-   return p->mm && !(p->flags & PF_BORROWED_MM);
+   int ret;
+
+   /*
+* task_lock() is acquired to avoid evaluating the condition while the
+* state of p->mm and p->flags is not consistent, which may happen,
+* for example, if this function is executed in parallel with
+* fs/aio.c:unuse_mm()
+*/
+   task_lock(p);
+   ret = p->mm && !(p->flags & PF_BORROWED_MM);
+   task_unlock(p);
+   return ret;
 }
 
 static unsigned int try_to_freeze_tasks(int freeze_user_space)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way

2007-05-11 Thread Linus Torvalds

On Sat, 12 May 2007, Oleg Nesterov wrote:
> 
> things change, ->mm is not stable if the kernel thread does use_mm/unuse_mm.

->mm is not stable *regardless*!

Trivial examples:
 - kernel thread does execve()
 - user thread does exit().

The use "use_mm()" and "unuse_mm()" things are total red herrings.

If the freezer depends on the difference between user and kernel threads, 
then THAT PATCH IS BUGGY. It's that simple. It tests something that simply 
isn't stable outside the lock, and then returns that value after having 
unlocked it.

It might as well return a random number.

> However, the return value == 0 does not change in that particular case,
> exactly because is_user_space() takes task_lock().

As does exit_mm() etc.

That's NOT THE POINT. You cannot use the end result after releasing the 
task lock, because the moment you release the task lock, it becomes 
totally irrelevant, and may not be true any more.

Example (a):
 - you ask "is_user_space(p)", it returns 1.
 - before you actually have time to do anything about it, the task exists, 
   and (since you don't hold the lock any more) will now have a NULL 
   tsk->mm again (and would now return 0 if you called it again).

Example (b):
 - you ask "is_user_space(p)" and it returns 0, because it's a kernel 
   thread
 - before you actually do anything about it (but after you released the 
   task lock), the kernel thread does an "execve(/sbin/hotplug)" and is no 
   longer a kernel thread.

In both cases will the caller have a return value THAT IS NO LONGER TRUE.

See? The locking was pointless. Exactly because you release the lock 
before the user can actually do anything about the return value!

The fact that the locking protects against the very specific case of AIO 
where the threads _stay_ user tasks and don't really change is pretty much 
irrelevant, as far as I can see. 

Anyway, I think the whole freezer thing is broken. There's no reason to 
freeze kernel threads. 

If you want to freeze user processes, go ahead. But then you need a lock 
to make sure that new processes don't *become* user processes (ie you need 
to disable hotplug). 

And if you want to protect against cpufreq, do so. But don't try to say 
that you want to freeze all kernel threads. Just protect against cpufreq 
threads. Don't make all the other threads that have *zero* interest in 
freezing have to worry about it.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] spelling fixes: arch/powerpc/

2007-05-11 Thread Paul Mackerras

Simon Arlott writes:

> Spelling fixes in arch/powerpc/.

> - /* Retreive CPU related informations from the flat tree
> + /* Retreive CPU related information from the flat tree
   ^^
You missed one. :)

> - /* Clear the freeze bit, and reenable the interrupt.
> + /* Clear the freeze bit, and re-enable the interrupt.

reenable -> re-enable is a bit marginal, but OK.

>   /* Ok, now let's get cracking. You may ask me why I just didn't match
>* the iic host from the iic OF node, but that way I'm still compatible
> -  * with really really old old firmwares for which we don't have a node
> +  * with really really old old firmware for which we don't have a node

I think "firmwares" here was meaning to imply there were more than one
instance or release of firmware which had the property described, and
your change loses that.

> -  * differenciate them all and since that hack was there for a long
> +  * differentiate them all and since that hack was there for a long

I haven't been too strict about the Franglais in the past, but OK. :)

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] ip_local_port_range sysctl has annoying default

2007-05-11 Thread David Miller

From: Mark Glines <[EMAIL PROTECTED]>
Date: Fri, 11 May 2007 17:01:35 -0700

> Following the principle of least astonishment, I think it seems better
> to use high, out-of-the-way port numbers regardless of how much RAM the
> system has.  So, the following patch changes this behavior slightly.
> The system still picks a dynamic range depending on the bind hash size,
> but now, all ranges start with 32768.  I suppose another reasonable way
> to do this would be to end all ranges with 61000, or something like
> that.

All ports above and including 1024 are non-privileged and available to
anyone.

Applications which have some requirements in this area need to work
those things out themselves.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way

2007-05-11 Thread Oleg Nesterov

I hope Rafael will correct me if I am wrong,

On 05/12, Oleg Nesterov wrote:
>
> On 05/11, Linus Torvalds wrote:
> >
> > On Sat, 12 May 2007, Oleg Nesterov wrote:
> > > 
> > > without task_lock() we can see "p->mm != NULL" but not PF_BORROWED_MM.
> > 
> > Let me explain it one more time:
> >  - shouldn't the *caller* protect this?
> >
> > Afaik, there's two situations:
> >  - either things don't change (in which case you don't need locking at 
> >all, since things are statically one way or the other)
> >  - or things change (in which case the caller can't rely on the return 
> >value anyway, since they might change *after* you release the lock)
> 
> things change, ->mm is not stable if the kernel thread does use_mm/unuse_mm.
> 
> However, the return value == 0 does not change in that particular case,
> exactly because is_user_space() takes task_lock().

Probably there is some misunderstanding. This patch doesn't claim it solves
all problems. Before this patch we have

static inline int is_user_space(struct task_struct *p)
{
return p->mm && !(p->flags & PF_BORROWED_MM);
}

and this is clearly racy wrt to use_mm() which sets this PF_BORROWED_MM bit.
So this is just a little improvement, nothing more.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] ip_local_port_range sysctl has annoying default

2007-05-11 Thread Mark Glines

On a powerpc machine (kurobox) I have here with 128M of RAM, the default
value of /proc/sys/net/ipv4/ip_local_port_range is:
20484999

This setting affects the port assigned to an application by default
when the application doesn't specify a port to use, like, for instance,
an outgoing connection.  It affects both TCP and UDP.  The default
values for this sysctl vary depending on the size of the tcp bind hash,
which in turn, varies depending on the size of the system RAM (I think).

By a one-in-a-million coincidence, this machine has a default port
range starting with 2048, and this breaks things for me.  I'm trying to
run both klive and nfs on this box, but klive starts first (probably
because of the filename sort order), and claims UDP port 2049 for its
own purposes, causing the nfs server to fail to start.

If the bind hash size is over a certain threshold, the range
32768-61000 is used.  If it is under a certain threshold, a range
like (1024|2048|3072)-4999 is used, depending on exactly how small it
is.  Thix box happened to get the 2048-4999 range, which broke nfs.

A comment just above the code that does this says, "Try to be a bit
smarter and adjust defaults depending on available memory."  "smarter"?
Maybe, maybe not.  Either way, it's unexpected.

Following the principle of least astonishment, I think it seems better
to use high, out-of-the-way port numbers regardless of how much RAM the
system has.  So, the following patch changes this behavior slightly.
The system still picks a dynamic range depending on the bind hash size,
but now, all ranges start with 32768.  I suppose another reasonable way
to do this would be to end all ranges with 61000, or something like
that.

It also seems funny to me that this would be in tcp_init(), when it
affects both TCP and UDP.  But hey, it is where it is.

Signed-off-by: Mark Glines <[EMAIL PROTECTED]>

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index bd4c295..4431b87 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2464,14 +2464,14 @@ void __init tcp_init(void)
(tcp_hashinfo.bhash_size * sizeof(struct 
inet_bind_hashbucket));
order++)
;
+   sysctl_local_port_range[0] = 32768;
if (order >= 4) {
-   sysctl_local_port_range[0] = 32768;
sysctl_local_port_range[1] = 61000;
tcp_death_row.sysctl_max_tw_buckets = 18;
sysctl_tcp_max_orphans = 4096 << (order - 4);
sysctl_max_syn_backlog = 1024;
} else if (order < 3) {
-   sysctl_local_port_range[0] = 1024 * (3 - order);
+   sysctl_local_port_range[1] = 32768 + (1024 * order);
tcp_death_row.sysctl_max_tw_buckets >>= (3 - order);
sysctl_tcp_max_orphans >>= (3 - order);
sysctl_max_syn_backlog = 128;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: x86 setup rewrite tree ready for flamage^W review

2007-05-11 Thread Kevin Winchester

H. Peter Anvin wrote:
> Hello all,
>
> I believe the x86 setup tree is now finished.  I will turn it into a
> "clean patchset" later this week, but I wanted to get flamed^W feedback
> on it first.
>
> The git tree is at:
>
> http://git.kernel.org/?p=linux/kernel/git/hpa/linux-2.6-newsetup.git;a=summary
> git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-newsetup.git
> ...
>
> ... and a flat patch at ...
>
> http://www.kernel.org/pub/linux/kernel/people/hpa/newsetup-36f021b5.patch
>
>   -hpa
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>
>   
Not sure if you were looking for testing, but I fuzzed it to apply to
2.6.21-git and gave it a spin.  Worked just like a normal boot (which I
assume was the point).

[0.00] Linux version 2.6.21-g0a3fd051-dirty ([EMAIL PROTECTED]) (gcc
version 4.1.2 (Gentoo 4.1.2)) #9 PREEMPT Fri May 11 20:50:02 ADT 2007
[0.00] Command line: root=/dev/sda3
[0.00] BIOS-provided physical RAM map:
[0.00]  BIOS-e820:  - 0009f800 (usable)
[0.00]  BIOS-e820: 0009f800 - 000a (reserved)
[0.00]  BIOS-e820: 000f - 0010 (reserved)
[0.00]  BIOS-e820: 0010 - 1fef (usable)
[0.00]  BIOS-e820: 1fef - 1fef3000 (ACPI NVS)
[0.00]  BIOS-e820: 1fef3000 - 1ff0 (ACPI data)
[0.00]  BIOS-e820: fec0 - 0001 (reserved)
[0.00] Entering add_active_range(0, 0, 159) 0 entries of 256 used
[0.00] Entering add_active_range(0, 256, 130800) 1 entries of
256 used
[0.00] end_pfn_map = 1048576
[0.00] DMI 2.3 present.
[0.00] ACPI: RSDP 000F77D0, 0014 (r0 VIAK8T)
[0.00] ACPI: RSDT 1FEF3040, 0034 (r1 VIAK8T AWRDACPI 42302E31
AWRD0)
[0.00] ACPI: FACP 1FEF30C0, 0074 (r1 VIAK8T AWRDACPI 42302E31
AWRD0)
[0.00] ACPI: DSDT 1FEF3180, 4F8A (r1 VIAK8T AWRDACPI 1000
MSFT  10E)
[0.00] ACPI: FACS 1FEF, 0040
[0.00] ACPI: BOOT 1FEF8180, 0028 (r1 VIAK8T AWRDACPI 42302E31
AWRD0)
[0.00] ACPI: SSDT 1FEF82C0, 00B5 (r1 PTLTD  POWERNOW1 
LTP1)
[0.00] ACPI: APIC 1FEF8200, 0068 (r1 VIAK8T AWRDACPI 42302E31
AWRD0)
[0.00] Entering add_active_range(0, 0, 159) 0 entries of 256 used
[0.00] Entering add_active_range(0, 256, 130800) 1 entries of
256 used
[0.00] Zone PFN ranges:
[0.00]   DMA 0 -> 4096
[0.00]   DMA324096 ->  1048576
[0.00]   Normal1048576 ->  1048576
[0.00] early_node_map[2] active PFN ranges
[0.00] 0:0 ->  159
[0.00] 0:  256 ->   130800
[0.00] On node 0 totalpages: 130703
[0.00]   DMA zone: 56 pages used for memmap
[0.00]   DMA zone: 1356 pages reserved
[0.00]   DMA zone: 2587 pages, LIFO batch:0
[0.00]   DMA32 zone: 1732 pages used for memmap
[0.00]   DMA32 zone: 124972 pages, LIFO batch:31
[0.00]   Normal zone: 0 pages used for memmap
[0.00] ACPI: PM-Timer IO Port: 0x4008
[0.00] ACPI: Local APIC address 0xfee0
[0.00] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[0.00] Processor #0 (Bootup-CPU)
[0.00] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] disabled)
[0.00] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
[0.00] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
[0.00] ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
[0.00] IOAPIC[0]: apic_id 2, address 0xfec0, GSI 0-23
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
[0.00] ACPI: IRQ0 used by override.
[0.00] ACPI: IRQ2 used by override.
[0.00] ACPI: IRQ9 used by override.
[0.00] Setting APIC routing to flat
[0.00] Using ACPI (MADT) for SMP configuration information
[0.00] Allocating PCI resources starting at 2000 (gap:
1ff0:ded0)
[0.00] Built 1 zonelists.  Total pages: 127559
[0.00] Kernel command line: root=/dev/sda3
[0.00] Initializing CPU#0
[0.00] PID hash table entries: 2048 (order: 11, 16384 bytes)
[   13.327158] time.c: Detected 1838.853 MHz processor.
[   13.328328] Console: colour VGA+ 80x25
[   13.331709] Dentry cache hash table entries: 65536 (order: 7, 524288
bytes)
[   13.332048] Inode-cache hash table entries: 32768 (order: 6, 262144
bytes)
[   13.332193] Checking aperture...
[   13.332264] CPU 0: aperture @ e000 size 128 MB
[   13.337619] Memory: 509000k/523200k available (3246k kernel code,
13476k reserved, 1225k d

Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way

2007-05-11 Thread Rafael J. Wysocki

On Saturday, 12 May 2007 01:29, Linus Torvalds wrote:
> 
> On Sat, 12 May 2007, Rafael J. Wysocki wrote:
> >
> > We use this function (ie. kernel/power/process.c:is_user_space()) to
> > distinguish kernel threads from user space processes.  Therefore we make it
> > always return true for user space processes and always return false for 
> > kernel
> > threads.  In the latter case we need to use the task_lock() to ensure that 
> > the
> > result is as desired (ie. false), because otherwise it might be racing with
> > either fs/aio.c:use_mm() or fs/aio.c:unuse_mm().
> 
> But there is no race protection in the *caller*, so if it can ever return 
> one or the other, what protects it from changing once the caller returns?
> 
> And if the value can change (because some thread uses "use_mm()"), then 
> the caller cannot rely on the value that got returned.

The value cannot change because of that.  There only is a small window inside
unuse_mm() (or use_mm()) in which the value may be wrong.  Namely:

static void unuse_mm(struct mm_struct *mm)
{
struct task_struct *tsk = current;

task_lock(tsk);
tsk->flags &= ~PF_BORROWED_MM;
---
--- If is_user_space() without the task_lock() is called right here, it will
--- return 'true', although it should return 'false'.
---
tsk->mm = NULL;
/* active_mm is still 'mm' */
enter_lazy_tlb(mm, tsk);
task_unlock(tsk);
}

IOW, quoting Andrew, "is_user_space() requires that the state of p->mm and
p->flags be consistent".

> So you migt as well not return any value at all, since the returned value 
> is apparently meaningless once the lock has been released.

No, it is not meaningless.

Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way

2007-05-11 Thread Oleg Nesterov

On 05/11, Linus Torvalds wrote:
>
> On Sat, 12 May 2007, Oleg Nesterov wrote:
> > 
> > without task_lock() we can see "p->mm != NULL" but not PF_BORROWED_MM.
> 
> Let me explain it one more time:
>  - shouldn't the *caller* protect this?
>
> Afaik, there's two situations:
>  - either things don't change (in which case you don't need locking at 
>all, since things are statically one way or the other)
>  - or things change (in which case the caller can't rely on the return 
>value anyway, since they might change *after* you release the lock)

things change, ->mm is not stable if the kernel thread does use_mm/unuse_mm.

However, the return value == 0 does not change in that particular case,
exactly because is_user_space() takes task_lock().

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][RESEND] PIE randomization

2007-05-11 Thread Jiri Kosina

On Fri, 11 May 2007, Andrew Morton wrote:

> I could reverse-engineer that info from the patch, I guess, but I'd 
> prefer to go in the opposite direction: you tell us what the patch is 
> trying to do, then we look at it and see if we agree that it is in fact 
> doing that.

I've just quickly looked at the patch and it seems fine - it's using 
mmap()'s randomization functionality in such a way that it maps the the 
main executable of (specially compiled/linked) ET_DYN binaries onto a 
random address (in cases in which mmap() is allowed to perform a 
randomization). Which is what we want, I'd guess.

Jan, would you care to update the patch with proper Changelog entry?

However, I seem to get "soft" hang on boot with this patch, approximately 
at the time the init should be executed. The system is not completely 
stuck - interrupts are delivered, keyboard is working, alt-sysrq-t dumps 
proper output, but userspace doesn't seem to get started. This happens on 
i386, didn't try on other archs.

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Use boot based time for process start time and boot time in /proc

2007-05-11 Thread Tomas Janousek

Hi,

On Fri, May 11, 2007 at 12:51:32PM -0700, Andrew Morton wrote:
> On Fri, 11 May 2007 10:45:31 +0200
> Tomas Janousek <[EMAIL PROTECTED]> wrote:
> 
> > Hello,
> > 
> > On Thu, May 10, 2007 at 04:40:47PM -0700, Andrew Morton wrote:
> > > Tomas Janousek <[EMAIL PROTECTED]> wrote:
> > > > @@ -445,12 +445,14 @@ static int show_stat(struct seq_file *p, void *v)
> > > > unsigned long jif;
> > > > cputime64_t user, nice, system, idle, iowait, irq, softirq, 
> > > > steal;
> > > > u64 sum = 0;
> > > > +   struct timespec boottime;
> > > >  
> > > > user = nice = system = idle = iowait =
> > > > irq = softirq = steal = cputime64_zero;
> > > > -   jif = - wall_to_monotonic.tv_sec;
> > > > -   if (wall_to_monotonic.tv_nsec)
> > > > -   --jif;
> > > > +   getboottime(&boottime);
> > > > +   jif = boottime.tv_sec;
> > > > +   if (boottime.tv_nsec)
> > > > +   ++jif;
> > > >
> > > 

> > getboottime(&boottime);
> > jif = boottime.tv_sec;
> > -   if (boottime.tv_nsec)
> > -   ++jif;
> >  

> So we've gone from --jif to ++jif to no change at all.
> 
> Are you sure that this net removal of --jif is correct?

Yes.

Let's say wall_to_monotonic = { -10, 50 }  (which is { -9, -50 }, and
the original code would result in - (- 10) - 1 == 9).

The getboottime calls set_normalized_timespec on { - (-10), - (50) } which
results to { 10 - 1, - 50 + 100 } = { 9, 50 }.
tv_sec == 9  =>  correct.

-- 
TJ. (Brno, CZ), BaseOS, Red Hat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way

2007-05-11 Thread Linus Torvalds



On Sat, 12 May 2007, Oleg Nesterov wrote:
> 
> without task_lock() we can see "p->mm != NULL" but not PF_BORROWED_MM.

Let me explain it one more time:
 - shouldn't the *caller* protect this?

Afaik, there's two situations:
 - either things don't change (in which case you don't need locking at 
   all, since things are statically one way or the other)
 - or things change (in which case the caller can't rely on the return 
   value anyway, since they might change *after* you release the lock)
ie what's up? Is there a third case?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way

2007-05-11 Thread Linus Torvalds

On Sat, 12 May 2007, Rafael J. Wysocki wrote:
>
> We use this function (ie. kernel/power/process.c:is_user_space()) to
> distinguish kernel threads from user space processes.  Therefore we make it
> always return true for user space processes and always return false for kernel
> threads.  In the latter case we need to use the task_lock() to ensure that the
> result is as desired (ie. false), because otherwise it might be racing with
> either fs/aio.c:use_mm() or fs/aio.c:unuse_mm().

But there is no race protection in the *caller*, so if it can ever return 
one or the other, what protects it from changing once the caller returns?

And if the value can change (because some thread uses "use_mm()"), then 
the caller cannot rely on the value that got returned.

So you migt as well not return any value at all, since the returned value 
is apparently meaningless once the lock has been released.

In other words: "The lock, it does nothing".

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way

2007-05-11 Thread Andrew Morton

On Sat, 12 May 2007 01:22:06 +0200
"Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:

> On Saturday, 12 May 2007 00:56, Linus Torvalds wrote:
> > 
> > On Fri, 11 May 2007, Rafael J. Wysocki wrote:
> > > 
> > > For user space processes this condition is always true.
> > > 
> > > For kernel threads:
> > > (1) the change of tsk->mm from NULL to a nonzero value is only made in
> > > fs/aio.c:use_mm() along with the setting of PF_BORROWED_MM under
> > > the task_lock(),
> > > (2) the change of tsk->mm from a nonzero value to NULL is only made in
> > > fs/aio.c:unuse_mm() along with the resetting of PF_BORROWED_MM
> > > under the task_lock().
> > > Therefore, by taking the task_lock() here we make sure that the condition
> > > is alyways false when we check it for kernel threads.
> > 
> > Why *test* it then and return anything?
> > 
> > Why not just doa "task_lock(p); task_unlock(p);" with no return value? 
> > 
> > As it is, it sounds like either the code is buggy, or it's pointless.
> 
> I'm not sure what you mean.
> 
> We use this function (ie. kernel/power/process.c:is_user_space()) to
> distinguish kernel threads from user space processes.  Therefore we make it
> always return true for user space processes and always return false for kernel
> threads.  In the latter case we need to use the task_lock() to ensure that the
> result is as desired (ie. false), because otherwise it might be racing with
> either fs/aio.c:use_mm() or fs/aio.c:unuse_mm().
> 

ah, OK.

static void use_mm(struct mm_struct *mm)
{
struct mm_struct *active_mm;
struct task_struct *tsk = current;

task_lock(tsk);
tsk->flags |= PF_BORROWED_MM;
active_mm = tsk->active_mm;
atomic_inc(&mm->mm_count);
tsk->mm = mm;
tsk->active_mm = mm;
/*
 * Note that on UML this *requires* PF_BORROWED_MM to be set, otherwise
 * it won't work. Update it accordingly if you change it here
 */
switch_mm(active_mm, mm, tsk);
task_unlock(tsk);

So is_user_space() requires that the state of p->mm and p->flags be
consistent: it doesn't want to be looking at those two things in that
three-statement window above.

Good changelogging and commenting save quite a bit of time and email.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] module_author: don't advice putting in an email address

2007-05-11 Thread Krzysztof Halasa

Rene Herman <[EMAIL PROTECTED]> writes:

> /* Author, ideally of form NAME [, NAME ]*[ and NAME ]
>
> After my trivial patch, it says:
>
> /* Author, ideally of form NAME[, NAME]*[ and NAME] */

I think I would put something like this:

/* Author, of form NAME[, NAME]*[ and NAME]
 * If you have a permanent email address and are prepared for
   maintaining/supporting the module, you may want to provide
   the address as well */

The wording isn't the best I suppose.

I.e., the change would mean providing the address is not strictly
required and the person should think when adding it, that's all.
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] swsusp: Use platform mode by default

2007-05-11 Thread Pavel Machek

Hi!

> > Just to clarify, the change in question isn't new.  It was introduced by the
> > commit 9185cfa92507d07ac787bc73d06c4eec7239 before 2.6.20, at Seife's
> > request and with Pavel's acceptance.
> 
> Ok, if it's that old, we migt as leave it in. Clearly there weren't many 
> regressions, and this isn't a case of other monsters lurking behind a lack 
> of testers.

Ok, so what is the result?

"platform" is the correct default, because it is as the spec said.

Both were default in recent history, and neither is too horrible. So
I'd prefer "platform" to be default, as it is correct.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] libata: add human-readable error value decoding

2007-05-11 Thread Jeff Garzik


Robert Hancock wrote:
The ATA ones are more of a pain in that regard than SCSI though - SCSI 
has all distinct error codes for different errors, whereas ATA has 
bitmasks for everything..


That should not affect implementation.  Either way, a table-driven 
approach can easily work.


I favor decoding the SError status bits, but your names were far too 
long.  "ProtocolErr" should be "Proto".  "10B8BErr" should be "10b8b". 
HostInternalErr to HostInt.  PHYInternalErr to PHYInt.  etc.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way

2007-05-11 Thread Oleg Nesterov

On 05/11, Linus Torvalds wrote:
>
> On Fri, 11 May 2007, Rafael J. Wysocki wrote:
> > 
> > Therefore, by taking the task_lock() here we make sure that the condition
> > is alyways false when we check it for kernel threads.
> 
> Why *test* it then and return anything?
> 
> Why not just doa "task_lock(p); task_unlock(p);" with no return value? 

because we should not freeze a kernel thread at FREEZER_USER_SPACE stage?

without task_lock() we can see "p->mm != NULL" but not PF_BORROWED_MM.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way

2007-05-11 Thread Rafael J. Wysocki

On Saturday, 12 May 2007 00:56, Linus Torvalds wrote:
> 
> On Fri, 11 May 2007, Rafael J. Wysocki wrote:
> > 
> > For user space processes this condition is always true.
> > 
> > For kernel threads:
> > (1) the change of tsk->mm from NULL to a nonzero value is only made in
> > fs/aio.c:use_mm() along with the setting of PF_BORROWED_MM under
> > the task_lock(),
> > (2) the change of tsk->mm from a nonzero value to NULL is only made in
> > fs/aio.c:unuse_mm() along with the resetting of PF_BORROWED_MM
> > under the task_lock().
> > Therefore, by taking the task_lock() here we make sure that the condition
> > is alyways false when we check it for kernel threads.
> 
> Why *test* it then and return anything?
> 
> Why not just doa "task_lock(p); task_unlock(p);" with no return value? 
> 
> As it is, it sounds like either the code is buggy, or it's pointless.

I'm not sure what you mean.

We use this function (ie. kernel/power/process.c:is_user_space()) to
distinguish kernel threads from user space processes.  Therefore we make it
always return true for user space processes and always return false for kernel
threads.  In the latter case we need to use the task_lock() to ensure that the
result is as desired (ie. false), because otherwise it might be racing with
either fs/aio.c:use_mm() or fs/aio.c:unuse_mm().

Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] libata: add human-readable error value decoding

2007-05-11 Thread Robert Hancock


Tejun Heo wrote:

Chuck Ebbert wrote:

Robert Hancock wrote:

+  ehc->i.serror & SERR_TRANS_ST_ERROR ? "TransStatTransErr "
: "",
+  ehc->i.serror & SERR_UNRECOG_FIS ? "UnrecogFIS " : "",
+  ehc->i.serror & SERR_DEV_XCHG ? "DevExchanged " : "" );

I'm not really convinced whether this is necessary.  The human readable
form is also a bit cryptic and can get quite long.  So, mild NACK from
me.


It certainly seems useful when debugging hotplug issues or random SATA
problems which end up being caused by communication problems. Without
this output, Joe User stands no chance of figuring out what's going on,
and neither does Joe libata Developer unless they really care to dig
through the spec and count bits to figure out what they mean. At least
with this you can see that there was a CRC error, etc. and go from that..


Why not just document the error messages?

And the scsi ones too, I can't seem to find what the sense codes mean.


They are well documented elsewhere - the standard documents.  For sense
codes, t10.org.  For SError bits, t13.org.  You can get drafts free of
charge.


The ATA ones are more of a pain in that regard than SCSI though - SCSI 
has all distinct error codes for different errors, whereas ATA has 
bitmasks for everything..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[announce] Intel announces the PowerTOP utility for Linux

2007-05-11 Thread Arjan van de Ven



What's eating the battery life of my laptop? Why isn't it many more 
hours? Which software component causes the most power to be burned? 
These are important questions without a good answer... until now.


The Linux 2.6.21 kernel introduces the so called tickless-idle 
feature. This feature allows the processor to be really idle for long 
periods of time, rather than having to wake up every millisecond for 
the timer tick. Current processors save a lot of power if they are 
idle for long periods, which translates into a longer battery life for 
your laptop, or a lower energy bill for your datacenter. However, a 
Linux system consists of more software than just the kernel, and there 
are many tunables involved. It's not easy to see what is going on, and 
as a result the behavior is sometimes far from optimal, and a lot of 
power is wasted.


Intel is proud to announce the PowerTOP tool 
(http://www.linuxpowertop.org), a program that collects the various 
pieces of information from your system and presents an overview of how 
well your laptop is doing in terms of power savings. In addition, 
PowerTOP will provide an indication of which tunables and software 
components are the biggest offenders in slurping up your battery time. 
PowerTOP will update it's display frequently so that you can directly 
see the impact of any changes you are making.


A typical Linux distribution has many components that wake the 
processor up frequently for no good reason. In our testing with 
PowerTOP, we have seen many cases where with some simple fixes, the 
battery life of typical laptops was increased by one hour or more!


We are providing fixes for several of the issues we identified, and we 
encourage the Linux community to help us in this quest to get the 
maximum battery life out of your (hopefully Intel based) laptops. Try 
the PowerTOP tool, join the mailing list or the IRC channel and 
provide feedback, problem reports or fixes!


Website:  http://www.linuxpowertop.org
IRC:  irc.oftc.net#powertop channel
Mailing list: http://www.bughost.org/mailman/listinfo/power
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

-mm git tree

2007-05-11 Thread J. Bruce Fields

The git tree at

git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git

could be set up in a simpler way:

$ git ls-remote 
git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git
fc4b5be9e651d3e71b54541e0315fc82211b42b5refs/heads/option_export
59a1fe35614c3c937a4e8cb6e4a45f1d05544d9drefs/heads/v2.6.13-mm1
e3602088f81f66655ec6c62320d5c56839ffc02brefs/heads/v2.6.13-mm2
...
05230bd16821e2ec80321d72e97e7a2b1a07c6f2refs/tags/master
...
5e1302f173f63c5c57c5de8b44152c30ae2a72c4refs/tags/v2.6.13-mm1
59a1fe35614c3c937a4e8cb6e4a45f1d05544d9drefs/tags/v2.6.13-mm1^{}
a06c5a7b36cfb30345a9476cbaff02955483c4carefs/tags/v2.6.13-mm2
e3602088f81f66655ec6c62320d5c56839ffc02brefs/tags/v2.6.13-mm2^{}
...

Would it be possible to remove the branches that exist for each
individual version, and to change the "master" tag to a branch?

Since git gives tag names priority over head names, fetching the above
tag makes "master" refer to it instead of any local branch named
"master".

(I get particularly bizarre behavior with current git; after:

git remote add mm 
git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git
git fetch mm

when I check out "master", it sets HEAD to refs/heads/master, but the
index and working tree to refs/tags/master.)

I think it may have been set up this way with the idea that a branch
should only ever move "forward" in history, whereas tags could move
around freely.

But that's not really right--for something like -mm that's continually
rewritten and rebased, it makes sense to have a "master" branch that
skips around.  The default git-remote setup on recent git is prepared to
deal with this.

And having a repository with 101 branches and counting, none of which
every change, is awkward--if nothing else it makes the output of
"git-branch -r" a little hard to read.

--b.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] x86_64: use signalfd and timerfd compat syscalls

2007-05-11 Thread Heiko Carstens

From: Heiko Carstens <[EMAIL PROTECTED]>

Looks like these two are wired up in a wrong way.

Cc: Davide Libenzi <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: Heiko Carstens <[EMAIL PROTECTED]>
---
 arch/x86_64/ia32/ia32entry.S |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Index: linux-2.6/arch/x86_64/ia32/ia32entry.S
===
--- linux-2.6.orig/arch/x86_64/ia32/ia32entry.S
+++ linux-2.6/arch/x86_64/ia32/ia32entry.S
@@ -716,7 +716,7 @@ ia32_sys_call_table:
.quad sys_getcpu
.quad sys_epoll_pwait
.quad compat_sys_utimensat  /* 320 */
-   .quad sys_signalfd
-   .quad sys_timerfd
+   .quad compat_sys_signalfd
+   .quad compat_sys_timerfd
.quad sys_eventfd
-ia32_syscall_end:  
+ia32_syscall_end:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] compat signalfd and timerfd are cond syscalls.

2007-05-11 Thread Heiko Carstens

From: Heiko Carstens <[EMAIL PROTECTED]>

Add missing cond_syscall statements for compat_sys_signalfd
and compat_sys_timerfd.

Cc: Davide Libenzi <[EMAIL PROTECTED]>
Signed-off-by: Heiko Carstens <[EMAIL PROTECTED]>
---

Index: linux-2.6/kernel/sys_ni.c
===
--- linux-2.6.orig/kernel/sys_ni.c
+++ linux-2.6/kernel/sys_ni.c
@@ -145,4 +145,6 @@ cond_syscall(sys_ioprio_get);
 /* New file descriptors */
 cond_syscall(sys_signalfd);
 cond_syscall(sys_timerfd);
+cond_syscall(compat_sys_signalfd);
+cond_syscall(compat_sys_timerfd);
 cond_syscall(sys_eventfd);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] scalable rw_mutex

2007-05-11 Thread Oleg Nesterov

On 05/11, Peter Zijlstra wrote:
> 
> +static inline int __rw_mutex_read_trylock(struct rw_mutex *rw_mutex)
> +{
> + preempt_disable();
> + if (likely(!__rw_mutex_reader_slow(rw_mutex))) {

--- WINDOW ---

> + percpu_counter_mod(&rw_mutex->readers, 1);
> + preempt_enable();
> + return 1;
> + }
> + preempt_enable();
> + return 0;
> +}
>
> [...snip...]
>
> +void rw_mutex_write_lock_nested(struct rw_mutex *rw_mutex, int subclass)
> +{
> [...snip...]
> +
> + /*
> +  * block new readers
> +  */
> + __rw_mutex_status_set(rw_mutex, RW_MUTEX_READER_SLOW);
> + /*
> +  * wait for all readers to go away
> +  */
> + wait_event(rw_mutex->wait_queue,
> + (percpu_counter_sum(&rw_mutex->readers) == 0));
> +}

This look a bit suspicious, can't mutex_write_lock() set RW_MUTEX_READER_SLOW
and find percpu_counter_sum() == 0 in that WINDOW above?


> +void rw_mutex_read_unlock(struct rw_mutex *rw_mutex)
> +{
> + rwsem_release(&rw_mutex->dep_map, 1, _RET_IP_);
> +
> + percpu_counter_mod(&rw_mutex->readers, -1);
> + if (unlikely(__rw_mutex_reader_slow(rw_mutex)) &&
> + percpu_counter_sum(&rw_mutex->readers) == 0)
> + wake_up_all(&rw_mutex->wait_queue);
> +}

The same. __rw_mutex_status_set()->wmb() in rw_mutex_write_lock below
is not enough. percpu_counter_mod() doesn't take fbc->lock if < FBC_BATCH,
so we don't have a proper serialization.

write_lock() sets RW_MUTEX_READER_SLOW, finds percpu_counter_sum() != 0,
and sleeps. rw_mutex_read_unlock() decrements cpu-local var, does not
see RW_MUTEX_READER_SLOW and skips wake_up_all().


> +void rw_mutex_write_lock_nested(struct rw_mutex *rw_mutex, int subclass)
> +{
> + might_sleep();
> + rwsem_acquire(&rw_mutex->dep_map, subclass, 0, _RET_IP_);
> +
> + mutex_lock_nested(&rw_mutex->write_mutex, subclass);
> + mutex_lock_nested(&rw_mutex->read_mutex, subclass);
> +
> + /*
> +  * block new readers
> +  */
> + __rw_mutex_status_set(rw_mutex, RW_MUTEX_READER_SLOW);
> + /*
> +  * wait for all readers to go away
> +  */
> + wait_event(rw_mutex->wait_queue,
> + (percpu_counter_sum(&rw_mutex->readers) == 0));
> +}
> +
> +void rw_mutex_write_unlock(struct rw_mutex *rw_mutex)
> +{
> + int waiters;
> +
> + rwsem_release(&rw_mutex->dep_map, 1, _RET_IP_);
> +
> + /*
> +  * let the readers rip
> +  */
> + __rw_mutex_status_set(rw_mutex, RW_MUTEX_READER_FAST);
> + waiters = atomic_read(&rw_mutex->read_waiters);
> + mutex_unlock(&rw_mutex->read_mutex);
> + /*
> +  * wait for at least 1 reader to get through
> +  */
> + if (waiters) {
> + wait_event(rw_mutex->wait_queue,
> + (atomic_read(&rw_mutex->read_waiters) < waiters));
> + }
> + /*
> +  * before we let the writers rip
> +  */
> + mutex_unlock(&rw_mutex->write_mutex);
> +}

Looks like we can have only one task on rw_mutex->wait_queue, and it holds
->write_mutex. Can't we use just a "task_struct *write_waiter" instead of
->wait_queue ? This makes rw_mutex smaller.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way

2007-05-11 Thread Linus Torvalds



On Fri, 11 May 2007, Rafael J. Wysocki wrote:
> 
> For user space processes this condition is always true.
> 
> For kernel threads:
> (1) the change of tsk->mm from NULL to a nonzero value is only made in
> fs/aio.c:use_mm() along with the setting of PF_BORROWED_MM under
> the task_lock(),
> (2) the change of tsk->mm from a nonzero value to NULL is only made in
> fs/aio.c:unuse_mm() along with the resetting of PF_BORROWED_MM
> under the task_lock().
> Therefore, by taking the task_lock() here we make sure that the condition
> is alyways false when we check it for kernel threads.

Why *test* it then and return anything?

Why not just doa "task_lock(p); task_unlock(p);" with no return value? 

As it is, it sounds like either the code is buggy, or it's pointless.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 6/7] Add common orderly_poweroff()

2007-05-11 Thread Randy Dunlap

On Thu, 10 May 2007 16:57:14 -0700 Jeremy Fitzhardinge wrote:

> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2208,3 +2208,61 @@ asmlinkage long sys_getcpu(unsigned __us
> +
> +/**
> + * Trigger an orderly system poweroff

* orderly_poweroff - Trigger an orderly system poweroff

> + * @force: force poweroff if command execution fails
> + *
> + * This may be called from any context to trigger a system shutdown.
> + * If the orderly shutdown fails, it will force an immediate shutdown.
> + */
> +int orderly_poweroff(bool force)
> +{
> + int argc;
> + char **argv = argv_split(GFP_ATOMIC, poweroff_cmd, &argc);
> + static char *envp[] = {
> + "HOME=/",
> + "PATH=/sbin:/bin:/usr/sbin:/usr/bin",
> + NULL
> + };
> + int ret = -ENOMEM;
> + struct subprocess_info *info;
> +
> + if (argv == NULL) {
> + printk(KERN_WARNING "%s failed to allocate memory for \"%s\"\n",
> +__func__, poweroff_cmd);
> + goto out;
> + }
> +
> + info = call_usermodehelper_setup(argv[0], argv, envp);
> + if (info == NULL) {
> + argv_free(argv);
> + goto out;
> + }
> +
> + call_usermodehelper_setcleanup(info, argv_cleanup);
> +
> + ret = call_usermodehelper_exec(info, -1);
> +
> +  out:
> + if (ret && force) {
> + printk(KERN_WARNING "Failed to start orderly shutdown: "
> +"forcing the issue\n");
> +
> + /* I guess this should try to kick off some daemon to
> +sync and poweroff asap.  Or not even bother syncing
> +if we're doing an emergency shutdown? */
> + emergency_sync();
> + kernel_power_off();
> + }
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(orderly_poweroff);


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 4/7] add argv_split()

2007-05-11 Thread Randy Dunlap

On Thu, 10 May 2007 16:57:12 -0700 Jeremy Fitzhardinge wrote:

> --- /dev/null
> +++ b/lib/argv_split.c
> @@ -0,0 +1,159 @@
> +
> +/**
> + * argv_free - free an argv
> + *

extra "blank" line.

> + * @argv - the argument vector to be freed
> + *
> + * Frees an argv and the strings it points to.
> + */
> +void argv_free(char **argv)
> +{
> + char **p;
> + for (p = argv; *p; p++)
> + kfree(*p);
> +
> + kfree(argv);
> +}
> +EXPORT_SYMBOL(argv_free);

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [bisect] NFS regression breaks X

2007-05-11 Thread Jeff Garzik



ACK -- this regression was fixed by Trond's recent NFS bugfix push upstream.

Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] swsusp: Use platform mode by default

2007-05-11 Thread Linus Torvalds



On Fri, 11 May 2007, Rafael J. Wysocki wrote:
> 
> Just to clarify, the change in question isn't new.  It was introduced by the
> commit 9185cfa92507d07ac787bc73d06c4eec7239 before 2.6.20, at Seife's
> request and with Pavel's acceptance.

Ok, if it's that old, we migt as leave it in. Clearly there weren't many 
regressions, and this isn't a case of other monsters lurking behind a lack 
of testers.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][RESEND] PIE randomization

2007-05-11 Thread Ulrich Drepper


On 5/11/07, Andrew Morton <[EMAIL PROTECTED]> wrote:

erm, I was being funny.  If you randomize a binary it won't run any more.
cp /dev/random /bin/login.  Oh well.

My point is, we're not being told what is being randomized here.  Is it the
virtual starting address of the main executable mmap?  Of the shared
libraries also?  Is it the stack location?  What?


PIE = Position Independent Executable, that's how I named them.

These are not regular executables, they are basically DSOs but usually
compiled with -fpie/-fPIE instead of -fpic/-fPIC and linked with -pie
instead of -shared to allow the compiled and linker perform more
optimizations.

See section 5 in

 http://people.redhat.com/drepper/nonselsec.pdf

Jan unfortunately Ingo's document which doesn't really explain it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

NFS spews warnings on x86-64

2007-05-11 Thread Jeff Garzik


Current git, on Fedora 6/x86-64:

fs/nfs/read.c: In function ‘nfs_return_empty_page’:
fs/nfs/read.c:82: warning: ‘memclear_highpage_flush’ is deprecated 
(declared at include/linux/highmem.h:115)

fs/nfs/read.c: In function ‘nfs_readpage_truncate_uninitialised_page’:
fs/nfs/read.c:106: warning: ‘memclear_highpage_flush’ is deprecated 
(declared at include/linux/highmem.h:115)
fs/nfs/read.c:109: warning: ‘memclear_highpage_flush’ is deprecated 
(declared at include/linux/highmem.h:115)

fs/nfs/read.c: In function ‘nfs_readpage_async’:
fs/nfs/read.c:133: warning: ‘memclear_highpage_flush’ is deprecated 
(declared at include/linux/highmem.h:115)

fs/nfs/read.c: In function ‘readpage_async_filler’:
fs/nfs/read.c:535: warning: ‘memclear_highpage_flush’ is deprecated 
(declared at include/linux/highmem.h:115)

fs/nfs/write.c: In function ‘nfs_mark_uptodate’:
fs/nfs/write.c:171: warning: ‘memclear_highpage_flush’ is deprecated 
(declared at include/linux/highmem.h:115)

fs/nfs/nfs4xdr.c: In function ‘decode_close’:
fs/nfs/nfs4xdr.c:2900: warning: format ‘%u’ expects type ‘unsigned int’, 
but argument 4 has type ‘long unsigned int’

fs/nfs/nfs4xdr.c: In function ‘decode_lock’:
fs/nfs/nfs4xdr.c:3189: warning: format ‘%u’ expects type ‘unsigned int’, 
but argument 4 has type ‘long unsigned int’

fs/nfs/nfs4xdr.c: In function ‘decode_locku’:
fs/nfs/nfs4xdr.c:3212: warning: format ‘%u’ expects type ‘unsigned int’, 
but argument 4 has type ‘long unsigned int’

fs/nfs/nfs4xdr.c: In function ‘decode_open’:
fs/nfs/nfs4xdr.c:3278: warning: format ‘%u’ expects type ‘unsigned int’, 
but argument 4 has type ‘long unsigned int’

fs/nfs/nfs4xdr.c: In function ‘decode_open_confirm’:
fs/nfs/nfs4xdr.c:3305: warning: format ‘%u’ expects type ‘unsigned int’, 
but argument 4 has type ‘long unsigned int’

fs/nfs/nfs4xdr.c: In function ‘decode_open_downgrade’:
fs/nfs/nfs4xdr.c:3318: warning: format ‘%u’ expects type ‘unsigned int’, 
but argument 4 has type ‘long unsigned int’

fs/nfs/nfs4xdr.c: In function ‘decode_setclientid’:
fs/nfs/nfs4xdr.c:3593: warning: format ‘%u’ expects type ‘unsigned int’, 
but argument 4 has type ‘long unsigned int’

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] libata: fallback to the other IDENTIFY on device error, take#2

2007-05-11 Thread Jeff Garzik


Tejun Heo wrote:

+   if (class == ATA_DEV_ATA)
+   class = ATA_DEV_ATAPI;
+   else
+   class = ATA_DEV_ATA;



the 'else' branch is obviously redundant

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] libata: fallback to the other IDENTIFY on device error, take#2

2007-05-11 Thread Jeff Garzik


Tejun Heo wrote:

It seems the world isn't as frank as we thought and some devices lie
about who they are.  Fallback to the other IDENTIFY if IDENTIFY is
aborted by the device.  As this is the strategy used by IDE for a long
time, it shouldn't cause too much problem.

Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
Cc: William Thompson <[EMAIL PROTECTED]>
---
Updated to fallback iff IDENTIFY is aborted by the device.


applied


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 0/2] [PATCH] input: correctly handle keys without hardware release event

2007-05-11 Thread Giel de Nijs

Hi,

This patch adds a soft release key mask to input_dev, to enable keyboard
drivers to determine which keys never generate a hardware release event and
hence add a release event after every press event of such keys. The mask is
controlled by ioctls.

The Fn+F? key combinations of Dell Latitude series laptops (and possibly other
Dells or other brands) only generate a key press event and never a key release
event, which is most probable a hardware flaw (or feature?). Due to this flaw,
combinations like Fn+F1 for hibernate and Fn-F3 for showing battery status
cannot be used. Ubuntu has probably fixed this by patching the X input layer
and HAL, but other distributions (like Debian) cannot use these keys. This
patch adds a generic method to signal if keys with certain scancodes never
generate release events, so the keyboard driver can add those events right
after a key press event.

The ioctls used to read and write to this bitmask might be used in a program
like setkeycodes, which is normally used to map certain scancodes to keycodes.
With a command line option, this program could also set the soft release bit
for a certain scancode if desired. Patches for setkeycodes and getkeycodes
against the Debian console-tools can be found at
http://giel.operation0.org/keyboard-soft-release

This patch also uses the infrastructure for generating release events for
KEY_HANGEUL and KEY_HANJA, something which was already done in atkbd.c.

See also this thread: http://thread.gmane.org/gmane.linux.kernel/401378

Greetings,
Giel
-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 >

1 - 100 of 414 matches

Mail list logo