Re: [patch 3/3] clockevents: Fix resume logic - updated version
On Fri, 11 May 2007 23:09:15 +0200 "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > > > > hm, Fedora don't seem to want to give me an RPM which contains acpidump > > > and > > > all the yum servers are featuring scrogged checksums. I could build it, I > > > guess, but there's a principle involved ;) > > > > > > http://userweb.kernel.org/~akpm/dsdt is /proc/acpi/dsdt. Is that OK? > > > > Yes, thanks. > > Hmm, have you tried to do 'echo shutdown > /sys/power/disk' before the > hibernation? That didn't change the behaviour. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ALPHA: MARVEL - check for allocated memory
This patch adds checking for allocated memory which is used to hold AGP info. Also some whitespace cleanup. Signed-off-by: Cyrill Gorcunov <[EMAIL PROTECTED]> --- arch/alpha/kernel/core_marvel.c | 137 --- 1 files changed, 71 insertions(+), 66 deletions(-) diff --git a/arch/alpha/kernel/core_marvel.c b/arch/alpha/kernel/core_marvel.c index 7f6a984..9f6d1a2 100644 --- a/arch/alpha/kernel/core_marvel.c +++ b/arch/alpha/kernel/core_marvel.c @@ -29,7 +29,7 @@ #include "proto.h" #include "pci_impl.h" - + /* * Debug helpers */ @@ -41,13 +41,13 @@ # define DBG_CFG(args) #endif - + /* * Private data */ static struct io7 *io7_head = NULL; - + /* * Helper functions */ @@ -79,7 +79,7 @@ mk_resource_name(int pe, int port, char *str) { char tmp[80]; char *name; - + sprintf(tmp, "PCI %s PE %d PORT %d", str, pe, port); name = alloc_bootmem(strlen(tmp) + 1); strcpy(name, tmp); @@ -130,19 +130,19 @@ alloc_io7(unsigned int pe) * Insert in pe sorted order. */ if (NULL == io7_head) /* empty list */ - io7_head = io7; + io7_head = io7; else if (io7_head->pe > io7->pe) { /* insert at head */ io7->next = io7_head; io7_head = io7; } else {/* insert at position */ for (insp = io7_head; insp; insp = insp->next) { if (insp->pe == io7->pe) { - printk(KERN_ERR "Too many IO7s at PE %d\n", + printk(KERN_ERR "Too many IO7s at PE %d\n", io7->pe); return NULL; } - if (NULL == insp->next || + if (NULL == insp->next || insp->next->pe > io7->pe) { /* insert here */ io7->next = insp->next; insp->next = io7; @@ -157,7 +157,7 @@ alloc_io7(unsigned int pe) io7_head = io7; } } - + return io7; } @@ -191,7 +191,7 @@ io7_clear_errors(struct io7 *io7) p7csrs->PO7_CRRCT_SYM.csr = -1UL; } - + /* * IO7 PCI, PCI/X, AGP configuration. */ @@ -206,11 +206,11 @@ io7_init_hose(struct io7 *io7, int port) int i; hose->index = hose_index++; /* arbitrary */ - + /* * We don't have an isa or legacy hose, but glibc expects to be * able to use the bus == 0 / dev == 0 form of the iobase syscall -* to determine information about the i/o system. Since XFree86 +* to determine information about the i/o system. Since XFree86 * relies on glibc's determination to tell whether or not to use * sparse access, we need to point the pci_isa_hose at a real hose * so at least that determination is correct. @@ -249,10 +249,10 @@ io7_init_hose(struct io7 *io7, int port) hose->mem_space->flags = IORESOURCE_MEM; if (request_resource(&ioport_resource, hose->io_space) < 0) - printk(KERN_ERR "Failed to request IO on hose %d\n", + printk(KERN_ERR "Failed to request IO on hose %d\n", hose->index); if (request_resource(&iomem_resource, hose->mem_space) < 0) - printk(KERN_ERR "Failed to request MEM on hose %d\n", + printk(KERN_ERR "Failed to request MEM on hose %d\n", hose->index); /* @@ -284,7 +284,7 @@ io7_init_hose(struct io7 *io7, int port) hose->sg_isa = iommu_arena_new_node(marvel_cpuid_to_nid(io7->pe), hose, 0x0080, 0x0080, 0); hose->sg_isa->align_entry = 8; /* cache line boundary */ - csrs->POx_WBASE[0].csr = + csrs->POx_WBASE[0].csr = hose->sg_isa->dma_base | wbase_m_ena | wbase_m_sg; csrs->POx_WMASK[0].csr = (hose->sg_isa->size - 1) & wbase_m_addr; csrs->POx_TBASE[0].csr = virt_to_phys(hose->sg_isa->ptes); @@ -302,7 +302,7 @@ io7_init_hose(struct io7 *io7, int port) hose->sg_pci = iommu_arena_new_node(marvel_cpuid_to_nid(io7->pe), hose, 0xc000, 0x4000, 0); hose->sg_pci->align_entry = 8; /* cache line boundary */ - csrs->POx_WBASE[2].csr = + csrs->POx_WBASE[2].csr = hose->sg_pci->dma_base | wbase_m_ena | wbase_m_sg; csrs->POx_WMASK[2].csr = (hose->sg_pci->size - 1) & wbase_m_addr; csrs->POx_TBASE[2].csr = virt_to_phys(hose->sg_pci->ptes); @@ -357,7 +357,7 @@ marvel_io7_present(gct6_node *node) int pe; if (node->type != GCT_TYPE_HOSE || - node->subtype != GCT_SUBTYPE_IO_PORT_MODULE) + node->subtype != GCT_SUBT
[PATCH] ALPHA: TITAN - check for allocated memory
This patch adds checking for allocated memory which is used to hold AGP info. Also some whitespace cleanup. Signed-off-by: Cyrill Gorcunov <[EMAIL PROTECTED]> --- arch/alpha/kernel/core_titan.c | 99 +--- 1 files changed, 52 insertions(+), 47 deletions(-) diff --git a/arch/alpha/kernel/core_titan.c b/arch/alpha/kernel/core_titan.c index 3662fef..419dbc8 100644 --- a/arch/alpha/kernel/core_titan.c +++ b/arch/alpha/kernel/core_titan.c @@ -46,7 +46,7 @@ struct # define DBG_CFG(args) #endif - + /* * Routines to access TIG registers. */ @@ -56,21 +56,21 @@ mk_tig_addr(int offset) return (volatile unsigned long *)(TITAN_TIG_SPACE + (offset << 6)); } -static inline u8 +static inline u8 titan_read_tig(int offset, u8 value) { volatile unsigned long *tig_addr = mk_tig_addr(offset); return (u8)(*tig_addr & 0xff); } -static inline void +static inline void titan_write_tig(int offset, u8 value) { volatile unsigned long *tig_addr = mk_tig_addr(offset); *tig_addr = (unsigned long)value; } - + /* * Given a bus, device, and function number, compute resulting * configuration space address @@ -84,7 +84,7 @@ titan_write_tig(int offset, u8 value) * * Type 1: * - * 3 3|3 3 2 2|2 2 2 2|2 2 2 2|1 1 1 1|1 1 1 1|1 1 + * 3 3|3 3 2 2|2 2 2 2|2 2 2 2|1 1 1 1|1 1 1 1|1 1 * 3 2|1 0 9 8|7 6 5 4|3 2 1 0|9 8 7 6|5 4 3 2|1 0 9 8|7 6 5 4|3 2 1 0 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * | | | | | | | | | | |B|B|B|B|B|B|B|B|D|D|D|D|D|F|F|F|R|R|R|R|R|R|0|1| @@ -95,11 +95,11 @@ titan_write_tig(int offset, u8 value) * 15:11 Device number (5 bits) * 10:8function number * 7:2register number - * + * * Notes: - * The function number selects which function of a multi-function device + * The function number selects which function of a multi-function device * (e.g., SCSI and Ethernet). - * + * * The register selects a DWORD (32 bit) register offset. Hence it * doesn't get shifted by 2 bits as we want to "drop" the bottom two * bits. @@ -123,7 +123,7 @@ mk_conf_addr(struct pci_bus *pbus, unsigned int device_fn, int where, addr = (bus << 16) | (device_fn << 8) | where; addr |= hose->config_space_base; - + *pci_addr = addr; DBG_CFG(("mk_conf_addr: returning pci_addr 0x%lx\n", addr)); return 0; @@ -154,7 +154,7 @@ titan_read_config(struct pci_bus *bus, unsigned int devfn, int where, return PCIBIOS_SUCCESSFUL; } -static int +static int titan_write_config(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 value) { @@ -185,17 +185,17 @@ titan_write_config(struct pci_bus *bus, unsigned int devfn, int where, return PCIBIOS_SUCCESSFUL; } -struct pci_ops titan_pci_ops = +struct pci_ops titan_pci_ops = { .read = titan_read_config, .write =titan_write_config, }; - + void titan_pci_tbi(struct pci_controller *hose, dma_addr_t start, dma_addr_t end) { - titan_pachip *pachip = + titan_pachip *pachip = (hose->index & 1) ? TITAN_pachip1 : TITAN_pachip0; titan_pachip_port *port; volatile unsigned long *csr; @@ -203,11 +203,11 @@ titan_pci_tbi(struct pci_controller *hose, dma_addr_t start, dma_addr_t end) /* Get the right hose. */ port = &pachip->g_port; - if (hose->index & 2) + if (hose->index & 2) port = &pachip->a_port; /* We can invalidate up to 8 tlb entries in a go. The flush - matches against <31:16> in the pci address. + matches against <31:16> in the pci address. Note that gtlbi* and atlbi* are in the same place in the g_port and a_port, respectively, so the g_port offset can be used even if hose is an a_port */ @@ -215,7 +215,7 @@ titan_pci_tbi(struct pci_controller *hose, dma_addr_t start, dma_addr_t end) if (((start ^ end) & 0x) == 0) csr = &port->port_specific.g.gtlbiv.csr; - /* For TBIA, it doesn't matter what value we write. For TBI, + /* For TBIA, it doesn't matter what value we write. For TBI, it's the shifted tag bits. */ value = (start & 0x) >> 12; @@ -249,11 +249,11 @@ titan_init_one_pachip_port(titan_pachip_port *port, int index) hose->mem_space = alloc_resource(); /* -* This is for userland consumption. The 40-bit PIO bias that we -* use in the kernel through KSEG doesn't work in the page table +* This is for userland consumption. The 40-bit PIO bias that we +* use in the kernel through KSEG doesn't work in the page table * based user mappings. (43-bit KSEG sign extends the physical * address from bit 40 to hit the I/O bit - mapped addresses don't). -* So make s
Re: FUTEX_CMP_REQUEUE_PI is not quite there
Andrew Morton wrote: Well yup. We're kind of waiting for someone to reply to http://lkml.org/lkml/2007/5/7/129 Seems to be the same or at least related. On comment about my first mail: this is the correct code of condvars, despite what I wrote before. I wasn't thinking clear. The internal futex is a normal futex. It is the job of the CMP_REQUEUE_PI call to figure this out, select the waiter with the highest priority, and boost the priority if necessary based on the targer futex which always is a PI futex. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] kbuild: silence section mismatch warnings
On Fri, May 11, 2007 at 04:22:28PM -0500, Kumar Gala wrote: > > On May 11, 2007, at 4:08 PM, Sam Ravnborg wrote: > > >- Forwarded message from Sam Ravnborg <[EMAIL PROTECTED]> - > > > >Forgot lkml in first mail... > > > > Sam > > > >Subject: [RFC PATCH] kbuild: silence section mismatch warnings > >From: Sam Ravnborg <[EMAIL PROTECTED]> > >Date: Fri, 11 May 2007 23:03:46 +0200 > >User-Agent: Mutt/1.4.2.1i > >To: Chris Wedgwood <[EMAIL PROTECTED]>, Andrew Morton <[EMAIL PROTECTED]>, > > "David S. Miller" <[EMAIL PROTECTED]>, > > Russell King <[EMAIL PROTECTED]>, > > Satyam Sharma <[EMAIL PROTECTED]> > >Cc: [EMAIL PROTECTED] > > > >Following patch allow us in specific places to silence section > >mismatch warnings. > >There is a few legitime places that modpost does not yet recognize > >where > >reference from .text to .init.text (likewise for data) are legitime. > >This allow us to spot the few places and annotate them so we do not > >get false warnings that in the end will let real warnings pass. > > > >The annotation is simple to grep for so revieing all uses in a few > >months time are trivial. It is assumed that a few places will > >use this to shut up the warning as replacement for the real fix. > >But these cases are esay to spot and to fix up. > > Its unclear if you expect that some things will be tagged > __init_refok/__initdata_refok forever or if we'll find some way to > fix/change the code so the things tagged no longer need it. A few places will need the __init_refok tag forever. But as Satyam points out it will likely be misused. So the __init_refok is introduced to stay. akpm pointed out in private mail that I need to update the linker scripts too - and running out of time this weekend so that will be later. Sam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] "volatile considered harmful", take 3
Satyam Sharma wrote: > > Because volatile is ill-defined? Or actually, *undefined* (well, > implementation-defined is as good as that)? It's *so* _vague_, > one doesn't _feel_ like using it at all! > Sorry, that's just utter crap. Linux isn't written in some mythical C which only exists in standard document, it is written in a particular subset of GNU C. "volatile" is well enough defined in that context, it is just frequently misused. > We already have a complete API containing optimization barriers, > load/store/full memory barriers. With well-defined and > well-understood semantics. Just ... _why_ use volatile? See below. > It will _always_ work. In fact you can't really say the same for > volatile. We already assume the compiler _actually_ took some > pains to stuff meaning into C's (lack of) definition of volatile and > implement it -- but in what sense, nobody knows (the C standard > doesn't, so what are we). It will always work within the context of GNU C. >> more heavy-handed as it's disabling *all* optimization such as loop >> invariants across the barrier. > > This is a legitimate criticism, I agree. There you have it. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: FUTEX_CMP_REQUEUE_PI is not quite there
On Fri, 11 May 2007 23:10:47 -0700 Ulrich Drepper <[EMAIL PROTECTED]> wrote: > I hooked up FUTEX_CMP_REQUEUE_PI here and got a kernel crash. Well yup. We're kind of waiting for someone to reply to http://lkml.org/lkml/2007/5/7/129 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] LinuxPPS: Pulse per Second support for Linux
On Fri, 11 May 2007 23:55:37 +0200 Rodolfo Giometti <[EMAIL PROTECTED]> wrote: > Hello, > > here my new patch with a lot of fixes. > > The only issue not still fixed is the one related with: > > #define NETLINK_PPSAPI 20 > > I need time to resolve it. > > Follows my comments and then the patch, hope now I can came back into > -mm tree again! :) Well I suppose I could toss it in there for a bit of review-and-test. But I'll need to drop it again because we do need to split this patch into the series of patches, please. You should do this earlier rather than later because it improves reviewability. > > - This: > > > > static void pps_class_release(struct class_device *cdev) > > { > > /* Nop??? */ > > } > > > > is a bug and it earns you a nastygram from Greg. These objects must be > > dynamically allocated - this is not optional. > > It could be acceptable defining this function as void? No, it needs to be a proper release function, like all the other ones around the place. This comes up again and again and again and I recently asked Greg to direct me to (or to write) suitable documentation, and I think he did, but I lost it. Greg, can you remind us please? > > We have a bunch of code in random other drivers which is dependent upon > > CONFIG_PPS_CLIENT_foo. The problem is that if a kernel was compiled with > > CONFIG_PPS_CLIENT_foo=n and then the pps driver is later built for that > > kernel, it won't actually work because lp, serial etc weren't correctly > > configured when _they_ were built. > > > > This sort of cross-module coupling is considered to be a bad thing, but > > I'm not really sure it's all that important. > > > > - Please split the patch up into a series of patches: one for pps core and > > one for each of the clients (servers?): one for lp, one for serial, etc. > > > > Try to arrange for that series of patches to build and run at each stage > > of application. > > > > Please don't lose my changes when you do so ;) > > > > Please review the changes I made and a) stick to the same style and b) fix > > up any sites which I missed. > > > > - Please remove all the typedefs: > > > > +typedef struct ntp_fp { > > +typedef union pps_timeu { > > +typedef struct pps_info { > > +typedef struct pps_params { > > > > and just use `struct ntp_fp' everywhere. > > Those typedefs are defined in PPS specifications (please, see RFC 2783). We don't use typedefs in-kernel. Please convert the code to use `struct ntp_fp' everywhere. For RFC compatibility to userspace you can do #ifndef __KERNEL__ typedef struct ntp_fp ntp_fp_t; ... #endif > > - The above four structures are communicated with userspace, yes? > > > > I believe that they will not work correctly when 32-bit userspace is > > communicating with a 64-bit kernel. Alignments change and sizeof(long) > > changes. > > > > You don't want to have to write compat code. I suggest that you redo > > those structures in terms of __u32, __u64, etc. You probably need to use > > attribute((packed)) too, not sure. > > > > Then let's get that part carefully reviewed (Arnd Bergmann <[EMAIL > > PROTECTED]> > > is my go-to guru on this) and please test it carefully. > > > > Yeah, you just haven't got a chance that something as huge and as complex > > as struct pps_netlink_msg will survive the 32->64 transition. > > The same as above. These structure are fixed by RFC 2783. Your answer has no relationship to my question. The problem here is that under a 64-bit kernel we require that applications which use this structure definition work correctly when they are compiled to generate 32-bit code and when they are compiled to generate 64-bit code. Furthermore we should aim to to have to code work correctly across different version of the compiler, and when different compiler options are used, and when altogether different compilers are used. It is not clear to me that your definition is sufficiently defensive against _any_ of these things. > > - Please ensure that `make headers_check' passes OK (you'll hear from me if > > it doesn't ;)) > > Done. > > > - Can we get rid of the private dbg, err and info macros? Surely there are > > generic ones somewhere. > > They are very useful to LinuxPPS users who can enable/disable them by > configuration menu. You misunderstand. I'm not saying "remove the callsites". I'm saying "remove the definitions". Because we already have things like pr_debug() and pr_info(), so new code should use those rather than reinventing them. Plus, we already have at least 52 different implementations of "dbg" in the tree and your 53rd one didn't compile because it clashed with someone else's. This is the compiler sending us a message: "use the exiting infrastructure". If that infrastructure is insufficient then let's improve it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body o
Re: [PATCH] "volatile considered harmful", take 3
On 5/12/07, H. Peter Anvin <[EMAIL PROTECTED]> wrote: Satyam Sharma wrote: > >> + - Pointers to data structures in coherent memory which might be >> modified >> +by I/O devices can, sometimes, legitimately be volatile. A ring >> buffer >> +used by a network adapter, where that adapter changes pointers to >> +indicate which descriptors have been processed, is an example of >> this >> +type of situation. > > is a legitimate use case for volatile is still not clear to me (I > agree with Alan's > comment in a previous thread that this seems to be a case where a memory > barrier would be applicable^Wbetter, actually). I could be wrong here, so > would be nice if Peter explains why volatile is legitimate here. > > Otherwise, it's fine with me. > I don't see why Alan's way is necessarily better; Because volatile is ill-defined? Or actually, *undefined* (well, implementation-defined is as good as that)? It's *so* _vague_, one doesn't _feel_ like using it at all! We already have a complete API containing optimization barriers, load/store/full memory barriers. With well-defined and well-understood semantics. Just ... _why_ use volatile? it should work but is It will _always_ work. In fact you can't really say the same for volatile. We already assume the compiler _actually_ took some pains to stuff meaning into C's (lack of) definition of volatile and implement it -- but in what sense, nobody knows (the C standard doesn't, so what are we). more heavy-handed as it's disabling *all* optimization such as loop invariants across the barrier. This is a legitimate criticism, I agree. Thanks, Satyam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
FUTEX_CMP_REQUEUE_PI is not quite there
I hooked up FUTEX_CMP_REQUEUE_PI here and got a kernel crash. No serial console so this is the output of the screen after the machine stopped. This is of course on x86-64. Compiled from a rawhide-ified upstream kernel from two days ago. The situation is the we requeue from a non-PI futex to a PI futex. We might now actually want to change the condvar implementation to use internally a PI futex if the mutex in use is PI, too, but this kind of mismatch can still happen. I can provide binaries if necessary. There is quite a lot of output from the kernel: BUG: at kernel/futex.c:1665 set_pi_futex_owner() Call Trace: [] futex_lock_pi+0x351/0x685 [] _spin_lock_irqsave+0x9/0xe [] __up_read+0x19/0x7f [] default_wake_function+0x0/0xe [] do_futex+0xa68/0x10e8 [] sys_futex+0xee/0x10c [] _spin_unlock_irq+0x9/0xc [] system_call+0x7e/0x83 BUG: at lib/plist.c:78 plist_add() Call Trace: [] plist_add+0x3a/0x90 [] futex_lock_pi+0x387/0x685 [] _spin_lock_irqsave+0x9/0xe [] __up_read+0x19/0x7f [] default_wake_function+0x0/0xe [] do_futex+0xa68/0x10e8 [] sys_futex+0xee/0x10c [] _spin_unlock_irq+0x9/0xc [] system_call+0x7e/0x83 BUG: at kernel/futex.c:483 exit_pi_state_list() Call Trace: [] exit_pi_state_list+0xbe/0x11e [] do_exit+0x801/0x84e [] complete_and_exit+0x0/0x16 [] system_call+0x7e/0x83 list_add corruption. prev->next should be next (81001dda1cb8), but was 81006c 6e06c8. (prev=81006c6e06c8). [ cut here ] kernel BUG at lib/list_debug.c:33! invalid opcode: [1] SMP CPU 0 Pid: 15097, comm: ld-linux-x86-64 Not tainted 2.6.21-1.3145.fc7 #1 RIP: 0010:[] [] __list_add+0x47/0x5b RSP: 0018:81003cc01e78 EFLAGS: 00010092 RAX: 0079 RBX: 81001dda1cb8 RCX: fca9 RDX: RSI: 0282 RDI: 80559a50 RBP: 81001dda1cb0 R08: 00a0 R09: 0010 R10: 81000305dd00 R11: R12: 81001dda1c88 R13: 0282 R14: 81006c6e0080 R15: 810075edac78 FS: () GS:8059e000() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 40400eb8 CR3: 1c40f000 CR4: 26e0 Process ld-linux-x86-64 (pid: 15097, threadinfo 81003cc0, task 81006c6e00 Stack: 81006c6e06b0 8030c7a2 81006c6e07b0 810075edac50 81006c6e06b0 8043ac19 81006c6e06b0 810075edac40 81006c6e06b0 8070f9f0 81006c6e07b0 81006c6e0080 Call Trace: [] plist_del+0x3a/0x70 [] rt_mutex_slowunlock+0x8c/0x1cd [] exit_pi_state_list+0xec/0x11e [] do_exit+0x801/0x84e [] complete_and_exit+0x0/0x16 [] system_call+0x7e/0x83 Code: 0f 0b eb fe 48 89 7e 08 48 89 37 48 89 57 08 48 89 3a 5a c3 RIP [] __list_add+0x47/0x5b RSP -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] LinuxPPS: Pulse per Second support for Linux
Hello, here my new patch with a lot of fixes. The only issue not still fixed is the one related with: #define NETLINK_PPSAPI 20 I need time to resolve it. Follows my comments and then the patch, hope now I can came back into -mm tree again! :) On Thu, May 10, 2007 at 12:27:52AM -0700, [EMAIL PROTECTED] wrote: > > Review comments: > > - Running a timer once per second will make the super-low-power people upset. The ktimer modules is just for debugging pourpose and it's not needed into real working system. > - This uses netlink? Is that interface documented anywhere? > > Please check with Dave Miller that this: > > #define NETLINK_PPSAPI 20 > > reservation is OK. Is not ok. To be fixed. > - This: > > if ((nlpps->tsformat != PPS_TSFMT_TSPEC) != 0 ) { > > is weird. I changed it to > > if (nlpps->tsformat != PPS_TSFMT_TSPEC) { Fixed. > - This: > > timeout += nlpps->timeout.tv_nsec/(10/HZ); > > probably won't work on i386. We use do_div() for 64/32 divides. I'll > find out when I compile it. > > It's nice to use NSEC_PER_SEC rather than having to count all those > zeroes. Fixed. > - The code uses interruptible_sleep_on_timeout(). That API is deprecated > and is racy. Please convert to wait_event_interruptible_timeout(). > > Ditto interruptible_sleep_on() Fixed. > - This: > > memset(pps_source, 0, sizeof(struct pps_s) * PPS_MAX_SOURCES); > > was unneeded. The C startup code already did that. Fixed. > - All these separators: > > +/* --- Input function -- +*/ > > aren't typical for kernel code. I left them in, but please consider > removing them all. Fixed. > - This: > > static void pps_class_release(struct class_device *cdev) > { > /* Nop??? */ > } > > is a bug and it earns you a nastygram from Greg. These objects must be > dynamically allocated - this is not optional. It could be acceptable defining this function as void? > - What's this doing in 8250.c? > > + if (up->port.flags & UPF_HARDPPS_CD) > + up->ier |= UART_IER_MSI;/* enable interrupts */ > > Please fully describe the reasons for this change in the changelog, and in > a code comment and then get the change reviewed by Russell King > <[EMAIL PROTECTED]>. If user specify a serial port as PPS source we enable IRQ on that port. > - Please document within the changelog the other changes to the serial code > and we'll ask Russell to take a look at those as well. OK. I'll do it. > - The Kconfig purports to support CONFIG_PPS=m. Does that actually work? Yes. It works... > We have a bunch of code in random other drivers which is dependent upon > CONFIG_PPS_CLIENT_foo. The problem is that if a kernel was compiled with > CONFIG_PPS_CLIENT_foo=n and then the pps driver is later built for that > kernel, it won't actually work because lp, serial etc weren't correctly > configured when _they_ were built. > > This sort of cross-module coupling is considered to be a bad thing, but > I'm not really sure it's all that important. > > - Please split the patch up into a series of patches: one for pps core and > one for each of the clients (servers?): one for lp, one for serial, etc. > > Try to arrange for that series of patches to build and run at each stage > of application. > > Please don't lose my changes when you do so ;) > > Please review the changes I made and a) stick to the same style and b) fix > up any sites which I missed. > > - Please remove all the typedefs: > > +typedef struct ntp_fp { > +typedef union pps_timeu { > +typedef struct pps_info { > +typedef struct pps_params { > > and just use `struct ntp_fp' everywhere. Those typedefs are defined in PPS specifications (please, see RFC 2783). > - The above four structures are communicated with userspace, yes? > > I believe that they will not work correctly when 32-bit userspace is > communicating with a 64-bit kernel. Alignments change and sizeof(long) > changes. > > You don't want to have to write compat code. I suggest that you redo > those structures in terms of __u32, __u64, etc. You probably need to use > attribute((packed)) too, not sure. > > Then let's get that part carefully reviewed (Arnd Bergmann <[EMAIL > PROTECTED]> > is my go-to guru on this) and please test it carefully. > > Yeah, you just haven't got a chance that something as huge and as complex > as struct pps_netlink_msg will survive the 32->64 transition. The same as above. These structure are fixed by RFC 2783. > - Please ensure that `make headers_check' passes OK (you'll hear from me if > it doesn't ;)) Done. > - Can we get rid of the private dbg, err and info macros? Surely there are > generic ones somewhere. They are very useful to LinuxPPS users who can enable/disable them by configuration menu. Also I'm planning to
Re: [PATCH] mm: swap prefetch improvements
Con wrote: > Hmm I'm not really sure what it takes to make it cpuset aware; > ... > It is numa aware to some degree. It stores the node id and when it starts > prefetching it only prefetches to nodes that are suitable for prefetching to > ... > It would be absolutely trivial to add a check for 'number_of_cpusets' <= 1 > in the prefetch_enabled() function. Would you like that? Hmmm ... it seems that we shadow boxing here ... trying to pick a solution to solve a problem when we aren't even sure we have a problem, much less what the problem is. That does not usually lead to the right path. Could you put some more effort into characterizing what problems can arise if one has prefetch and cpusets active at the same time? My first wild guess is that the only incompatibility would have been that prefetch might mess up NUMA placement (get pages on wrong nodes), which it seems you have tried to address in your current patches. So it would not surprise me if there was no problem here. We may just have to lean on Nick some more, if he is the only one who understands what the problem is, to try again to explain it to us. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-mm2: HDAPS? BUG: at kernel/mutex.c:311
Hi Dmitry, On 5/12/07, Dmitry Torokhov <[EMAIL PROTECTED]> wrote: On Friday 11 May 2007 20:53, Andrew Morton wrote: > Ho hum. I suppose a suitable workaround would be to convert hdaps_mtx back > into a semaphore. ug. Actually I was looking for victimes^Wvolunteers to test the patch below. It gets rid of _trylock business. Ah! You just beat me here, and your patch is definitely better. I was wondering why this driver wanted to use a mutex (previously the semaphore) to synchronize between process and interrupt context in the first place. Most of the code in here uses synchronous delays so never sleeps anyway, but then unfortunately it does a weird repeated-waiting-hardware-status-register-check thingy in its .probe() which meant a straightforward mutex -> spinlock wasn't possible. So then made a patch pushing off the poll to keventd workqueue, when I saw your mail that does exactly the same, but wrapped about in the generic input-polldev infrastructure! It's barely 12 days old in mainline -- no wonder I didn't know about it. Seems to be good-looking code! Thanks, Satyam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] "volatile considered harmful", take 3
H. Peter Anvin wrote: > > I don't see why Alan's way is necessarily better; it should work but is > more heavy-handed as it's disabling *all* optimization such as loop > invariants across the barrier. > To expand on this further: the way this probably *should* be handled, Linux-style, is with internally-volatile versions of le32_to_cpup() and friends. That obeys the concept that the volatility should be associated with an operation, not a data structure, and, being related to an I/O device, should have its endianness explicitly declared. Right now those macros don't exist, however. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] "volatile considered harmful", take 3
Satyam Sharma wrote: > >> + - Pointers to data structures in coherent memory which might be >> modified >> +by I/O devices can, sometimes, legitimately be volatile. A ring >> buffer >> +used by a network adapter, where that adapter changes pointers to >> +indicate which descriptors have been processed, is an example of >> this >> +type of situation. > > is a legitimate use case for volatile is still not clear to me (I > agree with Alan's > comment in a previous thread that this seems to be a case where a memory > barrier would be applicable^Wbetter, actually). I could be wrong here, so > would be nice if Peter explains why volatile is legitimate here. > > Otherwise, it's fine with me. > I don't see why Alan's way is necessarily better; it should work but is more heavy-handed as it's disabling *all* optimization such as loop invariants across the barrier. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] "volatile considered harmful", take 2
pradeep singh wrote: > > Sorry, for my misunderstanding but i hope Jonathan actually means > volatile harmful only in C and not while using extended asm with gcc? Or > does you all consider volatile while using extended asm as harmful too? > Incidentally i came to know that using volatile in such cases may be > still be optimized by the gcc. And the correct way is to fake a side > effect to the gcc, which can be done using "memory" clobbering directive > in the correct place and not "m" or "+m". > > Does this means to exclude volatile from extended asm also, while using > them in kernel? > We were talking about "register", not "volatile". -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/10] Linux Kernel Markers - i386 optimized version
On Fri, May 11, 2007 at 10:27:29AM +0530, Ananth N Mavinakayanahalli wrote: > On Thu, May 10, 2007 at 12:59:18PM -0400, Mathieu Desnoyers wrote: > > * Alan Cox ([EMAIL PROTECTED]) wrote: > > ... > > > > * Third issue : Scalability. Changing code will stop every CPU on the > > > > system for a while. Compared to this, the int3-based approach will run > > > > through the breakpoint handler "if" one of the CPU happens to execute > > > > this code at the wrong time. The standard case is just an IPI (to > > > > > > If I read the errata right then patching in an int3 will itself trigger > > > the errata so anything could happen. > > > > > > I believe there are other safe sequences for doing code patching - perhaps > > > one of the Intel folk can advise ? > > IIRC, when the first implementation of what exists now as kprobes was > done (as part of the dprobes framework), this question did come up. I > think the conclusion was that the errata applies only to multi-byte > modifications and single-byte changes are guaranteed to be atomic. > Given int3 on Intel is just 1-byte, we are safe. > > > I'll let the Intel guys confirm this, I don't have the reference nearby > > (I got this information by talking with the kprobe team members, and > > they got this information directly from Intel developers) but the > > int3 is the one special case to which the errata does not apply. > > Otherwise, kprobes and gdb would have a big, big issue. > > Perhaps Richard/Suparna can confirm. I just tried digging up past discussions on this from Richard, about int3 being safe http://sourceware.org/ml/systemtap/2005-q3/msg00208.html http://lkml.org/lkml/2006/9/20/30 Regards Suparna > > Ananth -- Suparna Bhattacharya ([EMAIL PROTECTED]) Linux Technology Center IBM Software Lab, India - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: swap prefetch improvements
On Saturday 12 May 2007 15:03, Paul Jackson wrote: > > Swap prefetch is not cpuset aware so make the config option depend on > > !CPUSETS. > > Ok. > > Could you explain what it means to say "swap prefetch is not cpuset aware", > or could you give a rough idea of what it would take to make it cpuset > aware? Hmm I'm not really sure what it takes to make it cpuset aware; it was Nick that pointed out that it was not, so I'm not sure and still going off your original recommendation that there was no need to make it cpuset aware but at least honour node placement (see below). > I wouldn't go so far as to say that no one would ever want to prefetch and > use cpusets at the same time, but I will grant that it's not a sufficiently > important need that it should block a useful prefetch implementation on > non-cpuset systems. Thank you for agreeing on me there :) > One case that would be useful, however, is to handle prefetch in the case > that cpusets are configured into ones kernel, but one is not making any > real use of them ('number_of_cpusets' <= 1). That will actually be the > most common case for the major distribution(s) that enable cpusets by > default in their builds, for most arch's including the arch's popular > on desktops. > > So what would it take to allow CONFIG'ing both prefetch and cpusets on, > but having prefetch dynamically adapt to the presence of active cpuset > usage, perhaps by basically shutting down if it can't easily do any > better? I could certainly entertain requests to callout to some > prefetch routine from the cpuset code, at the critical points that > cpusets transitioned in or out of active use. It would be absolutely trivial to add a check for 'number_of_cpusets' <= 1 in the prefetch_enabled() function. Would you like that? > Semi-separate issue -- is it just cpusets that aren't prefetch friendly, > or is it also mm/mempolicy (mbind, set_mempolicy) as well? > > For that matter, even if neither mm/mempolicy nor cpusets are used, on > systems with multiple memory nodes (not all memory equally distant from > all CPUs, aka NUMA), could prefetch cause some sort of shuffling of > memory placement, which might harm the performance of an HPC (High > Performance Computing) application with carefully tuned memory > placement. Granted, this -is- getting to be a corner case. Most HPC > apps running on NUMA hardware are making at least some use of > mm/mempolicy or cpusets. It is numa aware to some degree. It stores the node id and when it starts prefetching it only prefetches to nodes that are suitable for prefetching to (based on a number of arbitrary freeness arguments I invented). It uses the original node id it came from by allocating a page via: alloc_pages_node(node, GFP_HIGHUSER & ~__GFP_WAIT, 0); where "node" is the original node the swapped page came from. Thanks for comments. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-mm2: HDAPS? BUG: at kernel/mutex.c:311
On Friday 11 May 2007 20:53, Andrew Morton wrote: > Ho hum. I suppose a suitable workaround would be to convert hdaps_mtx back > into a semaphore. ug. Actually I was looking for victimes^Wvolunteers to test the patch below. It gets rid of _trylock business. -- Dmitry HWMON: hdaps - convert to use input-polldev Switch to using input-polldev skeleton instead of implementing polling loop by itself. Signed-off-by: Dmitry Torokhov <[EMAIL PROTECTED]> --- drivers/hwmon/Kconfig |1 drivers/hwmon/hdaps.c | 55 +- 2 files changed, 25 insertions(+), 31 deletions(-) Index: work/drivers/hwmon/Kconfig === --- work.orig/drivers/hwmon/Kconfig +++ work/drivers/hwmon/Kconfig @@ -602,6 +602,7 @@ config SENSORS_W83627EHF config SENSORS_HDAPS tristate "IBM Hard Drive Active Protection System (hdaps)" depends on INPUT && X86 + select INPUT_POLLDEV default n help This driver provides support for the IBM Hard Drive Active Protection Index: work/drivers/hwmon/hdaps.c === --- work.orig/drivers/hwmon/hdaps.c +++ work/drivers/hwmon/hdaps.c @@ -28,7 +28,7 @@ #include #include -#include +#include #include #include #include @@ -61,13 +61,12 @@ #define INIT_TIMEOUT_MSECS 4000/* wait up to 4s for device init ... */ #define INIT_WAIT_MSECS200 /* ... in 200ms increments */ -#define HDAPS_POLL_PERIOD (HZ/20) /* poll for input every 1/20s */ +#define HDAPS_POLL_INTERVAL50 /* poll for input every 1/20s (50 ms)*/ #define HDAPS_INPUT_FUZZ 4 /* input event threshold */ #define HDAPS_INPUT_FLAT 4 -static struct timer_list hdaps_timer; static struct platform_device *pdev; -static struct input_dev *hdaps_idev; +static struct input_polled_dev *hdaps_idev; static unsigned int hdaps_invert; static u8 km_activity; static int rest_x; @@ -323,24 +322,19 @@ static void hdaps_calibrate(void) __hdaps_read_pair(HDAPS_PORT_XPOS, HDAPS_PORT_YPOS, &rest_x, &rest_y); } -static void hdaps_mousedev_poll(unsigned long unused) +static void hdaps_mousedev_poll(struct input_polled_dev *dev) { + struct input_dev *input_dev = dev->input; int x, y; - /* Cannot sleep. Try nonblockingly. If we fail, try again later. */ - if (mutex_trylock(&hdaps_mtx)) { - mod_timer(&hdaps_timer,jiffies + HDAPS_POLL_PERIOD); - return; - } + mutex_lock(&hdaps_mtx); if (__hdaps_read_pair(HDAPS_PORT_XPOS, HDAPS_PORT_YPOS, &x, &y)) goto out; - input_report_abs(hdaps_idev, ABS_X, x - rest_x); - input_report_abs(hdaps_idev, ABS_Y, y - rest_y); - input_sync(hdaps_idev); - - mod_timer(&hdaps_timer, jiffies + HDAPS_POLL_PERIOD); + input_report_abs(input_dev, ABS_X, x - rest_x); + input_report_abs(input_dev, ABS_Y, y - rest_y); + input_sync(input_dev); out: mutex_unlock(&hdaps_mtx); @@ -536,6 +530,7 @@ static struct dmi_system_id __initdata h static int __init hdaps_init(void) { + struct input_dev *idev; int ret; if (!dmi_check_system(hdaps_whitelist)) { @@ -563,39 +558,37 @@ static int __init hdaps_init(void) if (ret) goto out_device; - hdaps_idev = input_allocate_device(); + hdaps_idev = input_allocate_polled_device(); if (!hdaps_idev) { ret = -ENOMEM; goto out_group; } + hdaps_idev->poll = hdaps_mousedev_poll; + hdaps_idev->poll_interval = HDAPS_POLL_INTERVAL; + /* initial calibrate for the input device */ hdaps_calibrate(); /* initialize the input class */ - hdaps_idev->name = "hdaps"; - hdaps_idev->dev.parent = &pdev->dev; - hdaps_idev->evbit[0] = BIT(EV_ABS); - input_set_abs_params(hdaps_idev, ABS_X, + idev = hdaps_idev->input; + idev->name = "hdaps"; + idev->dev.parent = &pdev->dev; + idev->evbit[0] = BIT(EV_ABS); + input_set_abs_params(idev, ABS_X, -256, 256, HDAPS_INPUT_FUZZ, HDAPS_INPUT_FLAT); - input_set_abs_params(hdaps_idev, ABS_Y, + input_set_abs_params(idev, ABS_Y, -256, 256, HDAPS_INPUT_FUZZ, HDAPS_INPUT_FLAT); - ret = input_register_device(hdaps_idev); + ret = input_register_polled_device(hdaps_idev); if (ret) goto out_idev; - /* start up our timer for the input device */ - init_timer(&hdaps_timer); - hdaps_timer.function = hdaps_mousedev_poll; - hdaps_timer.expires = jiffies + HDAPS_POLL_PERIOD; - add_timer(&hdaps_timer); - printk(KERN_INFO "hdaps: driver successfully loaded.\n"); return 0; out_idev: - input_free_device(hdaps_idev); + inp
Re: Is this a preempt issue in drivers/input/evdev.c
Hi, On Friday 11 May 2007 23:18, Yin,Fengwei wrote: > > So if the evdev_release() is preempted at the point marked by another > process which will open the evdev, which will make operation sequence > like: > >--evdev->open in evdev_release() > -> preempted > evdev->open++ and input_open_devie() ><- reschedule > input_close_device() > > Should we introduce a mutex here? Or do I miss something? Thanks. > Locking is completely absent in evdev. There was a patch introducing locking in recent -mm's but it got dropped. I need to refresh it. -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: swap prefetch improvements
> Swap prefetch is not cpuset aware so make the config option depend on > !CPUSETS. Ok. Could you explain what it means to say "swap prefetch is not cpuset aware", or could you give a rough idea of what it would take to make it cpuset aware? I wouldn't go so far as to say that no one would ever want to prefetch and use cpusets at the same time, but I will grant that it's not a sufficiently important need that it should block a useful prefetch implementation on non-cpuset systems. One case that would be useful, however, is to handle prefetch in the case that cpusets are configured into ones kernel, but one is not making any real use of them ('number_of_cpusets' <= 1). That will actually be the most common case for the major distribution(s) that enable cpusets by default in their builds, for most arch's including the arch's popular on desktops. So what would it take to allow CONFIG'ing both prefetch and cpusets on, but having prefetch dynamically adapt to the presence of active cpuset usage, perhaps by basically shutting down if it can't easily do any better? I could certainly entertain requests to callout to some prefetch routine from the cpuset code, at the critical points that cpusets transitioned in or out of active use. Semi-separate issue -- is it just cpusets that aren't prefetch friendly, or is it also mm/mempolicy (mbind, set_mempolicy) as well? For that matter, even if neither mm/mempolicy nor cpusets are used, on systems with multiple memory nodes (not all memory equally distant from all CPUs, aka NUMA), could prefetch cause some sort of shuffling of memory placement, which might harm the performance of an HPC (High Performance Computing) application with carefully tuned memory placement. Granted, this -is- getting to be a corner case. Most HPC apps running on NUMA hardware are making at least some use of mm/mempolicy or cpusets. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm: swap prefetch improvements
It turns out that fixing swap prefetch was not that hard to fix and improve upon, and since Andrew hasn't dropped swap prefetch, instead here are a swag of fixes and improvements, including making it depend on !CPUSETS as Nick requested. These changes lead to dramatic improvements. Eg on a machine with 2GB ram and only 500MB swap: Prefetch disabled: ./sp_tester Ram 2060352000 Swap 522072000 Total ram to be malloced: 2321388000 bytes Starting first malloc of 1160694000 bytes Starting 1st read of first malloc Touching this much ram takes 529 milliseconds Starting second malloc of 1160694000 bytes Completed second malloc and free Sleeping for 300 seconds Important part - starting reread of first malloc Completed read of first malloc Timed portion 6030 milliseconds Prefetch enabled: /sp_tester Ram 2060352000 Swap 522072000 Total ram to be malloced: 2321388000 bytes Starting first malloc of 1160694000 bytes Starting 1st read of first malloc Touching this much ram takes 528 milliseconds Starting second malloc of 1160694000 bytes Completed second malloc and free Sleeping for 300 seconds Important part - starting reread of first malloc Completed read of first malloc Timed portion 665 milliseconds Note that simply touching the ram took 528 ms so the time taken for the 230MB converted from major faults to minor faults took only 137ms instead of 5.5s. --- Numerous improvements to swap prefetch. It was possible for kprefetchd to go to sleep indefinitely before/after changing the /proc value of swap prefetch. Fix that. The cost of remove_from_swapped_list() can be removed from every page swapin by moving it to be done entirely by kprefetchd lazily. The call site for add_to_swapped_list need only be at one place. Wakeups can occur much less frequently if swap prefetch is disabled. Make it possible to enable swap prefetch explicitly via /proc when laptop_mode is enabled by changing the value of the sysctl to 2. The complicated iteration over every entry can be consolidated by using list_for_each_safe. Swap prefetch is not cpuset aware so make the config option depend on !CPUSETS. Fix potential irq problem by converting read_lock_irq to irqsave etc. Code style fixes. Change the ioprio from IOPRIO_CLASS_IDLE to normal lower priority to ensure that bio requests are not starved if other I/O begins during prefetching. Signed-off-by: Con Kolivas <[EMAIL PROTECTED]> --- Documentation/sysctl/vm.txt |4 - init/Kconfig|2 mm/page_io.c|2 mm/swap_prefetch.c | 158 +++- mm/swap_state.c |2 mm/vmscan.c |1 6 files changed, 75 insertions(+), 94 deletions(-) Index: linux-2.6.21-mm1/mm/page_io.c === --- linux-2.6.21-mm1.orig/mm/page_io.c 2007-02-05 22:52:04.0 +1100 +++ linux-2.6.21-mm1/mm/page_io.c 2007-05-12 14:30:52.0 +1000 @@ -17,6 +17,7 @@ #include #include #include +#include #include static struct bio *get_swap_bio(gfp_t gfp_flags, pgoff_t index, @@ -118,6 +119,7 @@ int swap_writepage(struct page *page, st ret = -ENOMEM; goto out; } + add_to_swapped_list(page); if (wbc->sync_mode == WB_SYNC_ALL) rw |= (1 << BIO_RW_SYNC); count_vm_event(PSWPOUT); Index: linux-2.6.21-mm1/mm/swap_state.c === --- linux-2.6.21-mm1.orig/mm/swap_state.c 2007-05-07 21:53:51.0 +1000 +++ linux-2.6.21-mm1/mm/swap_state.c2007-05-12 14:30:52.0 +1000 @@ -83,7 +83,6 @@ static int __add_to_swap_cache(struct pa error = radix_tree_insert(&swapper_space.page_tree, entry.val, page); if (!error) { - remove_from_swapped_list(entry.val); page_cache_get(page); SetPageLocked(page); SetPageSwapCache(page); @@ -102,7 +101,6 @@ int add_to_swap_cache(struct page *page, int error; if (!swap_duplicate(entry)) { - remove_from_swapped_list(entry.val); INC_CACHE_INFO(noent_race); return -ENOENT; } Index: linux-2.6.21-mm1/mm/vmscan.c === --- linux-2.6.21-mm1.orig/mm/vmscan.c 2007-05-07 21:53:51.0 +1000 +++ linux-2.6.21-mm1/mm/vmscan.c2007-05-12 14:30:52.0 +1000 @@ -410,7 +410,6 @@ int remove_mapping(struct address_space if (PageSwapCache(page)) { swp_entry_t swap = { .val = page_private(page) }; - add_to_swapped_list(page); __delete_from_swap_cache(page); write_unlock_irq(&mapping->tree_lock); swap_free(swap); Index: linux-2.6.21-mm1/mm/swap_pref
Re: [PATCH] "volatile considered harmful", take 3
Satyam Sharma wrote: On 5/11/07, Jonathan Corbet <[EMAIL PROTECTED]> wrote: + - Pointers to data structures in coherent memory which might be modified +by I/O devices can, sometimes, legitimately be volatile. A ring buffer +used by a network adapter, where that adapter changes pointers to +indicate which descriptors have been processed, is an example of this +type of situation. is a legitimate use case for volatile is still not clear to me (I IMO it is not. We do /not/ want to encourage volatile use in those cases, and indeed, it's not necessary even if you can rationalize the use of the English word "volatile" to describe the situation. Drivers work quite well without volatile in such situations. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: Re: [announce] Intel announces the PowerTOP utility for Linux
Words by Matt Mackall [Fri, May 11, 2007 at 09:39:05PM -0500]: > On Sat, May 12, 2007 at 02:40:52AM +0100, Jose Celestino wrote: > > Words by Matt Mackall [Fri, May 11, 2007 at 07:17:19PM -0500]: > > > On Fri, May 11, 2007 at 04:07:18PM -0700, Arjan van de Ven wrote: > > > > > > > > What's eating the battery life of my laptop? Why isn't it many more > > > > hours? Which software component causes the most power to be burned? > > > > These are important questions without a good answer... until now. > > > > > > I get: > > > > > > No detailed statistics available; please enable the CONFIG_TIMER_STATS > > > kernel option > > > > > > > Must run as root (rw to /proc/timer_stats is needed). > > That file doesn't exist, despite CONFIG_TIMER_STATS being in > /proc/config.gz. > Then again, perhaps you have /proc/tstats instead. If so apply this (well, you get the idea): --- powertop/powertop.c 2007-05-12 05:01:15.0 +0100 +++ powertop_new/powertop.c 2007-05-12 05:08:46.0 +0100 @@ -212,8 +212,8 @@ void stop_timerstats(void) { FILE *file; - file = fopen("/proc/timer_stats","w"); - if (!file) { + if (!(file = fopen("/proc/timer_stats","w")) && + !(file = fopen("/proc/stats","w")) ) { nostats = 1; return; } @@ -223,8 +223,8 @@ void start_timerstats(void) { FILE *file; - file = fopen("/proc/timer_stats","w"); - if (!file) { + if (!(file = fopen("/proc/timer_stats","w")) && + !(file = fopen("/proc/stats","w")) ) { nostats = 1; return; } @@ -388,7 +388,7 @@ i = 0; totalticks = 0; if (!nostats) - file = popen("cat /proc/timer_stats | sort -n | tail -190", "r"); + file = popen("cat /proc/timer_stats 2>>/dev/null|| cat /proc/tstats | sort -n | tail -190", "r"); while (file && !feof(file) && i<190) { char *count, *pid, *process, *func; int cnt; -- Jose Celestino http://www.msversus.org/ ; http://techp.org/petition/show/1 http://www.vinc17.org/noswpat.en.html "And on the trillionth day, Man created Gods." -- Thomas D. Pate - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] leds:arch/sh/boards/landisk LEDs supports
To: Richard-san I'm sorry. The patch sent yesterday is corrected. Only the ledtrig_bitpat_default function was changed. The patch of "Custom triggers support, which are might not supported by all LEDs" is necessary. LED driver of I-O DATA LANDISK and USL-5P Signed-off-by: kogiidena <[EMAIL PROTECTED]> --- diff -urpN OLD/drivers/leds/Kconfig NEW/drivers/leds/Kconfig --- OLD/drivers/leds/Kconfig2007-04-28 06:49:26.0 +0900 +++ NEW/drivers/leds/Kconfig2007-05-11 21:15:28.0 +0900 @@ -94,6 +94,12 @@ config LEDS_COBALT help This option enables support for the front LED on Cobalt Server +config LEDS_LANDISK + tristate "LED Support for LANDISK Series" + depends on LEDS_CLASS && SH_LANDISK + help + This option enables support for the LED on LANDISK Series + comment "LED Triggers" config LEDS_TRIGGERS diff -urpN OLD/drivers/leds/Makefile NEW/drivers/leds/Makefile --- OLD/drivers/leds/Makefile 2007-04-28 06:49:26.0 +0900 +++ NEW/drivers/leds/Makefile 2007-05-11 23:34:07.0 +0900 @@ -16,6 +16,7 @@ obj-$(CONFIG_LEDS_NET48XX)+= leds-net4 obj-$(CONFIG_LEDS_WRAP)+= leds-wrap.o obj-$(CONFIG_LEDS_H1940) += leds-h1940.o obj-$(CONFIG_LEDS_COBALT) += leds-cobalt.o +obj-$(CONFIG_LEDS_LANDISK) += leds-landisk.o # LED Triggers obj-$(CONFIG_LEDS_TRIGGER_TIMER) += ledtrig-timer.o diff -urpN OLD/drivers/leds/leds-landisk.c NEW/drivers/leds/leds-landisk.c --- OLD/drivers/leds/leds-landisk.c 1970-01-01 09:00:00.0 +0900 +++ NEW/drivers/leds/leds-landisk.c 2007-05-12 11:31:47.0 +0900 @@ -0,0 +1,215 @@ +/* + * LEDs driver for I-O DATA DEVICE, INC. "LANDISK Series" support. + * + * Copyright (C) 2007 kogiidena + * + * Based on the drivers/leds/leds-ams-delta.c by: + * Copyright (C) 2006 Jonathan McDowell <[EMAIL PROTECTED]> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +#include +#include +#include +#include +#include +#include + +static enum { + LANDISK = 0, + USL_5P = 1, +} landisk_product; + +static DEFINE_SPINLOCK(landisk_led_lock); + +static void landisk_led_set(struct led_classdev *led_cdev, + enum led_brightness value); + +static struct led_classdev landisk_leds[] = { + [0] = { + .name = "power", + .brightness_set = landisk_led_set, + .default_trigger = "bitpat", + }, + [1] = { + .name = "status", + .brightness_set = landisk_led_set, + .default_trigger = "bitpat", + }, + [2] = { + .name = "led1", + .brightness_set = landisk_led_set, + .default_trigger = "bitpat", + }, + [3] = { + .name = "led2", + .brightness_set = landisk_led_set, + .default_trigger = "bitpat", + }, + [4] = { + .name = "led3", + .brightness_set = landisk_led_set, + .default_trigger = "bitpat", + }, + [5] = { + .name = "led4", + .brightness_set = landisk_led_set, + .default_trigger = "bitpat", + }, + [6] = { + .name = "led5", + .brightness_set = landisk_led_set, + .default_trigger = "bitpat", + }, + [7] = { + .name = "buzzer", + .brightness_set = landisk_led_set, + .default_trigger = "bitpat", + }, +}; + +void ledtrig_bitpat_default(struct led_classdev *led_cdev, + unsigned long *delay, char *bitdata) +{ + int led; + + led = (led_cdev - &landisk_leds[0]); + if ((led == 0) || (led == 1)) { + strcpy(bitdata, "blink"); + } + if (led == 7) { + *delay = 250; + } + +} + +static void landisk_led_set(struct led_classdev *led_cdev, + enum led_brightness value) +{ + u8 tmp, bitmask; + unsigned long flags; + + bitmask = 0x01 << (led_cdev - &landisk_leds[0]); + + spin_lock_irqsave(&landisk_led_lock, flags); + tmp = ctrl_inb(PA_LED); + if (value) + tmp |= bitmask; + else + tmp &= ~bitmask; + ctrl_outb(tmp, PA_LED); + spin_unlock_irqrestore(&landisk_led_lock, flags); +} + +static int landisk_led_probe(struct platform_device *pdev) +{ + int i, nr_leds; + int ret; + + nr_leds = (landisk_product == LANDISK) ? 2 : 8; + + for (i = ret = 0; ret >= 0 && i < nr_leds; i++) { + ret = led_classdev_register(&pdev->dev, &landisk_leds[i]); + } + + if (ret < 0 && i > 1) { + nr_leds = i
Re: [PATCH 2/2] leds:arch/sh/boards/landisk LEDs supports
To: Richard-san I'm sorry. The patch sent yesterday is corrected, too. Because the source had not been read easily, it cleaned it. There is no change for the basic function. Add Bitpattern Trigger. Bitpattern continuously turns LED on and off according to the value directed "bitdata". "bitdata" is composed of the character string that consists of the following three characters. '0' turn off LED. '1' turn on LED. 'R' is repeated from the head of the "bitdata". In addition, the character string of "on", "off", and "blink" can be set to "bitdata". The transition time of "bitdata" is set by "delay". Signed-off-by: kogiidena <[EMAIL PROTECTED]> --- diff -urpN OLD/drivers/leds/Kconfig NEW/drivers/leds/Kconfig --- OLD/drivers/leds/Kconfig2007-04-28 06:49:26.0 +0900 +++ NEW/drivers/leds/Kconfig2007-05-11 21:15:28.0 +0900 @@ -127,5 +133,19 @@ config LEDS_TRIGGER_HEARTBEAT load average. If unsure, say Y. +config LEDS_TRIGGER_BITPAT + tristate "LED Bitpattern Trigger" + depends on LEDS_TRIGGERS + help + Bitpattern continuously turns LED on and off according to + the value directed "bitdata". "bitdata" is composed of + the character string that consists of the following three + characters. '0' turn off LED. '1' turn on LED. 'R' is + repeated from the head of the "bitdata". + In addition, the character string of "on", "off", and "blink" + can be set to "bitdata". + The transition time of "bitdata" is set by "delay". + If unsure, say Y. + endmenu diff -urpN OLD/drivers/leds/Makefile NEW/drivers/leds/Makefile --- OLD/drivers/leds/Makefile 2007-05-11 23:44:12.0 +0900 +++ NEW/drivers/leds/Makefile 2007-05-11 23:46:47.0 +0900 @@ -22,3 +22,4 @@ obj-$(CONFIG_LEDS_LANDISK)+= leds-land obj-$(CONFIG_LEDS_TRIGGER_TIMER) += ledtrig-timer.o obj-$(CONFIG_LEDS_TRIGGER_IDE_DISK)+= ledtrig-ide-disk.o obj-$(CONFIG_LEDS_TRIGGER_HEARTBEAT) += ledtrig-heartbeat.o +obj-$(CONFIG_LEDS_TRIGGER_BITPAT) += ledtrig-bitpat.o diff -urpN OLD/drivers/leds/ledtrig-bitpat.c NEW/drivers/leds/ledtrig-bitpat.c --- OLD/drivers/leds/ledtrig-bitpat.c 1970-01-01 09:00:00.0 +0900 +++ NEW/drivers/leds/ledtrig-bitpat.c 2007-05-12 11:29:34.0 +0900 @@ -0,0 +1,231 @@ +/* + * LED Bitpattern Trigger + * + * Copyright (C) 2007 kogiidena + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "leds.h" + +#define BITDATA_LEN 18 + +struct bitpat_trig_data { + char bitdata[BITDATA_LEN + 2]; + int cnt; + unsigned long delay; + struct timer_list timer; +}; + +void __attribute__ ((weak)) +ledtrig_bitpat_default(struct led_classdev *led_cdev, + unsigned long *delay, char *bitdata) +{ + /* Nothing to do. */ +} + +static void led_bitpat_function(unsigned long data) +{ + struct led_classdev *led_cdev = (struct led_classdev *)data; + struct bitpat_trig_data *bitpat_data = led_cdev->trigger_data; + unsigned long delay = bitpat_data->delay; + char bitpat; + + bitpat = bitpat_data->bitdata[bitpat_data->cnt++]; + + if (bitpat == '0' || bitpat == '1') { + led_set_brightness(led_cdev, + (bitpat == '1') ? LED_FULL : LED_OFF); + } else { + bitpat_data->cnt = 0; + return; + } + + if (bitpat_data->bitdata[bitpat_data->cnt] == 'R') + bitpat_data->cnt = 0; + + mod_timer(&bitpat_data->timer, jiffies + msecs_to_jiffies(delay)); +} + +static ssize_t led_delay_show(struct class_device *dev, char *buf) +{ + struct led_classdev *led_cdev = class_get_devdata(dev); + struct bitpat_trig_data *bitpat_data = led_cdev->trigger_data; + + sprintf(buf, "%lu\n", bitpat_data->delay); + + return strlen(buf) + 1; +} + +static ssize_t led_delay_store(struct class_device *dev, const char *buf, + size_t size) +{ + struct led_classdev *led_cdev = class_get_devdata(dev); + struct bitpat_trig_data *bitpat_data = led_cdev->trigger_data; + int ret = -EINVAL; + char *after; + unsigned long state = simple_strtoul(buf, &after, 10); + size_t count = after - buf; + + if (*after && isspace(*after)) + count++; + + if (count == size) { + bitpat_data->delay = state; + mod_timer(&bitpat_data->timer, jiffies + 1); + ret = count; + } + return ret; +} + +static void led_bitdata_update(struct bitpat_trig_data *bitpat_data, + const char *buf) +{ + int i; + const char *s
Re: [PATCH] swsusp: Use platform mode by default
I agree that we should keep the "platform" default, as it went in 2 releases ago (nearly 6 months) without any reported failures until this one -- and it fixed a longstanding issue documented on many machines. We should debug Qi's failure like any other. We are actually in better shape on this one than others because we already know something that works around it. Qi, Please open a bug report here: http://bugzilla.kernel.org/enter_bug.cgi?product=ACPI in the Power-Off category. There are some other open poweroff bugs and maybe we'll find a common thread. Please attach the output from acpidump and dmesg -s64000. thanks, -Len - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: libata reset-seq merge broke sata_sil on sh
On Fri, May 11, 2007 at 11:39:20AM +0200, Tejun Heo wrote: > Paul Mundt wrote: > > Bumping the hardreset delay up does indeed fix it, I've had to bump it up > > to 1200 before it started working (at 600 it still fails): > > > > [0.967379] scsi0 : sata_sil > > [0.970425] scsi1 : sata_sil > > [0.973298] ata1: SATA max UDMA/100 cmd 0xfd000280 ctl 0xfd00028a bmdma > > 0xfd000200 irq 0 > > [0.981331] ata2: SATA max UDMA/100 cmd 0xfd0002c0 ctl 0xfd0002ca bmdma > > 0xfd000208 irq 0 > > [1.299353] ata1: device not ready (errno=-19), forcing hardreset > > [2.817893] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) > > [2.826284] ata1.00: ata_hpa_resize 1: sectors = 39070080, hpa_sectors = > > 39070080 > > [2.831052] ata1.00: ATA-5: HHD424020F7SV00, 00MLA0A5, max UDMA/100 > > [2.837548] ata1.00: 39070080 sectors, multi 0: LBA > > [2.842702] ata1.00: applying bridge limits > > [2.854162] ata1.00: ata_hpa_resize 1: sectors = 39070080, hpa_sectors = > > 39070080 > > [2.858938] ata1.00: configured for UDMA/100 > > [3.172602] ata2: SATA link down (SStatus 0 SControl 310) > > [3.175736] scsi 0:0:0:0: Direct-Access ATA HHD424020F7SV00 > > 00ML PQ: 0 ANSI: 5 > > > > I'm not sure if it matters or not, but this is an iVDR drive, so that > > might also have additional implications. > > Don't have the flimsiest idea what an iVDR drive is but I take it that > it's some sort of special purpose thing. :-) > http://www.ivdr.org The GoVault appears to be a similar device, in that they're both removeable cartridges. > Gary, IIRC, the requirement for GoVault was 3secs, right? Paul, can you > try to estimate the minimum required delay? Please go down by 100ms and > report where it breaks. > 800ms was the lowest it would work at, 700ms still breaks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: APIC error on 32-bit kernel
> > We're trying to track down the source of a problem that occurs > > whenever the atl1 network driver is activated on a 32-bit 2.6.21-rc4 > > and -rc5, -rc6, 2.6.20.x, 2.6.19.3, and probably others. > > > We can load the driver just fine, but whenever we activate the > > network, we see APIC errors (a sample of them are shown here, > > captured from a serial console): > > > > [EMAIL PROTECTED] ~]# echo 8 > /proc/sys/kernel/printk > > [EMAIL PROTECTED] ~]# [ 93.942012] process `sysctl' is using deprecated > > sysctl (sysc. > > [ 94.396609] atl1: eth0 link is up 1000 Mbps full duplex > > [ 94.498887] APIC error on CPU0: 00(08) > > [ 94.498534] APIC error on CPU1: 00(08) > > [ 94.550079] APIC error on CPU0: 08(08) > > [ 94.549725] APIC error on CPU1: 08(08) > > [ 94.600915] APIC error on CPU1: 08(08) > > [ 94.601276] APIC error on CPU0: 08(08) > > [ 94.652108] APIC error on CPU1: 08(08) > > [ 94.652470] APIC error on CPU0: 08(08) > > [ 94.703659] APIC error on CPU0: 08(08) > > [ 94.703305] APIC error on CPU1: 08(08) > > [ 94.754852] APIC error on CPU0: 08(40) > > [ 94.806045] APIC error on CPU0: 40(08) /* Here is what the APIC error bits mean: 0: Send CS error 1: Receive CS error 2: Send accept error 3: Receive accept error 4: Reserved 5: Send illegal vector 6: Received illegal vector 7: Illegal register address */ So the 40 means the APIC got an illegal vector. Certainly this is consistent with the fact that the errors start when a specific device is being used. I assume that device is using MSI? Curious that it is different in 32-bit and 64-bit mode. > > [ 94.805692] APIC error on CPU1: 08(08) > > [ 94.857238] APIC error on CPU0: 08(08) > > [ 94.856884] APIC error on CPU1: 08(08) > > [ 94.908432] APIC error on CPU0: 08(08) > > [ 94.908078] APIC error on CPU1: 08(08) > > [snip, more of the same] > > [ 98.901156] APIC error on CPU1: 08(08) > > [ 98.952702] APIC error on CPU0: 08(08) > > [ 98.952349] APIC error on CPU1: 08(08) > > [ 99.003895] APIC error on CPU0: 08(08) > > [ 99.003542] APIC error on CPU1: 08(08) > > > > The machine hangs for about 5-10 seconds, then spontaneously reboots > > without further console output. > > I can prompt an oops by pinging my router while the apic errors are > scrolling by. > > > > > This is an Asus M2V (Via K8T890) motherboard. > > > > The problem does not occur on a 32-bit kernel if we boot with > > pci=nomsi, and it doesn't occur at all on a 64-bit kernel on the same > > motherboard. pci=nomsi, works, okay... > > We also do not see this problem on Intel-based motherboards, with > > either 32- or 64-bit kernels. > > A full raft of documentation -- including acpidump and > linux-firmware-kit output, console capture, kernel config, lspci -vvxxx > (with apic=debug boot option), dmesg, and /proc/interrupts -- is > available at http://www.hogchain.net/m2v/apic-problem/ [06Dh 109 2] Boot Architecture Flags : 0003 for what it is worth, the bit in ACPI that is used to disable MSI support is not set -- so as far as the BIOS is concerned, this system should support MSI. Is it an add-in card, or lan-on-motherboard? -Len - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] "volatile considered harmful", take 3
On 5/11/07, Jonathan Corbet <[EMAIL PROTECTED]> wrote: Here's another version of the volatile document. Once again, I've tried to address all of the comments. There haven't really been any recent comments addressing the correctness of the document; people have been more concerned with how it's expressed. I'm glad to see files in Documentation/ held to a high standard of writing, but, unless somebody has a factual issue this time around I would like to declare Mission Accomplished and move on. The document looks good, but whether: + - Pointers to data structures in coherent memory which might be modified +by I/O devices can, sometimes, legitimately be volatile. A ring buffer +used by a network adapter, where that adapter changes pointers to +indicate which descriptors have been processed, is an example of this +type of situation. is a legitimate use case for volatile is still not clear to me (I agree with Alan's comment in a previous thread that this seems to be a case where a memory barrier would be applicable^Wbetter, actually). I could be wrong here, so would be nice if Peter explains why volatile is legitimate here. Otherwise, it's fine with me. Thanks, Satyam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Is this a preempt issue in drivers/input/evdev.c
Hi, When open/close evdev, the code is as following to handle multi-client operation: static int evdev_release(...) { ... if (!--evdev->open) {exist) input_close_device(...); else evdev_free(evdev); } return 0; } static int evdev_open(...) { ... if (!evdev->open++ && evdev->exist){ errror = input_open_device(...); if (error) { ... } } ... return 0; } So if the evdev_release() is preempted at the point marked by another process which will open the evdev, which will make operation sequence like: --evdev->open in evdev_release() -> preempted evdev->open++ and input_open_devie() <- reschedule input_close_device() Should we introduce a mutex here? Or do I miss something? Thanks. Regards Yin, Fengwei - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] ip_local_port_range sysctl has annoying default
In article <[EMAIL PROTECTED]> you wrote: > However, there are a large number of applications which have registered > ports in this range. And some application who request random listening ports actually query the /etc/services file to ensure it is a "unnamed" port. Gruss Bernd - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] spelling fixes: init/
On 5/12/07, Simon Arlott <[EMAIL PROTECTED]> wrote: Spelling fix in init/. Signed-off-by: Simon Arlott <[EMAIL PROTECTED]> --- init/main.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/init/main.c b/init/main.c index e8d080c..7ee2031 100644 --- a/init/main.c +++ b/init/main.c @@ -275,7 +275,7 @@ static int __init unknown_bootoption(char *param, char *val) return 0; /* -* Preemptive maintenance for "why didn't my mispelled command +* Preemptive maintenance for "why didn't my misspelled command * line work?" That was probably intentional. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] kbuild: silence section mismatch warnings
Hi Sam, On May 11, 2007, at 4:08 PM, Sam Ravnborg wrote: > Following patch allow us in specific places to silence section > mismatch warnings. Well, I had spelled out my reservations about this earlier, but I don't feel too strongly. Most people probably do not want / prefer to see warnings (even if they know they're false positives), and this also helps reduce unnecessary reports of known FPs on lkml. > The annotation is simple to grep for so revieing all uses in a few > months time are trivial. It is assumed that a few places will > use this to shut up the warning as replacement for the real fix. > But these cases are esay to spot and to fix up. Yes, so we have to be careful with its use. On 5/12/07, Kumar Gala <[EMAIL PROTECTED]> wrote: Its unclear if you expect that some things will be tagged __init_refok/__initdata_refok forever or if we'll find some way to fix/change the code so the things tagged no longer need it. We'll _have_ to fix those bugs that use the whitelisting in modpost merely to kill off a warning, need to fix binutils for some others, and may have to live with this for still others (mm/sl*b.c suffer from a chicken-and-egg problem, for example). On May 11, 2007, at 4:08 PM, Sam Ravnborg wrote: With this and the following two patches I have a section mismatch free build. The plan is that a section mismatch soon will graduate from a warning to an error. Yes, that's only sane. diff --git a/include/linux/init.h b/include/linux/init.h [...] +/* modpost check for references from .text to .init.text and likewise + * from .data to .init.data. They are in most cases sign of bugs but You may want to list all illegal combinations in the comment above check_sec_ref instead, and only introduce __init{data}_refok here. + * in a few places this is OK. The following can be used to tell + * modpost that such a reference is OK. + * For references to .exit.text and .exit.data the same annotation + * will silence warnings from modpost. + */ +#define __init_refok noinline __attribute__ ((__section__ (".text_initrefok"))) +#define __initdata_refok noinline __attribute__ ((__section__ (".data_initrefok"))) Actually, for a second there I got confused you had done this the other way round. __init_refok sounds similar to __init (almost a "variant" of __init) so I thought you were annotating the _callees_ and not the _callers_. BTW, I wonder if there would be any relative merits of doing things that way. Did you consider this "reversed" approach? Hmmm ... we would be annotating lesser functions, for one. With the current __init_refok-for-callers semantics, we mark the callee __init _and_ the caller __init_refok, which is unnecessary double-work, and will only get worse if the same __init callee is called by multiple callers in .text. For the case where a caller references multiple __init callees? I don't see any relative advantage of either scheme, or is there ... Also, it's easier to spot a function that _is_ (or should be) __init in the code already, than see if it is being referenced from .text, and if so, make it __init_whatever (__init_refok is an equally good name for callee-semantics). (looking at code in 21-mm2) I looked at scripts/mod/modpost.c only briefly, but it seems to me shifting the semantics of __init_refok to refer to callees could also subsume (and make redundant) patterns #1, 2, 6, 7 and 9 of secref_whitelist, no? [BTW I noticed _three_ whitelisting functions in there -- could it be possible for us to do what init_section_ref_ok and exit_section_ref_ok do in secref_whitelist itself? Those three whitelists are beginning to look darn ugly.] Anyway, somehow the __init_refok-callee scheme seems saner to me -- please do consider it. Note, in that case, however: 1. We'll have to invent separate __exit_refok and __exitdata_refok, of course. But that's a good thing for me, compared to giving a multiplexed definition to __init_refok to mean caller-can-safely-call-__init && caller-can-safely-call-__exit. 2. The init section freeing code in the kernel would need to be patched to free sections marked as __init_refok, __initdata_refok, __exit_refok and __exitdata_refok too. But that would most likely be a trivial patch. 3. When the exception / bug is fixed, we would convert such __init_refok annotations to __init. diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c index 113dc77..986200b 100644 --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -582,6 +582,14 @@ static int strrcmp(const char *s, const char *sub) /** ^^^ should be simply /* That's a doc-book-style comment header for a comment that's actually not doc-book-style. Randy gets angry. Thanks, Satyam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: [announce] Intel announces the PowerTOP utility for Linux
On Sat, May 12, 2007 at 02:40:52AM +0100, Jose Celestino wrote: > Words by Matt Mackall [Fri, May 11, 2007 at 07:17:19PM -0500]: > > On Fri, May 11, 2007 at 04:07:18PM -0700, Arjan van de Ven wrote: > > > > > > What's eating the battery life of my laptop? Why isn't it many more > > > hours? Which software component causes the most power to be burned? > > > These are important questions without a good answer... until now. > > > > I get: > > > > No detailed statistics available; please enable the CONFIG_TIMER_STATS > > kernel option > > > > Must run as root (rw to /proc/timer_stats is needed). That file doesn't exist, despite CONFIG_TIMER_STATS being in /proc/config.gz. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/12] crypto: don't pollute the global namespace with sg_next()
Benny Halevy <[EMAIL PROTECTED]> wrote: > > I was trying to say that the methods should be compatible, otherwise > bugs can happen, and that your scheme is better since it can > handle sglists with zero length entries that aren't the last. > A case that might be valid after dma mapping and merging. > If indeed this case is possible, this seems to be the right time > to converge to your scheme. Well right now this isn't possible because the crypto layer is not directly hooked to any DMA code since it's (mostly) software only. However, I completely agree that it should be converted to this new scheme. The only user of chaining right now is crypto/hmac.c so it should be easy to fix. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 2/2] epoll locks changes and cleanups ...
Changes the rwlock to a spinlock, and drops the use-count variable. Operations are always bound by the mutex now, so the use-count is no more needed. For the same reason, the rwlock can become a simple spinlock. Signed-off-by: Davide Libenzi <[EMAIL PROTECTED]> - Davide Index: linux-2.6.21/fs/eventpoll.c === --- linux-2.6.21.orig/fs/eventpoll.c2007-05-11 17:21:25.0 -0700 +++ linux-2.6.21/fs/eventpoll.c 2007-05-11 19:20:32.0 -0700 @@ -1,6 +1,6 @@ /* - * fs/eventpoll.c ( Efficent event polling implementation ) - * Copyright (C) 2001,...,2006 Davide Libenzi + * fs/eventpoll.c (Efficent event polling implementation) + * Copyright (C) 2001,...,2007 Davide Libenzi * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -44,8 +44,8 @@ * There are three level of locking required by epoll : * * 1) epmutex (mutex) - * 2) ep->mtx (mutes) - * 3) ep->lock (rw_lock) + * 2) ep->mtx (mutex) + * 3) ep->lock (spinlock) * * The acquire order is the one listed above, from 1 to 3. * We need a spinlock (ep->lock) because we manipulate objects @@ -140,6 +140,12 @@ /* List header used to link this structure to the eventpoll ready list */ struct list_head rdllink; + /* +* Works together "struct eventpoll"->ovflist in keeping the +* single linked chain of items. +*/ + struct epitem *next; + /* The file descriptor information this item refers to */ struct epoll_filefd ffd; @@ -152,23 +158,11 @@ /* The "container" of this item */ struct eventpoll *ep; - /* The structure that describe the interested events and the source fd */ - struct epoll_event event; - - /* -* Used to keep track of the usage count of the structure. This avoids -* that the structure will desappear from underneath our processing. -*/ - atomic_t usecnt; - /* List header used to link this item to the "struct file" items list */ struct list_head fllink; - /* -* Works together "struct eventpoll"->ovflist in keeping the -* single linked chain of items. -*/ - struct epitem *next; + /* The structure that describe the interested events and the source fd */ + struct epoll_event event; }; /* @@ -178,7 +172,7 @@ */ struct eventpoll { /* Protect the this structure access */ - rwlock_t lock; + spinlock_t lock; /* * This mutex is used to ensure that files are not removed @@ -394,78 +388,11 @@ } /* - * Unlink the "struct epitem" from all places it might have been hooked up. - * This function must be called with write IRQ lock on "ep->lock". - */ -static int ep_unlink(struct eventpoll *ep, struct epitem *epi) -{ - int error; - - /* -* It can happen that this one is called for an item already unlinked. -* The check protect us from doing a double unlink ( crash ). -*/ - error = -ENOENT; - if (!ep_rb_linked(&epi->rbn)) - goto error_return; - - /* -* Clear the event mask for the unlinked item. This will avoid item -* notifications to be sent after the unlink operation from inside -* the kernel->userspace event transfer loop. -*/ - epi->event.events = 0; - - /* -* At this point is safe to do the job, unlink the item from our rb-tree. -* This operation togheter with the above check closes the door to -* double unlinks. -*/ - ep_rb_erase(&epi->rbn, &ep->rbr); - - /* -* If the item we are going to remove is inside the ready file descriptors -* we want to remove it from this list to avoid stale events. -*/ - if (ep_is_linked(&epi->rdllink)) - list_del_init(&epi->rdllink); - - error = 0; -error_return: - - DNPRINTK(3, (KERN_INFO "[%p] eventpoll: ep_unlink(%p, %p) = %d\n", -current, ep, epi->ffd.file, error)); - - return error; -} - -/* - * Increment the usage count of the "struct epitem" making it sure - * that the user will have a valid pointer to reference. - */ -static void ep_use_epitem(struct epitem *epi) -{ - atomic_inc(&epi->usecnt); -} - -/* - * Decrement ( release ) the usage count by signaling that the user - * has finished using the structure. It might lead to freeing the - * structure itself if the count goes to zero. - */ -static void ep_release_epitem(struct epitem *epi) -{ - if (atomic_dec_and_test(&epi->usecnt)) - kmem_cache_free(epi_cache, epi); -} - -/* * Removes a "struct epitem" from the eventpoll RB tree and deallocates - * all the associated resources. + * all the associated resources. Must be called with "mtx" held. */ static int ep
Re: [PATCH]: Fix assertion failure with MSI on sparc64
On Fri, 2007-05-11 at 13:26 -0700, David Miller wrote: > Hi Michael, I'm still working through the various regressions on > sparc64 added by your MSI changes :-) Hi Dave, Guilty as charged - I did CC you on the patches though ;) > The one I fixed the other day was a missed switch over to > alloc_pci_dev() in the sparc64 PCI probing code which caused an OOPS > in pci_enable_msi() because the list head of the pci dev was not > initialized. PowerPC's OBP firmware tree based PCI probing code > was updated, sparc64's wasnt. Sorry - not sure how I missed that one, it even matches "k.alloc(.*pci_dev" - thanks for fixing it :) > Today's find is a triggered assertion in msi_free_irqs() when the > system doesn't support MSI, in which case arch_setup_msi_irqs() always > returns an error. What do you need to determine that the system can't support MSI? Could you do that logic in arch_msi_check_device()? > The problem is that when this happens we branch into msi_free_irqs(), > to which you added the following assertion loop: > > list_for_each_entry(entry, &dev->msi_list, list) > BUG_ON(irq_has_action(entry->irq)); > > Well, if arch_setup_msi_irqs() fails, entry->irq will be zero and > although that's never assigned to any normal devices we use that IRQ > number for the timer interrupt on sparc64 so this assertion triggers. > > Better to test for zero before doing the irq_has_action() assertion > thing. Yep, looks good - it matches the logic in arch_teardown_msi_irqs(). cheers Acked-by: Michael Ellerman <[EMAIL PROTECTED]> > Signed-off-by: David S. Miller <[EMAIL PROTECTED]> > > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c > index e6740d1..d9cbd58 100644 > --- a/drivers/pci/msi.c > +++ b/drivers/pci/msi.c > @@ -549,8 +549,10 @@ static int msi_free_irqs(struct pci_dev* dev) > { > struct msi_desc *entry, *tmp; > > - list_for_each_entry(entry, &dev->msi_list, list) > - BUG_ON(irq_has_action(entry->irq)); > + list_for_each_entry(entry, &dev->msi_list, list) { > + if (entry->irq) > + BUG_ON(irq_has_action(entry->irq)); > + } > > arch_teardown_msi_irqs(dev); > -- Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person signature.asc Description: This is a digitally signed message part
[patch 1/2] fix epoll single pass code and add wait-exclusive flag ...
Fixes the epoll single pass code. During the unlocked event delivery (to userspace) code, the poll callback can re-issue new events, and we must receive them correctly. Since we loop in a lockless fashion, we want to be O(nready), and we don't want to flash on/off the spinlock for every event, we have the poll callback to use a secondary list to queue events while we're inside the event delivery loop. The rw_semaphore has been turned into a mutex. This patch also adds the wait-exclusive flag, as suggested by Davi Arnaut. Signed-off-by: Davide Libenzi <[EMAIL PROTECTED]> - Davide Index: linux-2.6.21/fs/eventpoll.c === --- linux-2.6.21.orig/fs/eventpoll.c2007-05-11 14:32:31.0 -0700 +++ linux-2.6.21/fs/eventpoll.c 2007-05-11 16:33:38.0 -0700 @@ -26,7 +26,6 @@ #include #include #include -#include #include #include #include @@ -39,14 +38,13 @@ #include #include #include -#include /* * LOCKING: * There are three level of locking required by epoll : * * 1) epmutex (mutex) - * 2) ep->sem (rw_semaphore) + * 2) ep->mtx (mutes) * 3) ep->lock (rw_lock) * * The acquire order is the one listed above, from 1 to 3. @@ -57,20 +55,20 @@ * a spinlock. During the event transfer loop (from kernel to * user space) we could end up sleeping due a copy_to_user(), so * we need a lock that will allow us to sleep. This lock is a - * read-write semaphore (ep->sem). It is acquired on read during - * the event transfer loop and in write during epoll_ctl(EPOLL_CTL_DEL) - * and during eventpoll_release_file(). Then we also need a global - * semaphore to serialize eventpoll_release_file() and ep_free(). - * This semaphore is acquired by ep_free() during the epoll file + * mutex (ep->mtx). It is acquired during the event transfer loop, + * during epoll_ctl(EPOLL_CTL_DEL) and during eventpoll_release_file(). + * Then we also need a global mutex to serialize eventpoll_release_file() + * and ep_free(). + * This mutex is acquired by ep_free() during the epoll file * cleanup path and it is also acquired by eventpoll_release_file() * if a file has been pushed inside an epoll set and it is then * close()d without a previous call toepoll_ctl(EPOLL_CTL_DEL). - * It is possible to drop the "ep->sem" and to use the global - * semaphore "epmutex" (together with "ep->lock") to have it working, - * but having "ep->sem" will make the interface more scalable. + * It is possible to drop the "ep->mtx" and to use the global + * mutex "epmutex" (together with "ep->lock") to have it working, + * but having "ep->mtx" will make the interface more scalable. * Events that require holding "epmutex" are very rare, while for - * normal operations the epoll private "ep->sem" will guarantee - * a greater scalability. + * normal operations the epoll private "ep->mtx" will guarantee + * a better scalability. */ #define DEBUG_EPOLL 0 @@ -102,6 +100,8 @@ #define EP_MAX_EVENTS (INT_MAX / sizeof(struct epoll_event)) +#define EP_UNACTIVE_PTR ((void *) -1L) + struct epoll_filefd { struct file *file; int fd; @@ -111,7 +111,7 @@ * Node that is linked into the "wake_task_list" member of the "struct poll_safewake". * It is used to keep track on all tasks that are currently inside the wake_up() code * to 1) short-circuit the one coming from the same task and same wait queue head - * ( loop ) 2) allow a maximum number of epoll descriptors inclusion nesting + * (loop) 2) allow a maximum number of epoll descriptors inclusion nesting * 3) let go the ones coming from other tasks. */ struct wake_task_node { @@ -130,6 +130,48 @@ }; /* + * Each file descriptor added to the eventpoll interface will + * have an entry of this type linked to the "rbr" RB tree. + */ +struct epitem { + /* RB-Tree node used to link this structure to the eventpoll rb-tree */ + struct rb_node rbn; + + /* List header used to link this structure to the eventpoll ready list */ + struct list_head rdllink; + + /* The file descriptor information this item refers to */ + struct epoll_filefd ffd; + + /* Number of active wait queue attached to poll operations */ + int nwait; + + /* List containing poll wait queues */ + struct list_head pwqlist; + + /* The "container" of this item */ + struct eventpoll *ep; + + /* The structure that describe the interested events and the source fd */ + struct epoll_event event; + + /* +* Used to keep track of the usage count of the structure. This avoids +* that the structure will desappear from underneath our processing. +*/ + atomic_t usecnt; + + /* List header used to link this item to the "struct file" items list */ + struct list_head fllink; + + /* +* Works together "struct eventpoll"->ovflist in keeping the +* single linked chain of items. +
Re: [PATCH] MAINTAINERS: remove invalid list address for TPM
On Fri, 11 May 2007 09:19:32 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote: > > On Fri, 11 May 2007 16:42:07 +1000 Stephen Rothwell wrote: > > > This address bounces with "550 Unknown user". > > > > Signed-off-by: Stephen Rothwell <[EMAIL PROTECTED]> > > --- > > MAINTAINERS |1 - > > 1 files changed, 0 insertions(+), 1 deletions(-) > > Hm, I get: > > Your mail to 'tpmdd-devel' with the subject > test only > Is being held until the list moderator can review it for approval. > The reason it is being held: > Post by non-member to a members-only list Hmm, interesting. I did get the "550 Unknown user" bounce. I guess some glitch in sourceforge.net's mail system? > so what MAINTAINERS usually says in that case is "(subscribers-only)": > > L:[EMAIL PROTECTED] (subscribers-only) Fine, that seems sensible. -- Cheers, Stephen Rothwell[EMAIL PROTECTED] http://www.canb.auug.org.au/~sfr/ pgpOows0ZBJSo.pgp Description: PGP signature
Re: [PATCH] update sysfs kset initialisation in PPC64 DLPAR IO driver
On Fri, 2007-05-11 at 11:40 +0100, Andy Whitcroft wrote: > Michael Ellerman wrote: > > On Fri, 2007-05-11 at 00:16 -0700, Greg KH wrote: > >> On Thu, May 10, 2007 at 04:54:41PM +0100, Andy Whitcroft wrote: > >>> Greg KH wrote: > On Thu, May 10, 2007 at 03:00:50PM +0100, Andy Whitcroft wrote: > > Move the rpadlpar device from "struct subsystem" to "struct kset" > > following the changes in sysfs. > > > > Signed-off-by: Andy Whitcroft <[EMAIL PROTECTED]> > > --- > > > > Ok, this patch seems to sort out the compile problem > > here and indeed boots and runs kernbench. Perhaps > > you could confirm this is sufficient. > As per the discussion on the pci hotplug list, no, this doesn't seem to > fix the problem. The developers there are looking into it. If you can > test out patches for this, I'm sure the people there would appreciate > the help. > >>> Sure anything they have for testing, send them to me ... > >> They have the same patch that you made (I made it), yet they reported > >> that it didn't work properly for them. > >> > >> Can you test your patch out on "real" hardware? > > > > I tested it on real hardware, but it can't hurt for Andy to try it too I > > guess. > > To be fair I am not sure I have a clue how to test it. Got a recipe? > My patch was based on how other drivers seemed to be converted which is > a concern for those drivers. > > What sort of failure do you see? Prior to the removal of struct subysystem I get two files called 'add_slot' and 'remove_slot' under /sys/bus/pci/slots/control. With Greg's patch I get the directory /sys/bus/pci/slots/control, but nothing under it. Apparently John Rose is looking into it. cheers -- Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person signature.asc Description: This is a digitally signed message part
Re: [patch] ip_local_port_range sysctl has annoying default
David Miller wrote: > > All ports above and including 1024 are non-privileged and available to > anyone. > > Applications which have some requirements in this area need to work > those things out themselves. However, there are a large number of applications which have registered ports in this range. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 2.6.21-git15 - Kconfig Cleanup
Fix misc small issues/typos/grammar in Kconfigs for 2.6.21-git15. Signed-off-by: Matt LaPlante <[EMAIL PROTECTED]> -- diff -ru a/arch/arm/plat-s3c24xx/Kconfig b/arch/arm/plat-s3c24xx/Kconfig --- a/arch/arm/plat-s3c24xx/Kconfig 2007-04-25 23:08:32.0 -0400 +++ b/arch/arm/plat-s3c24xx/Kconfig 2007-05-11 21:44:06.0 -0400 @@ -70,7 +70,7 @@ help Set the chunksize in Kilobytes of the CRC for checking memory corruption over suspend and resume. A smaller value will mean that - the CRC data block will take more memory, but wil identify any + the CRC data block will take more memory, but will identify any faults with better precision. See diff -ru a/arch/blackfin/Kconfig b/arch/blackfin/Kconfig --- a/arch/blackfin/Kconfig 2007-05-11 20:32:24.0 -0400 +++ b/arch/blackfin/Kconfig 2007-05-11 21:33:28.0 -0400 @@ -435,100 +435,100 @@ default y help If enabled interrupt entry code (STORE/RESTORE CONTEXT) is linked - into L1 instruction memory.(less latency) + into L1 instruction memory. (less latency) config EXCPT_IRQ_SYSC_L1 - bool "Locate entire ASM lowlevel excepetion / interrupt - Syscall and CPLB handler code in L1 Memory" + bool "Locate entire ASM lowlevel exception / interrupt - Syscall and CPLB handler code in L1 Memory" default y help - If enabled entire ASM lowlevel exception and interrupt entry code (STORE/RESTORE CONTEXT) is linked - into L1 instruction memory.(less latency) + If enabled, the entire ASM lowlevel exception and interrupt entry code + (STORE/RESTORE CONTEXT) is linked into L1 instruction memory. (less latency) config DO_IRQ_L1 bool "Locate frequently called do_irq dispatcher function in L1 Memory" default y help - If enabled frequently called do_irq dispatcher function is linked - into L1 instruction memory.(less latency) + If enabled, the frequently called do_irq dispatcher function is linked + into L1 instruction memory. (less latency) config CORE_TIMER_IRQ_L1 bool "Locate frequently called timer_interrupt() function in L1 Memory" default y help - If enabled frequently called timer_interrupt() function is linked - into L1 instruction memory.(less latency) + If enabled, the frequently called timer_interrupt() function is linked + into L1 instruction memory. (less latency) config IDLE_L1 bool "Locate frequently idle function in L1 Memory" default y help - If enabled frequently called idle function is linked - into L1 instruction memory.(less latency) + If enabled, the frequently called idle function is linked + into L1 instruction memory. (less latency) config SCHEDULE_L1 bool "Locate kernel schedule function in L1 Memory" default y help - If enabled frequently called kernel schedule is linked - into L1 instruction memory.(less latency) + If enabled, the frequently called kernel schedule is linked + into L1 instruction memory. (less latency) config ARITHMETIC_OPS_L1 bool "Locate kernel owned arithmetic functions in L1 Memory" default y help If enabled arithmetic functions are linked - into L1 instruction memory.(less latency) + into L1 instruction memory. (less latency) config ACCESS_OK_L1 bool "Locate access_ok function in L1 Memory" default y help - If enabled access_ok function is linked - into L1 instruction memory.(less latency) + If enabled, the access_ok function is linked + into L1 instruction memory. (less latency) config MEMSET_L1 bool "Locate memset function in L1 Memory" default y help - If enabled memset function is linked - into L1 instruction memory.(less latency) + If enabled, the memset function is linked + into L1 instruction memory. (less latency) config MEMCPY_L1 bool "Locate memcpy function in L1 Memory" default y help - If enabled memcpy function is linked - into L1 instruction memory.(less latency) + If enabled, the memcpy function is linked + into L1 instruction memory. (less latency) config SYS_BFIN_SPINLOCK_L1 bool "Locate sys_bfin_spinlock function in L1 Memory" default y help - If enabled sys_bfin_spinlock function is linked - into L1 instruction memory.(less latency) + If enabled, the sys_bfin_spinlock function is linked + into L1 instruction memory. (less latency) config IP_CHECKSUM_L1 bool "Locate IP Checksum function in L1 Memory" default n help - If enabled IP Checksum function is linked -
Re: [patch] ip_local_port_range sysctl has annoying default
Mark Glines wrote: > > By a one-in-a-million coincidence, this machine has a default port > range starting with 2048, and this breaks things for me. I'm trying to > run both klive and nfs on this box, but klive starts first (probably > because of the filename sort order), and claims UDP port 2049 for its > own purposes, causing the nfs server to fail to start. > > If the bind hash size is over a certain threshold, the range > 32768-61000 is used. If it is under a certain threshold, a range > like (1024|2048|3072)-4999 is used, depending on exactly how small it > is. Thix box happened to get the 2048-4999 range, which broke nfs. > > A comment just above the code that does this says, "Try to be a bit > smarter and adjust defaults depending on available memory." "smarter"? > Maybe, maybe not. Either way, it's unexpected. > > Following the principle of least astonishment, I think it seems better > to use high, out-of-the-way port numbers regardless of how much RAM the > system has. So, the following patch changes this behavior slightly. > The system still picks a dynamic range depending on the bind hash size, > but now, all ranges start with 32768. I suppose another reasonable way > to do this would be to end all ranges with 61000, or something like > that. > Yes, that would be better. The IANA recommended port range for dynamic ports are 49152-65535; Linux extends this to 32768 and chops off some of the really high ports, but keeping them in the high range is thus the right thing to do. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] From: Paul Mundt <[EMAIL PROTECTED]>
On Sat, 12 May 2007 10:33:00 +0900 Paul Mundt <[EMAIL PROTECTED]> wrote: > On Fri, May 11, 2007 at 11:39:15AM -0700, Andrew Morton wrote: > > On Fri, 11 May 2007 09:57:50 -0700 > > [EMAIL PROTECTED] wrote: > > > > > > I'll take a look at tidying up the PMB slab, getting rid of the dtor > > > > shouldn't be terribly painful. I simply opted to do the list management > > > > there since others were doing it for the PGD slab cache at the time that > > > > was written. > > > > > > And here's the bit for dropping pmb_cache_dtor(), moving the list > > > management up to pmb_alloc() and pmb_free(). > > > > > > With this applied, we're all set for killing off slab destructors > > > from the kernel entirely. > > > > hm, this is already in Paul's git tree. > > > > If we're going to slam all this into 2.6.22 then I can just tempdrop Paul's > > tree. > > > > However I think we've done enough slab work for 2.6.22 now so I'm inclined > > to queue these changes for 2.6.23. That would mean that the slab changes in > > -mm have a dependency on the sh git tree which I am sure to forget about. > > If I end up merging these changes before Paul merges his tree, sh will > > break. Presumably Paul will notice this ;) > > I can prune it from my tree if you'd rather just bundle these together, I > wasn't sure what the timeline for these changes were, so I opted just to > toss the PMB rework in my git tree ahead of time. > > On the other hand, if Christoph's changes are going to be queued for > 2.6.23, the PMB changes will trickle in well before then anyways. It looks like we'll be going the latter trickle-in way, thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: [announce] Intel announces the PowerTOP utility for Linux
Words by Matt Mackall [Fri, May 11, 2007 at 07:17:19PM -0500]: > On Fri, May 11, 2007 at 04:07:18PM -0700, Arjan van de Ven wrote: > > > > What's eating the battery life of my laptop? Why isn't it many more > > hours? Which software component causes the most power to be burned? > > These are important questions without a good answer... until now. > > I get: > > No detailed statistics available; please enable the CONFIG_TIMER_STATS > kernel option > Must run as root (rw to /proc/timer_stats is needed). -- Jose Celestino http://www.msversus.org/ ; http://techp.org/petition/show/1 http://www.vinc17.org/noswpat.en.html "And on the trillionth day, Man created Gods." -- Thomas D. Pate - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] spelling fixes: arch/sh/
On Fri, May 11, 2007 at 08:43:12PM +0100, Simon Arlott wrote: > Spelling fixes in arch/sh/. > > Signed-off-by: Simon Arlott <[EMAIL PROTECTED]> Applied, thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] spelling fixes: arch/sh64/
On Fri, May 11, 2007 at 08:43:19PM +0100, Simon Arlott wrote: > Spelling fixes in arch/sh64/. > > Signed-off-by: Simon Arlott <[EMAIL PROTECTED]> Applied, thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] From: Paul Mundt <[EMAIL PROTECTED]>
On Fri, May 11, 2007 at 11:39:15AM -0700, Andrew Morton wrote: > On Fri, 11 May 2007 09:57:50 -0700 > [EMAIL PROTECTED] wrote: > > > > I'll take a look at tidying up the PMB slab, getting rid of the dtor > > > shouldn't be terribly painful. I simply opted to do the list management > > > there since others were doing it for the PGD slab cache at the time that > > > was written. > > > > And here's the bit for dropping pmb_cache_dtor(), moving the list > > management up to pmb_alloc() and pmb_free(). > > > > With this applied, we're all set for killing off slab destructors > > from the kernel entirely. > > hm, this is already in Paul's git tree. > > If we're going to slam all this into 2.6.22 then I can just tempdrop Paul's > tree. > > However I think we've done enough slab work for 2.6.22 now so I'm inclined > to queue these changes for 2.6.23. That would mean that the slab changes in > -mm have a dependency on the sh git tree which I am sure to forget about. > If I end up merging these changes before Paul merges his tree, sh will > break. Presumably Paul will notice this ;) I can prune it from my tree if you'd rather just bundle these together, I wasn't sure what the timeline for these changes were, so I opted just to toss the PMB rework in my git tree ahead of time. On the other hand, if Christoph's changes are going to be queued for 2.6.23, the PMB changes will trickle in well before then anyways. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/2] Slab allocators: Drop support for destructors
On Fri, May 11, 2007 at 09:57:51AM -0700, [EMAIL PROTECTED] wrote: > There is no user of destructors left. There is no reason why we should > keep checking for destructors calls in the slab allocators. > > The RFC for this patch was discussed at > http://marc.info/?l=linux-kernel&m=117882364330705&w=2 > > Destructors were mainly used for list management which required them to take a > spinlock. Taking a spinlock in a destructor is a bit risky since the slab > allocators may run the destructors anytime they decide a slab is no longer > needed. > > Patch drops destructor support. Any attempt to use a destructor will BUG(). > > Cc: Pekka Enberg <[EMAIL PROTECTED]> > Cc: Paul Mundt <[EMAIL PROTECTED]> > Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> > Acked-by: Paul Mundt <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Bugme-new] [Bug 8462] New: applications under wine freezes
--- Davide Libenzi wrote: > Charles, would you mind trying the patch below > against -git13 on your > machine. I tested it with wine and firefox on a 32 > bit P4 with HT and it's > working fine. i applied the patch against git13. starcraft, pokerstars, and firefox under wine have not frozen. looks good. Never miss an email again! Yahoo! Toolbar alerts you the instant new Mail arrives. http://tools.search.yahoo.com/toolbar/features/mail/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way
On Sat, 12 May 2007, Oleg Nesterov wrote: > > However, in my opininon THAT PATCH has nothing to do with this problem. > It just improves the code that we already have. Sure. However, I think it does it THE WRONG WAY, and doesn't actually fix the much deeper problems with the freezer, as shown by the fact that the lock is *still* broken for other cases. So, here's a summary: - we should not take the lock inside the function, because taking it there is fundamentally wrong, and leaves all the *other* races in place. - if you actually want to solve the other races, the lock needs to be taken by the caller, in which case taking it in the callee is obviously (again) wrong. - or then, we accept that the race wasn't fixed AT ALL, and you add other code to _other_ places to handle the case where you froze the wrong thread (or didn't freeze the right one). And I'm not making that up. Look at most of the other patches in that series: they are _exactly_ about the scenario I'm outlining. - the whole "kernel thread vs user thread" thing is the wrong thing to check in the first place, since we just should never touch kernel threads in the first place, and anything that wants to freeze user space should have disabled exec_usermodehelper() at a higher level That's why I'm so unhappy. The "fix" is going in the wrong direction. Each fix on their own may be an "improvement", but the end result of many of the fixes is a total mess! We can continue to add bandaids to something broken, until it "works". But the end result, while "working", is not actually any better. Quite the reverse - the end result of something like that is that you add all these magic rules and special cases. So in the end one ugly design decision leads to broken locking, which in turn leads to other cases where you add more broken code, which just leads to a situation where nobody actually understands what the *design* is, because there simply *isn't* any design - it's just a hodge-podge of "but this fixes a bug" ad-hoc "fixes". Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-mm2: HDAPS? BUG: at kernel/mutex.c:311
On Fri, 11 May 2007 17:53:35 -0700 Andrew Morton <[EMAIL PROTECTED]> wrote: > And indeed that's buggy - the non-debug version of spin_lock_mutex() is not > irq-safe. > > I'd say that's pretty dumb of the mutex interface, really. Doing a > mutex_trylock() should be OK from all contexts. We can fix this in a low-impact fashion by making mutex_trylock() do a spin_trylock() on mutex->wait_lock, no? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way
On Saturday, 12 May 2007 02:08, Linus Torvalds wrote: > > On Sat, 12 May 2007, Oleg Nesterov wrote: > > > > things change, ->mm is not stable if the kernel thread does use_mm/unuse_mm. > > ->mm is not stable *regardless*! > > Trivial examples: > - kernel thread does execve() > - user thread does exit(). > > The use "use_mm()" and "unuse_mm()" things are total red herrings. > > If the freezer depends on the difference between user and kernel threads, > then THAT PATCH IS BUGGY. It's that simple. It tests something that simply > isn't stable outside the lock, and then returns that value after having > unlocked it. > > It might as well return a random number. > > > However, the return value == 0 does not change in that particular case, > > exactly because is_user_space() takes task_lock(). > > As does exit_mm() etc. > > That's NOT THE POINT. You cannot use the end result after releasing the > task lock, because the moment you release the task lock, it becomes > totally irrelevant, and may not be true any more. > > Example (a): > - you ask "is_user_space(p)", it returns 1. > - before you actually have time to do anything about it, the task exists, >and (since you don't hold the lock any more) will now have a NULL >tsk->mm again (and would now return 0 if you called it again). In which case we won't be freezing this task at all. > Example (b): > - you ask "is_user_space(p)" and it returns 0, because it's a kernel >thread > - before you actually do anything about it (but after you released the >task lock), the kernel thread does an "execve(/sbin/hotplug)" and is no >longer a kernel thread. This is a special case that needs special handling. > In both cases will the caller have a return value THAT IS NO LONGER TRUE. > > See? The locking was pointless. Exactly because you release the lock > before the user can actually do anything about the return value! > > The fact that the locking protects against the very specific case of AIO > where the threads _stay_ user tasks and don't really change is pretty much > irrelevant, as far as I can see. Well, I disagree. We need the locking *exactly* to avoid situations in which the threads don't really change, but we might think that they *have changed*. More precisely, it's needed, because without it kernel threads which execute use_mm()/unuse_mm() might be identified as user space processes, and that would be wrong. The other cases are beyond the scope of this patch. Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] ip_local_port_range sysctl has annoying default
On Sat, 12 May 2007 00:06:45 UTC David Miller <[EMAIL PROTECTED]> wrote: > All ports above and including 1024 are non-privileged and available to > anyone. > > Applications which have some requirements in this area need to work > those things out themselves. Hi David, I agree completely. My issue is that an application which doesn't care which port it binds to (twistd, on klive's behalf) stomped on the port of an application which cares very much about which port it binds to (nfs). I will gladly accept *any* solution to this problem. I agree that it would be preferable to change the port NFS decides to bind to. If you have a patch to do this, I will happily apply it and go on my merry way. However, the world we live in does have port numbers exceeding 1024 listed in /etc/services. What I'd like to know is, for applications which don't care what port they get, the kernel will assign values of 32768 and above on some machines, but not others. (Based on their bind hash size.) Starting from 32768 seems like very sane behavior to me, because it minimizes the chances of a collision, and (as far as I know) doesn't cost anything. A configuration which stomps on a not-entirely-unknown application like nfs *by default* isn't necessarily a bug, but it is a worst case scenario, from the perspective of a lowly user like me, who wants things to Just Work. :) Is there a compelling reason not to assign random ports starting from 32768 everywhere regardless of their bind hash size, like my patch attempts to do? Does it consume any extra resources to do so? Thanks, Mark - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] spelling fixes: arch/m68knommu/
On Fri, 11 May 2007, Simon Arlott wrote: > - * Local routines to interrcept the standard I/O and vector handling > - * code. Don't include this 'till now - initialization code above needs > + * Local routines to intercept the standard I/O and vector handling > + * code. Don't include this until now - initialization code above needs > * access to the real code too. What's wrong with 'til? > - * Sub-architcture dependant initialization code for the Freescale > + * Sub-architcture dependent initialization code for the Freescale ... > - * Sub-architcture dependant initialization code for the Freescale > + * Sub-architcture dependent initialization code for the Freescale ... > - * Sub-architcture dependant initialization code for the Motorola > + * Sub-architcture dependent initialization code for the Motorola You want "Sub-architecture". -f - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way
On 05/12, Oleg Nesterov wrote: > > Do we need freezer? Should we freeze kernel threads? I can't judge. I tried > to read a long thread about suspend, and failed to understand it. > > I personally think we can simplify things if CPU-hotplug use freezer, at > least. Just a small example, debug_smp_processor_id: /* * Kernel threads bound to a single CPU can safely use * smp_processor_id(): */ this_mask = cpumask_of_cpu(this_cpu); if (cpus_equal(current->cpus_allowed, this_mask)) goto out; This is not true with CONFIG_HOTPLUG_CPU. This becomes true if we freeze the kernel threads from CPU_DOWN_PREPARE to CPU_DEAD. Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-mm2: HDAPS? BUG: at kernel/mutex.c:311
On Fri, 11 May 2007 19:21:15 -0500 Matt Mackall <[EMAIL PROTECTED]> wrote: > This just hit: > > [7.856000] usbcore: registered new interface driver usbhid > [7.86] BUG: at kernel/mutex.c:311 __mutex_trylock_slowpath() > [7.868000] [] show_trace_log_lvl+0x1a/0x30 > [7.872000] [] show_trace+0x12/0x14 > [7.876000] [] dump_stack+0x15/0x17 > [7.88] [] mutex_trylock+0x56/0x15a > [7.888000] [] hdaps_mousedev_poll+0x10/0xcb > [7.892000] [] run_timer_softirq+0x10e/0x16f > [7.896000] [] __do_softirq+0x5d/0xc0 > [7.90] [] do_softirq+0x6e/0xf0 > [7.904000] [] irq_exit+0x3e/0x7b > [7.912000] [] do_IRQ+0x9d/0xb2 > [7.916000] [] common_interrupt+0x2e/0x34 > [7.92] [] printk+0x1b/0x1d > [7.924000] [] usb_register_driver+0xa0/0xe5 > [7.928000] [] hid_init+0x28/0x51 > [7.932000] [] kernel_init+0xbc/0x23e > [7.94] [] kernel_thread_helper+0x7/0x10 > [7.944000] === > [7.948000] drivers/hid/usbhid/hid-core.c: v2.6:USB HID core driver > > Looks like it's triggered by the HDAPS driver. > It's complaining about a mutex_trylock() being run from irq context. And indeed that's buggy - the non-debug version of spin_lock_mutex() is not irq-safe. I'd say that's pretty dumb of the mutex interface, really. Doing a mutex_trylock() should be OK from all contexts. This is caused by a recent semaphore->mutex conversion and it's in mainline now. Ho hum. I suppose a suitable workaround would be to convert hdaps_mtx back into a semaphore. ug. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Refine SCREEN_INFO sanity check for vgacon initialization.
Gerd Hoffmann <[EMAIL PROTECTED]> writes: > Hi, > > Checking video mode field only to see whenever SCREEN_INFO is > initialized is not enougth, in some cases it is zero although > a vga card is present. Lets additionally check cols and lines. Acked-by: "Eric W. Biederman" <[EMAIL PROTECTED]> > > Updates f82af20e1a028e16b9bb11da081fa1148d40fa6a, should go > into 2.6.22. > > please apply, > Gerd > > Refine SCREEN_INFO sanity check for vgacon initialization. > > Checking video mode field only to see whenever SCREEN_INFO is > initialized is not enougth, in some cases it is zero although > a vga card is present. Lets additionally check cols and lines. > > Signed-off-by: Gerd Hoffmann <[EMAIL PROTECTED]> > Cc: Rusty Russell <[EMAIL PROTECTED]> > Cc: Andi Kleen <[EMAIL PROTECTED]> > Cc: Alan <[EMAIL PROTECTED]> > Cc: Ingo Molnar <[EMAIL PROTECTED]> > Cc: Eric W. Biederman <[EMAIL PROTECTED]> > --- > drivers/video/console/vgacon.c |9 +++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > Index: vanilla-2.6.21-git11/drivers/video/console/vgacon.c > === > --- vanilla-2.6.21-git11.orig/drivers/video/console/vgacon.c > +++ vanilla-2.6.21-git11/drivers/video/console/vgacon.c > @@ -368,9 +368,14 @@ static const char *vgacon_startup(void) > #endif > } > > + /* SCREEN_INFO initialized? */ > + if ((ORIG_VIDEO_MODE == 0) && > + (ORIG_VIDEO_LINES == 0) && > + (ORIG_VIDEO_COLS == 0)) > + goto no_vga; > + > /* VGA16 modes are not handled by VGACON */ > - if ((ORIG_VIDEO_MODE == 0x00) || /* SCREEN_INFO not initialized */ > - (ORIG_VIDEO_MODE == 0x0D) ||/* 320x200/4 */ > + if ((ORIG_VIDEO_MODE == 0x0D) ||/* 320x200/4 */ > (ORIG_VIDEO_MODE == 0x0E) ||/* 640x200/4 */ > (ORIG_VIDEO_MODE == 0x10) ||/* 640x350/4 */ > (ORIG_VIDEO_MODE == 0x12) ||/* 640x480/4 */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way
On 05/11, Linus Torvalds wrote: > > On Sat, 12 May 2007, Oleg Nesterov wrote: > > > > things change, ->mm is not stable if the kernel thread does use_mm/unuse_mm. > > ->mm is not stable *regardless*! > > Trivial examples: > - kernel thread does execve() > - user thread does exit(). Yes sure. Quoting myself, > > true->false means daemonize() or do_exit(), seems harmless. > > false->true means exec from kernel space. That is why FREEZER_KERNEL_THREADS > in fact means all tasks, not only kernel threads. > > The use "use_mm()" and "unuse_mm()" things are total red herrings. > > If the freezer depends on the difference between user and kernel threads, > then THAT PATCH IS BUGGY. It's that simple. This is another story, I can't comment because I am not educated enough. However, in my opininon THAT PATCH has nothing to do with this problem. It just improves the code that we already have. > > However, the return value == 0 does not change in that particular case, > > exactly because is_user_space() takes task_lock(). > > As does exit_mm() etc. Note the "in that particular case". > See? The locking was pointless. Exactly because you release the lock > before the user can actually do anything about the return value! Yes. See the "Quoting myself" above. > Anyway, I think the whole freezer thing is broken. There's no reason to > freeze kernel threads. It is not perfect. Rafael tries to improve it. Do we need freezer? Should we freeze kernel threads? I can't judge. I tried to read a long thread about suspend, and failed to understand it. I personally think we can simplify things if CPU-hotplug use freezer, at least. Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SLUB under lguest on i386
On Fri, 11 May 2007, Oliver Xymoron wrote: > And no sign of further progress. SLAB worked fine. Add slub_debug to the command line. Any changes or any additional diagnostic output? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: x86 setup rewrite tree ready for flamage^W review
Kevin Winchester wrote: > Not sure if you were looking for testing, but I fuzzed it to apply to > 2.6.21-git and gave it a spin. Worked just like a normal boot (which I > assume was the point). That would be the point, yes :) Looking for breakage in video mode detection, memory detection, and APM are probably the trickiest areas. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] From: Paul Mundt <[EMAIL PROTECTED]>
On Fri, 11 May 2007, Andrew Morton wrote: > However I think we've done enough slab work for 2.6.22 now so I'm inclined > to queue these changes for 2.6.23. That would mean that the slab changes in > -mm have a dependency on the sh git tree which I am sure to forget about. > If I end up merging these changes before Paul merges his tree, sh will > break. Presumably Paul will notice this ;) Ok. Only mm is fine for what I have planned. I want to add a kmem_cache_ops structure for 2.6.23. Maybe I can use the now useless dtor field of kmem_cache_create for this? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.21-mm2: HDAPS? BUG: at kernel/mutex.c:311
This just hit: [7.856000] usbcore: registered new interface driver usbhid [7.86] BUG: at kernel/mutex.c:311 __mutex_trylock_slowpath() [7.868000] [] show_trace_log_lvl+0x1a/0x30 [7.872000] [] show_trace+0x12/0x14 [7.876000] [] dump_stack+0x15/0x17 [7.88] [] mutex_trylock+0x56/0x15a [7.888000] [] hdaps_mousedev_poll+0x10/0xcb [7.892000] [] run_timer_softirq+0x10e/0x16f [7.896000] [] __do_softirq+0x5d/0xc0 [7.90] [] do_softirq+0x6e/0xf0 [7.904000] [] irq_exit+0x3e/0x7b [7.912000] [] do_IRQ+0x9d/0xb2 [7.916000] [] common_interrupt+0x2e/0x34 [7.92] [] printk+0x1b/0x1d [7.924000] [] usb_register_driver+0xa0/0xe5 [7.928000] [] hid_init+0x28/0x51 [7.932000] [] kernel_init+0xbc/0x23e [7.94] [] kernel_thread_helper+0x7/0x10 [7.944000] === [7.948000] drivers/hid/usbhid/hid-core.c: v2.6:USB HID core driver Looks like it's triggered by the HDAPS driver. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [announce] Intel announces the PowerTOP utility for Linux
On Fri, May 11, 2007 at 04:07:18PM -0700, Arjan van de Ven wrote: > > What's eating the battery life of my laptop? Why isn't it many more > hours? Which software component causes the most power to be burned? > These are important questions without a good answer... until now. I get: No detailed statistics available; please enable the CONFIG_TIMER_STATS kernel option with: $ zgrep STATS /proc/config.gz # CONFIG_TASKSTATS is not set # CONFIG_SCHEDSTATS is not set CONFIG_TIMER_STATS=y -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way
On Saturday, 12 May 2007 01:25, Andrew Morton wrote: > On Sat, 12 May 2007 01:22:06 +0200 > "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > > > On Saturday, 12 May 2007 00:56, Linus Torvalds wrote: > > > > > > On Fri, 11 May 2007, Rafael J. Wysocki wrote: > > > > > > > > For user space processes this condition is always true. > > > > > > > > For kernel threads: > > > > (1) the change of tsk->mm from NULL to a nonzero value is only made in > > > > fs/aio.c:use_mm() along with the setting of PF_BORROWED_MM under > > > > the task_lock(), > > > > (2) the change of tsk->mm from a nonzero value to NULL is only made in > > > > fs/aio.c:unuse_mm() along with the resetting of PF_BORROWED_MM > > > > under the task_lock(). > > > > Therefore, by taking the task_lock() here we make sure that the > > > > condition > > > > is alyways false when we check it for kernel threads. > > > > > > Why *test* it then and return anything? > > > > > > Why not just doa "task_lock(p); task_unlock(p);" with no return value? > > > > > > As it is, it sounds like either the code is buggy, or it's pointless. > > > > I'm not sure what you mean. > > > > We use this function (ie. kernel/power/process.c:is_user_space()) to > > distinguish kernel threads from user space processes. Therefore we make it > > always return true for user space processes and always return false for > > kernel > > threads. In the latter case we need to use the task_lock() to ensure that > > the > > result is as desired (ie. false), because otherwise it might be racing with > > either fs/aio.c:use_mm() or fs/aio.c:unuse_mm(). > > > > ah, OK. > > static void use_mm(struct mm_struct *mm) > { > struct mm_struct *active_mm; > struct task_struct *tsk = current; > > task_lock(tsk); > tsk->flags |= PF_BORROWED_MM; > active_mm = tsk->active_mm; > atomic_inc(&mm->mm_count); > tsk->mm = mm; > tsk->active_mm = mm; > /* >* Note that on UML this *requires* PF_BORROWED_MM to be set, otherwise >* it won't work. Update it accordingly if you change it here >*/ > switch_mm(active_mm, mm, tsk); > task_unlock(tsk); > > So is_user_space() requires that the state of p->mm and p->flags be > consistent: it doesn't want to be looking at those two things in that > three-statement window above. > > Good changelogging and commenting save quite a bit of time and email. Very true. I have added a comment to the patch, so that we remeber why the task_lock() is there. Please replace the original patch with this one (unless you think it's worse ;-)). --- From: Rafael J. Wysocki <[EMAIL PROTECTED]> The reading of PF_BORROWED_MM in is_user_space() without task_lock() is racy. Fix it. Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]> Acked-by: Pavel Machek <[EMAIL PROTECTED]> --- kernel/power/process.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) Index: linux-2.6/kernel/power/process.c === --- linux-2.6.orig/kernel/power/process.c +++ linux-2.6/kernel/power/process.c @@ -8,6 +8,7 @@ #undef DEBUG +#include #include #include #include @@ -88,7 +89,18 @@ static void cancel_freezing(struct task_ static inline int is_user_space(struct task_struct *p) { - return p->mm && !(p->flags & PF_BORROWED_MM); + int ret; + + /* +* task_lock() is acquired to avoid evaluating the condition while the +* state of p->mm and p->flags is not consistent, which may happen, +* for example, if this function is executed in parallel with +* fs/aio.c:unuse_mm() +*/ + task_lock(p); + ret = p->mm && !(p->flags & PF_BORROWED_MM); + task_unlock(p); + return ret; } static unsigned int try_to_freeze_tasks(int freeze_user_space) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way
On Sat, 12 May 2007, Oleg Nesterov wrote: > > things change, ->mm is not stable if the kernel thread does use_mm/unuse_mm. ->mm is not stable *regardless*! Trivial examples: - kernel thread does execve() - user thread does exit(). The use "use_mm()" and "unuse_mm()" things are total red herrings. If the freezer depends on the difference between user and kernel threads, then THAT PATCH IS BUGGY. It's that simple. It tests something that simply isn't stable outside the lock, and then returns that value after having unlocked it. It might as well return a random number. > However, the return value == 0 does not change in that particular case, > exactly because is_user_space() takes task_lock(). As does exit_mm() etc. That's NOT THE POINT. You cannot use the end result after releasing the task lock, because the moment you release the task lock, it becomes totally irrelevant, and may not be true any more. Example (a): - you ask "is_user_space(p)", it returns 1. - before you actually have time to do anything about it, the task exists, and (since you don't hold the lock any more) will now have a NULL tsk->mm again (and would now return 0 if you called it again). Example (b): - you ask "is_user_space(p)" and it returns 0, because it's a kernel thread - before you actually do anything about it (but after you released the task lock), the kernel thread does an "execve(/sbin/hotplug)" and is no longer a kernel thread. In both cases will the caller have a return value THAT IS NO LONGER TRUE. See? The locking was pointless. Exactly because you release the lock before the user can actually do anything about the return value! The fact that the locking protects against the very specific case of AIO where the threads _stay_ user tasks and don't really change is pretty much irrelevant, as far as I can see. Anyway, I think the whole freezer thing is broken. There's no reason to freeze kernel threads. If you want to freeze user processes, go ahead. But then you need a lock to make sure that new processes don't *become* user processes (ie you need to disable hotplug). And if you want to protect against cpufreq, do so. But don't try to say that you want to freeze all kernel threads. Just protect against cpufreq threads. Don't make all the other threads that have *zero* interest in freezing have to worry about it. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] spelling fixes: arch/powerpc/
Simon Arlott writes: > Spelling fixes in arch/powerpc/. > - /* Retreive CPU related informations from the flat tree > + /* Retreive CPU related information from the flat tree ^^ You missed one. :) > - /* Clear the freeze bit, and reenable the interrupt. > + /* Clear the freeze bit, and re-enable the interrupt. reenable -> re-enable is a bit marginal, but OK. > /* Ok, now let's get cracking. You may ask me why I just didn't match >* the iic host from the iic OF node, but that way I'm still compatible > - * with really really old old firmwares for which we don't have a node > + * with really really old old firmware for which we don't have a node I think "firmwares" here was meaning to imply there were more than one instance or release of firmware which had the property described, and your change loses that. > - * differenciate them all and since that hack was there for a long > + * differentiate them all and since that hack was there for a long I haven't been too strict about the Franglais in the past, but OK. :) Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] ip_local_port_range sysctl has annoying default
From: Mark Glines <[EMAIL PROTECTED]> Date: Fri, 11 May 2007 17:01:35 -0700 > Following the principle of least astonishment, I think it seems better > to use high, out-of-the-way port numbers regardless of how much RAM the > system has. So, the following patch changes this behavior slightly. > The system still picks a dynamic range depending on the bind hash size, > but now, all ranges start with 32768. I suppose another reasonable way > to do this would be to end all ranges with 61000, or something like > that. All ports above and including 1024 are non-privileged and available to anyone. Applications which have some requirements in this area need to work those things out themselves. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way
I hope Rafael will correct me if I am wrong, On 05/12, Oleg Nesterov wrote: > > On 05/11, Linus Torvalds wrote: > > > > On Sat, 12 May 2007, Oleg Nesterov wrote: > > > > > > without task_lock() we can see "p->mm != NULL" but not PF_BORROWED_MM. > > > > Let me explain it one more time: > > - shouldn't the *caller* protect this? > > > > Afaik, there's two situations: > > - either things don't change (in which case you don't need locking at > >all, since things are statically one way or the other) > > - or things change (in which case the caller can't rely on the return > >value anyway, since they might change *after* you release the lock) > > things change, ->mm is not stable if the kernel thread does use_mm/unuse_mm. > > However, the return value == 0 does not change in that particular case, > exactly because is_user_space() takes task_lock(). Probably there is some misunderstanding. This patch doesn't claim it solves all problems. Before this patch we have static inline int is_user_space(struct task_struct *p) { return p->mm && !(p->flags & PF_BORROWED_MM); } and this is clearly racy wrt to use_mm() which sets this PF_BORROWED_MM bit. So this is just a little improvement, nothing more. Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] ip_local_port_range sysctl has annoying default
On a powerpc machine (kurobox) I have here with 128M of RAM, the default value of /proc/sys/net/ipv4/ip_local_port_range is: 20484999 This setting affects the port assigned to an application by default when the application doesn't specify a port to use, like, for instance, an outgoing connection. It affects both TCP and UDP. The default values for this sysctl vary depending on the size of the tcp bind hash, which in turn, varies depending on the size of the system RAM (I think). By a one-in-a-million coincidence, this machine has a default port range starting with 2048, and this breaks things for me. I'm trying to run both klive and nfs on this box, but klive starts first (probably because of the filename sort order), and claims UDP port 2049 for its own purposes, causing the nfs server to fail to start. If the bind hash size is over a certain threshold, the range 32768-61000 is used. If it is under a certain threshold, a range like (1024|2048|3072)-4999 is used, depending on exactly how small it is. Thix box happened to get the 2048-4999 range, which broke nfs. A comment just above the code that does this says, "Try to be a bit smarter and adjust defaults depending on available memory." "smarter"? Maybe, maybe not. Either way, it's unexpected. Following the principle of least astonishment, I think it seems better to use high, out-of-the-way port numbers regardless of how much RAM the system has. So, the following patch changes this behavior slightly. The system still picks a dynamic range depending on the bind hash size, but now, all ranges start with 32768. I suppose another reasonable way to do this would be to end all ranges with 61000, or something like that. It also seems funny to me that this would be in tcp_init(), when it affects both TCP and UDP. But hey, it is where it is. Signed-off-by: Mark Glines <[EMAIL PROTECTED]> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index bd4c295..4431b87 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2464,14 +2464,14 @@ void __init tcp_init(void) (tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket)); order++) ; + sysctl_local_port_range[0] = 32768; if (order >= 4) { - sysctl_local_port_range[0] = 32768; sysctl_local_port_range[1] = 61000; tcp_death_row.sysctl_max_tw_buckets = 18; sysctl_tcp_max_orphans = 4096 << (order - 4); sysctl_max_syn_backlog = 1024; } else if (order < 3) { - sysctl_local_port_range[0] = 1024 * (3 - order); + sysctl_local_port_range[1] = 32768 + (1024 * order); tcp_death_row.sysctl_max_tw_buckets >>= (3 - order); sysctl_tcp_max_orphans >>= (3 - order); sysctl_max_syn_backlog = 128; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: x86 setup rewrite tree ready for flamage^W review
H. Peter Anvin wrote: > Hello all, > > I believe the x86 setup tree is now finished. I will turn it into a > "clean patchset" later this week, but I wanted to get flamed^W feedback > on it first. > > The git tree is at: > > http://git.kernel.org/?p=linux/kernel/git/hpa/linux-2.6-newsetup.git;a=summary > git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-newsetup.git > ... > > ... and a flat patch at ... > > http://www.kernel.org/pub/linux/kernel/people/hpa/newsetup-36f021b5.patch > > -hpa > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > > Not sure if you were looking for testing, but I fuzzed it to apply to 2.6.21-git and gave it a spin. Worked just like a normal boot (which I assume was the point). [0.00] Linux version 2.6.21-g0a3fd051-dirty ([EMAIL PROTECTED]) (gcc version 4.1.2 (Gentoo 4.1.2)) #9 PREEMPT Fri May 11 20:50:02 ADT 2007 [0.00] Command line: root=/dev/sda3 [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: - 0009f800 (usable) [0.00] BIOS-e820: 0009f800 - 000a (reserved) [0.00] BIOS-e820: 000f - 0010 (reserved) [0.00] BIOS-e820: 0010 - 1fef (usable) [0.00] BIOS-e820: 1fef - 1fef3000 (ACPI NVS) [0.00] BIOS-e820: 1fef3000 - 1ff0 (ACPI data) [0.00] BIOS-e820: fec0 - 0001 (reserved) [0.00] Entering add_active_range(0, 0, 159) 0 entries of 256 used [0.00] Entering add_active_range(0, 256, 130800) 1 entries of 256 used [0.00] end_pfn_map = 1048576 [0.00] DMI 2.3 present. [0.00] ACPI: RSDP 000F77D0, 0014 (r0 VIAK8T) [0.00] ACPI: RSDT 1FEF3040, 0034 (r1 VIAK8T AWRDACPI 42302E31 AWRD0) [0.00] ACPI: FACP 1FEF30C0, 0074 (r1 VIAK8T AWRDACPI 42302E31 AWRD0) [0.00] ACPI: DSDT 1FEF3180, 4F8A (r1 VIAK8T AWRDACPI 1000 MSFT 10E) [0.00] ACPI: FACS 1FEF, 0040 [0.00] ACPI: BOOT 1FEF8180, 0028 (r1 VIAK8T AWRDACPI 42302E31 AWRD0) [0.00] ACPI: SSDT 1FEF82C0, 00B5 (r1 PTLTD POWERNOW1 LTP1) [0.00] ACPI: APIC 1FEF8200, 0068 (r1 VIAK8T AWRDACPI 42302E31 AWRD0) [0.00] Entering add_active_range(0, 0, 159) 0 entries of 256 used [0.00] Entering add_active_range(0, 256, 130800) 1 entries of 256 used [0.00] Zone PFN ranges: [0.00] DMA 0 -> 4096 [0.00] DMA324096 -> 1048576 [0.00] Normal1048576 -> 1048576 [0.00] early_node_map[2] active PFN ranges [0.00] 0:0 -> 159 [0.00] 0: 256 -> 130800 [0.00] On node 0 totalpages: 130703 [0.00] DMA zone: 56 pages used for memmap [0.00] DMA zone: 1356 pages reserved [0.00] DMA zone: 2587 pages, LIFO batch:0 [0.00] DMA32 zone: 1732 pages used for memmap [0.00] DMA32 zone: 124972 pages, LIFO batch:31 [0.00] Normal zone: 0 pages used for memmap [0.00] ACPI: PM-Timer IO Port: 0x4008 [0.00] ACPI: Local APIC address 0xfee0 [0.00] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) [0.00] Processor #0 (Bootup-CPU) [0.00] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] disabled) [0.00] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) [0.00] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) [0.00] ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0]) [0.00] IOAPIC[0]: apic_id 2, address 0xfec0, GSI 0-23 [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) [0.00] ACPI: IRQ0 used by override. [0.00] ACPI: IRQ2 used by override. [0.00] ACPI: IRQ9 used by override. [0.00] Setting APIC routing to flat [0.00] Using ACPI (MADT) for SMP configuration information [0.00] Allocating PCI resources starting at 2000 (gap: 1ff0:ded0) [0.00] Built 1 zonelists. Total pages: 127559 [0.00] Kernel command line: root=/dev/sda3 [0.00] Initializing CPU#0 [0.00] PID hash table entries: 2048 (order: 11, 16384 bytes) [ 13.327158] time.c: Detected 1838.853 MHz processor. [ 13.328328] Console: colour VGA+ 80x25 [ 13.331709] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes) [ 13.332048] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes) [ 13.332193] Checking aperture... [ 13.332264] CPU 0: aperture @ e000 size 128 MB [ 13.337619] Memory: 509000k/523200k available (3246k kernel code, 13476k reserved, 1225k d
Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way
On Saturday, 12 May 2007 01:29, Linus Torvalds wrote: > > On Sat, 12 May 2007, Rafael J. Wysocki wrote: > > > > We use this function (ie. kernel/power/process.c:is_user_space()) to > > distinguish kernel threads from user space processes. Therefore we make it > > always return true for user space processes and always return false for > > kernel > > threads. In the latter case we need to use the task_lock() to ensure that > > the > > result is as desired (ie. false), because otherwise it might be racing with > > either fs/aio.c:use_mm() or fs/aio.c:unuse_mm(). > > But there is no race protection in the *caller*, so if it can ever return > one or the other, what protects it from changing once the caller returns? > > And if the value can change (because some thread uses "use_mm()"), then > the caller cannot rely on the value that got returned. The value cannot change because of that. There only is a small window inside unuse_mm() (or use_mm()) in which the value may be wrong. Namely: static void unuse_mm(struct mm_struct *mm) { struct task_struct *tsk = current; task_lock(tsk); tsk->flags &= ~PF_BORROWED_MM; --- --- If is_user_space() without the task_lock() is called right here, it will --- return 'true', although it should return 'false'. --- tsk->mm = NULL; /* active_mm is still 'mm' */ enter_lazy_tlb(mm, tsk); task_unlock(tsk); } IOW, quoting Andrew, "is_user_space() requires that the state of p->mm and p->flags be consistent". > So you migt as well not return any value at all, since the returned value > is apparently meaningless once the lock has been released. No, it is not meaningless. Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way
On 05/11, Linus Torvalds wrote: > > On Sat, 12 May 2007, Oleg Nesterov wrote: > > > > without task_lock() we can see "p->mm != NULL" but not PF_BORROWED_MM. > > Let me explain it one more time: > - shouldn't the *caller* protect this? > > Afaik, there's two situations: > - either things don't change (in which case you don't need locking at >all, since things are statically one way or the other) > - or things change (in which case the caller can't rely on the return >value anyway, since they might change *after* you release the lock) things change, ->mm is not stable if the kernel thread does use_mm/unuse_mm. However, the return value == 0 does not change in that particular case, exactly because is_user_space() takes task_lock(). Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RESEND] PIE randomization
On Fri, 11 May 2007, Andrew Morton wrote: > I could reverse-engineer that info from the patch, I guess, but I'd > prefer to go in the opposite direction: you tell us what the patch is > trying to do, then we look at it and see if we agree that it is in fact > doing that. I've just quickly looked at the patch and it seems fine - it's using mmap()'s randomization functionality in such a way that it maps the the main executable of (specially compiled/linked) ET_DYN binaries onto a random address (in cases in which mmap() is allowed to perform a randomization). Which is what we want, I'd guess. Jan, would you care to update the patch with proper Changelog entry? However, I seem to get "soft" hang on boot with this patch, approximately at the time the init should be executed. The system is not completely stuck - interrupts are delivered, keyboard is working, alt-sysrq-t dumps proper output, but userspace doesn't seem to get started. This happens on i386, didn't try on other archs. -- Jiri Kosina - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Use boot based time for process start time and boot time in /proc
Hi, On Fri, May 11, 2007 at 12:51:32PM -0700, Andrew Morton wrote: > On Fri, 11 May 2007 10:45:31 +0200 > Tomas Janousek <[EMAIL PROTECTED]> wrote: > > > Hello, > > > > On Thu, May 10, 2007 at 04:40:47PM -0700, Andrew Morton wrote: > > > Tomas Janousek <[EMAIL PROTECTED]> wrote: > > > > @@ -445,12 +445,14 @@ static int show_stat(struct seq_file *p, void *v) > > > > unsigned long jif; > > > > cputime64_t user, nice, system, idle, iowait, irq, softirq, > > > > steal; > > > > u64 sum = 0; > > > > + struct timespec boottime; > > > > > > > > user = nice = system = idle = iowait = > > > > irq = softirq = steal = cputime64_zero; > > > > - jif = - wall_to_monotonic.tv_sec; > > > > - if (wall_to_monotonic.tv_nsec) > > > > - --jif; > > > > + getboottime(&boottime); > > > > + jif = boottime.tv_sec; > > > > + if (boottime.tv_nsec) > > > > + ++jif; > > > > > > > > > getboottime(&boottime); > > jif = boottime.tv_sec; > > - if (boottime.tv_nsec) > > - ++jif; > > > So we've gone from --jif to ++jif to no change at all. > > Are you sure that this net removal of --jif is correct? Yes. Let's say wall_to_monotonic = { -10, 50 } (which is { -9, -50 }, and the original code would result in - (- 10) - 1 == 9). The getboottime calls set_normalized_timespec on { - (-10), - (50) } which results to { 10 - 1, - 50 + 100 } = { 9, 50 }. tv_sec == 9 => correct. -- TJ. (Brno, CZ), BaseOS, Red Hat - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way
On Sat, 12 May 2007, Oleg Nesterov wrote: > > without task_lock() we can see "p->mm != NULL" but not PF_BORROWED_MM. Let me explain it one more time: - shouldn't the *caller* protect this? Afaik, there's two situations: - either things don't change (in which case you don't need locking at all, since things are statically one way or the other) - or things change (in which case the caller can't rely on the return value anyway, since they might change *after* you release the lock) ie what's up? Is there a third case? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way
On Sat, 12 May 2007, Rafael J. Wysocki wrote: > > We use this function (ie. kernel/power/process.c:is_user_space()) to > distinguish kernel threads from user space processes. Therefore we make it > always return true for user space processes and always return false for kernel > threads. In the latter case we need to use the task_lock() to ensure that the > result is as desired (ie. false), because otherwise it might be racing with > either fs/aio.c:use_mm() or fs/aio.c:unuse_mm(). But there is no race protection in the *caller*, so if it can ever return one or the other, what protects it from changing once the caller returns? And if the value can change (because some thread uses "use_mm()"), then the caller cannot rely on the value that got returned. So you migt as well not return any value at all, since the returned value is apparently meaningless once the lock has been released. In other words: "The lock, it does nothing". Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way
On Sat, 12 May 2007 01:22:06 +0200 "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > On Saturday, 12 May 2007 00:56, Linus Torvalds wrote: > > > > On Fri, 11 May 2007, Rafael J. Wysocki wrote: > > > > > > For user space processes this condition is always true. > > > > > > For kernel threads: > > > (1) the change of tsk->mm from NULL to a nonzero value is only made in > > > fs/aio.c:use_mm() along with the setting of PF_BORROWED_MM under > > > the task_lock(), > > > (2) the change of tsk->mm from a nonzero value to NULL is only made in > > > fs/aio.c:unuse_mm() along with the resetting of PF_BORROWED_MM > > > under the task_lock(). > > > Therefore, by taking the task_lock() here we make sure that the condition > > > is alyways false when we check it for kernel threads. > > > > Why *test* it then and return anything? > > > > Why not just doa "task_lock(p); task_unlock(p);" with no return value? > > > > As it is, it sounds like either the code is buggy, or it's pointless. > > I'm not sure what you mean. > > We use this function (ie. kernel/power/process.c:is_user_space()) to > distinguish kernel threads from user space processes. Therefore we make it > always return true for user space processes and always return false for kernel > threads. In the latter case we need to use the task_lock() to ensure that the > result is as desired (ie. false), because otherwise it might be racing with > either fs/aio.c:use_mm() or fs/aio.c:unuse_mm(). > ah, OK. static void use_mm(struct mm_struct *mm) { struct mm_struct *active_mm; struct task_struct *tsk = current; task_lock(tsk); tsk->flags |= PF_BORROWED_MM; active_mm = tsk->active_mm; atomic_inc(&mm->mm_count); tsk->mm = mm; tsk->active_mm = mm; /* * Note that on UML this *requires* PF_BORROWED_MM to be set, otherwise * it won't work. Update it accordingly if you change it here */ switch_mm(active_mm, mm, tsk); task_unlock(tsk); So is_user_space() requires that the state of p->mm and p->flags be consistent: it doesn't want to be looking at those two things in that three-statement window above. Good changelogging and commenting save quite a bit of time and email. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] module_author: don't advice putting in an email address
Rene Herman <[EMAIL PROTECTED]> writes: > /* Author, ideally of form NAME [, NAME ]*[ and NAME ] > > After my trivial patch, it says: > > /* Author, ideally of form NAME[, NAME]*[ and NAME] */ I think I would put something like this: /* Author, of form NAME[, NAME]*[ and NAME] * If you have a permanent email address and are prepared for maintaining/supporting the module, you may want to provide the address as well */ The wording isn't the best I suppose. I.e., the change would mean providing the address is not strictly required and the person should think when adding it, that's all. -- Krzysztof Halasa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] swsusp: Use platform mode by default
Hi! > > Just to clarify, the change in question isn't new. It was introduced by the > > commit 9185cfa92507d07ac787bc73d06c4eec7239 before 2.6.20, at Seife's > > request and with Pavel's acceptance. > > Ok, if it's that old, we migt as leave it in. Clearly there weren't many > regressions, and this isn't a case of other monsters lurking behind a lack > of testers. Ok, so what is the result? "platform" is the correct default, because it is as the spec said. Both were default in recent history, and neither is too horrible. So I'd prefer "platform" to be default, as it is correct. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] libata: add human-readable error value decoding
Robert Hancock wrote: The ATA ones are more of a pain in that regard than SCSI though - SCSI has all distinct error codes for different errors, whereas ATA has bitmasks for everything.. That should not affect implementation. Either way, a table-driven approach can easily work. I favor decoding the SError status bits, but your names were far too long. "ProtocolErr" should be "Proto". "10B8BErr" should be "10b8b". HostInternalErr to HostInt. PHYInternalErr to PHYInt. etc. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way
On 05/11, Linus Torvalds wrote: > > On Fri, 11 May 2007, Rafael J. Wysocki wrote: > > > > Therefore, by taking the task_lock() here we make sure that the condition > > is alyways false when we check it for kernel threads. > > Why *test* it then and return anything? > > Why not just doa "task_lock(p); task_unlock(p);" with no return value? because we should not freeze a kernel thread at FREEZER_USER_SPACE stage? without task_lock() we can see "p->mm != NULL" but not PF_BORROWED_MM. Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way
On Saturday, 12 May 2007 00:56, Linus Torvalds wrote: > > On Fri, 11 May 2007, Rafael J. Wysocki wrote: > > > > For user space processes this condition is always true. > > > > For kernel threads: > > (1) the change of tsk->mm from NULL to a nonzero value is only made in > > fs/aio.c:use_mm() along with the setting of PF_BORROWED_MM under > > the task_lock(), > > (2) the change of tsk->mm from a nonzero value to NULL is only made in > > fs/aio.c:unuse_mm() along with the resetting of PF_BORROWED_MM > > under the task_lock(). > > Therefore, by taking the task_lock() here we make sure that the condition > > is alyways false when we check it for kernel threads. > > Why *test* it then and return anything? > > Why not just doa "task_lock(p); task_unlock(p);" with no return value? > > As it is, it sounds like either the code is buggy, or it's pointless. I'm not sure what you mean. We use this function (ie. kernel/power/process.c:is_user_space()) to distinguish kernel threads from user space processes. Therefore we make it always return true for user space processes and always return false for kernel threads. In the latter case we need to use the task_lock() to ensure that the result is as desired (ie. false), because otherwise it might be racing with either fs/aio.c:use_mm() or fs/aio.c:unuse_mm(). Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] libata: add human-readable error value decoding
Tejun Heo wrote: Chuck Ebbert wrote: Robert Hancock wrote: + ehc->i.serror & SERR_TRANS_ST_ERROR ? "TransStatTransErr " : "", + ehc->i.serror & SERR_UNRECOG_FIS ? "UnrecogFIS " : "", + ehc->i.serror & SERR_DEV_XCHG ? "DevExchanged " : "" ); I'm not really convinced whether this is necessary. The human readable form is also a bit cryptic and can get quite long. So, mild NACK from me. It certainly seems useful when debugging hotplug issues or random SATA problems which end up being caused by communication problems. Without this output, Joe User stands no chance of figuring out what's going on, and neither does Joe libata Developer unless they really care to dig through the spec and count bits to figure out what they mean. At least with this you can see that there was a CRC error, etc. and go from that.. Why not just document the error messages? And the scsi ones too, I can't seem to find what the sense codes mean. They are well documented elsewhere - the standard documents. For sense codes, t10.org. For SError bits, t13.org. You can get drafts free of charge. The ATA ones are more of a pain in that regard than SCSI though - SCSI has all distinct error codes for different errors, whereas ATA has bitmasks for everything.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[announce] Intel announces the PowerTOP utility for Linux
What's eating the battery life of my laptop? Why isn't it many more hours? Which software component causes the most power to be burned? These are important questions without a good answer... until now. The Linux 2.6.21 kernel introduces the so called tickless-idle feature. This feature allows the processor to be really idle for long periods of time, rather than having to wake up every millisecond for the timer tick. Current processors save a lot of power if they are idle for long periods, which translates into a longer battery life for your laptop, or a lower energy bill for your datacenter. However, a Linux system consists of more software than just the kernel, and there are many tunables involved. It's not easy to see what is going on, and as a result the behavior is sometimes far from optimal, and a lot of power is wasted. Intel is proud to announce the PowerTOP tool (http://www.linuxpowertop.org), a program that collects the various pieces of information from your system and presents an overview of how well your laptop is doing in terms of power savings. In addition, PowerTOP will provide an indication of which tunables and software components are the biggest offenders in slurping up your battery time. PowerTOP will update it's display frequently so that you can directly see the impact of any changes you are making. A typical Linux distribution has many components that wake the processor up frequently for no good reason. In our testing with PowerTOP, we have seen many cases where with some simple fixes, the battery life of typical laptops was increased by one hour or more! We are providing fixes for several of the issues we identified, and we encourage the Linux community to help us in this quest to get the maximum battery life out of your (hopefully Intel based) laptops. Try the PowerTOP tool, join the mailing list or the IRC channel and provide feedback, problem reports or fixes! Website: http://www.linuxpowertop.org IRC: irc.oftc.net#powertop channel Mailing list: http://www.bughost.org/mailman/listinfo/power - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
-mm git tree
The git tree at git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git could be set up in a simpler way: $ git ls-remote git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git fc4b5be9e651d3e71b54541e0315fc82211b42b5refs/heads/option_export 59a1fe35614c3c937a4e8cb6e4a45f1d05544d9drefs/heads/v2.6.13-mm1 e3602088f81f66655ec6c62320d5c56839ffc02brefs/heads/v2.6.13-mm2 ... 05230bd16821e2ec80321d72e97e7a2b1a07c6f2refs/tags/master ... 5e1302f173f63c5c57c5de8b44152c30ae2a72c4refs/tags/v2.6.13-mm1 59a1fe35614c3c937a4e8cb6e4a45f1d05544d9drefs/tags/v2.6.13-mm1^{} a06c5a7b36cfb30345a9476cbaff02955483c4carefs/tags/v2.6.13-mm2 e3602088f81f66655ec6c62320d5c56839ffc02brefs/tags/v2.6.13-mm2^{} ... Would it be possible to remove the branches that exist for each individual version, and to change the "master" tag to a branch? Since git gives tag names priority over head names, fetching the above tag makes "master" refer to it instead of any local branch named "master". (I get particularly bizarre behavior with current git; after: git remote add mm git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git git fetch mm when I check out "master", it sets HEAD to refs/heads/master, but the index and working tree to refs/tags/master.) I think it may have been set up this way with the idea that a branch should only ever move "forward" in history, whereas tags could move around freely. But that's not really right--for something like -mm that's continually rewritten and rebased, it makes sense to have a "master" branch that skips around. The default git-remote setup on recent git is prepared to deal with this. And having a repository with 101 branches and counting, none of which every change, is awkward--if nothing else it makes the output of "git-branch -r" a little hard to read. --b. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] x86_64: use signalfd and timerfd compat syscalls
From: Heiko Carstens <[EMAIL PROTECTED]> Looks like these two are wired up in a wrong way. Cc: Davide Libenzi <[EMAIL PROTECTED]> Cc: Andi Kleen <[EMAIL PROTECTED]> Signed-off-by: Heiko Carstens <[EMAIL PROTECTED]> --- arch/x86_64/ia32/ia32entry.S |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) Index: linux-2.6/arch/x86_64/ia32/ia32entry.S === --- linux-2.6.orig/arch/x86_64/ia32/ia32entry.S +++ linux-2.6/arch/x86_64/ia32/ia32entry.S @@ -716,7 +716,7 @@ ia32_sys_call_table: .quad sys_getcpu .quad sys_epoll_pwait .quad compat_sys_utimensat /* 320 */ - .quad sys_signalfd - .quad sys_timerfd + .quad compat_sys_signalfd + .quad compat_sys_timerfd .quad sys_eventfd -ia32_syscall_end: +ia32_syscall_end: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] compat signalfd and timerfd are cond syscalls.
From: Heiko Carstens <[EMAIL PROTECTED]> Add missing cond_syscall statements for compat_sys_signalfd and compat_sys_timerfd. Cc: Davide Libenzi <[EMAIL PROTECTED]> Signed-off-by: Heiko Carstens <[EMAIL PROTECTED]> --- Index: linux-2.6/kernel/sys_ni.c === --- linux-2.6.orig/kernel/sys_ni.c +++ linux-2.6/kernel/sys_ni.c @@ -145,4 +145,6 @@ cond_syscall(sys_ioprio_get); /* New file descriptors */ cond_syscall(sys_signalfd); cond_syscall(sys_timerfd); +cond_syscall(compat_sys_signalfd); +cond_syscall(compat_sys_timerfd); cond_syscall(sys_eventfd); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] scalable rw_mutex
On 05/11, Peter Zijlstra wrote: > > +static inline int __rw_mutex_read_trylock(struct rw_mutex *rw_mutex) > +{ > + preempt_disable(); > + if (likely(!__rw_mutex_reader_slow(rw_mutex))) { --- WINDOW --- > + percpu_counter_mod(&rw_mutex->readers, 1); > + preempt_enable(); > + return 1; > + } > + preempt_enable(); > + return 0; > +} > > [...snip...] > > +void rw_mutex_write_lock_nested(struct rw_mutex *rw_mutex, int subclass) > +{ > [...snip...] > + > + /* > + * block new readers > + */ > + __rw_mutex_status_set(rw_mutex, RW_MUTEX_READER_SLOW); > + /* > + * wait for all readers to go away > + */ > + wait_event(rw_mutex->wait_queue, > + (percpu_counter_sum(&rw_mutex->readers) == 0)); > +} This look a bit suspicious, can't mutex_write_lock() set RW_MUTEX_READER_SLOW and find percpu_counter_sum() == 0 in that WINDOW above? > +void rw_mutex_read_unlock(struct rw_mutex *rw_mutex) > +{ > + rwsem_release(&rw_mutex->dep_map, 1, _RET_IP_); > + > + percpu_counter_mod(&rw_mutex->readers, -1); > + if (unlikely(__rw_mutex_reader_slow(rw_mutex)) && > + percpu_counter_sum(&rw_mutex->readers) == 0) > + wake_up_all(&rw_mutex->wait_queue); > +} The same. __rw_mutex_status_set()->wmb() in rw_mutex_write_lock below is not enough. percpu_counter_mod() doesn't take fbc->lock if < FBC_BATCH, so we don't have a proper serialization. write_lock() sets RW_MUTEX_READER_SLOW, finds percpu_counter_sum() != 0, and sleeps. rw_mutex_read_unlock() decrements cpu-local var, does not see RW_MUTEX_READER_SLOW and skips wake_up_all(). > +void rw_mutex_write_lock_nested(struct rw_mutex *rw_mutex, int subclass) > +{ > + might_sleep(); > + rwsem_acquire(&rw_mutex->dep_map, subclass, 0, _RET_IP_); > + > + mutex_lock_nested(&rw_mutex->write_mutex, subclass); > + mutex_lock_nested(&rw_mutex->read_mutex, subclass); > + > + /* > + * block new readers > + */ > + __rw_mutex_status_set(rw_mutex, RW_MUTEX_READER_SLOW); > + /* > + * wait for all readers to go away > + */ > + wait_event(rw_mutex->wait_queue, > + (percpu_counter_sum(&rw_mutex->readers) == 0)); > +} > + > +void rw_mutex_write_unlock(struct rw_mutex *rw_mutex) > +{ > + int waiters; > + > + rwsem_release(&rw_mutex->dep_map, 1, _RET_IP_); > + > + /* > + * let the readers rip > + */ > + __rw_mutex_status_set(rw_mutex, RW_MUTEX_READER_FAST); > + waiters = atomic_read(&rw_mutex->read_waiters); > + mutex_unlock(&rw_mutex->read_mutex); > + /* > + * wait for at least 1 reader to get through > + */ > + if (waiters) { > + wait_event(rw_mutex->wait_queue, > + (atomic_read(&rw_mutex->read_waiters) < waiters)); > + } > + /* > + * before we let the writers rip > + */ > + mutex_unlock(&rw_mutex->write_mutex); > +} Looks like we can have only one task on rw_mutex->wait_queue, and it holds ->write_mutex. Can't we use just a "task_struct *write_waiter" instead of ->wait_queue ? This makes rw_mutex smaller. Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/7] Freezer: Read PF_BORROWED_MM in a nonracy way
On Fri, 11 May 2007, Rafael J. Wysocki wrote: > > For user space processes this condition is always true. > > For kernel threads: > (1) the change of tsk->mm from NULL to a nonzero value is only made in > fs/aio.c:use_mm() along with the setting of PF_BORROWED_MM under > the task_lock(), > (2) the change of tsk->mm from a nonzero value to NULL is only made in > fs/aio.c:unuse_mm() along with the resetting of PF_BORROWED_MM > under the task_lock(). > Therefore, by taking the task_lock() here we make sure that the condition > is alyways false when we check it for kernel threads. Why *test* it then and return anything? Why not just doa "task_lock(p); task_unlock(p);" with no return value? As it is, it sounds like either the code is buggy, or it's pointless. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 6/7] Add common orderly_poweroff()
On Thu, 10 May 2007 16:57:14 -0700 Jeremy Fitzhardinge wrote: > --- a/kernel/sys.c > +++ b/kernel/sys.c > @@ -2208,3 +2208,61 @@ asmlinkage long sys_getcpu(unsigned __us > + > +/** > + * Trigger an orderly system poweroff * orderly_poweroff - Trigger an orderly system poweroff > + * @force: force poweroff if command execution fails > + * > + * This may be called from any context to trigger a system shutdown. > + * If the orderly shutdown fails, it will force an immediate shutdown. > + */ > +int orderly_poweroff(bool force) > +{ > + int argc; > + char **argv = argv_split(GFP_ATOMIC, poweroff_cmd, &argc); > + static char *envp[] = { > + "HOME=/", > + "PATH=/sbin:/bin:/usr/sbin:/usr/bin", > + NULL > + }; > + int ret = -ENOMEM; > + struct subprocess_info *info; > + > + if (argv == NULL) { > + printk(KERN_WARNING "%s failed to allocate memory for \"%s\"\n", > +__func__, poweroff_cmd); > + goto out; > + } > + > + info = call_usermodehelper_setup(argv[0], argv, envp); > + if (info == NULL) { > + argv_free(argv); > + goto out; > + } > + > + call_usermodehelper_setcleanup(info, argv_cleanup); > + > + ret = call_usermodehelper_exec(info, -1); > + > + out: > + if (ret && force) { > + printk(KERN_WARNING "Failed to start orderly shutdown: " > +"forcing the issue\n"); > + > + /* I guess this should try to kick off some daemon to > +sync and poweroff asap. Or not even bother syncing > +if we're doing an emergency shutdown? */ > + emergency_sync(); > + kernel_power_off(); > + } > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(orderly_poweroff); --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/7] add argv_split()
On Thu, 10 May 2007 16:57:12 -0700 Jeremy Fitzhardinge wrote: > --- /dev/null > +++ b/lib/argv_split.c > @@ -0,0 +1,159 @@ > + > +/** > + * argv_free - free an argv > + * extra "blank" line. > + * @argv - the argument vector to be freed > + * > + * Frees an argv and the strings it points to. > + */ > +void argv_free(char **argv) > +{ > + char **p; > + for (p = argv; *p; p++) > + kfree(*p); > + > + kfree(argv); > +} > +EXPORT_SYMBOL(argv_free); --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [bisect] NFS regression breaks X
ACK -- this regression was fixed by Trond's recent NFS bugfix push upstream. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] swsusp: Use platform mode by default
On Fri, 11 May 2007, Rafael J. Wysocki wrote: > > Just to clarify, the change in question isn't new. It was introduced by the > commit 9185cfa92507d07ac787bc73d06c4eec7239 before 2.6.20, at Seife's > request and with Pavel's acceptance. Ok, if it's that old, we migt as leave it in. Clearly there weren't many regressions, and this isn't a case of other monsters lurking behind a lack of testers. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][RESEND] PIE randomization
On 5/11/07, Andrew Morton <[EMAIL PROTECTED]> wrote: erm, I was being funny. If you randomize a binary it won't run any more. cp /dev/random /bin/login. Oh well. My point is, we're not being told what is being randomized here. Is it the virtual starting address of the main executable mmap? Of the shared libraries also? Is it the stack location? What? PIE = Position Independent Executable, that's how I named them. These are not regular executables, they are basically DSOs but usually compiled with -fpie/-fPIE instead of -fpic/-fPIC and linked with -pie instead of -shared to allow the compiled and linker perform more optimizations. See section 5 in http://people.redhat.com/drepper/nonselsec.pdf Jan unfortunately Ingo's document which doesn't really explain it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
NFS spews warnings on x86-64
Current git, on Fedora 6/x86-64: fs/nfs/read.c: In function ‘nfs_return_empty_page’: fs/nfs/read.c:82: warning: ‘memclear_highpage_flush’ is deprecated (declared at include/linux/highmem.h:115) fs/nfs/read.c: In function ‘nfs_readpage_truncate_uninitialised_page’: fs/nfs/read.c:106: warning: ‘memclear_highpage_flush’ is deprecated (declared at include/linux/highmem.h:115) fs/nfs/read.c:109: warning: ‘memclear_highpage_flush’ is deprecated (declared at include/linux/highmem.h:115) fs/nfs/read.c: In function ‘nfs_readpage_async’: fs/nfs/read.c:133: warning: ‘memclear_highpage_flush’ is deprecated (declared at include/linux/highmem.h:115) fs/nfs/read.c: In function ‘readpage_async_filler’: fs/nfs/read.c:535: warning: ‘memclear_highpage_flush’ is deprecated (declared at include/linux/highmem.h:115) fs/nfs/write.c: In function ‘nfs_mark_uptodate’: fs/nfs/write.c:171: warning: ‘memclear_highpage_flush’ is deprecated (declared at include/linux/highmem.h:115) fs/nfs/nfs4xdr.c: In function ‘decode_close’: fs/nfs/nfs4xdr.c:2900: warning: format ‘%u’ expects type ‘unsigned int’, but argument 4 has type ‘long unsigned int’ fs/nfs/nfs4xdr.c: In function ‘decode_lock’: fs/nfs/nfs4xdr.c:3189: warning: format ‘%u’ expects type ‘unsigned int’, but argument 4 has type ‘long unsigned int’ fs/nfs/nfs4xdr.c: In function ‘decode_locku’: fs/nfs/nfs4xdr.c:3212: warning: format ‘%u’ expects type ‘unsigned int’, but argument 4 has type ‘long unsigned int’ fs/nfs/nfs4xdr.c: In function ‘decode_open’: fs/nfs/nfs4xdr.c:3278: warning: format ‘%u’ expects type ‘unsigned int’, but argument 4 has type ‘long unsigned int’ fs/nfs/nfs4xdr.c: In function ‘decode_open_confirm’: fs/nfs/nfs4xdr.c:3305: warning: format ‘%u’ expects type ‘unsigned int’, but argument 4 has type ‘long unsigned int’ fs/nfs/nfs4xdr.c: In function ‘decode_open_downgrade’: fs/nfs/nfs4xdr.c:3318: warning: format ‘%u’ expects type ‘unsigned int’, but argument 4 has type ‘long unsigned int’ fs/nfs/nfs4xdr.c: In function ‘decode_setclientid’: fs/nfs/nfs4xdr.c:3593: warning: format ‘%u’ expects type ‘unsigned int’, but argument 4 has type ‘long unsigned int’ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] libata: fallback to the other IDENTIFY on device error, take#2
Tejun Heo wrote: + if (class == ATA_DEV_ATA) + class = ATA_DEV_ATAPI; + else + class = ATA_DEV_ATA; the 'else' branch is obviously redundant - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] libata: fallback to the other IDENTIFY on device error, take#2
Tejun Heo wrote: It seems the world isn't as frank as we thought and some devices lie about who they are. Fallback to the other IDENTIFY if IDENTIFY is aborted by the device. As this is the strategy used by IDE for a long time, it shouldn't cause too much problem. Signed-off-by: Tejun Heo <[EMAIL PROTECTED]> Cc: William Thompson <[EMAIL PROTECTED]> --- Updated to fallback iff IDENTIFY is aborted by the device. applied - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 0/2] [PATCH] input: correctly handle keys without hardware release event
Hi, This patch adds a soft release key mask to input_dev, to enable keyboard drivers to determine which keys never generate a hardware release event and hence add a release event after every press event of such keys. The mask is controlled by ioctls. The Fn+F? key combinations of Dell Latitude series laptops (and possibly other Dells or other brands) only generate a key press event and never a key release event, which is most probable a hardware flaw (or feature?). Due to this flaw, combinations like Fn+F1 for hibernate and Fn-F3 for showing battery status cannot be used. Ubuntu has probably fixed this by patching the X input layer and HAL, but other distributions (like Debian) cannot use these keys. This patch adds a generic method to signal if keys with certain scancodes never generate release events, so the keyboard driver can add those events right after a key press event. The ioctls used to read and write to this bitmask might be used in a program like setkeycodes, which is normally used to map certain scancodes to keycodes. With a command line option, this program could also set the soft release bit for a certain scancode if desired. Patches for setkeycodes and getkeycodes against the Debian console-tools can be found at http://giel.operation0.org/keyboard-soft-release This patch also uses the infrastructure for generating release events for KEY_HANGEUL and KEY_HANJA, something which was already done in atkbd.c. See also this thread: http://thread.gmane.org/gmane.linux.kernel/401378 Greetings, Giel -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/