Re: [ext3][kernels >= 2.6.20.7 at least] KDE going comatose when FS is under heavy write load (massive starvation)
On Fri, Apr 27 2007, Linus Torvalds wrote: > So I do believe that we could probably do something about the IO > scheduling _too_: > > - break up large write requests (yeah, it will make for worse IO >throughput, but if make it configurable, and especially with >controllers that don't have insane overheads per command, the >difference between 128kB requests and 16MB requests is probably not >really even noticeable - SCSI things with large per-command overheads >are just stupid) > >Generating huge requests will automatically mean that they are >"unbreakable" from an IO scheduler perspective, so it's bad for latency >for other reqeusts once they've started. Overlooked this one initially... We actually don't generate huge requests, exactly because of that. Even if the device can do large requests (most SATA disks today can do 32meg), we default to 512kB as the largest one that we will build due to file system requests. It's trivial to reduce that limit, see /sys/block//queue/max_sectors_kb. That controls the maximum per-request size. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/16] raid acceleration and asynchronous offload api for 2.6.22
Dan Williams wrote: I am pleased to release this latest spin of the raid acceleration patches for merge consideration. This release aims to address all pending review items including MD bug fixes and async_tx api changes from Neil, and concerns on channel management from Chris and others. Data integrity tests using home grown scripts and 'iozone -V' are passing. I am open to suggestions for additional testing criteria. Do you have performance numbers? -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 14/22] pollfs: pollable futex
Eric Dumazet wrote: > Davi Arnaut a écrit : >> Eric Dumazet wrote: >>> Davi Arnaut a écrit : Asynchronously wait for FUTEX_WAKE operation on a futex if it still contains a given value. There can be only one futex wait per file descriptor. However, it can be rearmed (possibly at a different address) anytime. The pollable futex approach is far superior (send and receive events from userspace or kernel) to eventfd and fixes (supercedes) FUTEX_FD at the same time. Building block for pollable semaphores and user-defined events. Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> + +struct futex_event { + union { + void __user *addr; + u64 padding; + }; + int val; +}; >>> Hum... Here we might have a problem with 64 bit futexes, or private futexes >>> >>> So I believe this interface is not well defined and not expandable: in case >>> of >>> future additions to futexes, an old application compiled with an old >>> pollable >>> futex_event type might fail. >>> >> Hmm, how about: >> >> struct futex_event { >> union { >> void __user *addr; >> u64 padding; >> }; >> union { >> int val; >> s64 val64; >> }; >> /* whatever room is necessary for future improvements */ >> }; >> >> I haven't been keeping up with 64 bit or private futexes. What else >> could probably go wrong? > > Well, that's the point : This interface is like an ioctl() one : pretty bad > if > not properly designed :) I was merely mirroring the futex syscall arguments for FUTEX_WAIT. Will those change? I hope not :) > You probably need to stick one field containing one command or version > number, > something like that. I'm a bit skeptical that we need versioning for such a simple operation (command) as FUTEX_WAIT that takes an address and a value. > > > struct futex_event { > int type; > union { > void __user *addr; > u64 padding; > }; > union { > int val; > s64 val64; > }; > }; > > #define FUTEX_EVENT_SHARED32 1 > #define FUTEX_EVENT_SHARED64 2 > #define FUTEX_EVENT_PRIVATE32 (128|1) > #define FUTEX_EVENT_PRIVATE64 (128|2) I will take a look at the private futexes patches before commenting further. > ... > > Also, you should take care of alignements constraints (a 32bit user program > might run on a 64bit kernel) > Compat code? or futex alignements constraints? -- Davi Arnaut - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 01/10] compiler: define __attribute_unused__
On Tue, 1 May 2007 23:41:34 -0700 (PDT) David Rientjes <[EMAIL PROTECTED]> wrote: > compiler: define __maybe_unused > > Define __maybe_unused to apply to both functions or variables as > __attribute__((unused)). This will not emit a compile-time warning when > a function or variable is declared but unreferenced. > > We eventually want to change the name of __attribute_used__ to __used. > > Signed-off-by: David Rientjes <[EMAIL PROTECTED]> > --- > include/linux/compiler-gcc.h |1 + > 1 files changed, 1 insertions(+), 0 deletions(-) > > diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h > --- a/include/linux/compiler-gcc.h > +++ b/include/linux/compiler-gcc.h > @@ -37,3 +37,4 @@ > #define noinline__attribute__((noinline)) > #define __attribute_pure__ __attribute__((pure)) > #define __attribute_const__ __attribute__((__const__)) > +#define __maybe_unused __attribute__((unused)) Seems sane to me. We'd need a definition in compiler-intel.h too. I don't know if ICC implements __attribute__((unused)) - probably it does. I guess we can get by without any commentary describing __maybe_unused, but I think __used would need one - it's pretty obscure. [EMAIL PROTECTED] @code{used} attribute. [EMAIL PROTECTED] used +This attribute, attached to a function, means that code must be emitted +for the function even if it appears that the function is not referenced. +This is useful, for example, when the function is referenced only in +inline assembly. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Natsemi DP83815 driver spaming
On 5/1/07, Rafal Bilski <[EMAIL PROTECTED]> wrote: 2.6.21.1 is first kernel which I'm using at this device. Earlier it was WindowsCE terminal. It is hardware fault. Commenting out the code is my way to avoid "wakeup" messages in log, but I don't want to change anything in vanilla kernel. I'm lucky that NIC is working at all. I'm not sure what the right answer is. The code was designed to do the right thing, and yet in your case it's broken. Does it need to be a build option to work around broken hardware? Yuck. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 31/32] xen: --- drivers/net/xen-netfront.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
Chris Wright wrote: > It simply maps directly to the patch queue. We do go back and fold > things in and that should probably be done again, I agree. > Yeah, I've folded them all up now. Tracking xen-unstable is going to be tricker though. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 31/32] xen: --- drivers/net/xen-netfront.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
* Herbert Xu ([EMAIL PROTECTED]) wrote: > Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote: > > === > > --- a/drivers/net/xen-netfront.c > > +++ b/drivers/net/xen-netfront.c > > @@ -1213,10 +1213,10 @@ static int netif_poll(struct net_device > > Any reason why xen-netfront isn't just in a single patch? It makes > it a bit hard to review having it scattered around like this. It simply maps directly to the patch queue. We do go back and fold things in and that should probably be done again, I agree. thanks, -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 01/10] compiler: define __attribute_unused__
Andrew Morton wrote: On Tue, 1 May 2007 22:53:52 -0700 (PDT) David Rientjes <[EMAIL PROTECTED]> wrote: On Wed, 2 May 2007, Alexey Dobriyan wrote: On Tue, May 01, 2007 at 09:28:18PM -0700, David Rientjes wrote: +#define __attribute_unused__ __attribute__((unused)) Suggest __unused which is shorter and looks compiler-neutral. So you would also suggest renaming __attribute_used__ and all 48 of its uses to __used? Or __needed or __unneeded. None of them mean much to me and I'd be forever going back to the definition to work out what was intended. We're still in search of a name, IMO. But once we have it, yeah, we should update all present users. We can do that over time: retain the old and new definitions for a while. maybe_unused? The used attribute IMO is a bit easier to parse, so I don't think that needs to be renamed. Regarding the used vs needed thing, I don't think needed adds very much and deviates from gcc terminology. Presumably if something is used it is needed, and vice versa; similarly for unused. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v8
* Mike Galbraith <[EMAIL PROTECTED]> wrote: > > As usual, any sort of feedback, bugreport, fix and suggestion is > > more than welcome, > > Greetings, > > I noticed a (harmless) bounds warning triggered by the reduction in > size of array->bitmap. Patchlet below. thanks, applied! Your patch should also speed up task selection of RT tasks a bit. (the patch removes ~40 bytes of code). And on 64-bit we now fit into 2x 64-bit bitmap and thus are now down to two Find-First-Set instructions. Nice :) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 01/10] compiler: define __attribute_unused__
On Wed, 2 May 2007, Rusty Russell wrote: > OTOH, your point about "__unneeded" is well taken. "__needed" and > "__optional" perhaps? But their feature is *exactly* that the don't > look like the gcc attributes, hence avoid their semantic screwage. > Hmm, __optional doesn't sound appropriate either. Since this is going to be defined to be __attribute__ ((unused)), it can apply to both functions and variables. It should be applied to a function if it truly is unreferenced within the tree (and there are several examples of this current HEAD) and we don't want to use __needed because it still emits the function code even though it suppresses the warning. So saying a function that has no callers is "__optional" makes no sense since its code isn't going to be emitted in gcc >=3.4. What's your opinion of my __needed and __maybe_unused idea such as the following? compiler: define __maybe_unused Define __maybe_unused to apply to both functions or variables as __attribute__((unused)). This will not emit a compile-time warning when a function or variable is declared but unreferenced. We eventually want to change the name of __attribute_used__ to __used. Signed-off-by: David Rientjes <[EMAIL PROTECTED]> --- include/linux/compiler-gcc.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -37,3 +37,4 @@ #define noinline __attribute__((noinline)) #define __attribute_pure__ __attribute__((pure)) #define __attribute_const____attribute__((__const__)) +#define __maybe_unused __attribute__((unused)) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 14/22] pollfs: pollable futex
Davi Arnaut a écrit : Eric Dumazet wrote: Davi Arnaut a écrit : Asynchronously wait for FUTEX_WAKE operation on a futex if it still contains a given value. There can be only one futex wait per file descriptor. However, it can be rearmed (possibly at a different address) anytime. The pollable futex approach is far superior (send and receive events from userspace or kernel) to eventfd and fixes (supercedes) FUTEX_FD at the same time. Building block for pollable semaphores and user-defined events. Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> --- fs/pollfs/Makefile |1 fs/pollfs/futex.c | 154 + init/Kconfig |7 ++ 3 files changed, 162 insertions(+) Index: linux-2.6/fs/pollfs/Makefile === --- linux-2.6.orig/fs/pollfs/Makefile +++ linux-2.6/fs/pollfs/Makefile @@ -3,3 +3,4 @@ pollfs-y := file.o pollfs-$(CONFIG_POLLFS_SIGNAL) += signal.o pollfs-$(CONFIG_POLLFS_TIMER) += timer.o +pollfs-$(CONFIG_POLLFS_FUTEX) += futex.o Index: linux-2.6/fs/pollfs/futex.c === --- /dev/null +++ linux-2.6/fs/pollfs/futex.c @@ -0,0 +1,154 @@ +/* + * pollable futex + * + * Copyright (C) 2007 Davi E. M. Arnaut + * + * Licensed under the GNU GPL. See the file COPYING for details. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +struct futex_event { + union { + void __user *addr; + u64 padding; + }; + int val; +}; Hum... Here we might have a problem with 64 bit futexes, or private futexes So I believe this interface is not well defined and not expandable: in case of future additions to futexes, an old application compiled with an old pollable futex_event type might fail. Hmm, how about: struct futex_event { union { void __user *addr; u64 padding; }; union { int val; s64 val64; }; /* whatever room is necessary for future improvements */ }; I haven't been keeping up with 64 bit or private futexes. What else could probably go wrong? Well, that's the point : This interface is like an ioctl() one : pretty bad if not properly designed :) You probably need to stick one field containing one command or version number, something like that. struct futex_event { int type; union { void __user *addr; u64 padding; }; union { int val; s64 val64; }; }; #define FUTEX_EVENT_SHARED32 1 #define FUTEX_EVENT_SHARED64 2 #define FUTEX_EVENT_PRIVATE32 (128|1) #define FUTEX_EVENT_PRIVATE64 (128|2) ... Also, you should take care of alignements constraints (a 32bit user program might run on a 64bit kernel) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v8
On Tue, 2007-05-01 at 23:22 +0200, Ingo Molnar wrote: > i'm pleased to announce release -v8 of the CFS scheduler patchset. (The > main goal of CFS is to implement "desktop scheduling" with as high > quality as technically possible.) ... > As usual, any sort of feedback, bugreport, fix and suggestion is more > than welcome, Greetings, I noticed a (harmless) bounds warning triggered by the reduction in size of array->bitmap. Patchlet below. -Mike CC kernel/sched.o kernel/sched_rt.c: In function ‘load_balance_start_rt’: include/asm-generic/bitops/sched.h:30: warning: array subscript is above array bounds kernel/sched_rt.c: In function ‘pick_next_task_rt’: include/asm-generic/bitops/sched.h:30: warning: array subscript is above array bounds --- linux-2.6.21-cfs.v8/include/asm-generic/bitops/sched.h.org 2007-05-02 07:16:47.0 +0200 +++ linux-2.6.21-cfs.v8/include/asm-generic/bitops/sched.h 2007-05-02 07:20:45.0 +0200 @@ -6,28 +6,23 @@ /* * Every architecture must define this function. It's the fastest - * way of searching a 140-bit bitmap where the first 100 bits are - * unlikely to be set. It's guaranteed that at least one of the 140 - * bits is cleared. + * way of searching a 100-bit bitmap. It's guaranteed that at least + * one of the 100 bits is cleared. */ static inline int sched_find_first_bit(const unsigned long *b) { #if BITS_PER_LONG == 64 - if (unlikely(b[0])) + if (b[0]) return __ffs(b[0]); - if (likely(b[1])) - return __ffs(b[1]) + 64; - return __ffs(b[2]) + 128; + return __ffs(b[1]) + 64; #elif BITS_PER_LONG == 32 - if (unlikely(b[0])) + if (b[0]) return __ffs(b[0]); - if (unlikely(b[1])) + if (b[1]) return __ffs(b[1]) + 32; - if (unlikely(b[2])) + if (b[2]) return __ffs(b[2]) + 64; - if (b[3]) - return __ffs(b[3]) + 96; - return __ffs(b[4]) + 128; + return __ffs(b[3]) + 96; #else #error BITS_PER_LONG not defined #endif - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Natsemi DP83815 driver spaming
[...] > >> With code commented out I have 1 error / 3 transmitted packets from >> DP83815C. I have 1 error / 10 transmitted packets to DP83815C. Maybe >> it works at all because I have short cable, only 10m long. >> I don't remember any errors with plain 2.6.21.1. Sorry. I mean transmition errors, but of course log was full of "wakeup" messages. > Well, I certainly haven't changed anything in there. If the behavior > has changed in recent kernels, check the rest of the diffs. > > Tim 2.6.21.1 is first kernel which I'm using at this device. Earlier it was WindowsCE terminal. It is hardware fault. Commenting out the code is my way to avoid "wakeup" messages in log, but I don't want to change anything in vanilla kernel. I'm lucky that NIC is working at all. Thank You Rafał -- NIE KUPUJ!!! ...zanim nie porownasz cen >> http://link.interia.pl/f1a5e - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 31/32] xen: --- drivers/net/xen-netfront.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote: > === > --- a/drivers/net/xen-netfront.c > +++ b/drivers/net/xen-netfront.c > @@ -1213,10 +1213,10 @@ static int netif_poll(struct net_device Any reason why xen-netfront isn't just in a single patch? It makes it a bit hard to review having it scattered around like this. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 01/10] compiler: define __attribute_unused__
On Tue, 2007-05-01 at 23:06 -0700, David Rientjes wrote: > On Wed, 2 May 2007, Rusty Russell wrote: > > > Adding this macro doesn't give us anything that simply saying > > "__attribute__((unused))" doesn't give. But it does add a layer of > > kernel-specific indirection. > > > > That's obviously true since we're defining __attribute_unused__ to be > __attribute__((unused)). Hi David, I'm horribly familiar with this issue, BTW, so we don't need so many words 8) > The patched version makes this: > > int type __attribute_unused__ = 0; > > which definitely tells you that you're using a compiler attribute that > will be attached to that automatic. In your case: > > int type __unneeded = 0; > > doesn't say anything in this case. It doesn't resemble any attribute that > a programmer might be familiar with and begs the question of why we've > declared it if it's truly "unneeded"? Your version makes one wonder why they didn't use "__attribute__((unused))". Obviously the __attribute_unused__ macro exists for a reason, so they wonder what's the difference between that and the attribute? The answer: nothing. OTOH, your point about "__unneeded" is well taken. "__needed" and "__optional" perhaps? But their feature is *exactly* that the don't look like the gcc attributes, hence avoid their semantic screwage. > By the way, there are tons of these instances where __attribute__((used)) > needs to be added in driver code to suppress unreferenced warnings. Sure; historically we refactor around it. But warnings are now so commonplace few people care 8( Cheers, Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 01/10] compiler: define __attribute_unused__
On Tue, May 01, 2007 at 10:53:52PM -0700, David Rientjes wrote: >On Wed, 2 May 2007, Alexey Dobriyan wrote: > >> On Tue, May 01, 2007 at 09:28:18PM -0700, David Rientjes wrote: >> > +#define __attribute_unused__ __attribute__((unused)) >> >> Suggest __unused which is shorter and looks compiler-neutral. >> > >So you would also suggest renaming __attribute_used__ and all 48 of its >uses to __used? I suggest. ;-p '__attribute_unused__' is really long. I would prefer '__attribute__((unused))', since your macro doesn't let me type much less characters. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] edd: Switch to refcounting PCI APIs
On Mon, 23 Apr 2007 14:52:55 +0100 Alan Cox <[EMAIL PROTECTED]> wrote: > Signed-off-by: Alan Cox <[EMAIL PROTECTED]> > > diff -u --new-file --recursive --exclude-from /usr/src/exclude > linux.vanilla-2.6.21-rc6-mm1/drivers/firmware/edd.c > linux-2.6.21-rc6-mm1/drivers/firmware/edd.c > --- linux.vanilla-2.6.21-rc6-mm1/drivers/firmware/edd.c 2007-04-12 > 14:14:43.0 +0100 > +++ linux-2.6.21-rc6-mm1/drivers/firmware/edd.c 2007-04-23 > 11:50:57.185158272 +0100 > @@ -669,7 +669,7 @@ > struct edd_info *info = edd_dev_get_info(edev); > > if (edd_dev_is_type(edev, "PCI")) { > - return pci_find_slot(info->params.interface_path.pci.bus, > + return pci_get_slot(info->params.interface_path.pci.bus, > > PCI_DEVFN(info->params.interface_path.pci.slot, > info->params.interface_path.pci. > function)); > @@ -682,9 +682,12 @@ > { > > struct pci_dev *pci_dev = edd_get_pci_dev(edev); > + int ret; > if (!pci_dev) > return 1; > - return sysfs_create_link(&edev->kobj,&pci_dev->dev.kobj,"pci_dev"); > + ret = sysfs_create_link(&edev->kobj,&pci_dev->dev.kobj,"pci_dev"); > + pci_dev_put(pci_dev); > + return ret; > } > This escaped notice: drivers/firmware/edd.c: In function 'edd_get_pci_dev': drivers/firmware/edd.c:673: warning: passing argument 1 of 'pci_get_slot' makes pointer from integer without a cast But this didn't: Calling initcall 0xc0534e00: edd_init+0x0/0x2c0() BIOS EDD facility v0.16 2004-Jun-25, 6 devices found BUG: unable to handle kernel NULL pointer dereference at virtual address 0014 printing eip: c029ed16 *pde = Oops: [#1] SMP Modules linked in: CPU:1 EIP:0060:[]Not tainted VLI EFLAGS: 00010286 (2.6.21-mm1 #2) EIP is at pci_get_slot+0x26/0x90 eax: c04e9280 ebx: ecx: 0204 edx: 0001 esi: 0020 edi: c0499789 ebp: c242ff30 esp: c242ff18 ds: 007b es: 007b fs: 00d8 gs: ss: 0068 Process swapper (pid: 1, ti=c242e000 task=c242d550 task.ti=c242e000) Stack: c04ff358 000d c242ff30 c01b8b6f c326ec0c c0568ac1 c242ff70 c053505e c326ec18 c04a2d9a 0081 0006 0001 c326ec18 c0568a92 c0568a92 c242ffe0 c05185c2 Call Trace: [] show_trace_log_lvl+0x1a/0x30 [] show_stack_log_lvl+0xa9/0xd0 [] show_registers+0x1e9/0x2f0 [] die+0x10f/0x240 [] do_page_fault+0x2d9/0x610 [] error_code+0x72/0x78 [] edd_init+0x25e/0x2c0 [] kernel_init+0x122/0x2f0 [] kernel_thread_helper+0x7/0x14 === Code: 5d c3 8d 76 00 55 89 e5 56 89 d6 53 89 c3 83 ec 10 89 e0 25 00 e0 ff ff f7 40 14 00 ff ff 0f 75 46 b8 80 92 4e c0 e8 2a 7b e9 ff <8b> 43 14 8d 4b 14 eb 04 89 f6 89 d0 8b 10 0f 18 02 90 39 c8 74 EIP: [] pci_get_slot+0x26/0x90 SS:ESP 0068:c242ff18 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 16/16] iop3xx: Surface the iop3xx DMA and AAU units to the iop-adma driver
Adds the platform device definitions and the architecture specific support routines (i.e. register initialization and descriptor formats) for the iop-adma driver. Changelog: * add support for > 1k zero sum buffer sizes * added dma/aau platform devices to iq80321 and iq80332 setup * fixed the calculation in iop_desc_is_aligned * support xor buffer sizes larger than 16MB * fix places where software descriptors are assumed to be contiguous, only hardware descriptors are contiguous for up to a PAGE_SIZE buffer size * convert to async_tx * add interrupt support * add platform devices for 80219 boards * do not call platform register macros in driver code * remove switch() statements for compatible register offsets/layouts * change over to bitmap based capabilities Cc: Russell King <[EMAIL PROTECTED]> Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- arch/arm/mach-iop32x/glantank.c|2 arch/arm/mach-iop32x/iq31244.c |5 arch/arm/mach-iop32x/iq80321.c |3 arch/arm/mach-iop32x/n2100.c |2 arch/arm/mach-iop33x/iq80331.c |3 arch/arm/mach-iop33x/iq80332.c |3 arch/arm/plat-iop/Makefile |2 arch/arm/plat-iop/adma.c | 216 include/asm-arm/arch-iop32x/adma.h |5 include/asm-arm/arch-iop33x/adma.h |5 include/asm-arm/hardware/iop3xx-adma.h | 893 include/asm-arm/hardware/iop3xx.h | 68 -- 12 files changed, 1147 insertions(+), 60 deletions(-) diff --git a/arch/arm/mach-iop32x/glantank.c b/arch/arm/mach-iop32x/glantank.c index 45f4f13..2e0099b 100644 --- a/arch/arm/mach-iop32x/glantank.c +++ b/arch/arm/mach-iop32x/glantank.c @@ -180,6 +180,8 @@ static void __init glantank_init_machine(void) platform_device_register(&iop3xx_i2c1_device); platform_device_register(&glantank_flash_device); platform_device_register(&glantank_serial_device); + platform_device_register(&iop3xx_dma_0_channel); + platform_device_register(&iop3xx_dma_1_channel); pm_power_off = glantank_power_off; } diff --git a/arch/arm/mach-iop32x/iq31244.c b/arch/arm/mach-iop32x/iq31244.c index 60e7430..c0d077c 100644 --- a/arch/arm/mach-iop32x/iq31244.c +++ b/arch/arm/mach-iop32x/iq31244.c @@ -295,9 +295,14 @@ static void __init iq31244_init_machine(void) platform_device_register(&iop3xx_i2c1_device); platform_device_register(&iq31244_flash_device); platform_device_register(&iq31244_serial_device); + platform_device_register(&iop3xx_dma_0_channel); + platform_device_register(&iop3xx_dma_1_channel); if (is_ep80219()) pm_power_off = ep80219_power_off; + + if (!is_80219()) + platform_device_register(&iop3xx_aau_channel); } static int __init force_ep80219_setup(char *str) diff --git a/arch/arm/mach-iop32x/iq80321.c b/arch/arm/mach-iop32x/iq80321.c index 361c70c..474ec2a 100644 --- a/arch/arm/mach-iop32x/iq80321.c +++ b/arch/arm/mach-iop32x/iq80321.c @@ -180,6 +180,9 @@ static void __init iq80321_init_machine(void) platform_device_register(&iop3xx_i2c1_device); platform_device_register(&iq80321_flash_device); platform_device_register(&iq80321_serial_device); + platform_device_register(&iop3xx_dma_0_channel); + platform_device_register(&iop3xx_dma_1_channel); + platform_device_register(&iop3xx_aau_channel); } MACHINE_START(IQ80321, "Intel IQ80321") diff --git a/arch/arm/mach-iop32x/n2100.c b/arch/arm/mach-iop32x/n2100.c index 5f07344..8e6fe13 100644 --- a/arch/arm/mach-iop32x/n2100.c +++ b/arch/arm/mach-iop32x/n2100.c @@ -245,6 +245,8 @@ static void __init n2100_init_machine(void) platform_device_register(&iop3xx_i2c0_device); platform_device_register(&n2100_flash_device); platform_device_register(&n2100_serial_device); + platform_device_register(&iop3xx_dma_0_channel); + platform_device_register(&iop3xx_dma_1_channel); pm_power_off = n2100_power_off; diff --git a/arch/arm/mach-iop33x/iq80331.c b/arch/arm/mach-iop33x/iq80331.c index 1a9e361..b4d12bf 100644 --- a/arch/arm/mach-iop33x/iq80331.c +++ b/arch/arm/mach-iop33x/iq80331.c @@ -135,6 +135,9 @@ static void __init iq80331_init_machine(void) platform_device_register(&iop33x_uart0_device); platform_device_register(&iop33x_uart1_device); platform_device_register(&iq80331_flash_device); + platform_device_register(&iop3xx_dma_0_channel); + platform_device_register(&iop3xx_dma_1_channel); + platform_device_register(&iop3xx_aau_channel); } MACHINE_START(IQ80331, "Intel IQ80331") diff --git a/arch/arm/mach-iop33x/iq80332.c b/arch/arm/mach-iop33x/iq80332.c index 96d6f0f..2abb2d8 100644 --- a/arch/arm/mach-iop33x/iq80332.c +++ b/arch/arm/mach-iop33x/iq80332.c @@ -135,6 +135,9 @@ static void __init iq80332_init_machine(void) platform_device_register(&iop33x_uar
Re: [patch 01/10] compiler: define __attribute_unused__
On Tue, 1 May 2007, David Rientjes wrote: > The patched version makes this: > > int type __attribute_unused__ = 0; > > which definitely tells you that you're using a compiler attribute that > will be attached to that automatic. In your case: > > int type __unneeded = 0; > > doesn't say anything in this case. It doesn't resemble any attribute that > a programmer might be familiar with and begs the question of why we've > declared it if it's truly "unneeded"? > One possible way to remedy this situation is with __needed and __maybe_unneeded. __needed would be defined to __attribute__ ((used)), which would apply to functions only and specify that its code needs to be emitted anyway even though it appears to be unreferenced. This is needed for gcc >=3.4. In gcc <3.4, this gets defined to be __attribute__ ((unused)) to simply suppress the warning. So now all functions that are unreferenced except in inline assembly get __needed appended. __maybe_unneeded would be defined to __attribute__ ((unused)). It can apply to either functions or variables to suppress the warning if they are unused. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 15/16] iop13xx: Surface the iop13xx adma units to the iop-adma driver
Adds the platform device definitions and the architecture specific support routines (i.e. register initialization and descriptor formats) for the iop-adma driver. Changelog: * added 'descriptor pool size' to the platform data * add base support for buffer sizes larger than 16MB (hw max) * build error fix from Kirill A. Shutemov * rebase for async_tx changes * add interrupt support * do not call platform register macros in driver code Cc: Russell King <[EMAIL PROTECTED]> Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- arch/arm/mach-iop13xx/setup.c | 208 include/asm-arm/arch-iop13xx/adma.h| 545 include/asm-arm/arch-iop13xx/iop13xx.h | 34 +- 3 files changed, 766 insertions(+), 21 deletions(-) diff --git a/arch/arm/mach-iop13xx/setup.c b/arch/arm/mach-iop13xx/setup.c index 9a46bcd..662d1e2 100644 --- a/arch/arm/mach-iop13xx/setup.c +++ b/arch/arm/mach-iop13xx/setup.c @@ -25,6 +25,7 @@ #include #include #include +#include #define IOP13XX_UART_XTAL 4000 #define IOP13XX_SETUP_DEBUG 0 @@ -236,6 +237,129 @@ static unsigned long iq8134x_probe_flash_size(void) } #endif +/* ADMA Channels */ +static struct resource iop13xx_adma_0_resources[] = { + [0] = { + .start = IOP13XX_ADMA_PHYS_BASE(0), + .end = IOP13XX_ADMA_UPPER_PA(0), + .flags = IORESOURCE_MEM, + }, + [1] = { + .start = IRQ_IOP13XX_ADMA0_EOT, + .end = IRQ_IOP13XX_ADMA0_EOT, + .flags = IORESOURCE_IRQ + }, + [2] = { + .start = IRQ_IOP13XX_ADMA0_EOC, + .end = IRQ_IOP13XX_ADMA0_EOC, + .flags = IORESOURCE_IRQ + }, + [3] = { + .start = IRQ_IOP13XX_ADMA0_ERR, + .end = IRQ_IOP13XX_ADMA0_ERR, + .flags = IORESOURCE_IRQ + } +}; + +static struct resource iop13xx_adma_1_resources[] = { + [0] = { + .start = IOP13XX_ADMA_PHYS_BASE(1), + .end = IOP13XX_ADMA_UPPER_PA(1), + .flags = IORESOURCE_MEM, + }, + [1] = { + .start = IRQ_IOP13XX_ADMA1_EOT, + .end = IRQ_IOP13XX_ADMA1_EOT, + .flags = IORESOURCE_IRQ + }, + [2] = { + .start = IRQ_IOP13XX_ADMA1_EOC, + .end = IRQ_IOP13XX_ADMA1_EOC, + .flags = IORESOURCE_IRQ + }, + [3] = { + .start = IRQ_IOP13XX_ADMA1_ERR, + .end = IRQ_IOP13XX_ADMA1_ERR, + .flags = IORESOURCE_IRQ + } +}; + +static struct resource iop13xx_adma_2_resources[] = { + [0] = { + .start = IOP13XX_ADMA_PHYS_BASE(2), + .end = IOP13XX_ADMA_UPPER_PA(2), + .flags = IORESOURCE_MEM, + }, + [1] = { + .start = IRQ_IOP13XX_ADMA2_EOT, + .end = IRQ_IOP13XX_ADMA2_EOT, + .flags = IORESOURCE_IRQ + }, + [2] = { + .start = IRQ_IOP13XX_ADMA2_EOC, + .end = IRQ_IOP13XX_ADMA2_EOC, + .flags = IORESOURCE_IRQ + }, + [3] = { + .start = IRQ_IOP13XX_ADMA2_ERR, + .end = IRQ_IOP13XX_ADMA2_ERR, + .flags = IORESOURCE_IRQ + } +}; + +static u64 iop13xx_adma_dmamask = DMA_64BIT_MASK; +static struct iop_adma_platform_data iop13xx_adma_0_data = { + .hw_id = 0, + .pool_size = PAGE_SIZE, +}; + +static struct iop_adma_platform_data iop13xx_adma_1_data = { + .hw_id = 1, + .pool_size = PAGE_SIZE, +}; + +static struct iop_adma_platform_data iop13xx_adma_2_data = { + .hw_id = 2, + .pool_size = PAGE_SIZE, +}; + +/* The ids are fixed up later in iop13xx_platform_init */ +static struct platform_device iop13xx_adma_0_channel = { + .name = "iop-adma", + .id = 0, + .num_resources = 4, + .resource = iop13xx_adma_0_resources, + .dev = { + .dma_mask = &iop13xx_adma_dmamask, + .coherent_dma_mask = DMA_64BIT_MASK, + .platform_data = (void *) &iop13xx_adma_0_data, + }, +}; + +static struct platform_device iop13xx_adma_1_channel = { + .name = "iop-adma", + .id = 0, + .num_resources = 4, + .resource = iop13xx_adma_1_resources, + .dev = { + .dma_mask = &iop13xx_adma_dmamask, + .coherent_dma_mask = DMA_64BIT_MASK, + .platform_data = (void *) &iop13xx_adma_1_data, + }, +}; + +static struct platform_device iop13xx_adma_2_channel = { + .name = "iop-adma", + .id = 0, + .num_resources = 4, + .resource = iop13xx_adma_2_resources, + .dev = { + .dma_mask = &iop13xx_adma_dmamask, + .coherent_dma_mask = DMA_64BIT_MASK, + .platform_data = (void *) &iop13xx_adma_2_data, + }, +}; + void __init iop13xx_map_io(void) { /* Initialize the Sta
[PATCH 13/16] md: remove raid5 compute_block and compute_parity5
replaced by raid5_run_ops Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- drivers/md/raid5.c | 124 1 files changed, 0 insertions(+), 124 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index c9b91e3..74ce354 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -1501,130 +1501,6 @@ static void copy_data(int frombio, struct bio *bio, } \ } while(0) - -static void compute_block(struct stripe_head *sh, int dd_idx) -{ - int i, count, disks = sh->disks; - void *ptr[MAX_XOR_BLOCKS], *dest, *p; - - PRINTK("compute_block, stripe %llu, idx %d\n", - (unsigned long long)sh->sector, dd_idx); - - dest = page_address(sh->dev[dd_idx].page); - memset(dest, 0, STRIPE_SIZE); - count = 0; - for (i = disks ; i--; ) { - if (i == dd_idx) - continue; - p = page_address(sh->dev[i].page); - if (test_bit(R5_UPTODATE, &sh->dev[i].flags)) - ptr[count++] = p; - else - printk(KERN_ERR "compute_block() %d, stripe %llu, %d" - " not present\n", dd_idx, - (unsigned long long)sh->sector, i); - - check_xor(); - } - if (count) - xor_block(count, STRIPE_SIZE, dest, ptr); - set_bit(R5_UPTODATE, &sh->dev[dd_idx].flags); -} - -static void compute_parity5(struct stripe_head *sh, int method) -{ - raid5_conf_t *conf = sh->raid_conf; - int i, pd_idx = sh->pd_idx, disks = sh->disks, count; - void *ptr[MAX_XOR_BLOCKS], *dest; - struct bio *chosen; - - PRINTK("compute_parity5, stripe %llu, method %d\n", - (unsigned long long)sh->sector, method); - - count = 0; - dest = page_address(sh->dev[pd_idx].page); - switch(method) { - case READ_MODIFY_WRITE: - BUG_ON(!test_bit(R5_UPTODATE, &sh->dev[pd_idx].flags)); - for (i=disks ; i-- ;) { - if (i==pd_idx) - continue; - if (sh->dev[i].towrite && - test_bit(R5_UPTODATE, &sh->dev[i].flags)) { - ptr[count++] = page_address(sh->dev[i].page); - chosen = sh->dev[i].towrite; - sh->dev[i].towrite = NULL; - - if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags)) - wake_up(&conf->wait_for_overlap); - - BUG_ON(sh->dev[i].written); - sh->dev[i].written = chosen; - check_xor(); - } - } - break; - case RECONSTRUCT_WRITE: - memset(dest, 0, STRIPE_SIZE); - for (i= disks; i-- ;) - if (i!=pd_idx && sh->dev[i].towrite) { - chosen = sh->dev[i].towrite; - sh->dev[i].towrite = NULL; - - if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags)) - wake_up(&conf->wait_for_overlap); - - BUG_ON(sh->dev[i].written); - sh->dev[i].written = chosen; - } - break; - case CHECK_PARITY: - break; - } - if (count) { - xor_block(count, STRIPE_SIZE, dest, ptr); - count = 0; - } - - for (i = disks; i--;) - if (sh->dev[i].written) { - sector_t sector = sh->dev[i].sector; - struct bio *wbi = sh->dev[i].written; - while (wbi && wbi->bi_sector < sector + STRIPE_SECTORS) { - copy_data(1, wbi, sh->dev[i].page, sector); - wbi = r5_next_bio(wbi, sector); - } - - set_bit(R5_LOCKED, &sh->dev[i].flags); - set_bit(R5_UPTODATE, &sh->dev[i].flags); - } - - switch(method) { - case RECONSTRUCT_WRITE: - case CHECK_PARITY: - for (i=disks; i--;) - if (i != pd_idx) { - ptr[count++] = page_address(sh->dev[i].page); - check_xor(); - } - break; - case READ_MODIFY_WRITE: - for (i = disks; i--;) - if (sh->dev[i].written) { - ptr[count++] = page_address(sh->dev[i].page); - check_xor(); - }
[PATCH 14/16] dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines
This is a driver for the iop DMA/AAU/ADMA units which are capable of pq_xor, pq_update, pq_zero_sum, xor, dual_xor, xor_zero_sum, fill, copy+crc, and copy operations. Changelog: * fixed a slot allocation bug in do_iop13xx_adma_xor that caused too few slots to be requested eventually leading to data corruption * enabled the slot allocation routine to attempt to free slots before returning -ENOMEM * switched the cleanup routine to solely use the software chain and the status register to determine if a descriptor is complete. This is necessary to support other IOP engines that do not have status writeback capability * make the driver iop generic * modified the allocation routines to understand allocating a group of slots for a single operation * added a null xor initialization operation for the xor only channel on iop3xx * support xor operations on buffers larger than the hardware maximum * split the do_* routines into separate prep, src/dest set, submit stages * added async_tx support (dependent operations initiation at cleanup time) * simplified group handling * added interrupt support (callbacks via tasklets) * brought the pending depth inline with ioat (i.e. 4 descriptors) * drop dma mapping methods, suggested by Chris Leech * don't use inline in C files, Adrian Bunk * remove static tasklet declarations * make iop_adma_alloc_slots easier to read and remove chances for a corrupted descriptor chain * fix locking bug in iop_adma_alloc_chan_resources, Benjamin Herrenschmidt * convert capabilities over to dma_cap_mask_t Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- drivers/dma/Kconfig |8 drivers/dma/Makefile|1 drivers/dma/iop-adma.c | 1464 +++ include/asm-arm/hardware/iop_adma.h | 121 +++ 4 files changed, 1594 insertions(+), 0 deletions(-) diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index 292ddad..1c2ae4e 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -40,4 +40,12 @@ config INTEL_IOATDMA default m ---help--- Enable support for the Intel(R) I/OAT DMA engine. + +config INTEL_IOP_ADMA +tristate "Intel IOP ADMA support" +depends on DMA_ENGINE && (ARCH_IOP32X || ARCH_IOP33X || ARCH_IOP13XX) +default m +---help--- + Enable support for the Intel(R) IOP Series RAID engines. + endmenu diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile index 6a99341..8ebf10d 100644 --- a/drivers/dma/Makefile +++ b/drivers/dma/Makefile @@ -1,4 +1,5 @@ obj-$(CONFIG_DMA_ENGINE) += dmaengine.o obj-$(CONFIG_NET_DMA) += iovlock.o obj-$(CONFIG_INTEL_IOATDMA) += ioatdma.o +obj-$(CONFIG_INTEL_IOP_ADMA) += iop-adma.o obj-$(CONFIG_ASYNC_TX_DMA) += async_tx.o xor.o diff --git a/drivers/dma/iop-adma.c b/drivers/dma/iop-adma.c new file mode 100644 index 000..0d85f12 --- /dev/null +++ b/drivers/dma/iop-adma.c @@ -0,0 +1,1464 @@ +/* + * Copyright(c) 2006 Intel Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 + * Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * The full GNU General Public License is included in this distribution in the + * file called COPYING. + */ + +/* + * This driver supports the asynchrounous DMA copy and RAID engines available + * on the Intel Xscale(R) family of I/O Processors (IOP 32x, 33x, 134x) + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define to_iop_adma_chan(chan) container_of(chan, struct iop_adma_chan, common) +#define to_iop_adma_device(dev) container_of(dev, struct iop_adma_device, common) +#define tx_to_iop_adma_slot(tx) container_of(tx, struct iop_adma_desc_slot, async_tx) + +/** + * iop_adma_free_slots - flags descriptor slots for reuse + * @slot: Slot to free + * Caller must hold &iop_chan->lock while calling this function + */ +static void iop_adma_free_slots(struct iop_adma_desc_slot *slot) +{ + int stride = slot->slots_per_op; + + while (stride--) { + slot->slots_per_op = 0; + slot = list_entry(slot->slot_node.next, + struct iop_adma_desc_slot, + slot_node); + } +} + +static dma_cookie_t +iop_adma_run_tx_complete_actions(struct iop_adma_desc_slot *desc, + stru
[PATCH 12/16] md: move raid5 io requests to raid5_run_ops
handle_stripe now only updates the state of stripes. All execution of operations is moved to raid5_run_ops. Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- drivers/md/raid5.c | 68 1 files changed, 10 insertions(+), 58 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 1966713..c9b91e3 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -2388,6 +2388,8 @@ static void handle_stripe5(struct stripe_head *sh) PRINTK("Read_old block %d for r-m-w\n", i); set_bit(R5_LOCKED, &dev->flags); set_bit(R5_Wantread, &dev->flags); + if (!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending)) + sh->ops.count++; locked++; } else { set_bit(STRIPE_DELAYED, &sh->state); @@ -2408,6 +2410,8 @@ static void handle_stripe5(struct stripe_head *sh) PRINTK("Read_old block %d for Reconstruct\n", i); set_bit(R5_LOCKED, &dev->flags); set_bit(R5_Wantread, &dev->flags); + if (!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending)) + sh->ops.count++; locked++; } else { set_bit(STRIPE_DELAYED, &sh->state); @@ -2506,6 +2510,8 @@ static void handle_stripe5(struct stripe_head *sh) set_bit(R5_LOCKED, &dev->flags); set_bit(R5_Wantwrite, &dev->flags); + if (!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending)) + sh->ops.count++; clear_bit(STRIPE_DEGRADED, &sh->state); locked++; set_bit(STRIPE_INSYNC, &sh->state); @@ -2527,12 +2533,16 @@ static void handle_stripe5(struct stripe_head *sh) dev = &sh->dev[failed_num]; if (!test_bit(R5_ReWrite, &dev->flags)) { set_bit(R5_Wantwrite, &dev->flags); + if (!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending)) + sh->ops.count++; set_bit(R5_ReWrite, &dev->flags); set_bit(R5_LOCKED, &dev->flags); locked++; } else { /* let's read it back */ set_bit(R5_Wantread, &dev->flags); + if (!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending)) + sh->ops.count++; set_bit(R5_LOCKED, &dev->flags); locked++; } @@ -2642,64 +2652,6 @@ static void handle_stripe5(struct stripe_head *sh) test_bit(BIO_UPTODATE, &bi->bi_flags) ? 0 : -EIO); } - for (i=disks; i-- ;) { - int rw; - struct bio *bi; - mdk_rdev_t *rdev; - if (test_and_clear_bit(R5_Wantwrite, &sh->dev[i].flags)) - rw = WRITE; - else if (test_and_clear_bit(R5_Wantread, &sh->dev[i].flags)) - rw = READ; - else - continue; - - bi = &sh->dev[i].req; - - bi->bi_rw = rw; - if (rw == WRITE) - bi->bi_end_io = raid5_end_write_request; - else - bi->bi_end_io = raid5_end_read_request; - - rcu_read_lock(); - rdev = rcu_dereference(conf->disks[i].rdev); - if (rdev && test_bit(Faulty, &rdev->flags)) - rdev = NULL; - if (rdev) - atomic_inc(&rdev->nr_pending); - rcu_read_unlock(); - - if (rdev) { - if (syncing || expanding || expanded) - md_sync_acct(rdev->bdev, STRIPE_SECTORS); - - bi->bi_bdev = rdev->bdev; - PRINTK("for %llu schedule op %ld on disc %d\n", - (unsigned long long)sh->sector, bi->bi_rw, i); - atomic_inc(&sh->count); - bi->bi_sector = sh->sector + rdev->data_offset; - bi->bi_flags = 1 << BIO_UPTODATE; - bi->bi_vcnt = 1; - bi->bi_max_vecs
[PATCH 10/16] md: satisfy raid5 read requests via raid5_run_ops
Use raid5_run_ops to carry out the memory copies for a raid5 read request. Changelog: * cleanup to_read and to_fill accounting * do not fail reads that have reached the cache Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- drivers/md/raid5.c | 61 ++-- 1 files changed, 30 insertions(+), 31 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index f8a4522..6bde174 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -1998,7 +1998,7 @@ static void handle_stripe5(struct stripe_head *sh) int i; int syncing, expanding, expanded; int locked=0, uptodate=0, to_read=0, to_write=0, failed=0, written=0; - int compute=0, req_compute=0, non_overwrite=0; + int to_fill=0, compute=0, req_compute=0, non_overwrite=0; int failed_num=0; struct r5dev *dev; unsigned long pending=0; @@ -2022,37 +2022,29 @@ static void handle_stripe5(struct stripe_head *sh) dev = &sh->dev[i]; clear_bit(R5_Insync, &dev->flags); - PRINTK("check %d: state 0x%lx read %p write %p written %p\n", - i, dev->flags, dev->toread, dev->towrite, dev->written); - /* maybe we can reply to a read */ - if (test_bit(R5_UPTODATE, &dev->flags) && dev->toread) { - struct bio *rbi, *rbi2; - PRINTK("Return read for disc %d\n", i); - spin_lock_irq(&conf->device_lock); - rbi = dev->toread; - dev->toread = NULL; - if (test_and_clear_bit(R5_Overlap, &dev->flags)) - wake_up(&conf->wait_for_overlap); - spin_unlock_irq(&conf->device_lock); - while (rbi && rbi->bi_sector < dev->sector + STRIPE_SECTORS) { - copy_data(0, rbi, dev->page, dev->sector); - rbi2 = r5_next_bio(rbi, dev->sector); - spin_lock_irq(&conf->device_lock); - if (--rbi->bi_phys_segments == 0) { - rbi->bi_next = return_bi; - return_bi = rbi; - } - spin_unlock_irq(&conf->device_lock); - rbi = rbi2; - } - } + PRINTK("check %d: state 0x%lx toread %p read %p write %p written %p\n", + i, dev->flags, dev->toread, dev->read, dev->towrite, dev->written); + + /* maybe we can request a biofill operation +* +* new wantfill requests are only permitted while +* STRIPE_OP_BIOFILL is clear +*/ + if (test_bit(R5_UPTODATE, &dev->flags) && dev->toread && + !test_bit(STRIPE_OP_BIOFILL, &sh->ops.pending)) + set_bit(R5_Wantfill, &dev->flags); /* now count some things */ if (test_bit(R5_LOCKED, &dev->flags)) locked++; if (test_bit(R5_UPTODATE, &dev->flags)) uptodate++; + + if (test_bit(R5_Wantfill, &dev->flags)) + to_fill++; + else if (dev->toread) + to_read++; + if (test_bit(R5_Wantcompute, &dev->flags)) BUG_ON(++compute > 1); - if (dev->toread) to_read++; if (dev->towrite) { to_write++; if (!test_bit(R5_OVERWRITE, &dev->flags)) @@ -2073,9 +2065,13 @@ static void handle_stripe5(struct stripe_head *sh) set_bit(R5_Insync, &dev->flags); } rcu_read_unlock(); + + if (to_fill && !test_and_set_bit(STRIPE_OP_BIOFILL, &sh->ops.pending)) + sh->ops.count++; + PRINTK("locked=%d uptodate=%d to_read=%d" - " to_write=%d failed=%d failed_num=%d\n", - locked, uptodate, to_read, to_write, failed, failed_num); + " to_write=%d to_fill=%d failed=%d failed_num=%d\n", + locked, uptodate, to_read, to_write, to_fill, failed, failed_num); /* check if the array has lost two devices and, if so, some requests might * need to be failed */ @@ -2127,9 +2123,12 @@ static void handle_stripe5(struct stripe_head *sh) bi = bi2; } - /* fail any reads if this device is non-operational */ - if (!test_bit(R5_Insync, &sh->dev[i].flags) || - test_bit(R5_ReadError, &sh->dev[i].flags)) { + /* fail any reads if this device is non-operational and +* the data has not reached the cache yet. +*/ +
[PATCH 11/16] md: use async_tx and raid5_run_ops for raid5 expansion operations
The parity calculation for an expansion operation is the same as the calculation performed at the end of a write with the caveat that all blocks in the stripe are scheduled to be written. An expansion operation is identified as a stripe with the POSTXOR flag set and the BIODRAIN flag not set. The bulk copy operation to the new stripe is handled inline by async_tx. Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- drivers/md/raid5.c | 48 1 files changed, 36 insertions(+), 12 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 6bde174..1966713 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -2538,18 +2538,32 @@ static void handle_stripe5(struct stripe_head *sh) } } - if (expanded && test_bit(STRIPE_EXPANDING, &sh->state)) { - /* Need to write out all blocks after computing parity */ - sh->disks = conf->raid_disks; - sh->pd_idx = stripe_to_pdidx(sh->sector, conf, conf->raid_disks); - compute_parity5(sh, RECONSTRUCT_WRITE); + /* Finish postxor operations initiated by the expansion +* process +*/ + if (test_bit(STRIPE_OP_POSTXOR, &sh->ops.complete) && + !test_bit(STRIPE_OP_BIODRAIN, &sh->ops.pending)) { + + clear_bit(STRIPE_EXPANDING, &sh->state); + + clear_bit(STRIPE_OP_POSTXOR, &sh->ops.pending); + clear_bit(STRIPE_OP_POSTXOR, &sh->ops.ack); + clear_bit(STRIPE_OP_POSTXOR, &sh->ops.complete); + for (i= conf->raid_disks; i--;) { - set_bit(R5_LOCKED, &sh->dev[i].flags); - locked++; set_bit(R5_Wantwrite, &sh->dev[i].flags); + if (!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending)) + sh->ops.count++; } - clear_bit(STRIPE_EXPANDING, &sh->state); - } else if (expanded) { + } + + if (expanded && test_bit(STRIPE_EXPANDING, &sh->state) && + !test_bit(STRIPE_OP_POSTXOR, &sh->ops.pending)) { + /* Need to write out all blocks after computing parity */ + sh->disks = conf->raid_disks; + sh->pd_idx = stripe_to_pdidx(sh->sector, conf, conf->raid_disks); + locked += handle_write_operations5(sh, 0, 1); + } else if (expanded && !test_bit(STRIPE_OP_POSTXOR, &sh->ops.pending)) { clear_bit(STRIPE_EXPAND_READY, &sh->state); atomic_dec(&conf->reshape_stripes); wake_up(&conf->wait_for_overlap); @@ -2560,6 +2574,7 @@ static void handle_stripe5(struct stripe_head *sh) /* We have read all the blocks in this stripe and now we need to * copy some of them into a target stripe for expand. */ + struct dma_async_tx_descriptor *tx = NULL; clear_bit(STRIPE_EXPAND_SOURCE, &sh->state); for (i=0; i< sh->disks; i++) if (i != sh->pd_idx) { @@ -2583,9 +2598,12 @@ static void handle_stripe5(struct stripe_head *sh) release_stripe(sh2); continue; } - memcpy(page_address(sh2->dev[dd_idx].page), - page_address(sh->dev[i].page), - STRIPE_SIZE); + + /* place all the copies on one channel */ + tx = async_memcpy(sh2->dev[dd_idx].page, + sh->dev[i].page, 0, 0, STRIPE_SIZE, + ASYNC_TX_DEP_ACK, tx, NULL, NULL); + set_bit(R5_Expanded, &sh2->dev[dd_idx].flags); set_bit(R5_UPTODATE, &sh2->dev[dd_idx].flags); for (j=0; jraid_disks; j++) @@ -2597,6 +2615,12 @@ static void handle_stripe5(struct stripe_head *sh) set_bit(STRIPE_HANDLE, &sh2->state); } release_stripe(sh2); + + /* done submitting copies, wait for them to complete */ + if (i + 1 >= sh->disks) { + async_tx_ack(tx); + dma_wait_for_async_tx(tx); + } } } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 09/16] md: move raid5 parity checks to raid5_run_ops
handle_stripe sets STRIPE_OP_CHECK to request a check operation in raid5_run_ops. If raid5_run_ops is able to perform the check with a dma engine the parity will be preserved in memory removing the need to re-read it from disk, as is necessary in the synchronous case. 'Repair' operations re-use the same logic as compute block, with the caveat that the results of the compute block are immediately written back to the parity disk. To differentiate these operations the STRIPE_OP_MOD_REPAIR_PD flag is added. Changelog: * remove test_and_set/test_and_clear BUG_ONs, Neil Brown Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- drivers/md/raid5.c | 80 1 files changed, 61 insertions(+), 19 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 844bd9b..f8a4522 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -2430,32 +2430,74 @@ static void handle_stripe5(struct stripe_head *sh) locked += handle_write_operations5(sh, rcw == 0, 0); } - /* maybe we need to check and possibly fix the parity for this stripe -* Any reads will already have been scheduled, so we just see if enough data -* is available + /* 1/ Maybe we need to check and possibly fix the parity for this stripe. +*Any reads will already have been scheduled, so we just see if enough data +*is available. +* 2/ Hold off parity checks while parity dependent operations are in flight +*(conflicting writes are protected by the 'locked' variable) */ - if (syncing && locked == 0 && - !test_bit(STRIPE_INSYNC, &sh->state)) { + if ((syncing && locked == 0 && !test_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.pending) && + !test_bit(STRIPE_INSYNC, &sh->state)) || + test_bit(STRIPE_OP_CHECK, &sh->ops.pending) || + test_bit(STRIPE_OP_MOD_REPAIR_PD, &sh->ops.pending)) { + set_bit(STRIPE_HANDLE, &sh->state); - if (failed == 0) { - BUG_ON(uptodate != disks); - compute_parity5(sh, CHECK_PARITY); - uptodate--; - if (page_is_zero(sh->dev[sh->pd_idx].page)) { - /* parity is correct (on disc, not in buffer any more) */ - set_bit(STRIPE_INSYNC, &sh->state); - } else { - conf->mddev->resync_mismatches += STRIPE_SECTORS; - if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery)) - /* don't try to repair!! */ + /* Take one of the following actions: +* 1/ start a check parity operation if (uptodate == disks) +* 2/ finish a check parity operation and act on the result +* 3/ skip to the writeback section if we previously +*initiated a recovery operation +*/ + if (failed == 0 && !test_bit(STRIPE_OP_MOD_REPAIR_PD, &sh->ops.pending)) { + if (!test_and_set_bit(STRIPE_OP_CHECK, &sh->ops.pending)) { + BUG_ON(uptodate != disks); + clear_bit(R5_UPTODATE, &sh->dev[sh->pd_idx].flags); + sh->ops.count++; + uptodate--; + } else if (test_and_clear_bit(STRIPE_OP_CHECK, &sh->ops.complete)) { + clear_bit(STRIPE_OP_CHECK, &sh->ops.ack); + clear_bit(STRIPE_OP_CHECK, &sh->ops.pending); + + if (sh->ops.zero_sum_result == 0) + /* parity is correct (on disc, not in buffer any more) */ set_bit(STRIPE_INSYNC, &sh->state); else { - compute_block(sh, sh->pd_idx); - uptodate++; + conf->mddev->resync_mismatches += STRIPE_SECTORS; + if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery)) + /* don't try to repair!! */ + set_bit(STRIPE_INSYNC, &sh->state); + else { + set_bit(STRIPE_OP_COMPUTE_BLK, + &sh->ops.pending); + set_bit(STRIPE_OP_MOD_REPAIR_PD, + &sh->ops.pending); + set_bit(R5_Wantcompute, + &sh->dev[sh-
[PATCH 08/16] md: move raid5 compute block operations to raid5_run_ops
handle_stripe sets STRIPE_OP_COMPUTE_BLK to request servicing from raid5_run_ops. It also sets a flag for the block being computed to let other parts of handle_stripe submit dependent operations. raid5_run_ops guarantees that the compute operation completes before any dependent operation starts. Changelog: * remove the req_compute BUG_ON Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- drivers/md/raid5.c | 126 +++- 1 files changed, 94 insertions(+), 32 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 03a435d..844bd9b 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -1998,7 +1998,7 @@ static void handle_stripe5(struct stripe_head *sh) int i; int syncing, expanding, expanded; int locked=0, uptodate=0, to_read=0, to_write=0, failed=0, written=0; - int non_overwrite = 0; + int compute=0, req_compute=0, non_overwrite=0; int failed_num=0; struct r5dev *dev; unsigned long pending=0; @@ -2050,8 +2050,8 @@ static void handle_stripe5(struct stripe_head *sh) /* now count some things */ if (test_bit(R5_LOCKED, &dev->flags)) locked++; if (test_bit(R5_UPTODATE, &dev->flags)) uptodate++; + if (test_bit(R5_Wantcompute, &dev->flags)) BUG_ON(++compute > 1); - if (dev->toread) to_read++; if (dev->towrite) { to_write++; @@ -2206,31 +2206,83 @@ static void handle_stripe5(struct stripe_head *sh) * parity, or to satisfy requests * or to load a block that is being partially written. */ - if (to_read || non_overwrite || (syncing && (uptodate < disks)) || expanding) { - for (i=disks; i--;) { - dev = &sh->dev[i]; - if (!test_bit(R5_LOCKED, &dev->flags) && !test_bit(R5_UPTODATE, &dev->flags) && - (dev->toread || -(dev->towrite && !test_bit(R5_OVERWRITE, &dev->flags)) || -syncing || -expanding || -(failed && (sh->dev[failed_num].toread || -(sh->dev[failed_num].towrite && !test_bit(R5_OVERWRITE, &sh->dev[failed_num].flags - ) - ) { - /* we would like to get this block, possibly -* by computing it, but we might not be able to + if (to_read || non_overwrite || (syncing && (uptodate + compute < disks)) || expanding || + test_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.pending)) { + + /* Clear completed compute operations. Parity recovery +* (STRIPE_OP_MOD_REPAIR_PD) implies a write-back which is handled +* later on in this routine +*/ + if (test_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.complete) && + !test_bit(STRIPE_OP_MOD_REPAIR_PD, &sh->ops.pending)) { + clear_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.complete); + clear_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.ack); + clear_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.pending); + } + + /* look for blocks to read/compute, skip this if a compute +* is already in flight, or if the stripe contents are in the +* midst of changing due to a write +*/ + if (!test_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.pending) && + !test_bit(STRIPE_OP_PREXOR, &sh->ops.pending) && + !test_bit(STRIPE_OP_POSTXOR, &sh->ops.pending)) { + for (i=disks; i--;) { + dev = &sh->dev[i]; + + /* don't schedule compute operations or reads on +* the parity block while a check is in flight */ - if (uptodate == disks-1) { - PRINTK("Computing block %d\n", i); - compute_block(sh, i); - uptodate++; - } else if (test_bit(R5_Insync, &dev->flags)) { - set_bit(R5_LOCKED, &dev->flags); - set_bit(R5_Wantread, &dev->flags); - locked++; - PRINTK("Reading block %d (sync=%d)\n", - i, syncing); + if ((i == sh->pd_idx) && test_bit(STRIPE_OP_CHECK, &sh->ops.pending)) + continue; + +
[PATCH 07/16] md: move write operations to raid5_run_ops
handle_stripe sets STRIPE_OP_PREXOR, STRIPE_OP_BIODRAIN, STRIPE_OP_POSTXOR to request a write to the stripe cache. raid5_run_ops is triggerred to run and executes the request outside the stripe lock. Changelog: * make the 'rcw' parameter to handle_write_operations5 a simple flag, Neil Brown * remove test_and_set/test_and_clear BUG_ONs, Neil Brown Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- drivers/md/raid5.c | 151 +--- 1 files changed, 130 insertions(+), 21 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 14e9f6a..03a435d 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -1807,7 +1807,74 @@ static void compute_block_2(struct stripe_head *sh, int dd_idx1, int dd_idx2) } } +static int handle_write_operations5(struct stripe_head *sh, int rcw, int expand) +{ + int i, pd_idx = sh->pd_idx, disks = sh->disks; + int locked=0; + + if (rcw) { + /* skip the drain operation on an expand */ + if (!expand) { + set_bit(STRIPE_OP_BIODRAIN, &sh->ops.pending); + sh->ops.count++; + } + + set_bit(STRIPE_OP_POSTXOR, &sh->ops.pending); + sh->ops.count++; + + for (i=disks ; i-- ;) { + struct r5dev *dev = &sh->dev[i]; + + if (dev->towrite) { + set_bit(R5_LOCKED, &dev->flags); + if (!expand) + clear_bit(R5_UPTODATE, &dev->flags); + locked++; + } + } + } else { + BUG_ON(!(test_bit(R5_UPTODATE, &sh->dev[pd_idx].flags) || + test_bit(R5_Wantcompute, &sh->dev[pd_idx].flags))); + + set_bit(STRIPE_OP_PREXOR, &sh->ops.pending); + set_bit(STRIPE_OP_BIODRAIN, &sh->ops.pending); + set_bit(STRIPE_OP_POSTXOR, &sh->ops.pending); + + sh->ops.count += 3; + + for (i=disks ; i-- ;) { + struct r5dev *dev = &sh->dev[i]; + if (i==pd_idx) + continue; + /* For a read-modify write there may be blocks that are +* locked for reading while others are ready to be written +* so we distinguish these blocks by the R5_Wantprexor bit +*/ + if (dev->towrite && + (test_bit(R5_UPTODATE, &dev->flags) || + test_bit(R5_Wantcompute, &dev->flags))) { + set_bit(R5_Wantprexor, &dev->flags); + set_bit(R5_LOCKED, &dev->flags); + clear_bit(R5_UPTODATE, &dev->flags); + locked++; + } + } + } + + /* keep the parity disk locked while asynchronous operations +* are in flight +*/ + set_bit(R5_LOCKED, &sh->dev[pd_idx].flags); + clear_bit(R5_UPTODATE, &sh->dev[pd_idx].flags); + locked++; + + PRINTK("%s: stripe %llu locked: %d pending: %lx\n", + __FUNCTION__, (unsigned long long)sh->sector, + locked, sh->ops.pending); + + return locked; +} /* * Each stripe/dev can have one or more bion attached. @@ -2170,8 +2237,67 @@ static void handle_stripe5(struct stripe_head *sh) set_bit(STRIPE_HANDLE, &sh->state); } - /* now to consider writing and what else, if anything should be read */ - if (to_write) { + /* Now we check to see if any write operations have recently +* completed +*/ + + /* leave prexor set until postxor is done, allows us to distinguish +* a rmw from a rcw during biodrain +*/ + if (test_bit(STRIPE_OP_PREXOR, &sh->ops.complete) && + test_bit(STRIPE_OP_POSTXOR, &sh->ops.complete)) { + + clear_bit(STRIPE_OP_PREXOR, &sh->ops.complete); + clear_bit(STRIPE_OP_PREXOR, &sh->ops.ack); + clear_bit(STRIPE_OP_PREXOR, &sh->ops.pending); + + for (i=disks; i--;) + clear_bit(R5_Wantprexor, &sh->dev[i].flags); + } + + /* if only POSTXOR is set then this is an 'expand' postxor */ + if (test_bit(STRIPE_OP_BIODRAIN, &sh->ops.complete) && + test_bit(STRIPE_OP_POSTXOR, &sh->ops.complete)) { + + clear_bit(STRIPE_OP_BIODRAIN, &sh->ops.complete); + clear_bit(STRIPE_OP_BIODRAIN, &sh->ops.ack); + clear_bit(STRIPE_OP_BIODRAIN, &sh->ops.pending); + + clear_bit(STRIPE_OP_POSTXOR, &sh->ops.complete); + clear_bit(STRIPE_OP_POSTXOR, &sh->ops.ack); + clear_bit(STRIPE_OP
[PATCH 06/16] md: use raid5_run_ops for stripe cache operations
Each stripe has three flag variables to reflect the state of operations (pending, ack, and complete). -pending: set to request servicing in raid5_run_ops -ack: set to reflect that raid5_runs_ops has seen this request -complete: set when the operation is complete and it is ok for handle_stripe5 to clear 'pending' and 'ack'. Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- drivers/md/raid5.c | 65 +--- 1 files changed, 56 insertions(+), 9 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 0251bca..14e9f6a 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -126,6 +126,7 @@ static void __release_stripe(raid5_conf_t *conf, struct stripe_head *sh) } md_wakeup_thread(conf->mddev->thread); } else { + BUG_ON(sh->ops.pending); if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) { atomic_dec(&conf->preread_active_stripes); if (atomic_read(&conf->preread_active_stripes) < IO_THRESHOLD) @@ -225,7 +226,8 @@ static void init_stripe(struct stripe_head *sh, sector_t sector, int pd_idx, int BUG_ON(atomic_read(&sh->count) != 0); BUG_ON(test_bit(STRIPE_HANDLE, &sh->state)); - + BUG_ON(sh->ops.pending || sh->ops.ack || sh->ops.complete); + CHECK_DEVLOCK(); PRINTK("init_stripe called, stripe %llu\n", (unsigned long long)sh->sector); @@ -241,11 +243,11 @@ static void init_stripe(struct stripe_head *sh, sector_t sector, int pd_idx, int for (i = sh->disks; i--; ) { struct r5dev *dev = &sh->dev[i]; - if (dev->toread || dev->towrite || dev->written || + if (dev->toread || dev->read || dev->towrite || dev->written || test_bit(R5_LOCKED, &dev->flags)) { - printk("sector=%llx i=%d %p %p %p %d\n", + printk("sector=%llx i=%d %p %p %p %p %d\n", (unsigned long long)sh->sector, i, dev->toread, - dev->towrite, dev->written, + dev->read, dev->towrite, dev->written, test_bit(R5_LOCKED, &dev->flags)); BUG(); } @@ -325,6 +327,43 @@ static struct stripe_head *get_active_stripe(raid5_conf_t *conf, sector_t sector return sh; } +/* check_op() ensures that we only dequeue an operation once */ +#define check_op(op) do {\ + if (test_bit(op, &sh->ops.pending) &&\ + !test_bit(op, &sh->ops.complete)) {\ + if (test_and_set_bit(op, &sh->ops.ack))\ + clear_bit(op, &pending);\ + else\ + ack++;\ + } else\ + clear_bit(op, &pending);\ +} while(0) + +/* find new work to run, do not resubmit work that is already + * in flight + */ +static unsigned long get_stripe_work(struct stripe_head *sh) +{ + unsigned long pending; + int ack = 0; + + pending = sh->ops.pending; + + check_op(STRIPE_OP_BIOFILL); + check_op(STRIPE_OP_COMPUTE_BLK); + check_op(STRIPE_OP_PREXOR); + check_op(STRIPE_OP_BIODRAIN); + check_op(STRIPE_OP_POSTXOR); + check_op(STRIPE_OP_CHECK); + if (test_and_clear_bit(STRIPE_OP_IO, &sh->ops.pending)) + ack++; + + sh->ops.count -= ack; + BUG_ON(sh->ops.count < 0); + + return pending; +} + static int raid5_end_read_request(struct bio * bi, unsigned int bytes_done, int error); static int @@ -1878,7 +1917,6 @@ static int stripe_to_pdidx(sector_t stripe, raid5_conf_t *conf, int disks) *schedule a write of some buffers *return confirmation of parity correctness * - * Parity calculations are done inside the stripe lock * buffers are taken off read_list or write_list, and bh_cache buffers * get BH_Lock set before the stripe lock is released. * @@ -1896,10 +1934,11 @@ static void handle_stripe5(struct stripe_head *sh) int non_overwrite = 0; int failed_num=0; struct r5dev *dev; + unsigned long pending=0; - PRINTK("handling stripe %llu, cnt=%d, pd_idx=%d\n", - (unsigned long long)sh->sector, atomic_read(&sh->count), - sh->pd_idx); + PRINTK("handling stripe %llu, state=%#lx cnt=%d, pd_idx=%d ops=%lx:%lx:%lx\n", + (unsigned long long)sh->sector, sh->state, atomic_read(&sh->count), + sh->pd_idx, sh->ops.pending, sh->ops.ack, sh->ops.complete); spin_lock(&sh->lock); clear_bit(STRIPE_HANDLE, &sh->state); @@ -2349,8 +2388,14 @@ static void handle_stripe5(struct stripe_head *sh) } } + if (sh->ops.count) + pending = get_stripe_work(sh); + spin_unlock(&sh->lock)
[PATCH 05/16] md: add raid5_run_ops and support routines
Prepare the raid5 implementation to use async_tx for running stripe operations: * biofill (copy data into request buffers to satisfy a read request) * compute block (generate a missing block in the cache from the other blocks) * prexor (subtract existing data as part of the read-modify-write process) * biodrain (copy data out of request buffers to satisfy a write request) * postxor (recalculate parity for new data that has entered the cache) * check (verify that the parity is correct) * io (submit i/o to the member disks) Changelog: * removed ops_complete_biodrain in favor of ops_complete_postxor and ops_complete_write. * removed the workqueue * call bi_end_io for reads in ops_complete_biofill * explicitly handle the 2-disk raid5 case (xor becomes memcpy) * fix race between async engines and bi_end_io call for reads, Neil Brown * remove unnecessary spin_lock from ops_complete_biofill * remove test_and_set/test_and_clear BUG_ONs, Neil Brown * remove explicit interrupt handling Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- drivers/md/raid5.c | 539 include/linux/raid/raid5.h | 63 + 2 files changed, 599 insertions(+), 3 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index ab8702d..0251bca 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -52,6 +52,7 @@ #include "raid6.h" #include +#include /* * Stripe cache @@ -324,6 +325,544 @@ static struct stripe_head *get_active_stripe(raid5_conf_t *conf, sector_t sector return sh; } +static int +raid5_end_read_request(struct bio * bi, unsigned int bytes_done, int error); +static int +raid5_end_write_request (struct bio *bi, unsigned int bytes_done, int error); + +static void ops_run_io(struct stripe_head *sh) +{ + raid5_conf_t *conf = sh->raid_conf; + int i, disks = sh->disks; + + might_sleep(); + + for (i=disks; i-- ;) { + int rw; + struct bio *bi; + mdk_rdev_t *rdev; + if (test_and_clear_bit(R5_Wantwrite, &sh->dev[i].flags)) + rw = WRITE; + else if (test_and_clear_bit(R5_Wantread, &sh->dev[i].flags)) + rw = READ; + else + continue; + + bi = &sh->dev[i].req; + + bi->bi_rw = rw; + if (rw == WRITE) + bi->bi_end_io = raid5_end_write_request; + else + bi->bi_end_io = raid5_end_read_request; + + rcu_read_lock(); + rdev = rcu_dereference(conf->disks[i].rdev); + if (rdev && test_bit(Faulty, &rdev->flags)) + rdev = NULL; + if (rdev) + atomic_inc(&rdev->nr_pending); + rcu_read_unlock(); + + if (rdev) { + if (test_bit(STRIPE_SYNCING, &sh->state) || + test_bit(STRIPE_EXPAND_SOURCE, &sh->state) || + test_bit(STRIPE_EXPAND_READY, &sh->state)) + md_sync_acct(rdev->bdev, STRIPE_SECTORS); + + bi->bi_bdev = rdev->bdev; + PRINTK("%s: for %llu schedule op %ld on disc %d\n", + __FUNCTION__, (unsigned long long)sh->sector, + bi->bi_rw, i); + atomic_inc(&sh->count); + bi->bi_sector = sh->sector + rdev->data_offset; + bi->bi_flags = 1 << BIO_UPTODATE; + bi->bi_vcnt = 1; + bi->bi_max_vecs = 1; + bi->bi_idx = 0; + bi->bi_io_vec = &sh->dev[i].vec; + bi->bi_io_vec[0].bv_len = STRIPE_SIZE; + bi->bi_io_vec[0].bv_offset = 0; + bi->bi_size = STRIPE_SIZE; + bi->bi_next = NULL; + if (rw == WRITE && + test_bit(R5_ReWrite, &sh->dev[i].flags)) + atomic_add(STRIPE_SECTORS, &rdev->corrected_errors); + generic_make_request(bi); + } else { + if (rw == WRITE) + set_bit(STRIPE_DEGRADED, &sh->state); + PRINTK("skip op %ld on disc %d for sector %llu\n", + bi->bi_rw, i, (unsigned long long)sh->sector); + clear_bit(R5_LOCKED, &sh->dev[i].flags); + set_bit(STRIPE_HANDLE, &sh->state); + } + } +} + +static struct dma_async_tx_descriptor * +async_copy_data(int frombio, struct bio *bio, struct page *page, sector_t sector, + struct dma_async_tx_descriptor *tx) +{ + struct bio_vec *bvl; + struct page *bio_page; + int i; + int page_offset; + +
[PATCH 04/16] dmaengine: add the async_tx api
The async_tx api provides methods for describing a chain of asynchronous bulk memory transfers/transforms with support for inter-transactional dependencies. It is implemented as a dmaengine client that smooths over the details of different hardware offload engine implementations. Code that is written to the api can optimize for asynchrnous operation and the api will fit the chain of operations to the available offload resources. Currently the raid5 implementation in the MD raid456 driver has been converted to the async_tx api. A driver for the offload engines on the Intel Xscale series of I/O processors, iop-adma, is provided. With the iop-adma driver and async_tx, raid456 is able to offload copy, xor, and xor-zero-sum operations to hardware engines. On iop342 tiobench showed higher throughput for sequential writes (20 - 30% improvement) and sequential reads to a degraded array (40 - 55% improvement). For the other cases performance was roughly equal, +/- a few percentage points. On a x86-smp platform the performance of the async_tx implementation (in synchronous mode) was also +/- a few percentage points of the original implementation. According to 'top' CPU utilization was positively affected in the offload case, but exact measurements have yet to be taken. The tiobench command line used for testing was: tiobench --size 2048 --block 4096 --block 131072 --dir /mnt/raid --numruns 5 * iop342 had 1GB of memory available Xor operations are handled by async_tx, to this end xor.c is moved into drivers/dma and is changed to take an explicit destination address and a series of sources to match the hardware engine implementation. When CONFIG_DMA_ENGINE is not set the asynchrounous path is compiled away. Changelog: * fixed a leftover debug print * don't allow callbacks in async_interrupt_cond * fixed xor_block changes * fixed usage of ASYNC_TX_XOR_DROP_DEST * drop dma mapping methods, suggested by Chris Leech * printk warning fixups from Andrew Morton * don't use inline in C files, Adrian Bunk * select the API when MD is enabled * BUG_ON xor source counts <= 1 * implicitly handle hardware concerns like channel switching and interrupts, Neil Brown * remove the per operation type list, and distribute operation capabilities evenly amongst the available channels * simplify async_tx_find_channel to optimize the fast path Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- drivers/Makefile |1 drivers/dma/Kconfig | 15 + drivers/dma/Makefile |1 drivers/dma/async_tx.c | 889 ++ drivers/dma/xor.c| 153 drivers/md/Kconfig |3 drivers/md/Makefile |6 drivers/md/raid5.c | 52 +-- drivers/md/xor.c | 154 include/linux/async_tx.h | 173 + include/linux/raid/xor.h |5 11 files changed, 1263 insertions(+), 189 deletions(-) diff --git a/drivers/Makefile b/drivers/Makefile index 3a718f5..2e8de9e 100644 --- a/drivers/Makefile +++ b/drivers/Makefile @@ -62,6 +62,7 @@ obj-$(CONFIG_I2C) += i2c/ obj-$(CONFIG_W1) += w1/ obj-$(CONFIG_HWMON)+= hwmon/ obj-$(CONFIG_PHONE)+= telephony/ +obj-$(CONFIG_ASYNC_TX_DMA) += dma/ obj-$(CONFIG_MD) += md/ obj-$(CONFIG_BT) += bluetooth/ obj-$(CONFIG_ISDN) += isdn/ diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index 30d021d..292ddad 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -7,8 +7,8 @@ menu "DMA Engine support" config DMA_ENGINE bool "Support for DMA engines" ---help--- - DMA engines offload copy operations from the CPU to dedicated - hardware, allowing the copies to happen asynchronously. + DMA engines offload bulk memory operations from the CPU to dedicated + hardware, allowing the operations to happen asynchronously. comment "DMA Clients" @@ -22,6 +22,16 @@ config NET_DMA Since this is the main user of the DMA engine, it should be enabled; say Y here. +config ASYNC_TX_DMA + tristate "Asynchronous Bulk Memory Transfers/Transforms API" + ---help--- + This enables the async_tx management layer for dma engines. + Subsystems coded to this API will use offload engines for bulk + memory operations where present. Software implementations are + called when a dma engine is not present or fails to allocate + memory to carry out the transaction. + Current subsystems ported to async_tx: MD_RAID4,5 + comment "DMA Devices" config INTEL_IOATDMA @@ -30,5 +40,4 @@ config INTEL_IOATDMA default m ---help--- Enable support for the Intel(R) I/OAT DMA engine. - endmenu diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile index bdcfdbd..6a99341 100644 --- a/drivers/dma/Makefile +++ b/drivers/dma/Makefile @@ -1,3 +1,4 @@ obj-$(CONFIG_DMA_ENGINE) += d
[PATCH 03/16] ARM: Add drivers/dma to arch/arm/Kconfig
Cc: Russell King <[EMAIL PROTECTED]> Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- arch/arm/Kconfig |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index e7baca2..74077e3 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -997,6 +997,8 @@ source "drivers/mmc/Kconfig" source "drivers/rtc/Kconfig" +source "drivers/dma/Kconfig" + endmenu source "fs/Kconfig" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 14/22] pollfs: pollable futex
Eric Dumazet wrote: > Davi Arnaut a écrit : >> Asynchronously wait for FUTEX_WAKE operation on a futex if it still contains >> a given value. There can be only one futex wait per file descriptor. However, >> it can be rearmed (possibly at a different address) anytime. >> >> The pollable futex approach is far superior (send and receive events from >> userspace or kernel) to eventfd and fixes (supercedes) FUTEX_FD at the same >> time. >> >> Building block for pollable semaphores and user-defined events. >> >> Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> >> >> --- >> fs/pollfs/Makefile |1 >> fs/pollfs/futex.c | 154 >> + >> init/Kconfig |7 ++ >> 3 files changed, 162 insertions(+) >> >> Index: linux-2.6/fs/pollfs/Makefile >> === >> --- linux-2.6.orig/fs/pollfs/Makefile >> +++ linux-2.6/fs/pollfs/Makefile >> @@ -3,3 +3,4 @@ pollfs-y := file.o >> >> pollfs-$(CONFIG_POLLFS_SIGNAL) += signal.o >> pollfs-$(CONFIG_POLLFS_TIMER) += timer.o >> +pollfs-$(CONFIG_POLLFS_FUTEX) += futex.o >> Index: linux-2.6/fs/pollfs/futex.c >> === >> --- /dev/null >> +++ linux-2.6/fs/pollfs/futex.c >> @@ -0,0 +1,154 @@ >> +/* >> + * pollable futex >> + * >> + * Copyright (C) 2007 Davi E. M. Arnaut >> + * >> + * Licensed under the GNU GPL. See the file COPYING for details. >> + */ >> + >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> + >> +struct futex_event { >> +union { >> +void __user *addr; >> +u64 padding; >> +}; >> +int val; >> +}; > > Hum... Here we might have a problem with 64 bit futexes, or private futexes > > So I believe this interface is not well defined and not expandable: in case > of > future additions to futexes, an old application compiled with an old pollable > futex_event type might fail. > Hmm, how about: struct futex_event { union { void __user *addr; u64 padding; }; union { int val; s64 val64; }; /* whatever room is necessary for future improvements */ }; I haven't been keeping up with 64 bit or private futexes. What else could probably go wrong? -- Davi Arnaut - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 02/16] dmaengine: move channel management to the client
This effectively makes channels a shared resource rather than tying them to a specific client. dmaengine now assumes that clients will internally track how many channels they need and dmaengine will learn if the client cares about a channel at dma_event_callback time. This also enables a client to ignore a channel if it does not meet extra client specific constraints beyond simple base capabilities. This patch also fixes up the NET_DMA client to use the new mechanism. Cc: Chris Leech <[EMAIL PROTECTED]> Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- drivers/dma/dmaengine.c | 206 ++--- drivers/dma/ioatdma.c |1 drivers/dma/ioatdma.h |3 - include/linux/dmaengine.h | 46 +- net/core/dev.c| 106 --- 5 files changed, 198 insertions(+), 164 deletions(-) diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c index 8a49103..1a26ce3 100644 --- a/drivers/dma/dmaengine.c +++ b/drivers/dma/dmaengine.c @@ -37,8 +37,8 @@ * Each device has a channels list, which runs unlocked but is never modified * once the device is registered, it's just setup by the driver. * - * Each client has a channels list, it's only modified under the client->lock - * and in an RCU callback, so it's safe to read under rcu_read_lock(). + * Each client is responsible for keeping track of the channels it uses. See + * the definition of dma_event_callback in dmaengine.h. * * Each device has a kref, which is initialized to 1 when the device is * registered. A kref_put is done for each class_device registered. When the @@ -51,10 +51,12 @@ * references to finish. * * Each channel has an open-coded implementation of Rusty Russell's "bigref," - * with a kref and a per_cpu local_t. A single reference is set when on an - * ADDED event, and removed with a REMOVE event. Net DMA client takes an - * extra reference per outstanding transaction. The relase function does a - * kref_put on the device. -ChrisL + * with a kref and a per_cpu local_t. A dma_chan_get is called when a client + * signals that it wants to use a channel, and dma_chan_put is called when + * a channel is removed or a client using it is unregesitered. A client can + * take extra references per outstanding transaction, as is the case with + * the NET DMA client. The release function does a kref_put on the device. + * -ChrisL, DanW */ #include @@ -102,8 +104,18 @@ static ssize_t show_bytes_transferred(struct class_device *cd, char *buf) static ssize_t show_in_use(struct class_device *cd, char *buf) { struct dma_chan *chan = container_of(cd, struct dma_chan, class_dev); + int in_use = 0; + + if (unlikely(chan->slow_ref) && atomic_read(&chan->refcount.refcount) > 1) + in_use = 1; + else { + if (local_read(&(per_cpu_ptr(chan->local, + get_cpu())->refcount)) > 0) + in_use = 1; + put_cpu(); + } - return sprintf(buf, "%d\n", (chan->client ? 1 : 0)); + return sprintf(buf, "%d\n", in_use); } static struct class_device_attribute dma_class_attrs[] = { @@ -129,42 +141,50 @@ static struct class dma_devclass = { /* --- client and device registration --- */ +#define dma_async_chan_satisfies_mask(chan, mask) __dma_async_chan_satisfies_mask((chan), &(mask)) +static int __dma_async_chan_satisfies_mask(struct dma_chan *chan, dma_cap_mask_t *want) +{ + dma_cap_mask_t has; + + bitmap_and(has.bits, want->bits, chan->device->cap_mask.bits, DMA_TX_TYPE_END); + return bitmap_equal(want->bits, has.bits, DMA_TX_TYPE_END); +} + /** - * dma_client_chan_alloc - try to allocate a channel to a client + * dma_client_chan_alloc - try to allocate channels to a client * @client: &dma_client * * Called with dma_list_mutex held. */ -static struct dma_chan *dma_client_chan_alloc(struct dma_client *client) +static void dma_client_chan_alloc(struct dma_client *client) { struct dma_device *device; struct dma_chan *chan; - unsigned long flags; int desc; /* allocated descriptor count */ + int ack; /* client has taken a reference to this channel */ - /* Find a channel, any DMA engine will do */ - list_for_each_entry(device, &dma_device_list, global_node) { + /* Find a channel */ + list_for_each_entry(device, &dma_device_list, global_node) list_for_each_entry(chan, &device->channels, device_node) { - if (chan->client) + if (!dma_async_chan_satisfies_mask(chan, client->cap_mask)) continue; desc = chan->device->device_alloc_chan_resources(chan); if (desc >= 0) { - kref_get(&device->refcount); - kref_init(&chan->refcount); -
Re: [patch 0/3] Clocksource / clockevent updates
On Wed, 02 May 2007 08:09:29 +0200 Thomas Gleixner <[EMAIL PROTECTED]> wrote: > On Tue, 2007-05-01 at 17:33 -0700, Andrew Morton wrote: > > On Mon, 30 Apr 2007 10:43:31 - > > Thomas Gleixner <[EMAIL PROTECTED]> wrote: > > > > > Andrew, > > > > > > please pick up the following updates to clocksource / clockevents: > > > > > > - Fixups to the resume logic > > > - Keep TSC stable, when lapic_timer_c2_ok is set > > > > > > > Should we be targetting these at 2.6.20.x? > > 2.6.21.x ? > > Hmm. They should get some testing first, but otherwise yes. > OK, I added the cc. The second patch won't apply to 2.6.21 when we get around to it, but it'll be pretty simple to repair. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 01/16] dmaengine: add base support for the async_tx api
In preparation for the async_tx (dmaengine client) API this patch: 1/ introduces struct dma_async_tx_descriptor as a common field for all dmaengine software descriptors. The primary role of this structure is to enable callbacks at transaction completion time, and support transaction chains that span multiple channels 2/ converts the device_memcpy_* methods into separate prep, set src/dest, and submit stages 3/ adds support for capabilities beyond memcpy (xor, memset, xor zero sum, completion interrupts). place holders for future capabilities are also included 4/ converts ioatdma to the new semantics Changelog: * drop dma mapping methods, suggested by Chris Leech * fix ioat_dma_dependency_added, also caught by Andrew Morton * fix dma_sync_wait, change from Andrew Morton * uninline large functions, change from Andrew Morton * add tx->callback = NULL to dmaengine calls to interoperate with async_tx calls * hookup ioat_tx_submit * convert channel capabilities to a 'cpumask_t like' bitmap Cc: Chris Leech <[EMAIL PROTECTED]> Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- drivers/dma/dmaengine.c | 182 + drivers/dma/ioatdma.c | 248 - drivers/dma/ioatdma.h |8 + include/linux/dmaengine.h | 245 4 files changed, 454 insertions(+), 229 deletions(-) diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c index 322ee29..8a49103 100644 --- a/drivers/dma/dmaengine.c +++ b/drivers/dma/dmaengine.c @@ -59,6 +59,7 @@ #include #include +#include #include #include #include @@ -66,6 +67,7 @@ #include #include #include +#include static DEFINE_MUTEX(dma_list_mutex); static LIST_HEAD(dma_device_list); @@ -165,6 +167,24 @@ static struct dma_chan *dma_client_chan_alloc(struct dma_client *client) return NULL; } +enum dma_status dma_sync_wait(struct dma_chan *chan, dma_cookie_t cookie) +{ + enum dma_status status; + unsigned long dma_sync_wait_timeout = jiffies + msecs_to_jiffies(5000); + + dma_async_issue_pending(chan); + do { + status = dma_async_is_tx_complete(chan, cookie, NULL, NULL); + if (time_after_eq(jiffies, dma_sync_wait_timeout)) { + printk(KERN_ERR "dma_sync_wait_timeout!\n"); + return DMA_ERROR; + } + } while (status == DMA_IN_PROGRESS); + + return status; +} +EXPORT_SYMBOL(dma_sync_wait); + /** * dma_chan_cleanup - release a DMA channel's resources * @kref: kernel reference structure that contains the DMA channel device @@ -322,6 +342,28 @@ int dma_async_device_register(struct dma_device *device) if (!device) return -ENODEV; + /* validate device routines */ + BUG_ON(dma_has_cap(DMA_MEMCPY, device->cap_mask) && + !device->device_prep_dma_memcpy); + BUG_ON(dma_has_cap(DMA_XOR, device->cap_mask) && + !device->device_prep_dma_xor); + BUG_ON(dma_has_cap(DMA_ZERO_SUM, device->cap_mask) && + !device->device_prep_dma_zero_sum); + BUG_ON(dma_has_cap(DMA_MEMSET, device->cap_mask) && + !device->device_prep_dma_memset); + BUG_ON(dma_has_cap(DMA_ZERO_SUM, device->cap_mask) && + !device->device_prep_dma_interrupt); + + BUG_ON(!device->device_alloc_chan_resources); + BUG_ON(!device->device_free_chan_resources); + BUG_ON(!device->device_tx_submit); + BUG_ON(!device->device_set_dest); + BUG_ON(!device->device_set_src); + BUG_ON(!device->device_dependency_added); + BUG_ON(!device->device_is_tx_complete); + BUG_ON(!device->device_issue_pending); + BUG_ON(!device->dev); + init_completion(&device->done); kref_init(&device->refcount); device->dev_id = id++; @@ -397,6 +439,146 @@ void dma_async_device_unregister(struct dma_device *device) } EXPORT_SYMBOL(dma_async_device_unregister); +/** + * dma_async_memcpy_buf_to_buf - offloaded copy between virtual addresses + * @chan: DMA channel to offload copy to + * @dest: destination address (virtual) + * @src: source address (virtual) + * @len: length + * + * Both @dest and @src must be mappable to a bus address according to the + * DMA mapping API rules for streaming mappings. + * Both @dest and @src must stay memory resident (kernel memory or locked + * user space pages). + */ +dma_cookie_t dma_async_memcpy_buf_to_buf(struct dma_chan *chan, +void *dest, void *src, size_t len) +{ + struct dma_device *dev = chan->device; + struct dma_async_tx_descriptor *tx; + dma_addr_t addr; + dma_cookie_t cookie; + int cpu; + + tx = dev->device_prep_dma_memcpy(chan, len, 0); + if (!tx) + return -ENOMEM; + + tx->ack = 1; + tx->callback = NULL; + addr = dma_map_single(dev->dev, src, len,
[PATCH 00/16] raid acceleration and asynchronous offload api for 2.6.22
I am pleased to release this latest spin of the raid acceleration patches for merge consideration. This release aims to address all pending review items including MD bug fixes and async_tx api changes from Neil, and concerns on channel management from Chris and others. Data integrity tests using home grown scripts and 'iozone -V' are passing. I am open to suggestions for additional testing criteria. I have also verified that git bisect is not broken by this set. The short log below highlights the most recent changes. The patches will be sent as a reply to this message, and they are also available via git: git pull git://lost.foo-projects.org/~dwillia2/git/iop md-accel-linus Additional comments and feedback welcome. Thanks, Dan -- 01/16: dmaengine: add base support for the async_tx api * convert channel capabilities to a 'cpumask_t like' bitmap 02/16: dmaengine: move channel management to the client * this patch is new to this series 03/16: ARM: Add drivers/dma to arch/arm/Kconfig 04/16: dmaengine: add the async_tx api * remove the per operation type list, and distribute operation capabilities evenly amongst the available channels * simplify async_tx_find_channel to optimize the fast path 05/16: md: add raid5_run_ops and support routines * explicitly handle the 2-disk raid5 case (xor becomes memcpy) * fix race between async engines and bi_end_io call for reads, Neil Brown * remove unnecessary spin_lock from ops_complete_biofill * remove test_and_set/test_and_clear BUG_ONs, Neil Brown * remove explicit interrupt handling, Neil Brown 06/16: md: use raid5_run_ops for stripe cache operations 07/16: md: move write operations to raid5_run_ops * remove test_and_set/test_and_clear BUG_ONs, Neil Brown 08/16: md: move raid5 compute block operations to raid5_run_ops * remove the req_compute BUG_ON 09/16: md: move raid5 parity checks to raid5_run_ops * remove test_and_set/test_and_clear BUG_ONs, Neil Brown 10/16: md: satisfy raid5 read requests via raid5_run_ops * cleanup to_read and to_fill accounting * do not fail reads that have reached the cache 11/16: md: use async_tx and raid5_run_ops for raid5 expansion operations 12/16: md: move raid5 io requests to raid5_run_ops 13/16: md: remove raid5 compute_block and compute_parity5 14/16: dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines * fix locking bug in iop_adma_alloc_chan_resources, Benjamin Herrenschmidt * convert capabilities over to dma_cap_mask_t 15/16: iop13xx: Surface the iop13xx adma units to the iop-adma driver 16/16: iop3xx: Surface the iop3xx DMA and AAU units to the iop-adma driver (previous release: http://marc.info/?l=linux-raid&m=117463257423193&w=2) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/9] Containers (V9): Basic container framework
Balbir wrote: > Would it be possible to extract those test cases and integrate them > with a testing framework like LTP? Do you have any regression test > suite for cpusets that can be made available publicly so that > any changes to cpusets can be validated? There are essentially two sorts of cpuset regression tests of interest. I have one such test, and the batch scheduler developers have various tests of their batch schedulers. 1) Testing batch schedulers against cpusets: I doubt that the batch scheduler developers would be able to extract a cpuset test from their tests, or be able to share it if they did. Their tests tend to be large tests of batch schedulers, and only incidentally test cpusets -- if we break cpusets, in sometimes even subtle ways that they happen to depend on, we break them. Sometimes there is no way to guess exactly what sorts of changes will break their code; we'll just have to schedule at least one run through one or more of them that rely heavily on cpusets before a change as big as rebasing cpusets on containers is reasonably safe. This test cycle won't be all that easy, so I'd wait until we are pretty close to what we think should be taken into the mainline kernel. I suppose I will have to be the one co-ordinating this test, as I am the only one I know with a presence in both camps. Once this test is done, from then forward, if we break them, we'll just have to deal with it as we do now, when the breakage shows up well down stream from the main kernel tree, at the point that a major batch scheduler release runs into a major distribution release containing the breakage. There is no practical way that I can see, as an ongoing basis, to continue testing for such breakage with every minor change to cpuset related code in the kernel. Any breakage found this way is dealt with by changes in user level code. Once again, I have bcc'd one or more developers of batch schedulers, so they can see what nonsense I am spouting about them now ;). 2) Testing cpusets with a specific test. There I can do better. Attached is the cpuset regression test I use. It requires at least 4 cpus and 2 memory nodes to do anything useful. It is copyright by SGI, released under GPL license. This regression test is the primary cpuset test upon which I relied during the development of cpusets, and continue to rely. Except for one subtle race condition in the test itself, it has not changed in the last two to three years. This test requires no user level code not found in an ordinary distro. It does require the taskset and numactl commands, for the purposes of testing certain interactions with them. It assumes that there are not other cpusets currently setup in the system that happen to conflict with the ones it creates. See further comments within the test script itself. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 cpuset_test Description: Binary data
Re: [patch 01/10] compiler: define __attribute_unused__
On Wed, 2 May 2007, Rusty Russell wrote: > Adding this macro doesn't give us anything that simply saying > "__attribute__((unused))" doesn't give. But it does add a layer of > kernel-specific indirection. > That's obviously true since we're defining __attribute_unused__ to be __attribute__((unused)). We were trying to clean up the misconception that the current __attribute_used__ was created to suppress warnings when, in fact, that was not its purpose. It was created to emit the code for a function that appeared to be unreferenced and only suppressed warnings as a side-effect in gcc <3.4. > If we're going to get kernel-specific, I'd prefer to see: > > __needed: suppress warning and don't discard, That would be the current definition of __attribute_used__ (i.e. we're saying that we use the function in inline assembly even though it appears we don't use it at all). > __unneeded: suppress warning and might discard. > That would be the patched definition of __attribute_unused__. So let's go back to the problem this was initially supposed to fix from arch/i386/pci/init.c: static __init int pci_access_init(void) { int type = 0; #ifdef CONFIG_PCI_DIRECT type = pci_direct_probe(); #endif #ifdef CONFIG_PCI_MMCONFIG pci_mmcfg_init(type); #endif ... and type is unreferenced for the remainder of the function. Obviously we could add #if defined(CONFIG_PCI_DIRECT) || defined(CONFIG_PCI_MMCONFIG) before the declaration of 'type', but that becomes sloppy pretty quickly. The patched version makes this: int type __attribute_unused__ = 0; which definitely tells you that you're using a compiler attribute that will be attached to that automatic. In your case: int type __unneeded = 0; doesn't say anything in this case. It doesn't resemble any attribute that a programmer might be familiar with and begs the question of why we've declared it if it's truly "unneeded"? By the way, there are tons of these instances where __attribute__((used)) needs to be added in driver code to suppress unreferenced warnings. David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/3] Clocksource / clockevent updates
On Tue, 2007-05-01 at 17:33 -0700, Andrew Morton wrote: > On Mon, 30 Apr 2007 10:43:31 - > Thomas Gleixner <[EMAIL PROTECTED]> wrote: > > > Andrew, > > > > please pick up the following updates to clocksource / clockevents: > > > > - Fixups to the resume logic > > - Keep TSC stable, when lapic_timer_c2_ok is set > > > > Should we be targetting these at 2.6.20.x? 2.6.21.x ? Hmm. They should get some testing first, but otherwise yes. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 01/10] compiler: define __attribute_unused__
On Tue, 1 May 2007 22:53:52 -0700 (PDT) David Rientjes <[EMAIL PROTECTED]> wrote: > On Wed, 2 May 2007, Alexey Dobriyan wrote: > > > On Tue, May 01, 2007 at 09:28:18PM -0700, David Rientjes wrote: > > > +#define __attribute_unused__ __attribute__((unused)) > > > > Suggest __unused which is shorter and looks compiler-neutral. > > > > So you would also suggest renaming __attribute_used__ and all 48 of its > uses to __used? Or __needed or __unneeded. None of them mean much to me and I'd be forever going back to the definition to work out what was intended. We're still in search of a name, IMO. But once we have it, yeah, we should update all present users. We can do that over time: retain the old and new definitions for a while. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 00/22] pollfs: filesystem abstraction for pollable objects
On Wed, 02 May 2007 02:22:35 -0300 Davi Arnaut <[EMAIL PROTECTED]> wrote: > This patch set introduces a new file system for the delivery of pollable > events through file descriptors. To the detriment of debugability, pollable > objects are a nice adjunct to nonblocking/epoll/event-based servers. > > The pollfs filesystem abstraction provides better mechanisms needed for > creating and maintaining pollable objects. Also the pollable futex approach > is far superior (send and receive events from userspace or kernel) to eventfd > and fixes (supercedes) FUTEX_FD at the same time. > > The (non) blocking and object size (user <-> kernel) semantics and are handled > internally, decoupling the core filesystem from the "subsystems" (mere push > and > pop operations). > > Currently implemented waitable "objects" are: signals, futexes, ai/o blocks > and > timers. Well that throws a spanner in the signalfd works. The code _looks_ nice and simple and clean from a quick scan. David, could you provide some feedback please? The patches are stunningly free of comments, but you used to do that to me pretty often so my sympathy is limited ;) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 14/22] pollfs: pollable futex
Davi Arnaut a écrit : Asynchronously wait for FUTEX_WAKE operation on a futex if it still contains a given value. There can be only one futex wait per file descriptor. However, it can be rearmed (possibly at a different address) anytime. The pollable futex approach is far superior (send and receive events from userspace or kernel) to eventfd and fixes (supercedes) FUTEX_FD at the same time. Building block for pollable semaphores and user-defined events. Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> --- fs/pollfs/Makefile |1 fs/pollfs/futex.c | 154 + init/Kconfig |7 ++ 3 files changed, 162 insertions(+) Index: linux-2.6/fs/pollfs/Makefile === --- linux-2.6.orig/fs/pollfs/Makefile +++ linux-2.6/fs/pollfs/Makefile @@ -3,3 +3,4 @@ pollfs-y := file.o pollfs-$(CONFIG_POLLFS_SIGNAL) += signal.o pollfs-$(CONFIG_POLLFS_TIMER) += timer.o +pollfs-$(CONFIG_POLLFS_FUTEX) += futex.o Index: linux-2.6/fs/pollfs/futex.c === --- /dev/null +++ linux-2.6/fs/pollfs/futex.c @@ -0,0 +1,154 @@ +/* + * pollable futex + * + * Copyright (C) 2007 Davi E. M. Arnaut + * + * Licensed under the GNU GPL. See the file COPYING for details. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +struct futex_event { + union { + void __user *addr; + u64 padding; + }; + int val; +}; Hum... Here we might have a problem with 64 bit futexes, or private futexes So I believe this interface is not well defined and not expandable: in case of future additions to futexes, an old application compiled with an old pollable futex_event type might fail. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Natsemi DP83815 driver spaming
>> > > * 2) check for sudden death of the NIC: >> > > *It seems that a reference set for this chip went out with >> incorrect info, >> > > *and there exist boards that aren't quite right. An >> unexpected voltage >> > > *drop can cause the PHY to get itself in a weird state >> (basically reset). >> > > *NOTE: this only seems to affect revC chips. >> >> > Code commented out and NIC is working OK. Strange. >> > eth0: DSPCFG accepted after 0 usec. >> > eth0: link up. >> > eth0: Setting full-duplex based on negotiated link capability. >> > dspcfg = 0x np->dspcfg = 0x5060 >> >> Oh, that's entertaining. I have to confess that I've never seen an that >> triggered the workaround before - adding the maintainer, Tim Hockin, who >> may be able to shed some light on the expected behaviour here? > > It's been quite a while since I dealt with this issue, so I am going > on faulty memory. A particular reference design for this chip had bad > resistor values, or something similar. That caused the chip to get > very very confused and need a reset. Can You send me documentation? I can't find anything in datasheet. I will replace bad resitors with correct ones. > So the driver is finding your chip to be hosed over and over again. > dspcfg = 0x00 is bad. I'd be very surprised if you don't get > other wierdness - bad performance or noise or who knows what. No. It is much better. Much less packets need to be retransmitted. I was blaming w3cache.tkdami.net earlier. > You could take out the error message and just let the driver do it's > thing, or you can try to run with that logic removed. But I'd measure > both and see what they do. Specifically - look for packet errors. With code commented out I have 1 error / 3 transmitted packets from DP83815C. I have 1 error / 10 transmitted packets to DP83815C. Maybe it works at all because I have short cable, only 10m long. I don't remember any errors with plain 2.6.21.1. > Tim Rafał -- NIE KUPUJ!!! ...zanim nie porownasz cen >> http://link.interia.pl/f1a5e - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 01/10] compiler: define __attribute_unused__
On Wed, 2 May 2007, Alexey Dobriyan wrote: > On Tue, May 01, 2007 at 09:28:18PM -0700, David Rientjes wrote: > > +#define __attribute_unused__ __attribute__((unused)) > > Suggest __unused which is shorter and looks compiler-neutral. > So you would also suggest renaming __attribute_used__ and all 48 of its uses to __used? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 02/22] pollfs: file system operations
The key feature of the pollfs file operations is to internally handle pollable (waitable) resources as files without exporting complex and bug-prone underlying (VFS) implementation details. All resource handlers are required to implement the read, write, poll, release operations and must not block. Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> --- fs/Makefile|1 fs/pollfs/Makefile |2 fs/pollfs/file.c | 238 + init/Kconfig |6 + 4 files changed, 247 insertions(+) Index: linux-2.6/fs/pollfs/file.c === --- /dev/null +++ linux-2.6/fs/pollfs/file.c @@ -0,0 +1,238 @@ +/* + * Copyright (C) 2007 Davi E. M. Arnaut + * + * Licensed under the GNU GPL. See the file COPYING for details. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define POLLFS_MAGIC 0x9a6afcd + +MODULE_LICENSE("GPL"); + +/* pollfs vfsmount entry */ +static struct vfsmount *pfs_mnt; + +/* pollfs file operations */ +static const struct file_operations pfs_fops; + +static inline ssize_t +pfs_read_nonblock(const struct pfs_operations *fops, void *data, + void __user *obj, size_t nr) +{ + ssize_t count = 0, res = 0; + + do { + res = fops->read(data, obj); + if (res) + break; + count++; + obj += fops->rsize; + } while (--nr); + + if (count) + return count * fops->rsize; + else if (res) + return res; + else + return -EAGAIN; +} + +static inline ssize_t +pfs_read_block(const struct pfs_operations *fops, void *data, + wait_queue_head_t *wait, void __user *obj, size_t nr) +{ + ssize_t count; + + do { + count = pfs_read_nonblock(fops, data, obj, nr); + if (count != -EAGAIN) + break; + count = wait_event_interruptible((*wait), fops->poll(data)); + } while (!count); + + return count; +} + +static ssize_t pfs_read(struct file *filp, char __user *buf, + size_t count, loff_t * pos) +{ + size_t nevents = count; + struct pfs_file *pfs = filp->private_data; + const struct pfs_operations *fops = pfs->fops; + + if (fops->rsize) + nevents /= fops->rsize; + else + nevents = 1; + + if (!nevents) + return -EINVAL; + + if (filp->f_flags & O_NONBLOCK) + return pfs_read_nonblock(fops, pfs->data, buf, nevents); + else + return pfs_read_block(fops, pfs->data, pfs->wait, buf, nevents); +} + +static ssize_t pfs_write(struct file *filp, const char __user *buf, +size_t count, loff_t * ppos) +{ + ssize_t res = 0; + size_t nevents = count; + struct pfs_file *pfs = filp->private_data; + const struct pfs_operations *fops = pfs->fops; + + if (fops->wsize) + nevents /= fops->wsize; + else + nevents = 1; + + if (!nevents) + return -EINVAL; + + count = 0; + + do { + res = fops->write(pfs->data, buf); + if (res) + break; + count++; + buf += fops->wsize; + } while (--nevents); + + if (count) + return count * fops->wsize; + else if (res) + return res; + else + return 0; +} + +static unsigned int pfs_poll(struct file *filp, struct poll_table_struct *wait) +{ + int ret = 0; + struct pfs_file *pfs = filp->private_data; + + poll_wait(filp, pfs->wait, wait); + + if (pfs->fops->poll) + ret = pfs->fops->poll(pfs->data); + else + ret = POLLIN; + + return ret; +} + +static int pfs_mmap(struct file *filp, struct vm_area_struct *vma) +{ + struct pfs_file *pfs = filp->private_data; + + return (pfs->fops->mmap) ? pfs->fops->mmap(pfs->data, vma) : -ENODEV; +} + +static int pfs_release(struct inode *inode, struct file *filp) +{ + struct pfs_file *pfs = filp->private_data; + + return pfs->fops->release(pfs->data); +} + +static const struct file_operations pfs_fops = { + .poll = pfs_poll, + .mmap = pfs_mmap, + .read = pfs_read, + .write = pfs_write, + .release = pfs_release +}; + +long pfs_open(struct pfs_file *pfs) +{ + int fd; + struct file *filp; + const struct pfs_operations *fops = pfs->fops; + + if (IS_ERR(pfs_mnt)) + return -ENOSYS; + + if (!fops->poll || (!fops->read || !fops->write)) + return -EINVAL; + + fd = get_unused_fd(); + if (fd < 0) + return -ENFILE; + + filp = get_empty_filp(); +
[PATCH] x86_64: O_EXCL on /dev/mcelog
From: Tim Hockin <[EMAIL PROTECTED]> Background: /dev/mcelog is a clear-on-read interface. It is currently possible for multiple users to open and read() the device. Users are protected from each other during any one read, but not across reads. Description: This patch adds support for O_EXCL to /dev/mcelog. If a user opens the device with O_EXCL, no other user may open the device (EBUSY). Likewise, any user that tries to open the device with O_EXCL while another user has the device will fail (EBUSY). Result: Applications can get exclusive access to /dev/mcelog. Applications that do not care will be unchanged. Alternatives: A simpler choice would be to only allow one open() at all, regardless of O_EXCL. Testing: I wrote an application that opens /dev/mcelog with O_EXCL and observed that any other app that tried to open /dev/mcelog would fail until the exclusive app had closed the device. Caveats: None. Patch: This patch is against 2.6.21-rc7. Signed-off-by: Tim Hockin <[EMAIL PROTECTED]> --- This is the first version version of this patch. The simpler alternative of only one open() sounds better to me, but becomes a net change in behavior. diff -pruN linux-2.6.20+th/arch/x86_64/kernel/mce.c linux-2.6.20+th1.5/arch/x86_64/kernel/mce.c --- linux-2.6.20+th/arch/x86_64/kernel/mce.c2007-04-27 14:19:08.0 -0700 +++ linux-2.6.20+th1.5/arch/x86_64/kernel/mce.c 2007-05-01 21:53:10.0 -0700 @@ -465,6 +465,40 @@ void __cpuinit mcheck_init(struct cpuinf * Character device to read and clear the MCE log. */ +static DEFINE_SPINLOCK(mce_state_lock); +static int open_count; /* #times opened */ +static int open_exclu; /* already open exclusive? */ + +static int mce_open(struct inode *inode, struct file *file) +{ + spin_lock(&mce_state_lock); + + if (open_exclu || (open_count && (file->f_flags & O_EXCL))) { + spin_unlock(&mce_state_lock); + return -EBUSY; + } + + if (file->f_flags & O_EXCL) + open_exclu = 1; + open_count++; + + spin_unlock(&mce_state_lock); + + return 0; +} + +static int mce_release(struct inode *inode, struct file *file) +{ + spin_lock(&mce_state_lock); + + open_count--; + open_exclu = 0; + + spin_unlock(&mce_state_lock); + + return 0; +} + static void collect_tscs(void *data) { unsigned long *cpu_tsc = (unsigned long *)data; @@ -553,6 +587,8 @@ static int mce_ioctl(struct inode *i, st } static const struct file_operations mce_chrdev_ops = { + .open = mce_open, + .release = mce_release, .read = mce_read, .ioctl = mce_ioctl, }; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 06/22] pollfs: export the plsignal system call
Export the new plsignal syscall prototype. While there, make it "conditional". Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> --- include/linux/syscalls.h |2 ++ kernel/sys_ni.c |1 + 2 files changed, 3 insertions(+) Index: linux-2.6/include/linux/syscalls.h === --- linux-2.6.orig/include/linux/syscalls.h +++ linux-2.6/include/linux/syscalls.h @@ -605,4 +605,6 @@ asmlinkage long sys_getcpu(unsigned __us int kernel_execve(const char *filename, char *const argv[], char *const envp[]); +asmlinkage long sys_plsignal(const sigset_t __user * set); + #endif Index: linux-2.6/kernel/sys_ni.c === --- linux-2.6.orig/kernel/sys_ni.c +++ linux-2.6/kernel/sys_ni.c @@ -112,6 +112,7 @@ cond_syscall(sys_vm86old); cond_syscall(sys_vm86); cond_syscall(compat_sys_ipc); cond_syscall(compat_sys_sysctl); +cond_syscall(sys_plsignal); /* arch-specific weak syscall entries */ cond_syscall(sys_pciconfig_read); -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 04/22] pollfs: pollable signal
Retrieve multiple per-process signals through a file descriptor. The mask of signals can be changed at any time. Also, the compat code can be kept very simple. Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> --- fs/pollfs/Makefile |2 fs/pollfs/signal.c | 144 + init/Kconfig |7 ++ 3 files changed, 153 insertions(+) Index: linux-2.6/fs/pollfs/signal.c === --- /dev/null +++ linux-2.6/fs/pollfs/signal.c @@ -0,0 +1,144 @@ +/* + * sigtimedwait4, retrieve multiple signals with one call. + * + * Copyright (C) 2007 Davi E. M. Arnaut + * + * Licensed under the GNU GPL. See the file COPYING for details. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +struct pfs_signal { + sigset_t set; + spinlock_t lock; + struct task_struct *task; + struct pfs_file file; +}; + +static void inline sigset_adjust(sigset_t *set) +{ + /* SIGKILL and SIGSTOP cannot be caught, blocked, or ignored */ + sigdelsetmask(set, sigmask(SIGKILL) | sigmask(SIGSTOP)); + + /* Signals we don't want to dequeue */ + signotset(set); +} + +static ssize_t read(struct pfs_signal *evs, siginfo_t __user *infoup) +{ + int signo; + siginfo_t info; + + signo = dequeue_signal_lock(evs->task, &evs->set, &info); + if (!signo) + return -EAGAIN; + + if (copy_siginfo_to_user(infoup, &info)) + return -EFAULT; + + return 0; +} + +static ssize_t write(struct pfs_signal *evs, const sigset_t __user *uset) +{ + sigset_t set; + + if (copy_from_user(&set, uset, sizeof(sigset_t))) + return -EFAULT; + + sigset_adjust(&set); + + spin_lock_irq(&evs->lock); + sigemptyset(&evs->set); + sigorsets(&evs->set, &evs->set, &set); + spin_unlock_irq(&evs->lock); + + return 0; +} + +static int poll(struct pfs_signal *evs) +{ + int ret = 0; + sigset_t pending; + unsigned long flags; + + rcu_read_lock(); + + if (!lock_task_sighand(evs->task, &flags)) + goto out_unlock; + + sigorsets(&pending, &evs->task->pending.signal, + &evs->task->signal->shared_pending.signal); + + unlock_task_sighand(evs->task, &flags); + + spin_lock_irqsave(&evs->lock, flags); + signandsets(&pending, &pending, &evs->set); + spin_unlock_irqrestore(&evs->lock, flags); + + if (!sigisemptyset(&pending)) + ret = POLLIN; + +out_unlock: + rcu_read_unlock(); + + return ret; +} + +static int release(struct pfs_signal *evs) +{ + put_task_struct(evs->task); + kfree(evs); + + return 0; +} + +static const struct pfs_operations signal_ops = { + .read = PFS_READ(read, struct pfs_signal, siginfo_t), + .write = PFS_WRITE(write, struct pfs_signal, sigset_t), + .poll = PFS_POLL(poll, struct pfs_signal), + .release= PFS_RELEASE(release, struct pfs_signal), + .rsize = sizeof(siginfo_t), + .wsize = sizeof(sigset_t), +}; + +asmlinkage long sys_plsignal(const sigset_t __user *uset) +{ + long error; + struct pfs_signal *evs; + + evs = kmalloc(sizeof(*evs), GFP_KERNEL); + if (!evs) + return -ENOMEM; + + if (copy_from_user(&evs->set, uset, sizeof(sigset_t))) { + kfree(evs); + return -EFAULT; + } + + spin_lock_init(&evs->lock); + + evs->task = current; + get_task_struct(current); + + sigset_adjust(&evs->set); + + evs->file.data = evs; + evs->file.fops = &signal_ops; + evs->file.wait = &evs->task->sigwait; + + error = pfs_open(&evs->file); + if (error < 0) + release(evs); + + return error; +} Index: linux-2.6/fs/pollfs/Makefile === --- linux-2.6.orig/fs/pollfs/Makefile +++ linux-2.6/fs/pollfs/Makefile @@ -1,2 +1,4 @@ obj-$(CONFIG_POLLFS) += pollfs.o pollfs-y := file.o + +pollfs-$(CONFIG_POLLFS_SIGNAL) += signal.o Index: linux-2.6/init/Kconfig === --- linux-2.6.orig/init/Kconfig +++ linux-2.6/init/Kconfig @@ -469,6 +469,13 @@ config POLLFS help Pollfs support +config POLLFS_SIGNAL + bool "Enable pollfs signal" if EMBEDDED + default y + depends on POLLFS + help +Pollable signal support + config SHMEM bool "Use full shmem filesystem" if EMBEDDED default y -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 05/22] pollfs: pollable signal compat code
Compat handlers for the pollable signal operations. Later the0 compat operations can operate on a per call basis. Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> --- fs/pollfs/signal.c | 85 + 1 file changed, 85 insertions(+) Index: linux-2.6/fs/pollfs/signal.c === --- linux-2.6.orig/fs/pollfs/signal.c +++ linux-2.6/fs/pollfs/signal.c @@ -16,6 +16,7 @@ #include #include #include +#include struct pfs_signal { sigset_t set; @@ -48,6 +49,24 @@ static ssize_t read(struct pfs_signal *e return 0; } +#ifdef CONFIG_COMPAT +static ssize_t compat_read(struct pfs_signal *evs, + struct compat_siginfo __user *infoup) +{ + int signo; + siginfo_t info; + + signo = dequeue_signal_lock(evs->task, &evs->set, &info); + if (!signo) + return -EAGAIN; + + if (copy_siginfo_to_user32(infoup, &info)) + return -EFAULT; + + return 0; +} +#endif + static ssize_t write(struct pfs_signal *evs, const sigset_t __user *uset) { sigset_t set; @@ -65,6 +84,28 @@ static ssize_t write(struct pfs_signal * return 0; } +#ifdef CONFIG_COMPAT +static ssize_t compat_write(struct pfs_signal *evs, + const compat_sigset_t __user *uset) +{ + sigset_t set; + compat_sigset_t cset; + + if (copy_from_user(&cset, uset, sizeof(compat_sigset_t))) + return -EFAULT; + + sigset_from_compat(&set, &cset); + sigset_adjust(&set); + + spin_lock_irq(&evs->lock); + sigemptyset(&evs->set); + sigorsets(&evs->set, &evs->set, &set); + spin_unlock_irq(&evs->lock); + + return 0; +} +#endif + static int poll(struct pfs_signal *evs) { int ret = 0; @@ -142,3 +183,47 @@ asmlinkage long sys_plsignal(const sigse return error; } + +#ifdef CONFIG_COMPAT +static const struct pfs_operations compat_signal_ops = { + /* .read= PFS_READ(compat_read, struct pfs_signal, struct compat_siginfo), */ + .write = PFS_WRITE(compat_write, struct pfs_signal, compat_sigset_t), + .poll = PFS_POLL(poll, struct pfs_signal), + .release= PFS_RELEASE(release, struct pfs_signal), + /* .rsize = sizeof(compat_siginfo_t), */ + .wsize = sizeof(sigset_t) +}; + +asmlinkage long compat_plsignal(const compat_sigset_t __user *uset) +{ + long error; + compat_sigset_t cset; + struct pfs_signal *evs; + + if (copy_from_user(&cset, uset, sizeof(compat_sigset_t))) + return -EFAULT; + + evs = kmalloc(sizeof(*evs), GFP_KERNEL); + if (!evs) + return -ENOMEM; + + spin_lock_init(&evs->lock); + + evs->task = current; + get_task_struct(current); + + sigset_from_compat(&evs->set, &cset); + sigset_adjust(&evs->set); + + evs->file.data = evs; + evs->file.fops = &compat_signal_ops; + evs->file.wait = &evs->task->sigwait; + + error = pfs_open(&evs->file); + + if (error < 0) + release(evs); + + return error; +} +#endif -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 01/10] compiler: define __attribute_unused__
On Tue, 2007-05-01 at 21:28 -0700, David Rientjes wrote: > For all supported versions of gcc (major version 3 and above), functions > and variables may be declared with __attribute__((unused)) to suppress > warnings if they are declared but unused. Adding this macro doesn't give us anything that simply saying "__attribute__((unused))" doesn't give. But it does add a layer of kernel-specific indirection. If we're going to get kernel-specific, I'd prefer to see: __needed: suppress warning and don't discard, __unneeded: suppress warning and might discard. For me this fits better with how I think. Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 17/22] pollfs: x86_64, wire up the plfutex system call
Make the plfutex syscall available to user-space on x86_64. Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> --- arch/x86_64/ia32/ia32entry.S |1 + include/asm-x86_64/unistd.h |4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) Index: linux-2.6/arch/x86_64/ia32/ia32entry.S === --- linux-2.6.orig/arch/x86_64/ia32/ia32entry.S +++ linux-2.6/arch/x86_64/ia32/ia32entry.S @@ -721,4 +721,5 @@ ia32_sys_call_table: .quad sys_epoll_pwait .quad sys_plsignal /* 320 */ .quad sys_pltimer + .quad sys_plfutex ia32_syscall_end: Index: linux-2.6/include/asm-x86_64/unistd.h === --- linux-2.6.orig/include/asm-x86_64/unistd.h +++ linux-2.6/include/asm-x86_64/unistd.h @@ -623,8 +623,10 @@ __SYSCALL(__NR_move_pages, sys_move_page __SYSCALL(__NR_plsignal, sys_plsignal) #define __NR_pltimer 281 __SYSCALL(__NR_pltimer, sys_pltimer) +#define __NR_plfutex 282 +__SYSCALL(__NR_plfutex, sys_plfutex) -#define __NR_syscall_max __NR_pltimer +#define __NR_syscall_max __NR_plfutex #ifndef __NO_STUBS #define __ARCH_WANT_OLD_READDIR -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 13/22] pollfs: asynchronous futex wait
Break apart and export the futex_wait function in order to be able to associate (wait for) a futex with other resources. Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> --- include/linux/futex.h | 80 ++ kernel/futex.c| 130 ++ 2 files changed, 118 insertions(+), 92 deletions(-) Index: linux-2.6/kernel/futex.c === --- linux-2.6.orig/kernel/futex.c +++ linux-2.6/kernel/futex.c @@ -55,81 +55,6 @@ #define FUTEX_HASHBITS (CONFIG_BASE_SMALL ? 4 : 8) /* - * Futexes are matched on equal values of this key. - * The key type depends on whether it's a shared or private mapping. - * Don't rearrange members without looking at hash_futex(). - * - * offset is aligned to a multiple of sizeof(u32) (== 4) by definition. - * We set bit 0 to indicate if it's an inode-based key. - */ -union futex_key { - struct { - unsigned long pgoff; - struct inode *inode; - int offset; - } shared; - struct { - unsigned long address; - struct mm_struct *mm; - int offset; - } private; - struct { - unsigned long word; - void *ptr; - int offset; - } both; -}; - -/* - * Priority Inheritance state: - */ -struct futex_pi_state { - /* -* list of 'owned' pi_state instances - these have to be -* cleaned up in do_exit() if the task exits prematurely: -*/ - struct list_head list; - - /* -* The PI object: -*/ - struct rt_mutex pi_mutex; - - struct task_struct *owner; - atomic_t refcount; - - union futex_key key; -}; - -/* - * We use this hashed waitqueue instead of a normal wait_queue_t, so - * we can wake only the relevant ones (hashed queues may be shared). - * - * A futex_q has a woken state, just like tasks have TASK_RUNNING. - * It is considered woken when list_empty(&q->list) || q->lock_ptr == 0. - * The order of wakup is always to make the first condition true, then - * wake up q->waiters, then make the second condition true. - */ -struct futex_q { - struct list_head list; - wait_queue_head_t waiters; - - /* Which hash list lock to use: */ - spinlock_t *lock_ptr; - - /* Key which the futex is hashed on: */ - union futex_key key; - - /* For fd, sigio sent using these: */ - int fd; - struct file *filp; - - /* Optional priority inheritance state: */ - struct futex_pi_state *pi_state; - struct task_struct *task; -}; - -/* * Split the global futex_lock into every hash list lock. */ struct futex_hash_bucket { @@ -904,8 +829,6 @@ queue_lock(struct futex_q *q, int fd, st q->fd = fd; q->filp = filp; - init_waitqueue_head(&q->waiters); - get_key_refs(&q->key); hb = hash_futex(&q->key); q->lock_ptr = &hb->lock; @@ -938,6 +861,7 @@ static void queue_me(struct futex_q *q, { struct futex_hash_bucket *hb; + init_waitqueue_head(&q->waiters); hb = queue_lock(q, fd, filp); __queue_me(q, hb); } @@ -1002,24 +926,22 @@ static void unqueue_me_pi(struct futex_q drop_key_refs(&q->key); } -static int futex_wait(u32 __user *uaddr, u32 val, unsigned long time) +int futex_wait_queue(struct futex_q *q, u32 __user *uaddr, u32 val) { struct task_struct *curr = current; - DECLARE_WAITQUEUE(wait, curr); struct futex_hash_bucket *hb; - struct futex_q q; u32 uval; int ret; - q.pi_state = NULL; + q->pi_state = NULL; retry: down_read(&curr->mm->mmap_sem); - ret = get_futex_key(uaddr, &q.key); + ret = get_futex_key(uaddr, &q->key); if (unlikely(ret != 0)) goto out_release_sem; - hb = queue_lock(&q, -1, NULL); + hb = queue_lock(q, -1, NULL); /* * Access the page AFTER the futex is queued. @@ -1044,7 +966,7 @@ static int futex_wait(u32 __user *uaddr, ret = get_futex_value_locked(&uval, uaddr); if (unlikely(ret)) { - queue_unlock(&q, hb); + queue_unlock(q, hb); /* * If we would have faulted, release mmap_sem, fault it in and @@ -1063,14 +985,37 @@ static int futex_wait(u32 __user *uaddr, goto out_unlock_release_sem; /* Only actually queue if *uaddr contained val. */ - __queue_me(&q, hb); + __queue_me(q, hb); /* * Now the futex is queued and we have checked the data, we -* don't want to hold mmap_sem while we sleep. +* don't want to hold mmap_sem while we (might) sleep. */ up_read(&curr->mm->mmap_sem); + return 0; + + out_unlock_release_sem: + queue_unlock(q, hb); + + out_release_sem: + up_rea
[patch 09/22] pollfs: pollable hrtimers
Per file descriptor high-resolution timers. A classic unix file interface for the POSIX timer_(create|settime|gettime|delete) family of functions. Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> --- fs/pollfs/Makefile |1 fs/pollfs/timer.c | 198 + init/Kconfig |7 + 3 files changed, 206 insertions(+) Index: linux-2.6/fs/pollfs/timer.c === --- /dev/null +++ linux-2.6/fs/pollfs/timer.c @@ -0,0 +1,198 @@ +/* + * pollable timers + * + * Copyright (C) 2007 Davi E. M. Arnaut + * + * Licensed under the GNU GPL. See the file COPYING for details. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +struct pfs_timer { + wait_queue_head_t wait; + ktime_t interval; + spinlock_t lock; + unsigned long overruns; + struct hrtimer timer; + struct pfs_file file; +}; + +struct hrtimerspec { + int flags; + clockid_t clock; + struct itimerspec expr; +}; + +static ssize_t read(struct pfs_timer *evs, struct itimerspec __user *uspec) +{ + int ret = -EAGAIN; + ktime_t remaining = {}; + unsigned long overruns = 0; + struct itimerspec spec = {}; + struct hrtimer *timer = &evs->timer; + + spin_lock_irq(&evs->lock); + + if (!evs->overruns) + goto out_unlock; + + if (hrtimer_active(timer)) + remaining = hrtimer_get_remaining(timer); + else if (evs->interval.tv64 > 0) + overruns = hrtimer_forward(timer, hrtimer_cb_get_time(timer), + evs->interval); + + ret = -EOVERFLOW; + if (overruns > (ULONG_MAX - evs->overruns)) + goto out_unlock; + else + evs->overruns += overruns; + + if (remaining.tv64 > 0) + spec.it_value = ktime_to_timespec(remaining); + + spec.it_interval = ktime_to_timespec(evs->interval); + + ret = 0; + +out_unlock: + spin_unlock_irq(&evs->lock); + + if (ret) + return ret; + + if (copy_to_user(uspec, &spec, sizeof(spec))) + return -EFAULT; + + return 0; +} + +static enum hrtimer_restart timer_fn(struct hrtimer *timer) +{ + struct pfs_timer *evs = container_of(timer, struct pfs_timer, timer); + unsigned long flags; + + spin_lock_irqsave(&evs->lock, flags); + /* timer tick, interval has elapsed */ + if (!evs->overruns++) + wake_up_all(&evs->wait); + spin_unlock_irqrestore(&evs->lock, flags); + + return HRTIMER_NORESTART; +} + +static inline void rearm_timer(struct pfs_timer *evs, struct hrtimerspec *spec) +{ + struct hrtimer *timer = &evs->timer; + enum hrtimer_mode mode = HRTIMER_MODE_REL; + + if (spec->flags & TIMER_ABSTIME) + mode = HRTIMER_MODE_ABS; + + do { + spin_lock_irq(&evs->lock); + if (hrtimer_try_to_cancel(timer) >= 0) + break; + spin_unlock_irq(&evs->lock); + cpu_relax(); + } while (1); + + hrtimer_init(timer, spec->clock, mode); + + timer->function = timer_fn; + timer->expires = timespec_to_ktime(spec->expr.it_value); + evs->interval = timespec_to_ktime(spec->expr.it_interval); + + if (timer->expires.tv64) + hrtimer_start(timer, timer->expires, mode); + + spin_unlock_irq(&evs->lock); +} + +static inline int spec_invalid(const struct hrtimerspec *spec) +{ + if (spec->clock != CLOCK_REALTIME && spec->clock != CLOCK_MONOTONIC) + return 1; + + if (!timespec_valid(&spec->expr.it_value) || + !timespec_valid(&spec->expr.it_interval)) + return 1; + + return 0; +} + +static ssize_t write(struct pfs_timer *evs, +const struct hrtimerspec __user *uspec) +{ + struct hrtimerspec spec; + + if (copy_from_user(&spec, uspec, sizeof(spec))) + return -EFAULT; + + if (spec_invalid(&spec)) + return -EINVAL; + + rearm_timer(evs, &spec); + + return 0; +} + +static int poll(struct pfs_timer *evs) +{ + int ret; + + ret = evs->overruns ? POLLIN : 0; + + return ret; +} + +static int release(struct pfs_timer *evs) +{ + hrtimer_cancel(&evs->timer); + kfree(evs); + + return 0; +} + +static const struct pfs_operations timer_ops = { + .read = PFS_READ(read, struct pfs_timer, struct itimerspec), + .write = PFS_WRITE(write, struct pfs_timer, struct hrtimerspec), + .poll = PFS_POLL(poll, struct pfs_timer), + .release = PFS_RELEASE(release, struct pfs_timer), + .rsize = sizeof(struct itimerspec), + .wsize = sizeof(struct hrtimerspec), +}; + +asmlinkage long sys_pltimer(void) +{ + long error; + struct pf
[patch 08/22] pollfs: x86_64, wire up the plsignal system call
Make the plsignal syscall available to user-space on x86_64. Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> --- arch/x86_64/ia32/ia32entry.S |3 ++- include/asm-x86_64/unistd.h |4 +++- 2 files changed, 5 insertions(+), 2 deletions(-) Index: linux-2.6/include/asm-x86_64/unistd.h === --- linux-2.6.orig/include/asm-x86_64/unistd.h +++ linux-2.6/include/asm-x86_64/unistd.h @@ -619,8 +619,10 @@ __SYSCALL(__NR_sync_file_range, sys_sync __SYSCALL(__NR_vmsplice, sys_vmsplice) #define __NR_move_pages279 __SYSCALL(__NR_move_pages, sys_move_pages) +#define __NR_plsignal 280 +__SYSCALL(__NR_plsignal, sys_plsignal) -#define __NR_syscall_max __NR_move_pages +#define __NR_syscall_max __NR_plsignal #ifndef __NO_STUBS #define __ARCH_WANT_OLD_READDIR Index: linux-2.6/arch/x86_64/ia32/ia32entry.S === --- linux-2.6.orig/arch/x86_64/ia32/ia32entry.S +++ linux-2.6/arch/x86_64/ia32/ia32entry.S @@ -714,9 +714,10 @@ ia32_sys_call_table: .quad compat_sys_get_robust_list .quad sys_splice .quad sys_sync_file_range - .quad sys_tee + .quad sys_tee /* 315 */ .quad compat_sys_vmsplice .quad compat_sys_move_pages .quad sys_getcpu .quad sys_epoll_pwait + .quad sys_plsignal /* 320 */ ia32_syscall_end: -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 03/22] pollfs: asynchronously wait for a signal
Add a wait queue to the task_struct in order to be able to associate (wait for) a signal with other resources. Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> --- include/linux/init_task.h |1 + include/linux/sched.h |1 + kernel/fork.c |1 + kernel/signal.c |5 + 4 files changed, 8 insertions(+) Index: linux-2.6/include/linux/sched.h === --- linux-2.6.orig/include/linux/sched.h +++ linux-2.6/include/linux/sched.h @@ -939,6 +939,7 @@ struct task_struct { sigset_t blocked, real_blocked; sigset_t saved_sigmask; /* To be restored with TIF_RESTORE_SIGMASK */ struct sigpending pending; + wait_queue_head_t sigwait; unsigned long sas_ss_sp; size_t sas_ss_size; Index: linux-2.6/include/linux/init_task.h === --- linux-2.6.orig/include/linux/init_task.h +++ linux-2.6/include/linux/init_task.h @@ -134,6 +134,7 @@ extern struct group_info init_groups; .list = LIST_HEAD_INIT(tsk.pending.list), \ .signal = {{0}}}, \ .blocked= {{0}},\ + .sigwait= __WAIT_QUEUE_HEAD_INITIALIZER(tsk.sigwait), \ .alloc_lock = __SPIN_LOCK_UNLOCKED(tsk.alloc_lock), \ .journal_info = NULL, \ .cpu_timers = INIT_CPU_TIMERS(tsk.cpu_timers), \ Index: linux-2.6/kernel/fork.c === --- linux-2.6.orig/kernel/fork.c +++ linux-2.6/kernel/fork.c @@ -1034,6 +1034,7 @@ static struct task_struct *copy_process( clear_tsk_thread_flag(p, TIF_SIGPENDING); init_sigpending(&p->pending); + init_waitqueue_head(&p->sigwait); p->utime = cputime_zero; p->stime = cputime_zero; Index: linux-2.6/kernel/signal.c === --- linux-2.6.orig/kernel/signal.c +++ linux-2.6/kernel/signal.c @@ -224,6 +224,8 @@ fastcall void recalc_sigpending_tsk(stru set_tsk_thread_flag(t, TIF_SIGPENDING); else clear_tsk_thread_flag(t, TIF_SIGPENDING); + + wake_up_interruptible_sync(&t->sigwait); } void recalc_sigpending(void) @@ -759,6 +761,7 @@ static int send_signal(int sig, struct s info->si_code >= 0))); if (q) { list_add_tail(&q->list, &signals->list); + wake_up_interruptible_sync(&t->sigwait); switch ((unsigned long) info) { case (unsigned long) SEND_SIG_NOINFO: q->info.si_signo = sig; @@ -1404,6 +1407,7 @@ int send_sigqueue(int sig, struct sigque list_add_tail(&q->list, &p->pending.list); sigaddset(&p->pending.signal, sig); + wake_up_interruptible_sync(&p->sigwait); if (!sigismember(&p->blocked, sig)) signal_wake_up(p, sig == SIGKILL); @@ -1453,6 +1457,7 @@ send_group_sigqueue(int sig, struct sigq list_add_tail(&q->list, &p->signal->shared_pending.list); sigaddset(&p->signal->shared_pending.signal, sig); + wake_up_interruptible_sync(&p->sigwait); __group_complete_signal(sig, p); out: spin_unlock_irqrestore(&p->sighand->siglock, flags); -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 21/22] pollfs: x86, wire up the plaio system call
Make the plaio syscall available to user-space on x86. Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> --- arch/i386/kernel/syscall_table.S |1 + include/asm-i386/unistd.h|3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) Index: linux-2.6/include/asm-i386/unistd.h === --- linux-2.6.orig/include/asm-i386/unistd.h +++ linux-2.6/include/asm-i386/unistd.h @@ -328,10 +328,11 @@ #define __NR_plsignal 320 #define __NR_pltimer 321 #define __NR_plfutex 322 +#define __NR_plaio 323 #ifdef __KERNEL__ -#define NR_syscalls 323 +#define NR_syscalls 324 #define __ARCH_WANT_IPC_PARSE_VERSION #define __ARCH_WANT_OLD_READDIR Index: linux-2.6/arch/i386/kernel/syscall_table.S === --- linux-2.6.orig/arch/i386/kernel/syscall_table.S +++ linux-2.6/arch/i386/kernel/syscall_table.S @@ -322,3 +322,4 @@ ENTRY(sys_call_table) .long sys_plsignal /* 320 */ .long sys_pltimer .long sys_plfutex + .long sys_plaio -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 19/22] pollfs: pollable aio
Submit, retrieve, or poll aio requests for completion through a file descriptor. User supplies a aio_context_t that is used to fetch a reference to the kioctx. Once the file descriptor is closed, the reference is decremented. Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> --- fs/pollfs/Makefile |1 fs/pollfs/aio.c| 103 + init/Kconfig |7 +++ 3 files changed, 111 insertions(+) Index: linux-2.6/fs/pollfs/Makefile === --- linux-2.6.orig/fs/pollfs/Makefile +++ linux-2.6/fs/pollfs/Makefile @@ -4,3 +4,4 @@ pollfs-y := file.o pollfs-$(CONFIG_POLLFS_SIGNAL) += signal.o pollfs-$(CONFIG_POLLFS_TIMER) += timer.o pollfs-$(CONFIG_POLLFS_FUTEX) += futex.o +pollfs-$(CONFIG_POLLFS_AIO) += aio.o Index: linux-2.6/fs/pollfs/aio.c === --- /dev/null +++ linux-2.6/fs/pollfs/aio.c @@ -0,0 +1,103 @@ +/* + * pollable aio + * + * Copyright (C) 2007 Davi E. M. Arnaut + * + * Licensed under the GNU GPL. See the file COPYING for details. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +struct pfs_aio { + struct kioctx *ioctx; + struct pfs_file file; +}; + +static ssize_t read(struct pfs_aio *evs, struct io_event __user *uioevt) +{ + int ret; + + ret = sys_io_getevents(evs->ioctx->user_id, 0, 1, uioevt, NULL); + + if (!ret) + ret = -EAGAIN; + else if (ret > 0) + ret = 0; + + return ret; +} + +static ssize_t write(struct pfs_aio *evs, const struct iocb __user *uiocb) +{ + struct iocb iocb; + + if (copy_from_user(&iocb, uiocb, sizeof(iocb))) + return -EFAULT; + + return io_submit_one(evs->ioctx, uiocb, &iocb); +} + +static int poll(struct pfs_aio *evs) +{ + int ret; + + ret = aio_ring_empty(evs->ioctx) ? 0 : POLLIN; + + return ret; +} + +static int release(struct pfs_aio *evs) +{ + put_ioctx(evs->ioctx); + + kfree(evs); + + return 0; +} + +static const struct pfs_operations aio_ops = { + .read = PFS_READ(read, struct pfs_aio, struct io_event), + .write = PFS_WRITE(write, struct pfs_aio, struct iocb), + .poll = PFS_POLL(poll, struct pfs_aio), + .release = PFS_RELEASE(release, struct pfs_aio), + .rsize = sizeof(struct io_event), + .wsize = sizeof(struct iocb), +}; + +asmlinkage long sys_plaio(aio_context_t ctx) +{ + long error; + struct pfs_aio *evs; + struct kioctx *ioctx = lookup_ioctx(ctx); + + if (!ioctx) + return -EINVAL; + + evs = kzalloc(sizeof(*evs), GFP_KERNEL); + if (!evs) { + put_ioctx(ioctx); + return -ENOMEM; + } + + evs->ioctx = ioctx; + + evs->file.data = evs; + evs->file.fops = &aio_ops; + evs->file.wait = &ioctx->wait; + + error = pfs_open(&evs->file); + + if (error < 0) + release(evs); + + return error; +} Index: linux-2.6/init/Kconfig === --- linux-2.6.orig/init/Kconfig +++ linux-2.6/init/Kconfig @@ -490,6 +490,13 @@ config POLLFS_FUTEX help Pollable futex support +config POLLFS_AIO + bool "Enable pollfs aio" if EMBEDDED + default y + depends on POLLFS + help +Pollable aio support + config SHMEM bool "Use full shmem filesystem" if EMBEDDED default y -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 10/22] pollfs: export the pltimer system call
Export the new pltimer syscall prototype. While there, make it "conditional". Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> --- include/linux/syscalls.h |2 ++ kernel/sys_ni.c |1 + 2 files changed, 3 insertions(+) Index: linux-2.6/include/linux/syscalls.h === --- linux-2.6.orig/include/linux/syscalls.h +++ linux-2.6/include/linux/syscalls.h @@ -607,4 +607,6 @@ int kernel_execve(const char *filename, asmlinkage long sys_plsignal(const sigset_t __user * set); +asmlinkage long sys_pltimer(void); + #endif Index: linux-2.6/kernel/sys_ni.c === --- linux-2.6.orig/kernel/sys_ni.c +++ linux-2.6/kernel/sys_ni.c @@ -113,6 +113,7 @@ cond_syscall(sys_vm86); cond_syscall(compat_sys_ipc); cond_syscall(compat_sys_sysctl); cond_syscall(sys_plsignal); +cond_syscall(sys_pltimer); /* arch-specific weak syscall entries */ cond_syscall(sys_pciconfig_read); -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SOME STUFF ABOUT REISER4 To Mr Hopper
From: [EMAIL PROTECTED] Date: Tue, 01 May 2007 21:55:59 -0700 > Hi Jeff, it seems that lkml has contacted both of my email accounts and > cripped them. Actually we aren't blocking your email address, rather we are blocking emails with lots of caps in them because that is what small children use when they first start using a computer. So if you stop using caps lock so much, you postings might start going through. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 07/22] pollfs: x86, wire up the plsignal system call
Make the plsignal syscall available to user-space on x86. Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> --- arch/i386/kernel/syscall_table.S |1 + include/asm-i386/unistd.h|3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) Index: linux-2.6/include/asm-i386/unistd.h === --- linux-2.6.orig/include/asm-i386/unistd.h +++ linux-2.6/include/asm-i386/unistd.h @@ -325,10 +325,11 @@ #define __NR_move_pages317 #define __NR_getcpu318 #define __NR_epoll_pwait 319 +#define __NR_plsignal 320 #ifdef __KERNEL__ -#define NR_syscalls 320 +#define NR_syscalls 321 #define __ARCH_WANT_IPC_PARSE_VERSION #define __ARCH_WANT_OLD_READDIR Index: linux-2.6/arch/i386/kernel/syscall_table.S === --- linux-2.6.orig/arch/i386/kernel/syscall_table.S +++ linux-2.6/arch/i386/kernel/syscall_table.S @@ -319,3 +319,4 @@ ENTRY(sys_call_table) .long sys_move_pages .long sys_getcpu .long sys_epoll_pwait + .long sys_plsignal /* 320 */ -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 16/22] pollfs: x86, wire up the plfutex system call
Make the plfutex syscall available to user-space on x86. Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> --- arch/i386/kernel/syscall_table.S |1 + include/asm-i386/unistd.h|3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) Index: linux-2.6/include/asm-i386/unistd.h === --- linux-2.6.orig/include/asm-i386/unistd.h +++ linux-2.6/include/asm-i386/unistd.h @@ -327,10 +327,11 @@ #define __NR_epoll_pwait 319 #define __NR_plsignal 320 #define __NR_pltimer 321 +#define __NR_plfutex 322 #ifdef __KERNEL__ -#define NR_syscalls 322 +#define NR_syscalls 323 #define __ARCH_WANT_IPC_PARSE_VERSION #define __ARCH_WANT_OLD_READDIR Index: linux-2.6/arch/i386/kernel/syscall_table.S === --- linux-2.6.orig/arch/i386/kernel/syscall_table.S +++ linux-2.6/arch/i386/kernel/syscall_table.S @@ -321,3 +321,4 @@ ENTRY(sys_call_table) .long sys_epoll_pwait .long sys_plsignal /* 320 */ .long sys_pltimer + .long sys_plfutex -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 18/22] pollfs: check if a AIO event ring is empty
The aio_ring_empty() function returns true if the AIO event ring has no elements, false otherwise. Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> --- fs/aio.c| 17 + include/linux/aio.h |1 + 2 files changed, 18 insertions(+) Index: linux-2.6/fs/aio.c === --- linux-2.6.orig/fs/aio.c +++ linux-2.6/fs/aio.c @@ -1004,6 +1004,23 @@ put_rq: return ret; } +int fastcall aio_ring_empty(struct kioctx *ioctx) +{ + struct aio_ring_info *info = &ioctx->ring_info; + struct aio_ring *ring; + unsigned long flags; + int ret = 0; + + spin_lock_irqsave(&ioctx->ctx_lock, flags); + ring = kmap_atomic(info->ring_pages[0], KM_IRQ1); + if (ring->head == ring->tail) + ret = 1; + kunmap_atomic(ring, KM_IRQ1); + spin_unlock_irqrestore(&ioctx->ctx_lock, flags); + + return ret; +} + /* aio_read_evt * Pull an event off of the ioctx's event ring. Returns the number of * events fetched (0 or 1 ;-) Index: linux-2.6/include/linux/aio.h === --- linux-2.6.orig/include/linux/aio.h +++ linux-2.6/include/linux/aio.h @@ -202,6 +202,7 @@ extern unsigned aio_max_size; extern ssize_t FASTCALL(wait_on_sync_kiocb(struct kiocb *iocb)); extern int FASTCALL(aio_put_req(struct kiocb *iocb)); +extern int FASTCALL(aio_ring_empty(struct kioctx *ioctx)); extern void FASTCALL(kick_iocb(struct kiocb *iocb)); extern int FASTCALL(aio_complete(struct kiocb *iocb, long res, long res2)); extern void FASTCALL(__put_ioctx(struct kioctx *ctx)); -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 11/22] pollfs: x86, wire up the pltimer system call
Make the pltimer syscall available to user-space on x86. Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> --- arch/i386/kernel/syscall_table.S |1 + include/asm-i386/unistd.h|3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) Index: linux-2.6/include/asm-i386/unistd.h === --- linux-2.6.orig/include/asm-i386/unistd.h +++ linux-2.6/include/asm-i386/unistd.h @@ -326,10 +326,11 @@ #define __NR_getcpu318 #define __NR_epoll_pwait 319 #define __NR_plsignal 320 +#define __NR_pltimer 321 #ifdef __KERNEL__ -#define NR_syscalls 321 +#define NR_syscalls 322 #define __ARCH_WANT_IPC_PARSE_VERSION #define __ARCH_WANT_OLD_READDIR Index: linux-2.6/arch/i386/kernel/syscall_table.S === --- linux-2.6.orig/arch/i386/kernel/syscall_table.S +++ linux-2.6/arch/i386/kernel/syscall_table.S @@ -320,3 +320,4 @@ ENTRY(sys_call_table) .long sys_getcpu .long sys_epoll_pwait .long sys_plsignal /* 320 */ + .long sys_pltimer -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 14/22] pollfs: pollable futex
Asynchronously wait for FUTEX_WAKE operation on a futex if it still contains a given value. There can be only one futex wait per file descriptor. However, it can be rearmed (possibly at a different address) anytime. The pollable futex approach is far superior (send and receive events from userspace or kernel) to eventfd and fixes (supercedes) FUTEX_FD at the same time. Building block for pollable semaphores and user-defined events. Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> --- fs/pollfs/Makefile |1 fs/pollfs/futex.c | 154 + init/Kconfig |7 ++ 3 files changed, 162 insertions(+) Index: linux-2.6/fs/pollfs/Makefile === --- linux-2.6.orig/fs/pollfs/Makefile +++ linux-2.6/fs/pollfs/Makefile @@ -3,3 +3,4 @@ pollfs-y := file.o pollfs-$(CONFIG_POLLFS_SIGNAL) += signal.o pollfs-$(CONFIG_POLLFS_TIMER) += timer.o +pollfs-$(CONFIG_POLLFS_FUTEX) += futex.o Index: linux-2.6/fs/pollfs/futex.c === --- /dev/null +++ linux-2.6/fs/pollfs/futex.c @@ -0,0 +1,154 @@ +/* + * pollable futex + * + * Copyright (C) 2007 Davi E. M. Arnaut + * + * Licensed under the GNU GPL. See the file COPYING for details. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +struct futex_event { + union { + void __user *addr; + u64 padding; + }; + int val; +}; + +struct pfs_futex { + struct futex_q q; + struct futex_event fevt; + struct mutex mutex; + unsigned volatile queued; + struct pfs_file file; +}; + +static ssize_t read(struct pfs_futex *evs, struct futex_event __user *ufevt) +{ + int ret; + struct futex_event fevt; + + mutex_lock(&evs->mutex); + + fevt = evs->fevt; + + ret = -EAGAIN; + + if (!evs->queued) + ret = -EINVAL; + else if (list_empty(&evs->q.list)) + ret = futex_wait_unqueue(&evs->q); + + switch (ret) { + case 1: + ret = -EAGAIN; + case 0: + evs->queued = 0; + } + + mutex_unlock(&evs->mutex); + + if (ret < 0) + return ret; + + if (copy_to_user(ufevt, &fevt, sizeof(fevt))) + return -EFAULT; + + return 0; +} + +static ssize_t write(struct pfs_futex *evs, +const struct futex_event __user *ufevt) +{ + int ret; + struct futex_event fevt; + + if (copy_from_user(&fevt, ufevt, sizeof(fevt))) + return -EFAULT; + + mutex_lock(&evs->mutex); + + if (evs->queued) + futex_wait_unqueue(&evs->q); + + ret = futex_wait_queue(&evs->q, fevt.addr, fevt.val); + + if (ret) + evs->queued = 0; + else { + evs->queued = 1; + evs->fevt = fevt; + } + + mutex_unlock(&evs->mutex); + + return ret; +} + +static int poll(struct pfs_futex *evs) +{ + int ret; + + while (!mutex_trylock(&evs->mutex)) + cpu_relax(); + + ret = evs->queued && list_empty(&evs->q.list) ? POLLIN : 0; + + mutex_unlock(&evs->mutex); + + return ret; +} + +static int release(struct pfs_futex *evs) +{ + if (evs->queued) + futex_wait_unqueue(&evs->q); + + mutex_destroy(&evs->mutex); + + kfree(evs); + + return 0; +} + +static const struct pfs_operations futex_ops = { + .read = PFS_READ(read, struct pfs_futex, struct futex_event), + .write = PFS_WRITE(write, struct pfs_futex, struct futex_event), + .poll = PFS_POLL(poll, struct pfs_futex), + .release = PFS_RELEASE(release, struct pfs_futex), + .rsize = sizeof(struct futex_event), + .wsize = sizeof(struct futex_event), +}; + +asmlinkage long sys_plfutex(void) +{ + long error; + struct pfs_futex *evs; + + evs = kzalloc(sizeof(*evs), GFP_KERNEL); + if (!evs) + return -ENOMEM; + + mutex_init(&evs->mutex); + init_waitqueue_head(&evs->q.waiters); + + evs->file.data = evs; + evs->file.fops = &futex_ops; + evs->file.wait = &evs->q.waiters; + + error = pfs_open(&evs->file); + + if (error < 0) + release(evs); + + return error; +} Index: linux-2.6/init/Kconfig === --- linux-2.6.orig/init/Kconfig +++ linux-2.6/init/Kconfig @@ -483,6 +483,13 @@ config POLLFS_TIMER help Pollable timer support +config POLLFS_FUTEX + bool "Enable pollfs futex" if EMBEDDED + default y + depends on POLLFS && FUTEX + help +Pollable futex support + config SHMEM bool "Use full shmem filesystem" if EMBEDDED default y -- - To unsubscribe from this list: send the line "unsub
[patch 15/22] pollfs: export the plfutex system call
Export the new plfutex syscall prototype. While there, make it "conditional". Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> --- include/linux/syscalls.h |2 ++ kernel/sys_ni.c |1 + 2 files changed, 3 insertions(+) Index: linux-2.6/include/linux/syscalls.h === --- linux-2.6.orig/include/linux/syscalls.h +++ linux-2.6/include/linux/syscalls.h @@ -609,4 +609,6 @@ asmlinkage long sys_plsignal(const sigse asmlinkage long sys_pltimer(void); +asmlinkage long sys_plfutex(void); + #endif Index: linux-2.6/kernel/sys_ni.c === --- linux-2.6.orig/kernel/sys_ni.c +++ linux-2.6/kernel/sys_ni.c @@ -114,6 +114,7 @@ cond_syscall(compat_sys_ipc); cond_syscall(compat_sys_sysctl); cond_syscall(sys_plsignal); cond_syscall(sys_pltimer); +cond_syscall(sys_plfutex); /* arch-specific weak syscall entries */ cond_syscall(sys_pciconfig_read); -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 22/22] pollfs: x86_64, wire up the plaio system call
Make the plaio syscall available to user-space on x86_64. Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> --- arch/x86_64/ia32/ia32entry.S |1 + include/asm-x86_64/unistd.h |4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) Index: linux-2.6/arch/x86_64/ia32/ia32entry.S === --- linux-2.6.orig/arch/x86_64/ia32/ia32entry.S +++ linux-2.6/arch/x86_64/ia32/ia32entry.S @@ -722,4 +722,5 @@ ia32_sys_call_table: .quad sys_plsignal /* 320 */ .quad sys_pltimer .quad sys_plfutex + .quad sys_plaio ia32_syscall_end: Index: linux-2.6/include/asm-x86_64/unistd.h === --- linux-2.6.orig/include/asm-x86_64/unistd.h +++ linux-2.6/include/asm-x86_64/unistd.h @@ -625,8 +625,10 @@ __SYSCALL(__NR_plsignal, sys_plsignal) __SYSCALL(__NR_pltimer, sys_pltimer) #define __NR_plfutex 282 __SYSCALL(__NR_plfutex, sys_plfutex) +#define __NR_plaio 283 +__SYSCALL(__NR_plaio, sys_plaio) -#define __NR_syscall_max __NR_plfutex +#define __NR_syscall_max __NR_plaio #ifndef __NO_STUBS #define __ARCH_WANT_OLD_READDIR -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 20/22] pollfs: export the plaio system call
Export the new plaio syscall prototype. While there, make it "conditional". Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> --- include/linux/syscalls.h |2 ++ kernel/sys_ni.c |1 + 2 files changed, 3 insertions(+) Index: linux-2.6/include/linux/syscalls.h === --- linux-2.6.orig/include/linux/syscalls.h +++ linux-2.6/include/linux/syscalls.h @@ -611,4 +611,6 @@ asmlinkage long sys_pltimer(void); asmlinkage long sys_plfutex(void); +asmlinkage long sys_plaio(aio_context_t ctx); + #endif Index: linux-2.6/kernel/sys_ni.c === --- linux-2.6.orig/kernel/sys_ni.c +++ linux-2.6/kernel/sys_ni.c @@ -115,6 +115,7 @@ cond_syscall(compat_sys_sysctl); cond_syscall(sys_plsignal); cond_syscall(sys_pltimer); cond_syscall(sys_plfutex); +cond_syscall(sys_plaio); /* arch-specific weak syscall entries */ cond_syscall(sys_pciconfig_read); -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 12/22] pollfs: x86_64, wire up the pltimer system call
Make the pltimer syscall available to user-space on x86_64. Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> --- arch/x86_64/ia32/ia32entry.S |1 + include/asm-x86_64/unistd.h |4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) Index: linux-2.6/arch/x86_64/ia32/ia32entry.S === --- linux-2.6.orig/arch/x86_64/ia32/ia32entry.S +++ linux-2.6/arch/x86_64/ia32/ia32entry.S @@ -720,4 +720,5 @@ ia32_sys_call_table: .quad sys_getcpu .quad sys_epoll_pwait .quad sys_plsignal /* 320 */ + .quad sys_pltimer ia32_syscall_end: Index: linux-2.6/include/asm-x86_64/unistd.h === --- linux-2.6.orig/include/asm-x86_64/unistd.h +++ linux-2.6/include/asm-x86_64/unistd.h @@ -621,8 +621,10 @@ __SYSCALL(__NR_vmsplice, sys_vmsplice) __SYSCALL(__NR_move_pages, sys_move_pages) #define __NR_plsignal 280 __SYSCALL(__NR_plsignal, sys_plsignal) +#define __NR_pltimer 281 +__SYSCALL(__NR_pltimer, sys_pltimer) -#define __NR_syscall_max __NR_plsignal +#define __NR_syscall_max __NR_pltimer #ifndef __NO_STUBS #define __ARCH_WANT_OLD_READDIR -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 01/22] pollfs: kernel-side API header
Add pollfs_fs.h header which contains the kernel-side declarations and auxiliary macros for type safety checks. Those macros can be simplified later. Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]> --- include/linux/pollfs_fs.h | 57 ++ 1 file changed, 57 insertions(+) Index: linux-2.6/include/linux/pollfs_fs.h === --- /dev/null +++ linux-2.6/include/linux/pollfs_fs.h @@ -0,0 +1,57 @@ +/* + * pollfs, a naive filesystem for pollable (waitable) files (objects) + * + * Copyright (C) 2007 Davi E. M. Arnaut + * + */ + +#ifndef _LINUX_POLL_FS_H +#define _LINUX_POLL_FS_H + +#ifdef __KERNEL__ + +#include +#include +#include + +#define PFS_CHECK_CALLBACK_1(f, a) (void*) \ + (sizeof((f)((typeof(a *))0))) + +#define PFS_CHECK_CALLBACK_2(f, a, b) (void*) \ + (sizeof((f)((typeof(a *))0, (typeof(b*))0))) + +#define PFS_WRITE(func, type, utype) \ + (ssize_t (*)(void *, const void __user *)) \ + (0 ? PFS_CHECK_CALLBACK_2(func, type, utype) : func) + +#define PFS_READ(func, type, utype)\ + (ssize_t (*)(void *, void __user *))\ + (0 ? PFS_CHECK_CALLBACK_2(func, type, utype) : func) + +#define PFS_POLL(func, type) \ + (int (*)(void *))(0 ? PFS_CHECK_CALLBACK_1(func, type) : func) + +#define PFS_RELEASE(func, type) \ + (int (*)(void *))(0 ? PFS_CHECK_CALLBACK_1(func, type) : func) + +struct pfs_operations { + ssize_t (*read)(void *, void __user *); + ssize_t (*write)(void *, const void __user *); + int (*mmap)(void *, struct vm_area_struct *); + int (*poll)(void *); + int (*release)(void *); + size_t rsize; + size_t wsize; +}; + +struct pfs_file { + void *data; + wait_queue_head_t *wait; + const struct pfs_operations *fops; +}; + +long pfs_open(struct pfs_file *pfs); + +#endif /* __KERNEL __ */ + +#endif /* _LINUX_POLLFS_FS_H */ -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 00/22] pollfs: filesystem abstraction for pollable objects
This patch set introduces a new file system for the delivery of pollable events through file descriptors. To the detriment of debugability, pollable objects are a nice adjunct to nonblocking/epoll/event-based servers. The pollfs filesystem abstraction provides better mechanisms needed for creating and maintaining pollable objects. Also the pollable futex approach is far superior (send and receive events from userspace or kernel) to eventfd and fixes (supercedes) FUTEX_FD at the same time. The (non) blocking and object size (user <-> kernel) semantics and are handled internally, decoupling the core filesystem from the "subsystems" (mere push and pop operations). Currently implemented waitable "objects" are: signals, futexes, ai/o blocks and timers. More details at each patch. http://haxent.com/~davi/pollfs/ Comments are welcome. -- Davi Arnaut - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v8
On Tue, May 01, 2007 at 10:57:14PM -0400, Ting Yang wrote: > Authors of this paper proposed a scheduler: Earlist Eligible Virtual > Deadline First (EEVDF). EEVDF uses exactly the same method as CFS to > track the execution of each running task. The only difference between > EEVDF and CFS is that EEVDF tries to _deadline_ fair while CFS is > _start-time_ fair. Scheduling based on deadline gives better reponse > time bound and seems to more fair. > In the following part of this email, I will try to explain the > similarities and differences between EEVDF and CFS. Hopefully, this > might provide you with some useful information w.r.t your current work > on CFS. Any chance you could write a patch to convert CFS to EEVDF? People may have an easier time understanding code than theoretical explanations. (I guess I could do it if sufficiently pressed.) -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources
On Tuesday, May 01, 2007, Jesse Barnes wrote: > > I'm testing it now on my 965... > > Bah... nevermind Robert, I see you're doing this already in > pci_mmcfg_reject_broken. I'm about to reboot & test now. Ok, I've tested a bit on my 965 (after re-adding my old patch to support it) and the new checks are more complete, but my BIOS still appears to be buggy. The extended config space (as defined by the register) is at 0xf000 (full value is 0xf003 indicating 128M enabled). The ACPI MCFG table has this space reserved according to Robert's new code, but the machine hangs due to the address space aliasing Olivier mentioned awhile back. I don't have a PCIe card to test with (or any devices that require extended config space that I know of) so I can't really tell if Windows supports PCIe on this platform, but if it does I don't see how it would w/o having a full bridge driver and sophisticated address space allocation builtin. I'm going to try updating my BIOS, but if that doesn't solve this problem, I'm not sure what we can do about it. Should pci_mmcfg_insert_resources check for conflicts? Should we just blacklist certain boards? I can try pinging our BIOS folks about this board to see what was intended, but I'm sure this won't be the only board we have problems with, so we'll need to address it generically somehow. Thanks, Jesse - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH]: linux-2.6.21-uc0 (MMU-less updates)
Hi All, An update of the uClinux (MMU-less) code against 2.6.21. A lot of cleanups, and a few bug fixes. Ahead is more changes to finalize platform device support for the new style ColdFire serial driver, and switching to the generic irq code. http://www.uclinux.org/pub/uClinux/uClinux-2.6.x/linux-2.6.21-uc0.patch.gz Change log: . Arctururs UC5272 and UC5282 board supportDavid Wu . use THREAD_SIZE for stack manipulation Philippe De Muyter . remove dead code from setup.cGreg Ungerer . remove dead cache code from mm Greg Ungerer . remove useless is_in_rom() Greg Ungerer . consolidate fixed bootparam code Greg Ungerer . no need to preserve THREAD_SR in resume Philippe De Muyter . implement irq_regs in interrupt service Greg Ungerer . remove machine specific irq code Greg Ungerer . fix timer step count for ColdFirePhilippe De Muyter . add chip select mappings for cobra5329 Thomas Brinker . remove old machine specific clock definesGreg Ungerer . improve readability of fec driver code Philippe De Muyter . do not read ICR before writing in fec driver Philippe De Muyter . fix INIT_WORK usage in fec driverGreg Ungerer . remove legacy PM code in 68328 serial driver Greg Ungerer . fix errno reporting in binfmt_flat loaderPhilippe De Muyter . create hw_irq.h for m68knommuGreg Ungerer . fix CLOCK_TICK_RATE for m68knommuPhilippe De Muyter . add expand_stack() funtcion to nommu Greg Ungerer . move to platform device setup for 520x Greg Ungerer . move to platform device setup for 5249 Greg Ungerer . new style serial driver for ColdFire UARTGreg Ungerer . add QSPI defines for 528x ColdFire parts David Wu . improve SoC device defines for 523x ColdFire Thomas Brinker Regards Greg Greg Ungerer -- Chief Software Dude EMAIL: [EMAIL PROTECTED] SnapGear -- a division of Secure Computing PHONE: +61 7 3435 2888 825 Stanley St, FAX: +61 7 3891 3630 Woolloongabba, QLD, 4102, Australia WEB: http://www.SnapGear.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 18/36] Use menuconfig objects II - MMC
Jan Engelhardt wrote: > > If it works, no problem. just put your sign-off somewhere > and let Andrew (or the appropriate subsys maintainer) have it :) > Well, the appropriate subsys maintainer would be me. :) Has it been decided that this is the way to go? I have no strong feelings either way. Rgds -- -- Pierre Ossman Linux kernel, MMC maintainerhttp://www.kernel.org PulseAudio, core developer http://pulseaudio.org rdesktop, core developer http://www.rdesktop.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
keyboard and mouse regresions as of dc87c398 plus cfs-v7
I just tried out git as of commit dc87c398 plus Ingo's cfs v6 patch. I'll try out w/o the patch to confirm later this morning. The kernel log shows no indication of a problem, but the (ps/2) mice do not work, and the keyboard repeat is *very* slow, and will not speed up if changed with xset(1x). >From dmesg(1): [ 25.448092] PNP: PS/2 Controller [PNP0303:KBC,PNP0f13:PS2M] at 0x60,0x64 irq 1,12 [ 25.460470] serio: i8042 KBD port at 0x60,0x64 irq 1 [ 25.466816] serio: i8042 AUX port at 0x60,0x64 irq 12 [ 25.473262] mice: PS/2 mouse device common for all mice [ 25.484605] input: AT Translated Set 2 keyboard as /class/input/input3 [ 25.497339] input: PC Speaker as /class/input/input4 but no /class/input entry gets generated for the mice. /proc/interupts shows 5 ints for irq 12, and that does not increment no matter what is done to the trackpoint, syn pad or external ps/2 mouse. My last known good version was 0f851021c0f91e5073fa89f26b5ac68e23df8e11 plus the rt patch. To get dc87c398 plus cfs-v7 I cloned, checked out v2.6.21, applied the cfs-v7 patch and then pulled in master. -JimC -- James Cloos <[EMAIL PROTECTED]> OpenPGP: 1024D/ED7DAEA6 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 01/10] compiler: define __attribute_unused__
On Tue, May 01, 2007 at 09:28:18PM -0700, David Rientjes wrote: > +#define __attribute_unused__ __attribute__((unused)) Suggest __unused which is shorter and looks compiler-neutral. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v8
Hi Ting, On Tue, May 01, 2007 at 10:57:14PM -0400, Ting Yang wrote: > > Hi, Ingo > > My name is Ting Yang, a graduate student from UMASS. I am currently > studying the linux scheduler and virtual memory manager to solve some > page swapping problems. I am very excited with the new scheduler CFS. > After I read through your code, I think that you might be interested in > reading this paper: > > "A Proportional Share REsource Allocation Algorithm for Real-Time, > Time-Shared Systems", by Ion Stoica. You can find the paper here: > http://citeseer.ist.psu.edu/37752.html > > Authors of this paper proposed a scheduler: Earlist Eligible Virtual > Deadline First (EEVDF). EEVDF uses exactly the same method as CFS to > track the execution of each running task. The only difference between > EEVDF and CFS is that EEVDF tries to _deadline_ fair while CFS is > _start-time_ fair. Scheduling based on deadline gives better reponse > time bound and seems to more fair. > > In the following part of this email, I will try to explain the > similarities and differences between EEVDF and CFS. Hopefully, this > might provide you with some useful information w.r.t your current work > on CFS. (...) Thanks very much for this very clear explanation. Now I realize that some of the principles I've had in mind for a long time already exist and are documented ! That's what I called sorting by job completion time in the past, which might not have been clear for everyone. Now you have put words on all those concepts, it's more clear ;-) > The decouple of weight w_i and timeslice l_i is important. Generally > speaking, weight determines throughput and timeslice determines the > responsiveness of a task. I 100% agree. That's the problem we have with nice today. Some people want to use nice to assign more CPU to tasks (as has always been for years) and others want to use nice to get better interactivity (meaning nice as when you're in a queue and leaving the old woman go before you). IMHO, the two concepts are opposed. Either you're a CPU hog OR you get quick responsiveness. > In normal situation, high priority tasks > usually need more cpu capacity within short period of time (bursty, such > as keyboard, mouse move, X updates, daemons, etc), and need to be > processed as quick as possible (responsiveness and interactiveness). > Follow the analysis above, we know that for higher priority tasks we > should give _higher weight_ to ensure its CPU throughput, and at the > same time give _smaller timeslice_ to ensure better responsiveness. > This is a bit counter-intuitive against the current linux > implementation: smaller nice value leads to higher weight and larger > timeslice. We have an additional problem in Linux, and not the least : it already exists and is deployed everywhere, so we cannot break existing setups. More specifically, we don't want to play with nice values of processes such as X. That's why I think that monitoring the amount of the time-slice (l_i) consumed by the task is important. I proposed to conserve the unused part of l_i as a credit (and conversely the credit can be negative if the time-slice has been over-used). This credit would serve two purposes : - reassign the unused part of l_i on next time-slices to get the most fair share of CPU between tasks - use it as an interactivity key to sort the tasks. Basically, if we note u_i the unused CPU cycles, you can sort based on (d_i - u_i) instead of just d_i, and the less hungry tasks will reach the CPU faster than others. (...) > Based on my understanding, adopting something like EEVDF in CFS should > not be very difficult given their similarities, although I do not have > any idea on how this impacts the load balancing for SMP. Does this worth > a try? I think that if you have time to spend on this, everyone would like to see the difference. All the works on the scheduler are more or less experimental and several people are exchanging ideas right now, so it should be the right moment. You seem to understand very well both approaches and it's likely that it will not take you too much time :-) > Sorry for such a long email :-) It was worth it, thanks ! Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: per-thread rusage
Alan Cox wrote: I just so happen to think we should implement a variety of CPU resource limits beyond what we now do, so this, too, interests me. Agreed - and make them all 64bit while doing the cleanup. One thing several Unixen have we don't for 32bi boxes is a proper set of 64bit resource handling for memory/file etc. We could also start using the CPU facilities to enforce some of the really interesting real time process ones (like main memory bandwidth) that at the moment we have no control over and can lead to very unfair behaviour. Alan Hi, Alan, Thanks for bringing this up. There are a couple of patches posted to lkml for RSS control (unmapped page cache controller under development). http://lwn.net/Articles/223829/ and the new enhanced verison by Pavel at http://www.opensubscriber.com/message/linux-kernel@vger.kernel.org/6456480.html We would appreciate any feedback to help us move the work forward and make the code ready for acceptance -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Crypto Update for 2.6.22
Hi: Here is the crypto update for 2.6.22: Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git or master.kernel.org:/pub/scm/linux/kernel/git/herbert/crypto-2.6.git Summary: * Added API for asynchronous block ciphers. * Small clean-up's. Herbert Xu (9): [CRYPTO] api: Proc functions should be marked as unused [CRYPTO] api: Add async block cipher interface [CRYPTO] tcrypt: Use async blkcipher interface [CRYPTO] templates: Pass type/mask when creating instances [CRYPTO] api: Add async blkcipher type [CRYPTO] cryptomgr: Fix parsing of nested templates [CRYPTO] api: Do not remove users unless new algorithm matches [CRYPTO] cryptd: Add software async crypto daemon [CRYPTO] api: Add ablkcipher_request_set_tfm Simon Arlott (1): [CRYPTO] padlock: Remove pointless padlock module crypto/Kconfig | 13 + crypto/Makefile |2 crypto/ablkcipher.c | 83 ++ crypto/algapi.c | 169 + crypto/blkcipher.c | 72 - crypto/cbc.c| 11 + crypto/cryptd.c | 375 crypto/cryptomgr.c | 66 +--- crypto/ecb.c| 11 + crypto/hash.c |2 crypto/hmac.c | 11 + crypto/lrw.c| 11 + crypto/pcbc.c | 11 + crypto/tcrypt.c | 121 ++- crypto/xcbc.c | 12 + drivers/crypto/Kconfig | 16 -- drivers/crypto/Makefile |1 include/crypto/algapi.h | 84 ++ include/linux/crypto.h | 236 +- 19 files changed, 1166 insertions(+), 141 deletions(-) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] ehea: NAPI multi queue TX/RX path for SMP
On Wed, 2007-02-28 at 18:34 +0100, Jan-Bernd Themann wrote: > This patch provides a functionality that allows parallel > RX processing on multiple RX queues by using dummy netdevices. > > > Signed-off-by: Jan-Bernd Themann <[EMAIL PROTECTED]> > --- > @@ -1789,6 +1798,22 @@ static void ehea_xmit3(struct sk_buff *s > dev_kfree_skb(skb); > } > > +static inline int ehea_hash_skb(struct sk_buff *skb, int num_qps) > +{ > + struct tcphdr *tcp; > + u32 tmp; > + > + if ((skb->protocol == htons(ETH_P_IP)) && > + (skb->nh.iph->protocol == IPPROTO_TCP)) { This breaks the build, looks like skb->nh went away: b0e380b1d8a8e0aca215df97702f99815f05c094 /scratch/michael/kisskb-build/src/drivers/net/ehea/ehea_main.c:1806: error: 'struct sk_buff' has no member named 'nh' /scratch/michael/kisskb-build/src/drivers/net/ehea/ehea_main.c:1807: error: 'struct sk_buff' has no member named 'nh' /scratch/michael/kisskb-build/src/drivers/net/ehea/ehea_main.c:1807: error: 'struct sk_buff' has no member named 'nh' /scratch/michael/kisskb-build/src/drivers/net/ehea/ehea_main.c:1809: error: 'struct sk_buff' has no member named 'nh' cheers -- Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person signature.asc Description: This is a digitally signed message part
Re: Question about Reiser4 (how to boot it?)
Hi Jeff, it seems that lkml has contacted both of my email accounts and cripped them. I can no longer recieve email from lkml on this account. I can neither recieve or send email to lkml from my other account. They have also just deleted the 4 emails I sent to lkml from the page http://lkml.org/lkml/2007/4/30/ This included one to you. In case you didn't get it,... here it is again. --- I used GRUB and the kernel and initrd from a separate partition to boot a Reiser4 installation. -- [EMAIL PROTECTED] -- http://www.fastmail.fm - Does exactly what it says on the tin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: per-thread rusage
On 5/1/07, Theodore Tso <[EMAIL PROTECTED]> wrote: The question is should we use setrlimit() to set the per-thread CPU limit, given that we would need some separate interface to set signal that should be sent. Is there any reason why we should have the interface specify whether the signal should be directed to a specified process or kernel thread-id, perhaps using si_pid field in the siginfo_t to specify which thread had exceeded its CPU limit. Or would this be overkill? The more I think about it the more complex it gets. There is a problem with delivering the signal to the receiving process itself: it is out of time and cannot perform the cleanup operation anymore. You could grant it a grace period but how long should that be? Some of the cleanup handlers might take a long time. If you don't enforce the CPU limit then it doesn't have to be in the kernel and you might as well use CLOCK_THREAD_CPUTIME_ID and create a timer. This should already work today. If not it must be fixed. Delivering the timeout signal to another thread isn't really possible either since the cleanup code might access thread-local data which wouldn't work since it's not the canceled thread's data which is accessed. I don't have a good answer right now whether enforced CPU limits can be implemented at all. But it seems for your purposes a timer with the CPU clock might be sufficient. Do you think this is something that we could get standardized into an upcoming Posix/Posix Threads standard? Regardless of whether a solution can be found, it's too late for the next revision. The deadline for new features is long gone by. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SOME STUFF ABOUT REISER4 To Mr Hopper
On Wed, 25 Apr 2007 09:29:19 -0400, "Jeff Garzik" <[EMAIL PROTECTED]> said: > Please fix your caps lock key. Thanks. > > Jeff Hi Jeff, it seems that lkml has contacted both of my email accounts and cripped them. I can no longer recieve email from lkml on this account. I can neither recieve or send email to lkml from my other account. They have also just deleted the 4 emails I sent to lkml from the page http://lkml.org/lkml/2007/4/30/ This included one to you. In case you didn't get it,... here it is again. --- Please fix your attitude. -- [EMAIL PROTECTED] -- http://www.fastmail.fm - The way an email service should be - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Question about Reiser4
Hi Edward, it seems that lkml has contacted both of my email accounts and cripped them. I can no longer recieve email from lkml on this account. I can neither recieve or send email to lkml from my other account. They have also just deleted the 4 emails I sent to lkml from the page http://lkml.org/lkml/2007/4/30/ This included one to you. In case you didn't get it,... here it is again. (Since you still haven't answered this one). - On Wed, 25 Apr 2007 19:03:12 +0400, "Edward Shishkin" <[EMAIL PROTECTED]> said: > [EMAIL PROTECTED] wrote: > > > > >As I understand it, the default Reiser4 DOES NOT USE any compression at > >all, not even tail compression, > > > > ^tail compression^tail conversion > Reiser4 does use tail conversion by default. > > > but saves space by eliminating block > >alignment wastage (tail compression is an option). > > > >So lets LOSE the statistics that involve compression. The results now > >look like this: > > > >.-. > >| FILESYSTEM | TIME |DISK | > >| TYPE |(secs)|USAGE| > >.-. > >|REISER4 | 3462 | 692 | > >|EXT2| 4092 | 816 | > >|JFS | 4225 | 806 | > >|EXT4| 4408 | 816 | > >|EXT3| 4421 | 816 | > >|XFS | 4625 | 779 | > >|REISER3 | 6178 | 793 | > >|FAT32 |12342 | 988 | > >|NTFS-3g |10414 | 772 | > >.-. > > > >These results are still EXTREMELY GOOD for REISER4. > > > > > > Everything is not so simple in the science of testing.. > Would you please change direction of your activity to stressing > instead of benchmarking? Caught oopses would have great value.. > OK? > > Regards, > Edward. > Tail conversion is NOT compression, So what exactly is your point? By "tail compression" I mean plugin ctail40, but since I was never able to get it to work, maybe its not tail compression at all. -- [EMAIL PROTECTED] -- http://www.fastmail.fm - Faster than the air-speed velocity of an unladen european swallow - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH (v2)] crypto: Remove pointless padlock module
On Sun, Apr 29, 2007 at 09:01:10AM +0100, Simon Arlott wrote: > > Well that's mostly the point - it shouldn't get compiled in - ever, > but it also has other modules depending on it in Kconfig that > shouldn't need to be modules. Patch applied. Thanks! -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Why ask Sun for ZFS while we have ReiserFS4 !?
Hi andrew, it seems that lkml has contacted both of my email accounts and cripped them. I can no longer recieve email from lkml on this account. I can neither recieve or send email to lkml from my other account. They have also just deleted the 4 emails I sent to lkml from the page http://lkml.org/lkml/2007/4/30/ This included one to you. In case you didn't get it,... here it is again. Yeah, why do you need ZFS while we have ReiserFS4? REISER4 - THE BEST FILESYSTEM EVER. You can read more here: http://linuxhelp.150m.com/resources/fs-benchmarks.htm http://m.domaindlx.com/LinuxHelp/resources/fs-benchmarks.htm .-. | FILESYSTEM | TIME |DISK | | TYPE |(secs)|USAGE| .-. |REISER4 lzo | 1938 | 278 | |REISER4 gzip| 2295 | 213 | .-. |REISER4 | 3462 | 692 | |EXT2| 4092 | 816 | |JFS | 4225 | 806 | |EXT4| 4408 | 816 | |EXT3| 4421 | 816 | |XFS | 4625 | 779 | |REISER3 | 6178 | 793 | |FAT32 |12342 | 988 | |NTFS-3g |10414 | 772 | .-. Column one measures the time taken to complete the bonnie++ benchmarking test (run with the parameters bonnie++ -n128:128k:0). The top two results use Reiser4 with compression. Since bonnie++ writes test files which are almost all zeros, compression speeds things up dramatically. That this is not the case in real world examples can be seen below where compression does not speed things up. However, more importantly, it does not slow things down either. Column two, Disk Usage: measures the amount of disk used to store 655MB of raw data (which was 3 different copies of the Linux kernel sources). OR LOOK AT THE FULL RESULTS: .-. |File |Disk |Copy |Copy |Tar |Unzip| Del | |System |Usage|655MB|655MB|Gzip |UnTar| 2.5 | |Type | (MB)| (1) | (2) |655MB|655MB| Gig | .-. |REISER4 gzip | 213 | 148 | 68 | 83 | 48 | 70 | |REISER4 lzo | 278 | 138 | 56 | 80 | 34 | 84 | |REISER4 tails| 673 | 148 | 63 | 78 | 33 | 65 | |REISER4 | 692 | 148 | 55 | 67 | 25 | 56 | |NTFS3g | 772 |1333 |1426 | 585 | 767 | 194 | |NTFS | 779 | 781 | 173 | X | X | X | |REISER3 | 793 | 184 | 98 | 85 | 63 | 22 | |XFS | 799 | 220 | 173 | 119 | 90 | 106 | |JFS | 806 | 228 | 202 | 95 | 97 | 127 | |EXT4 extents | 806 | 162 | 55 | 69 | 36 | 32 | |EXT4 default | 816 | 174 | 70 | 74 | 42 | 50 | |EXT3 | 816 | 182 | 74 | 73 | 43 | 51 | |EXT2 | 816 | 201 | 82 | 73 | 39 | 67 | |FAT32| 988 | 253 | 158 | 118 | 81 | 95 | .-. Each test was preformed 5 times and the average value recorded. Disk Usage: The amount of disk used to store the data (which was 3 different copies of the Linux kernel sources). The raw data (without filesystem meta-data, block alignment wastage, etc) was 655MB. Copy 655MB (1): Copy the data over a partition boundary. Copy 655MB (2): Copy the data within a partition. Tar Gzip 655MB: Tar and Gzip the data. Unzip UnTar 655MB: UnGzip and UnTar the data. Del 2.5 Gig: Delete everything just written (about 2.5 Gig). http://lkml.org/lkml/2007/4/9/4 -- [EMAIL PROTECTED] -- http://www.fastmail.fm - A fast, anti-spam email service. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] crypto: convert "crypto" subdirectory to UTF-8
On Tue, Apr 17, 2007 at 01:25:49PM -0400, John Anthony Kazos Jr. wrote: > From: John Anthony Kazos Jr. <[EMAIL PROTECTED]> > > Convert the subdirectory "crypto" to UTF-8. The files changed are > and . > > Signed-off-by: John Anthony Kazos Jr. <[EMAIL PROTECTED]> Thanks. Could you fix up include/linux/crypto.h as well? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove unnecessary irq disabling
On Tue, May 01, 2007 at 07:59:21PM -0400, Mark Lord wrote: > Glauber de Oliveira Costa wrote: > >RR asks us if it is really necessary to disable interrupts in > >setup_secondary_APIC_clock(). The answer is no, since setup_APIC_timer() > >starts by saving irq flags, which also disables them. > > > >Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]> > > > >--- a/arch/x86_64/kernel/apic.c > >+++ b/arch/x86_64/kernel/apic.c > >@@ -875,9 +875,7 @@ void __init setup_boot_APIC_clock (void) > > > > void __cpuinit setup_secondary_APIC_clock(void) > > { > >-local_irq_disable(); /* FIXME: Do we need this? --RR */ > > setup_APIC_timer(calibration_result); > >-local_irq_enable(); > > } > > > > void disable_APIC_timer(void) > > Okay, I'll bite: before the patch, this code would exit > with interrupts *enabled*, always. Now it does not. > yeah, you have a point. The disable is unnecessary, but maybe the enable is not. However, > What does that break, or was it already broken and this fixes it? I think neither. This function is only called at early bootup, (start_secondary() ), and most of its callees have interrupts off anyway. But maybe we do lose something. Andi, do you have a word on this? -- Glauber de Oliveira Costa Red Hat Inc. "Free as in Freedom" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: per-thread rusage
On Tue, May 01, 2007 at 05:17:28PM -0700, Ulrich Drepper wrote: > We have, in principal: setrlimit. We jump through hoops in the moment > to make RLIMIT_CPU a per-process facility. This is all nice. All you > need to do is to add resources RLIMIT_*_THREAD (e.g., > RLIMIT_CPU_THREAD) and additionally do accounting in a per-thread > basis. Indeed; in fact it would be easier to do per-thread accounting than our current per-process accounting, as you note. > The thread library can also not simply hijack the SIGXCPU signal, > the application want to use it So what would be additionally > needed is a method to specify what signal to sent. The default > might just as well be SIGXCPU but this must be changable. The question is should we use setrlimit() to set the per-thread CPU limit, given that we would need some separate interface to set signal that should be sent. Is there any reason why we should have the interface specify whether the signal should be directed to a specified process or kernel thread-id, perhaps using si_pid field in the siginfo_t to specify which thread had exceeded its CPU limit. Or would this be overkill? > The thread cancellation must appear like any other cancellation, > perhaps with a special status value (PTHREAD_CANCELED_XCPU instead of > PTHREAD_CANCEL). But that's a userlevel detail. Yep, I agree that thread cancellation is the right thing to happen at the Posix Threads level. Do you think this is something that we could get standardized into an upcoming Posix/Posix Threads standard? - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 07/10] mips: excite: use __attribute_unused__
Replace variable instances of __attribute__((unused)) with __attribute_unused__. Cc: Ralf Baechle <[EMAIL PROTECTED]> Cc: Thomas Koeller <[EMAIL PROTECTED]> Signed-off-by: David Rientjes <[EMAIL PROTECTED]> --- arch/mips/basler/excite/excite_device.c | 16 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/mips/basler/excite/excite_device.c b/arch/mips/basler/excite/excite_device.c --- a/arch/mips/basler/excite/excite_device.c +++ b/arch/mips/basler/excite/excite_device.c @@ -68,7 +68,7 @@ enum { static struct resource - excite_ctr_resource __attribute__((unused)) = { + excite_ctr_resource __attribute_unused__ = { .name = "GPI counters", .start = 0, .end= 5, @@ -77,7 +77,7 @@ static struct resource .sibling= NULL, .child = NULL }, - excite_gpislice_resource __attribute__((unused)) = { + excite_gpislice_resource __attribute_unused__ = { .name = "GPI slices", .start = 0, .end= 1, @@ -86,7 +86,7 @@ static struct resource .sibling= NULL, .child = NULL }, - excite_mdio_channel_resource __attribute__((unused)) = { + excite_mdio_channel_resource __attribute_unused__ = { .name = "MDIO channels", .start = 0, .end= 1, @@ -95,7 +95,7 @@ static struct resource .sibling= NULL, .child = NULL }, - excite_fifomem_resource __attribute__((unused)) = { + excite_fifomem_resource __attribute_unused__ = { .name = "FIFO memory", .start = 0, .end= 767, @@ -104,7 +104,7 @@ static struct resource .sibling= NULL, .child = NULL }, - excite_scram_resource __attribute__((unused)) = { + excite_scram_resource __attribute_unused__ = { .name = "Scratch RAM", .start = EXCITE_PHYS_SCRAM, .end= EXCITE_PHYS_SCRAM + EXCITE_SIZE_SCRAM - 1, @@ -113,7 +113,7 @@ static struct resource .sibling= NULL, .child = NULL }, - excite_fpga_resource __attribute__((unused)) = { + excite_fpga_resource __attribute_unused__ = { .name = "System FPGA", .start = EXCITE_PHYS_FPGA, .end= EXCITE_PHYS_FPGA + EXCITE_SIZE_FPGA - 1, @@ -122,7 +122,7 @@ static struct resource .sibling= NULL, .child = NULL }, - excite_nand_resource __attribute__((unused)) = { + excite_nand_resource __attribute_unused__ = { .name = "NAND flash control", .start = EXCITE_PHYS_NAND, .end= EXCITE_PHYS_NAND + EXCITE_SIZE_NAND - 1, @@ -131,7 +131,7 @@ static struct resource .sibling= NULL, .child = NULL }, - excite_titan_resource __attribute__((unused)) = { + excite_titan_resource __attribute_unused__ = { .name = "TITAN registers", .start = EXCITE_PHYS_TITAN, .end= EXCITE_PHYS_TITAN + EXCITE_SIZE_TITAN - 1, - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 03/10] sh: dma: use __attribute_unused__
There is no such thing as labeling a variable as __attribute__((used)). Since ts_shift is not referenced in inline assembly, we assume that we're simply suppressing a warning here if the variable is declared but unreferenced. Cc: Paul Mundt <[EMAIL PROTECTED]> Signed-off-by: David Rientjes <[EMAIL PROTECTED]> --- include/asm-sh/cpu-sh3/dma.h|2 +- include/asm-sh/cpu-sh4/dma-sh7780.h |2 +- include/asm-sh/cpu-sh4/dma.h|2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/asm-sh/cpu-sh3/dma.h b/include/asm-sh/cpu-sh3/dma.h --- a/include/asm-sh/cpu-sh3/dma.h +++ b/include/asm-sh/cpu-sh3/dma.h @@ -26,7 +26,7 @@ enum { XMIT_SZ_128BIT, }; -static unsigned int ts_shift[] __attribute__ ((used)) = { +static unsigned int ts_shift[] __attribute_unused__ = { [XMIT_SZ_8BIT] = 0, [XMIT_SZ_16BIT] = 1, [XMIT_SZ_32BIT] = 2, diff --git a/include/asm-sh/cpu-sh4/dma-sh7780.h b/include/asm-sh/cpu-sh4/dma-sh7780.h --- a/include/asm-sh/cpu-sh4/dma-sh7780.h +++ b/include/asm-sh/cpu-sh4/dma-sh7780.h @@ -28,7 +28,7 @@ enum { /* * The DMA count is defined as the number of bytes to transfer. */ -static unsigned int __attribute__ ((used)) ts_shift[] = { +static unsigned int ts_shift[] __attribute_unused__ = { [XMIT_SZ_8BIT] = 0, [XMIT_SZ_16BIT] = 1, [XMIT_SZ_32BIT] = 2, diff --git a/include/asm-sh/cpu-sh4/dma.h b/include/asm-sh/cpu-sh4/dma.h --- a/include/asm-sh/cpu-sh4/dma.h +++ b/include/asm-sh/cpu-sh4/dma.h @@ -53,7 +53,7 @@ enum { /* * The DMA count is defined as the number of bytes to transfer. */ -static unsigned int ts_shift[] __attribute__ ((used)) = { +static unsigned int ts_shift[] __attribute_unused__ = { [XMIT_SZ_64BIT] = 3, [XMIT_SZ_8BIT] = 0, [XMIT_SZ_16BIT] = 1, - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 05/10] frv: gdb: use __attribute_unused__
Replace function instances of __attribute__((unused)) with __attribute_unused__ to suppress warnings. Cc: David Howells <[EMAIL PROTECTED]> Signed-off-by: David Rientjes <[EMAIL PROTECTED]> --- arch/frv/kernel/gdb-stub.c | 12 ++-- 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/frv/kernel/gdb-stub.c b/arch/frv/kernel/gdb-stub.c --- a/arch/frv/kernel/gdb-stub.c +++ b/arch/frv/kernel/gdb-stub.c @@ -1195,7 +1195,7 @@ static void gdbstub_check_breakpoint(void) /* * */ -static void __attribute__((unused)) gdbstub_show_regs(void) +static void __attribute_unused__ gdbstub_show_regs(void) { unsigned long *reg; int loop; @@ -1223,7 +1223,7 @@ static void __attribute__((unused)) gdbstub_show_regs(void) /* * dump debugging regs */ -static void __attribute__((unused)) gdbstub_dump_debugregs(void) +static void __attribute_unused__ gdbstub_dump_debugregs(void) { gdbstub_printk("DCR%08lx ", __debug_status.dcr); gdbstub_printk("BRR%08lx\n", __debug_status.brr); @@ -2079,25 +2079,25 @@ void gdbstub_exit(int status) * GDB wants to call malloc() and free() to allocate memory for calling kernel * functions directly from its command line */ -static void *malloc(size_t size) __attribute__((unused)); +static void *malloc(size_t size) __attribute_unused__; static void *malloc(size_t size) { return kmalloc(size, GFP_ATOMIC); } -static void free(void *p) __attribute__((unused)); +static void free(void *p) __attribute_unused__; static void free(void *p) { kfree(p); } -static uint32_t ___get_HSR0(void) __attribute__((unused)); +static uint32_t ___get_HSR0(void) __attribute_unused__; static uint32_t ___get_HSR0(void) { return __get_HSR(0); } -static uint32_t ___set_HSR0(uint32_t x) __attribute__((unused)); +static uint32_t ___set_HSR0(uint32_t x) __attribute_unused__; static uint32_t ___set_HSR0(uint32_t x) { __set_HSR(0, x); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 08/10] mips: tlbex: use __attribute_unused__
Replace function instances of __attribute__((unused)) with __attribute_unused__. Cc: Ralf Baechle <[EMAIL PROTECTED]> Signed-off-by: David Rientjes <[EMAIL PROTECTED]> --- arch/mips/mm/tlbex.c | 36 ++-- 1 files changed, 18 insertions(+), 18 deletions(-) diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c --- a/arch/mips/mm/tlbex.c +++ b/arch/mips/mm/tlbex.c @@ -35,24 +35,24 @@ #include #include -static __init int __attribute__((unused)) r45k_bvahwbug(void) +static __init int __attribute_unused__ r45k_bvahwbug(void) { /* XXX: We should probe for the presence of this bug, but we don't. */ return 0; } -static __init int __attribute__((unused)) r4k_250MHZhwbug(void) +static __init int __attribute_unused__ r4k_250MHZhwbug(void) { /* XXX: We should probe for the presence of this bug, but we don't. */ return 0; } -static __init int __attribute__((unused)) bcm1250_m3_war(void) +static __init int __attribute_unused__ bcm1250_m3_war(void) { return BCM1250_M3_WAR; } -static __init int __attribute__((unused)) r1_llsc_war(void) +static __init int __attribute_unused__ r1_llsc_war(void) { return R1_LLSC_WAR; } @@ -511,18 +511,18 @@ L_LA(_r3000_write_probe_fail) #define i_ehb(buf) i_sll(buf, 0, 0, 3) #ifdef CONFIG_64BIT -static __init int __attribute__((unused)) in_compat_space_p(long addr) +static __init int __attribute_unused__ in_compat_space_p(long addr) { /* Is this address in 32bit compat space? */ return (((addr) & 0xL) == 0xL); } -static __init int __attribute__((unused)) rel_highest(long val) +static __init int __attribute_unused__ rel_highest(long val) { return val + 0x800080008000L) >> 48) & 0x) ^ 0x8000) - 0x8000; } -static __init int __attribute__((unused)) rel_higher(long val) +static __init int __attribute_unused__ rel_higher(long val) { return val + 0x80008000L) >> 32) & 0x) ^ 0x8000) - 0x8000; } @@ -556,8 +556,8 @@ static __init void i_LA_mostly(u32 **buf, unsigned int rs, long addr) i_lui(buf, rs, rel_hi(addr)); } -static __init void __attribute__((unused)) i_LA(u32 **buf, unsigned int rs, - long addr) +static __init void __attribute_unused__ i_LA(u32 **buf, unsigned int rs, +long addr) { i_LA_mostly(buf, rs, addr); if (rel_lo(addr)) @@ -636,8 +636,8 @@ static __init void copy_handler(struct reloc *rel, struct label *lab, move_labels(lab, first, end, off); } -static __init int __attribute__((unused)) insn_has_bdelay(struct reloc *rel, - u32 *addr) +static __init int __attribute_unused__ insn_has_bdelay(struct reloc *rel, + u32 *addr) { for (; rel->lab != label_invalid; rel++) { if (rel->addr == addr @@ -650,15 +650,15 @@ static __init int __attribute__((unused)) insn_has_bdelay(struct reloc *rel, } /* convenience functions for labeled branches */ -static void __init __attribute__((unused)) +static void __init __attribute_unused__ il_bltz(u32 **p, struct reloc **r, unsigned int reg, enum label_id l) { r_mips_pc16(r, *p, l); i_bltz(p, reg, 0); } -static void __init __attribute__((unused)) il_b(u32 **p, struct reloc **r, -enum label_id l) +static void __init __attribute_unused__ il_b(u32 **p, struct reloc **r, +enum label_id l) { r_mips_pc16(r, *p, l); i_b(p, 0); @@ -671,7 +671,7 @@ static void __init il_beqz(u32 **p, struct reloc **r, unsigned int reg, i_beqz(p, reg, 0); } -static void __init __attribute__((unused)) +static void __init __attribute_unused__ il_beqzl(u32 **p, struct reloc **r, unsigned int reg, enum label_id l) { r_mips_pc16(r, *p, l); @@ -692,7 +692,7 @@ static void __init il_bgezl(u32 **p, struct reloc **r, unsigned int reg, i_bgezl(p, reg, 0); } -static void __init __attribute__((unused)) +static void __init __attribute_unused__ il_bgez(u32 **p, struct reloc **r, unsigned int reg, enum label_id l) { r_mips_pc16(r, *p, l); @@ -810,7 +810,7 @@ static __initdata u32 final_handler[64]; * * As if we MIPS hackers wouldn't know how to nop pipelines happy ... */ -static __init void __attribute__((unused)) build_tlb_probe_entry(u32 **p) +static __init void __attribute_unused__ build_tlb_probe_entry(u32 **p) { switch (current_cpu_data.cputype) { /* Found by experiment: R4600 v2.0 needs this, too. */ @@ -1098,7 +1098,7 @@ build_get_pgd_vmalloc64(u32 **p, struct label **l, struct reloc **r, * TMP and PTR are scratch. * TMP will be clobbered, PTR will hold the pgd entry. */ -static __init void __attribute__((unused)
[patch 04/10] scsi: fix ambiguous gdthtable definition
Labeling a variable as __attribute_used__ is ambiguous: it means __attribute__((unused)) for gcc <3.4 and __attribute__((used)) for gcc >=3.4. There is no such thing as labeling a variable as __attribute__((used)). We assume that we're simply suppressing a warning here if gdthtable[] is declared but unreferenced. Cc: Achim Leubner <[EMAIL PROTECTED]> Signed-off-by: David Rientjes <[EMAIL PROTECTED]> --- drivers/scsi/gdth.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c --- a/drivers/scsi/gdth.c +++ b/drivers/scsi/gdth.c @@ -876,7 +876,7 @@ static int __init gdth_search_pci(gdth_pci_str *pcistr) /* Vortex only makes RAID controllers. * We do not really want to specify all 550 ids here, so wildcard match. */ -static struct pci_device_id gdthtable[] __attribute_used__ = { +static struct pci_device_id gdthtable[] __attribute_unused__ = { {PCI_VENDOR_ID_VORTEX,PCI_ANY_ID,PCI_ANY_ID, PCI_ANY_ID}, {PCI_VENDOR_ID_INTEL,PCI_DEVICE_ID_INTEL_SRC,PCI_ANY_ID,PCI_ANY_ID}, {PCI_VENDOR_ID_INTEL,PCI_DEVICE_ID_INTEL_SRC_XSCALE,PCI_ANY_ID,PCI_ANY_ID}, - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 09/10] powerpc: ps3: use __attribute_unused__
Replace function instances of __attribute__ ((unused)) with __attribute_unused__. Cc: Geoff Levand <[EMAIL PROTECTED]> Signed-off-by: David Rientjes <[EMAIL PROTECTED]> --- arch/powerpc/platforms/ps3/interrupt.c |4 ++-- arch/powerpc/platforms/ps3/time.c |2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/platforms/ps3/interrupt.c b/arch/powerpc/platforms/ps3/interrupt.c --- a/arch/powerpc/platforms/ps3/interrupt.c +++ b/arch/powerpc/platforms/ps3/interrupt.c @@ -434,7 +434,7 @@ static void _dump_64_bmp(const char *header, const u64 *p, unsigned cpu, *p & 0x); } -static void __attribute__ ((unused)) _dump_256_bmp(const char *header, +static void __attribute_unused__ _dump_256_bmp(const char *header, const u64 *p, unsigned cpu, const char* func, int line) { pr_debug("%s:%d: %s %u {%016lx:%016lx:%016lx:%016lx}\n", @@ -453,7 +453,7 @@ static void _dump_bmp(struct ps3_private* pd, const char* func, int line) } #define dump_mask(_x) _dump_mask(_x, __func__, __LINE__) -static void __attribute__ ((unused)) _dump_mask(struct ps3_private* pd, +static void __attribute_unused__ _dump_mask(struct ps3_private* pd, const char* func, int line) { unsigned long flags; diff --git a/arch/powerpc/platforms/ps3/time.c b/arch/powerpc/platforms/ps3/time.c --- a/arch/powerpc/platforms/ps3/time.c +++ b/arch/powerpc/platforms/ps3/time.c @@ -39,7 +39,7 @@ static void _dump_tm(const struct rtc_time *tm, const char* func, int line) } #define dump_time(_a) _dump_time(_a, __func__, __LINE__) -static void __attribute__ ((unused)) _dump_time(int time, const char* func, +static void __attribute_unused__ _dump_time(int time, const char* func, int line) { struct rtc_time tm; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 10/10] i386 mmzone: use __attribute_unused__
Replace automatic variable instances of __attribute__ ((unused)) with __attribute_unused__. Cc: Andy Whitcroft <[EMAIL PROTECTED]> Signed-off-by: David Rientjes <[EMAIL PROTECTED]> --- include/asm-i386/mmzone.h |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/asm-i386/mmzone.h b/include/asm-i386/mmzone.h --- a/include/asm-i386/mmzone.h +++ b/include/asm-i386/mmzone.h @@ -122,21 +122,21 @@ static inline int pfn_valid(int pfn) __alloc_bootmem_node(NODE_DATA(0), (x), PAGE_SIZE, 0) #define alloc_bootmem_node(pgdat, x) \ ({ \ - struct pglist_data __attribute__ ((unused))\ + struct pglist_data __attribute_unused__\ *__alloc_bootmem_node__pgdat = (pgdat); \ __alloc_bootmem_node(NODE_DATA(0), (x), SMP_CACHE_BYTES,\ __pa(MAX_DMA_ADDRESS)); \ }) #define alloc_bootmem_pages_node(pgdat, x) \ ({ \ - struct pglist_data __attribute__ ((unused))\ + struct pglist_data __attribute_unused__\ *__alloc_bootmem_node__pgdat = (pgdat); \ __alloc_bootmem_node(NODE_DATA(0), (x), PAGE_SIZE, \ __pa(MAX_DMA_ADDRESS)) \ }) #define alloc_bootmem_low_pages_node(pgdat, x) \ ({ \ - struct pglist_data __attribute__ ((unused))\ + struct pglist_data __attribute_unused__\ *__alloc_bootmem_node__pgdat = (pgdat); \ __alloc_bootmem_node(NODE_DATA(0), (x), PAGE_SIZE, 0); \ }) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 06/10] i386: voyager: use __attribute_unused__
Replace automatic variable instances of __attribute__((unused)) with __attribute_unused__ in mca_nmi_hook(). Cc: James Bottomley <[EMAIL PROTECTED]> Signed-off-by: David Rientjes <[EMAIL PROTECTED]> --- arch/i386/mach-voyager/voyager_basic.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/i386/mach-voyager/voyager_basic.c b/arch/i386/mach-voyager/voyager_basic.c --- a/arch/i386/mach-voyager/voyager_basic.c +++ b/arch/i386/mach-voyager/voyager_basic.c @@ -292,8 +292,8 @@ machine_emergency_restart(void) void mca_nmi_hook(void) { - __u8 dumpval __attribute__((unused)) = inb(0xf823); - __u8 swnmi __attribute__((unused)) = inb(0xf813); + __u8 dumpval __attribute_unused__ = inb(0xf823); + __u8 swnmi __attribute_unused__ = inb(0xf813); /* FIXME: assume dump switch pressed */ /* check to see if the dump switch was pressed */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 01/10] compiler: define __attribute_unused__
For all supported versions of gcc (major version 3 and above), functions and variables may be declared with __attribute__((unused)) to suppress warnings if they are declared but unused. This shouldn't be confused with functions being declared with __attribute__((used)). This specifies that the function code shall still be emitted even if it appears to be unreferenced, normally used if embedded in inline assembly. For gcc 3.4 and later, unreferenced static variables and functions are not emitted so this attribute is necessary to force variables and functions to be output. Earlier versions of gcc can simply use __attribute__((unused)) to suppress warnings about such variables: we do not require any special classification to ensure they are emitted. We introduce __attribute_unused__ for variables that should not produce a compile warning if they can, due to preprocessor macros, go unreferenced. Signed-off-by: David Rientjes <[EMAIL PROTECTED]> --- include/linux/compiler-gcc.h | 16 1 files changed, 16 insertions(+), 0 deletions(-) diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -37,3 +37,19 @@ #define noinline __attribute__((noinline)) #define __attribute_pure__ __attribute__((pure)) #define __attribute_const____attribute__((__const__)) + +/* + * __attribute_unused__ shall be used for functions or variables to suppress + * warnings when they may be declared but, due to preprocessor macros, + * commenting, etc., go unreferenced. + * + * In contrast, __attribute_used__ shall be used only for functions. gcc <3.4 + * emits code for static functions that are unreferenced and outputs a warning. + * __attribute_used__ will correctly suppress this warning. gcc >=3.4 does not + * emit code for static functions that are unreferenced (and thus there is no + * warning), but __attribute_used__ forces the function code to be output. Use + * __attribute_unused__ to suppress warnings about functions being unused or + * __attribute_used__ to ensure code is emitted when it is referenced only in + * inline assembly. + */ +#define __attribute_unused__ __attribute__((unused)) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 02/10] i386 pci: type may be unused
In the case of !CONFIG_PCI_DIRECT && !CONFIG_PCI_MMCONFIG, type is unreferened. Cc: Andi Kleen <[EMAIL PROTECTED]> Signed-off-by: David Rientjes <[EMAIL PROTECTED]> --- arch/i386/pci/init.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/i386/pci/init.c b/arch/i386/pci/init.c --- a/arch/i386/pci/init.c +++ b/arch/i386/pci/init.c @@ -6,7 +6,7 @@ in the right sequence from here. */ static __init int pci_access_init(void) { - int type = 0; + int type __attribute_unused__ = 0; #ifdef CONFIG_PCI_DIRECT type = pci_direct_probe(); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/