Re: [ext3][kernels >= 2.6.20.7 at least] KDE going comatose when FS is under heavy write load (massive starvation)

2007-05-01 Thread Jens Axboe
On Fri, Apr 27 2007, Linus Torvalds wrote:
> So I do believe that we could probably do something about the IO 
> scheduling _too_:
> 
>  - break up large write requests (yeah, it will make for worse IO 
>throughput, but if make it configurable, and especially with 
>controllers that don't have insane overheads per command, the 
>difference between 128kB requests and 16MB requests is probably not 
>really even noticeable - SCSI things with large per-command overheads 
>are just stupid)
> 
>Generating huge requests will automatically mean that they are 
>"unbreakable" from an IO scheduler perspective, so it's bad for latency 
>for other reqeusts once they've started.

Overlooked this one initially... We actually don't generate huge
requests, exactly because of that. Even if the device can do large
requests (most SATA disks today can do 32meg), we default to 512kB as
the largest one that we will build due to file system requests. It's
trivial to reduce that limit, see /sys/block//queue/max_sectors_kb.
That controls the maximum per-request size.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/16] raid acceleration and asynchronous offload api for 2.6.22

2007-05-01 Thread Nick Piggin

Dan Williams wrote:

I am pleased to release this latest spin of the raid acceleration
patches for merge consideration.  This release aims to address all
pending review items including MD bug fixes and async_tx api changes
from Neil, and concerns on channel management from Chris and others.

Data integrity tests using home grown scripts and 'iozone -V' are
passing.  I am open to suggestions for additional testing criteria.


Do you have performance numbers?

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 14/22] pollfs: pollable futex

2007-05-01 Thread Davi Arnaut
Eric Dumazet wrote:
> Davi Arnaut a écrit :
>> Eric Dumazet wrote:
>>> Davi Arnaut a écrit :
 Asynchronously wait for FUTEX_WAKE operation on a futex if it still 
 contains
 a given value. There can be only one futex wait per file descriptor. 
 However,
 it can be rearmed (possibly at a different address) anytime.

 The pollable futex approach is far superior (send and receive events from
 userspace or kernel) to eventfd and fixes (supercedes) FUTEX_FD at the 
 same time.

 Building block for pollable semaphores and user-defined events.

 Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>




 +
 +struct futex_event {
 +  union {
 +  void __user *addr;
 +  u64 padding;
 +  };
 +  int val;
 +};
>>> Hum... Here we might have a problem with 64 bit futexes, or private futexes
>>>
>>> So I believe this interface is not well defined and not expandable: in case 
>>> of 
>>> future additions to futexes, an old application compiled with an old 
>>> pollable 
>>> futex_event type might fail.
>>>
>> Hmm, how about:
>>
>> struct futex_event {
>>  union {
>>  void __user *addr;
>>  u64 padding;
>>  };
>>  union {
>>  int val;
>>  s64 val64;
>>  };
>>  /* whatever room is necessary for future improvements */
>> };
>>
>> I haven't been keeping up with 64 bit or private futexes. What else
>> could probably go wrong?
> 
> Well, that's the point : This interface is like an ioctl() one : pretty bad 
> if 
> not properly designed :)

I was merely mirroring the futex syscall arguments for FUTEX_WAIT. Will
those change? I hope not :)

> You probably need to stick one field containing one command or version 
> number, 
> something like that.

I'm a bit skeptical that we need versioning for such a simple operation
(command) as FUTEX_WAIT that takes an address and a value.

> 
> 
> struct futex_event {
>   int type;
>   union {
>   void __user *addr;
>   u64 padding;
>   };
>   union {
>   int val;
>   s64 val64;
>   };
> };
> 
> #define  FUTEX_EVENT_SHARED32  1
> #define  FUTEX_EVENT_SHARED64  2
> #define  FUTEX_EVENT_PRIVATE32 (128|1)
> #define  FUTEX_EVENT_PRIVATE64 (128|2)

I will take a look at the private futexes patches before commenting further.

> ...
> 
> Also, you should take care of alignements constraints (a 32bit user program 
> might run on a 64bit kernel)
> 

Compat code? or futex alignements constraints?

--
Davi Arnaut

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 01/10] compiler: define __attribute_unused__

2007-05-01 Thread Andrew Morton
On Tue, 1 May 2007 23:41:34 -0700 (PDT) David Rientjes <[EMAIL PROTECTED]> 
wrote:

> compiler: define __maybe_unused
> 
> Define __maybe_unused to apply to both functions or variables as
> __attribute__((unused)).  This will not emit a compile-time warning when
> a function or variable is declared but unreferenced.
> 
> We eventually want to change the name of __attribute_used__ to __used.
> 
> Signed-off-by: David Rientjes <[EMAIL PROTECTED]>
> ---
>  include/linux/compiler-gcc.h |1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -37,3 +37,4 @@
>  #define  noinline__attribute__((noinline))
>  #define __attribute_pure__   __attribute__((pure))
>  #define __attribute_const__  __attribute__((__const__))
> +#define __maybe_unused   __attribute__((unused))

Seems sane to me.  We'd need a definition in compiler-intel.h too.  I don't
know if ICC implements __attribute__((unused)) - probably it does.

I guess we can get by without any commentary describing __maybe_unused, but
I think __used would need one - it's pretty obscure.


[EMAIL PROTECTED] @code{used} attribute.
[EMAIL PROTECTED] used
+This attribute, attached to a function, means that code must be emitted
+for the function even if it appears that the function is not referenced.
+This is useful, for example, when the function is referenced only in
+inline assembly.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Natsemi DP83815 driver spaming

2007-05-01 Thread Tim Hockin

On 5/1/07, Rafal Bilski <[EMAIL PROTECTED]> wrote:


2.6.21.1 is first kernel which I'm using at this device. Earlier it was
WindowsCE terminal. It is hardware fault. Commenting out the code is my
way to avoid "wakeup" messages in log, but I don't want to change anything
in vanilla kernel. I'm lucky that NIC is working at all.


I'm not sure what the right answer is.  The code was designed to do
the right thing, and yet in your case it's broken.  Does it need to be
a build option to work around broken hardware?  Yuck.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 31/32] xen: --- drivers/net/xen-netfront.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)

2007-05-01 Thread Jeremy Fitzhardinge
Chris Wright wrote:
> It simply maps directly to the patch queue.  We do go back and fold
> things in and that should probably be done again, I agree.
>   

Yeah, I've folded them all up now.  Tracking xen-unstable is going to be
tricker though.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 31/32] xen: --- drivers/net/xen-netfront.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)

2007-05-01 Thread Chris Wright
* Herbert Xu ([EMAIL PROTECTED]) wrote:
> Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote:
> > ===
> > --- a/drivers/net/xen-netfront.c
> > +++ b/drivers/net/xen-netfront.c
> > @@ -1213,10 +1213,10 @@ static int netif_poll(struct net_device 
> 
> Any reason why xen-netfront isn't just in a single patch? It makes
> it a bit hard to review having it scattered around like this.

It simply maps directly to the patch queue.  We do go back and fold
things in and that should probably be done again, I agree.

thanks,
-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 01/10] compiler: define __attribute_unused__

2007-05-01 Thread Nick Piggin

Andrew Morton wrote:

On Tue, 1 May 2007 22:53:52 -0700 (PDT) David Rientjes <[EMAIL PROTECTED]> 
wrote:



On Wed, 2 May 2007, Alexey Dobriyan wrote:



On Tue, May 01, 2007 at 09:28:18PM -0700, David Rientjes wrote:


+#define __attribute_unused__   __attribute__((unused))


Suggest __unused which is shorter and looks compiler-neutral.



So you would also suggest renaming __attribute_used__ and all 48 of its 
uses to __used?



Or __needed or __unneeded.  None of them mean much to me and I'd be forever
going back to the definition to work out what was intended.

We're still in search of a name, IMO.  But once we have it, yeah, we should
update all present users.  We can do that over time: retain the old and new
definitions for a while.


maybe_unused?

The used attribute IMO is a bit easier to parse, so I don't think that
needs to be renamed.

Regarding the used vs needed thing, I don't think needed adds very much
and deviates from gcc terminology. Presumably if something is used it is
needed, and vice versa; similarly for unused.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v8

2007-05-01 Thread Ingo Molnar

* Mike Galbraith <[EMAIL PROTECTED]> wrote:

> > As usual, any sort of feedback, bugreport, fix and suggestion is 
> > more than welcome,
> 
> Greetings,
> 
> I noticed a (harmless) bounds warning triggered by the reduction in 
> size of array->bitmap.  Patchlet below.

thanks, applied! Your patch should also speed up task selection of RT 
tasks a bit. (the patch removes ~40 bytes of code). And on 64-bit we now 
fit into 2x 64-bit bitmap and thus are now down to two Find-First-Set 
instructions. Nice :)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 01/10] compiler: define __attribute_unused__

2007-05-01 Thread David Rientjes
On Wed, 2 May 2007, Rusty Russell wrote:

> OTOH, your point about "__unneeded" is well taken.  "__needed" and
> "__optional" perhaps?  But their feature is *exactly* that the don't
> look like the gcc attributes, hence avoid their semantic screwage.
> 

Hmm, __optional doesn't sound appropriate either.  Since this is going to 
be defined to be __attribute__ ((unused)), it can apply to both functions 
and variables.  It should be applied to a function if it truly is 
unreferenced within the tree (and there are several examples of this 
current HEAD) and we don't want to use __needed because it still emits the 
function code even though it suppresses the warning.  So saying a function 
that has no callers is "__optional" makes no sense since its code isn't 
going to be emitted in gcc >=3.4.

What's your opinion of my __needed and __maybe_unused idea such as the 
following?



compiler: define __maybe_unused

Define __maybe_unused to apply to both functions or variables as
__attribute__((unused)).  This will not emit a compile-time warning when
a function or variable is declared but unreferenced.

We eventually want to change the name of __attribute_used__ to __used.

Signed-off-by: David Rientjes <[EMAIL PROTECTED]>
---
 include/linux/compiler-gcc.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -37,3 +37,4 @@
 #define  noinline  __attribute__((noinline))
 #define __attribute_pure__ __attribute__((pure))
 #define __attribute_const____attribute__((__const__))
+#define __maybe_unused __attribute__((unused))
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 14/22] pollfs: pollable futex

2007-05-01 Thread Eric Dumazet

Davi Arnaut a écrit :

Eric Dumazet wrote:

Davi Arnaut a écrit :

Asynchronously wait for FUTEX_WAKE operation on a futex if it still contains
a given value. There can be only one futex wait per file descriptor. However,
it can be rearmed (possibly at a different address) anytime.

The pollable futex approach is far superior (send and receive events from
userspace or kernel) to eventfd and fixes (supercedes) FUTEX_FD at the same 
time.

Building block for pollable semaphores and user-defined events.

Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>

---
 fs/pollfs/Makefile |1 
 fs/pollfs/futex.c  |  154 +

 init/Kconfig   |7 ++
 3 files changed, 162 insertions(+)

Index: linux-2.6/fs/pollfs/Makefile
===
--- linux-2.6.orig/fs/pollfs/Makefile
+++ linux-2.6/fs/pollfs/Makefile
@@ -3,3 +3,4 @@ pollfs-y := file.o
 
 pollfs-$(CONFIG_POLLFS_SIGNAL) += signal.o

 pollfs-$(CONFIG_POLLFS_TIMER) += timer.o
+pollfs-$(CONFIG_POLLFS_FUTEX) += futex.o
Index: linux-2.6/fs/pollfs/futex.c
===
--- /dev/null
+++ linux-2.6/fs/pollfs/futex.c
@@ -0,0 +1,154 @@
+/*
+ * pollable futex
+ *
+ * Copyright (C) 2007 Davi E. M. Arnaut
+ *
+ * Licensed under the GNU GPL. See the file COPYING for details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct futex_event {
+   union {
+   void __user *addr;
+   u64 padding;
+   };
+   int val;
+};

Hum... Here we might have a problem with 64 bit futexes, or private futexes

So I believe this interface is not well defined and not expandable: in case of 
future additions to futexes, an old application compiled with an old pollable 
futex_event type might fail.




Hmm, how about:

struct futex_event {
union {
void __user *addr;
u64 padding;
};
union {
int val;
s64 val64;
};
/* whatever room is necessary for future improvements */
};

I haven't been keeping up with 64 bit or private futexes. What else
could probably go wrong?


Well, that's the point : This interface is like an ioctl() one : pretty bad if 
not properly designed :)


You probably need to stick one field containing one command or version number, 
something like that.



struct futex_event {
int type;
union {
void __user *addr;
u64 padding;
};
union {
int val;
s64 val64;
};
};

#define  FUTEX_EVENT_SHARED32  1
#define  FUTEX_EVENT_SHARED64  2
#define  FUTEX_EVENT_PRIVATE32 (128|1)
#define  FUTEX_EVENT_PRIVATE64 (128|2)

...

Also, you should take care of alignements constraints (a 32bit user program 
might run on a 64bit kernel)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v8

2007-05-01 Thread Mike Galbraith
On Tue, 2007-05-01 at 23:22 +0200, Ingo Molnar wrote:
> i'm pleased to announce release -v8 of the CFS scheduler patchset. (The 
> main goal of CFS is to implement "desktop scheduling" with as high 
> quality as technically possible.)
...

> As usual, any sort of feedback, bugreport, fix and suggestion is more 
> than welcome,

Greetings,

I noticed a (harmless) bounds warning triggered by the reduction in size
of array->bitmap.  Patchlet below.

-Mike

  CC  kernel/sched.o
kernel/sched_rt.c: In function ‘load_balance_start_rt’:
include/asm-generic/bitops/sched.h:30: warning: array subscript is above array 
bounds
kernel/sched_rt.c: In function ‘pick_next_task_rt’:
include/asm-generic/bitops/sched.h:30: warning: array subscript is above array 
bounds

--- linux-2.6.21-cfs.v8/include/asm-generic/bitops/sched.h.org  2007-05-02 
07:16:47.0 +0200
+++ linux-2.6.21-cfs.v8/include/asm-generic/bitops/sched.h  2007-05-02 
07:20:45.0 +0200
@@ -6,28 +6,23 @@
 
 /*
  * Every architecture must define this function. It's the fastest
- * way of searching a 140-bit bitmap where the first 100 bits are
- * unlikely to be set. It's guaranteed that at least one of the 140
- * bits is cleared.
+ * way of searching a 100-bit bitmap.  It's guaranteed that at least
+ * one of the 100 bits is cleared.
  */
 static inline int sched_find_first_bit(const unsigned long *b)
 {
 #if BITS_PER_LONG == 64
-   if (unlikely(b[0]))
+   if (b[0])
return __ffs(b[0]);
-   if (likely(b[1]))
-   return __ffs(b[1]) + 64;
-   return __ffs(b[2]) + 128;
+   return __ffs(b[1]) + 64;
 #elif BITS_PER_LONG == 32
-   if (unlikely(b[0]))
+   if (b[0])
return __ffs(b[0]);
-   if (unlikely(b[1]))
+   if (b[1])
return __ffs(b[1]) + 32;
-   if (unlikely(b[2]))
+   if (b[2])
return __ffs(b[2]) + 64;
-   if (b[3])
-   return __ffs(b[3]) + 96;
-   return __ffs(b[4]) + 128;
+   return __ffs(b[3]) + 96;
 #else
 #error BITS_PER_LONG not defined
 #endif


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Natsemi DP83815 driver spaming

2007-05-01 Thread Rafał Bilski
[...]
> 
>> With code commented out I have 1 error / 3 transmitted packets from
>> DP83815C. I have 1 error / 10 transmitted packets to DP83815C. Maybe
>> it works at all because I have short cable, only 10m long.
>> I don't remember any errors with plain 2.6.21.1.
Sorry. I mean transmition errors, but of course log was full of "wakeup" 
messages. 
> Well, I certainly haven't changed anything in there.  If the behavior
> has changed in recent kernels, check the rest of the diffs.
> 
> Tim

2.6.21.1 is first kernel which I'm using at this device. Earlier it was 
WindowsCE terminal. It is hardware fault. Commenting out the code is my 
way to avoid "wakeup" messages in log, but I don't want to change anything 
in vanilla kernel. I'm lucky that NIC is working at all.

Thank You
Rafał



--
NIE KUPUJ!!!
...zanim nie porownasz cen >> http://link.interia.pl/f1a5e



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 31/32] xen: --- drivers/net/xen-netfront.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)

2007-05-01 Thread Herbert Xu
Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote:
> ===
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -1213,10 +1213,10 @@ static int netif_poll(struct net_device 

Any reason why xen-netfront isn't just in a single patch? It makes
it a bit hard to review having it scattered around like this.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 01/10] compiler: define __attribute_unused__

2007-05-01 Thread Rusty Russell
On Tue, 2007-05-01 at 23:06 -0700, David Rientjes wrote:
> On Wed, 2 May 2007, Rusty Russell wrote:
> 
> > Adding this macro doesn't give us anything that simply saying
> > "__attribute__((unused))" doesn't give.  But it does add a layer of
> > kernel-specific indirection.
> > 
> 
> That's obviously true since we're defining __attribute_unused__ to be 
> __attribute__((unused)).

Hi David,

I'm horribly familiar with this issue, BTW, so we don't need so many
words 8)

> The patched version makes this:
> 
>   int type __attribute_unused__ = 0;
>
> which definitely tells you that you're using a compiler attribute that 
> will be attached to that automatic.  In your case:
> 
>   int type __unneeded = 0;
> 
> doesn't say anything in this case.  It doesn't resemble any attribute that 
> a programmer might be familiar with and begs the question of why we've 
> declared it if it's truly "unneeded"?

Your version makes one wonder why they didn't use
"__attribute__((unused))".  Obviously the __attribute_unused__ macro
exists for a reason, so they wonder what's the difference between that
and the attribute?  The answer: nothing.

OTOH, your point about "__unneeded" is well taken.  "__needed" and
"__optional" perhaps?  But their feature is *exactly* that the don't
look like the gcc attributes, hence avoid their semantic screwage.

> By the way, there are tons of these instances where __attribute__((used)) 
> needs to be added in driver code to suppress unreferenced warnings.

Sure; historically we refactor around it.  But warnings are now so
commonplace few people care 8(

Cheers,
Rusty.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 01/10] compiler: define __attribute_unused__

2007-05-01 Thread WANG Cong
On Tue, May 01, 2007 at 10:53:52PM -0700, David Rientjes wrote:
>On Wed, 2 May 2007, Alexey Dobriyan wrote:
>
>> On Tue, May 01, 2007 at 09:28:18PM -0700, David Rientjes wrote:
>> > +#define __attribute_unused__  __attribute__((unused))
>> 
>> Suggest __unused which is shorter and looks compiler-neutral.
>> 
>
>So you would also suggest renaming __attribute_used__ and all 48 of its 
>uses to __used?

I suggest. ;-p 
'__attribute_unused__' is really long. I would prefer 
'__attribute__((unused))', since your macro doesn't let me type much less 
characters.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] edd: Switch to refcounting PCI APIs

2007-05-01 Thread Andrew Morton
On Mon, 23 Apr 2007 14:52:55 +0100 Alan Cox <[EMAIL PROTECTED]> wrote:

> Signed-off-by: Alan Cox <[EMAIL PROTECTED]>
> 
> diff -u --new-file --recursive --exclude-from /usr/src/exclude 
> linux.vanilla-2.6.21-rc6-mm1/drivers/firmware/edd.c 
> linux-2.6.21-rc6-mm1/drivers/firmware/edd.c
> --- linux.vanilla-2.6.21-rc6-mm1/drivers/firmware/edd.c   2007-04-12 
> 14:14:43.0 +0100
> +++ linux-2.6.21-rc6-mm1/drivers/firmware/edd.c   2007-04-23 
> 11:50:57.185158272 +0100
> @@ -669,7 +669,7 @@
>   struct edd_info *info = edd_dev_get_info(edev);
>  
>   if (edd_dev_is_type(edev, "PCI")) {
> - return pci_find_slot(info->params.interface_path.pci.bus,
> + return pci_get_slot(info->params.interface_path.pci.bus,
>
> PCI_DEVFN(info->params.interface_path.pci.slot,
>  info->params.interface_path.pci.
>  function));
> @@ -682,9 +682,12 @@
>  {
>  
>   struct pci_dev *pci_dev = edd_get_pci_dev(edev);
> + int ret;
>   if (!pci_dev)
>   return 1;
> - return sysfs_create_link(&edev->kobj,&pci_dev->dev.kobj,"pci_dev");
> + ret = sysfs_create_link(&edev->kobj,&pci_dev->dev.kobj,"pci_dev");
> + pci_dev_put(pci_dev);
> + return ret;
>  }
> 

This escaped notice:

 
drivers/firmware/edd.c: In function 'edd_get_pci_dev':
drivers/firmware/edd.c:673: warning: passing argument 1 of 'pci_get_slot' makes 
pointer from integer without a cast



But this didn't:

Calling initcall 0xc0534e00: edd_init+0x0/0x2c0()
BIOS EDD facility v0.16 2004-Jun-25, 6 devices found
BUG: unable to handle kernel NULL pointer dereference at virtual address 
0014
 printing eip:
c029ed16
*pde = 
Oops:  [#1]
SMP 
Modules linked in:
CPU:1
EIP:0060:[]Not tainted VLI
EFLAGS: 00010286   (2.6.21-mm1 #2)
EIP is at pci_get_slot+0x26/0x90
eax: c04e9280   ebx:    ecx: 0204   edx: 0001
esi: 0020   edi: c0499789   ebp: c242ff30   esp: c242ff18
ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
Process swapper (pid: 1, ti=c242e000 task=c242d550 task.ti=c242e000)
Stack: c04ff358 000d c242ff30 c01b8b6f c326ec0c c0568ac1 c242ff70 c053505e 
   c326ec18 c04a2d9a 0081 0006   0001  
   c326ec18 c0568a92 c0568a92    c242ffe0 c05185c2 
Call Trace:
 [] show_trace_log_lvl+0x1a/0x30
 [] show_stack_log_lvl+0xa9/0xd0
 [] show_registers+0x1e9/0x2f0
 [] die+0x10f/0x240
 [] do_page_fault+0x2d9/0x610
 [] error_code+0x72/0x78
 [] edd_init+0x25e/0x2c0
 [] kernel_init+0x122/0x2f0
 [] kernel_thread_helper+0x7/0x14
 ===
Code: 5d c3 8d 76 00 55 89 e5 56 89 d6 53 89 c3 83 ec 10 89 e0 25 00 e0 ff ff 
f7 40 14 00 ff ff 0f 75 46 b8 80 92 4e c0 e8 2a 7b e9 ff <8b> 43 14 8d 4b 14 eb 
04 89 f6 89 d0 8b 10 0f 18 02 90 39 c8 74 
EIP: [] pci_get_slot+0x26/0x90 SS:ESP 0068:c242ff18

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16/16] iop3xx: Surface the iop3xx DMA and AAU units to the iop-adma driver

2007-05-01 Thread Dan Williams
Adds the platform device definitions and the architecture specific support
routines (i.e. register initialization and descriptor formats) for the
iop-adma driver.

Changelog:
* add support for > 1k zero sum buffer sizes
* added dma/aau platform devices to iq80321 and iq80332 setup
* fixed the calculation in iop_desc_is_aligned
* support xor buffer sizes larger than 16MB
* fix places where software descriptors are assumed to be contiguous, only
hardware descriptors are contiguous
for up to a PAGE_SIZE buffer size
* convert to async_tx
* add interrupt support
* add platform devices for 80219 boards
* do not call platform register macros in driver code
* remove switch() statements for compatible register offsets/layouts
* change over to bitmap based capabilities

Cc: Russell King <[EMAIL PROTECTED]>
Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 arch/arm/mach-iop32x/glantank.c|2 
 arch/arm/mach-iop32x/iq31244.c |5 
 arch/arm/mach-iop32x/iq80321.c |3 
 arch/arm/mach-iop32x/n2100.c   |2 
 arch/arm/mach-iop33x/iq80331.c |3 
 arch/arm/mach-iop33x/iq80332.c |3 
 arch/arm/plat-iop/Makefile |2 
 arch/arm/plat-iop/adma.c   |  216 
 include/asm-arm/arch-iop32x/adma.h |5 
 include/asm-arm/arch-iop33x/adma.h |5 
 include/asm-arm/hardware/iop3xx-adma.h |  893 
 include/asm-arm/hardware/iop3xx.h  |   68 --
 12 files changed, 1147 insertions(+), 60 deletions(-)

diff --git a/arch/arm/mach-iop32x/glantank.c b/arch/arm/mach-iop32x/glantank.c
index 45f4f13..2e0099b 100644
--- a/arch/arm/mach-iop32x/glantank.c
+++ b/arch/arm/mach-iop32x/glantank.c
@@ -180,6 +180,8 @@ static void __init glantank_init_machine(void)
platform_device_register(&iop3xx_i2c1_device);
platform_device_register(&glantank_flash_device);
platform_device_register(&glantank_serial_device);
+   platform_device_register(&iop3xx_dma_0_channel);
+   platform_device_register(&iop3xx_dma_1_channel);
 
pm_power_off = glantank_power_off;
 }
diff --git a/arch/arm/mach-iop32x/iq31244.c b/arch/arm/mach-iop32x/iq31244.c
index 60e7430..c0d077c 100644
--- a/arch/arm/mach-iop32x/iq31244.c
+++ b/arch/arm/mach-iop32x/iq31244.c
@@ -295,9 +295,14 @@ static void __init iq31244_init_machine(void)
platform_device_register(&iop3xx_i2c1_device);
platform_device_register(&iq31244_flash_device);
platform_device_register(&iq31244_serial_device);
+   platform_device_register(&iop3xx_dma_0_channel);
+   platform_device_register(&iop3xx_dma_1_channel);
 
if (is_ep80219())
pm_power_off = ep80219_power_off;
+
+   if (!is_80219())
+   platform_device_register(&iop3xx_aau_channel);
 }
 
 static int __init force_ep80219_setup(char *str)
diff --git a/arch/arm/mach-iop32x/iq80321.c b/arch/arm/mach-iop32x/iq80321.c
index 361c70c..474ec2a 100644
--- a/arch/arm/mach-iop32x/iq80321.c
+++ b/arch/arm/mach-iop32x/iq80321.c
@@ -180,6 +180,9 @@ static void __init iq80321_init_machine(void)
platform_device_register(&iop3xx_i2c1_device);
platform_device_register(&iq80321_flash_device);
platform_device_register(&iq80321_serial_device);
+   platform_device_register(&iop3xx_dma_0_channel);
+   platform_device_register(&iop3xx_dma_1_channel);
+   platform_device_register(&iop3xx_aau_channel);
 }
 
 MACHINE_START(IQ80321, "Intel IQ80321")
diff --git a/arch/arm/mach-iop32x/n2100.c b/arch/arm/mach-iop32x/n2100.c
index 5f07344..8e6fe13 100644
--- a/arch/arm/mach-iop32x/n2100.c
+++ b/arch/arm/mach-iop32x/n2100.c
@@ -245,6 +245,8 @@ static void __init n2100_init_machine(void)
platform_device_register(&iop3xx_i2c0_device);
platform_device_register(&n2100_flash_device);
platform_device_register(&n2100_serial_device);
+   platform_device_register(&iop3xx_dma_0_channel);
+   platform_device_register(&iop3xx_dma_1_channel);
 
pm_power_off = n2100_power_off;
 
diff --git a/arch/arm/mach-iop33x/iq80331.c b/arch/arm/mach-iop33x/iq80331.c
index 1a9e361..b4d12bf 100644
--- a/arch/arm/mach-iop33x/iq80331.c
+++ b/arch/arm/mach-iop33x/iq80331.c
@@ -135,6 +135,9 @@ static void __init iq80331_init_machine(void)
platform_device_register(&iop33x_uart0_device);
platform_device_register(&iop33x_uart1_device);
platform_device_register(&iq80331_flash_device);
+   platform_device_register(&iop3xx_dma_0_channel);
+   platform_device_register(&iop3xx_dma_1_channel);
+   platform_device_register(&iop3xx_aau_channel);
 }
 
 MACHINE_START(IQ80331, "Intel IQ80331")
diff --git a/arch/arm/mach-iop33x/iq80332.c b/arch/arm/mach-iop33x/iq80332.c
index 96d6f0f..2abb2d8 100644
--- a/arch/arm/mach-iop33x/iq80332.c
+++ b/arch/arm/mach-iop33x/iq80332.c
@@ -135,6 +135,9 @@ static void __init iq80332_init_machine(void)
platform_device_register(&iop33x_uar

Re: [patch 01/10] compiler: define __attribute_unused__

2007-05-01 Thread David Rientjes
On Tue, 1 May 2007, David Rientjes wrote:

> The patched version makes this:
> 
>   int type __attribute_unused__ = 0;
> 
> which definitely tells you that you're using a compiler attribute that 
> will be attached to that automatic.  In your case:
> 
>   int type __unneeded = 0;
> 
> doesn't say anything in this case.  It doesn't resemble any attribute that 
> a programmer might be familiar with and begs the question of why we've 
> declared it if it's truly "unneeded"?
> 

One possible way to remedy this situation is with __needed and 
__maybe_unneeded.

__needed would be defined to __attribute__ ((used)), which would apply to 
functions only and specify that its code needs to be emitted anyway even 
though it appears to be unreferenced.  This is needed for gcc >=3.4.  In 
gcc <3.4, this gets defined to be __attribute__ ((unused)) to simply 
suppress the warning.  So now all functions that are unreferenced except 
in inline assembly get __needed appended.

__maybe_unneeded would be defined to __attribute__ ((unused)).  It can 
apply to either functions or variables to suppress the warning if they are 
unused.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 15/16] iop13xx: Surface the iop13xx adma units to the iop-adma driver

2007-05-01 Thread Dan Williams
Adds the platform device definitions and the architecture specific
support routines (i.e. register initialization and descriptor formats) for the
iop-adma driver.

Changelog:
* added 'descriptor pool size' to the platform data
* add base support for buffer sizes larger than 16MB (hw max)
* build error fix from Kirill A. Shutemov
* rebase for async_tx changes
* add interrupt support
* do not call platform register macros in driver code

Cc: Russell King <[EMAIL PROTECTED]>
Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 arch/arm/mach-iop13xx/setup.c  |  208 
 include/asm-arm/arch-iop13xx/adma.h|  545 
 include/asm-arm/arch-iop13xx/iop13xx.h |   34 +-
 3 files changed, 766 insertions(+), 21 deletions(-)

diff --git a/arch/arm/mach-iop13xx/setup.c b/arch/arm/mach-iop13xx/setup.c
index 9a46bcd..662d1e2 100644
--- a/arch/arm/mach-iop13xx/setup.c
+++ b/arch/arm/mach-iop13xx/setup.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define IOP13XX_UART_XTAL 4000
 #define IOP13XX_SETUP_DEBUG 0
@@ -236,6 +237,129 @@ static unsigned long iq8134x_probe_flash_size(void)
 }
 #endif
 
+/* ADMA Channels */
+static struct resource iop13xx_adma_0_resources[] = {
+   [0] = {
+   .start = IOP13XX_ADMA_PHYS_BASE(0),
+   .end = IOP13XX_ADMA_UPPER_PA(0),
+   .flags = IORESOURCE_MEM,
+   },
+   [1] = {
+   .start = IRQ_IOP13XX_ADMA0_EOT,
+   .end = IRQ_IOP13XX_ADMA0_EOT,
+   .flags = IORESOURCE_IRQ
+   },
+   [2] = {
+   .start = IRQ_IOP13XX_ADMA0_EOC,
+   .end = IRQ_IOP13XX_ADMA0_EOC,
+   .flags = IORESOURCE_IRQ
+   },
+   [3] = {
+   .start = IRQ_IOP13XX_ADMA0_ERR,
+   .end = IRQ_IOP13XX_ADMA0_ERR,
+   .flags = IORESOURCE_IRQ
+   }
+};
+
+static struct resource iop13xx_adma_1_resources[] = {
+   [0] = {
+   .start = IOP13XX_ADMA_PHYS_BASE(1),
+   .end = IOP13XX_ADMA_UPPER_PA(1),
+   .flags = IORESOURCE_MEM,
+   },
+   [1] = {
+   .start = IRQ_IOP13XX_ADMA1_EOT,
+   .end = IRQ_IOP13XX_ADMA1_EOT,
+   .flags = IORESOURCE_IRQ
+   },
+   [2] = {
+   .start = IRQ_IOP13XX_ADMA1_EOC,
+   .end = IRQ_IOP13XX_ADMA1_EOC,
+   .flags = IORESOURCE_IRQ
+   },
+   [3] = {
+   .start = IRQ_IOP13XX_ADMA1_ERR,
+   .end = IRQ_IOP13XX_ADMA1_ERR,
+   .flags = IORESOURCE_IRQ
+   }
+};
+
+static struct resource iop13xx_adma_2_resources[] = {
+   [0] = {
+   .start = IOP13XX_ADMA_PHYS_BASE(2),
+   .end = IOP13XX_ADMA_UPPER_PA(2),
+   .flags = IORESOURCE_MEM,
+   },
+   [1] = {
+   .start = IRQ_IOP13XX_ADMA2_EOT,
+   .end = IRQ_IOP13XX_ADMA2_EOT,
+   .flags = IORESOURCE_IRQ
+   },
+   [2] = {
+   .start = IRQ_IOP13XX_ADMA2_EOC,
+   .end = IRQ_IOP13XX_ADMA2_EOC,
+   .flags = IORESOURCE_IRQ
+   },
+   [3] = {
+   .start = IRQ_IOP13XX_ADMA2_ERR,
+   .end = IRQ_IOP13XX_ADMA2_ERR,
+   .flags = IORESOURCE_IRQ
+   }
+};
+
+static u64 iop13xx_adma_dmamask = DMA_64BIT_MASK;
+static struct iop_adma_platform_data iop13xx_adma_0_data = {
+   .hw_id = 0,
+   .pool_size = PAGE_SIZE,
+};
+
+static struct iop_adma_platform_data iop13xx_adma_1_data = {
+   .hw_id = 1,
+   .pool_size = PAGE_SIZE,
+};
+
+static struct iop_adma_platform_data iop13xx_adma_2_data = {
+   .hw_id = 2,
+   .pool_size = PAGE_SIZE,
+};
+
+/* The ids are fixed up later in iop13xx_platform_init */
+static struct platform_device iop13xx_adma_0_channel = {
+   .name = "iop-adma",
+   .id = 0,
+   .num_resources = 4,
+   .resource = iop13xx_adma_0_resources,
+   .dev = {
+   .dma_mask = &iop13xx_adma_dmamask,
+   .coherent_dma_mask = DMA_64BIT_MASK,
+   .platform_data = (void *) &iop13xx_adma_0_data,
+   },
+};
+
+static struct platform_device iop13xx_adma_1_channel = {
+   .name = "iop-adma",
+   .id = 0,
+   .num_resources = 4,
+   .resource = iop13xx_adma_1_resources,
+   .dev = {
+   .dma_mask = &iop13xx_adma_dmamask,
+   .coherent_dma_mask = DMA_64BIT_MASK,
+   .platform_data = (void *) &iop13xx_adma_1_data,
+   },
+};
+
+static struct platform_device iop13xx_adma_2_channel = {
+   .name = "iop-adma",
+   .id = 0,
+   .num_resources = 4,
+   .resource = iop13xx_adma_2_resources,
+   .dev = {
+   .dma_mask = &iop13xx_adma_dmamask,
+   .coherent_dma_mask = DMA_64BIT_MASK,
+   .platform_data = (void *) &iop13xx_adma_2_data,
+   },
+};
+
 void __init iop13xx_map_io(void)
 {
/* Initialize the Sta

[PATCH 13/16] md: remove raid5 compute_block and compute_parity5

2007-05-01 Thread Dan Williams
replaced by raid5_run_ops

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 drivers/md/raid5.c |  124 
 1 files changed, 0 insertions(+), 124 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index c9b91e3..74ce354 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1501,130 +1501,6 @@ static void copy_data(int frombio, struct bio *bio,
   }   \
} while(0)
 
-
-static void compute_block(struct stripe_head *sh, int dd_idx)
-{
-   int i, count, disks = sh->disks;
-   void *ptr[MAX_XOR_BLOCKS], *dest, *p;
-
-   PRINTK("compute_block, stripe %llu, idx %d\n", 
-   (unsigned long long)sh->sector, dd_idx);
-
-   dest = page_address(sh->dev[dd_idx].page);
-   memset(dest, 0, STRIPE_SIZE);
-   count = 0;
-   for (i = disks ; i--; ) {
-   if (i == dd_idx)
-   continue;
-   p = page_address(sh->dev[i].page);
-   if (test_bit(R5_UPTODATE, &sh->dev[i].flags))
-   ptr[count++] = p;
-   else
-   printk(KERN_ERR "compute_block() %d, stripe %llu, %d"
-   " not present\n", dd_idx,
-   (unsigned long long)sh->sector, i);
-
-   check_xor();
-   }
-   if (count)
-   xor_block(count, STRIPE_SIZE, dest, ptr);
-   set_bit(R5_UPTODATE, &sh->dev[dd_idx].flags);
-}
-
-static void compute_parity5(struct stripe_head *sh, int method)
-{
-   raid5_conf_t *conf = sh->raid_conf;
-   int i, pd_idx = sh->pd_idx, disks = sh->disks, count;
-   void *ptr[MAX_XOR_BLOCKS], *dest;
-   struct bio *chosen;
-
-   PRINTK("compute_parity5, stripe %llu, method %d\n",
-   (unsigned long long)sh->sector, method);
-
-   count = 0;
-   dest = page_address(sh->dev[pd_idx].page);
-   switch(method) {
-   case READ_MODIFY_WRITE:
-   BUG_ON(!test_bit(R5_UPTODATE, &sh->dev[pd_idx].flags));
-   for (i=disks ; i-- ;) {
-   if (i==pd_idx)
-   continue;
-   if (sh->dev[i].towrite &&
-   test_bit(R5_UPTODATE, &sh->dev[i].flags)) {
-   ptr[count++] = page_address(sh->dev[i].page);
-   chosen = sh->dev[i].towrite;
-   sh->dev[i].towrite = NULL;
-
-   if (test_and_clear_bit(R5_Overlap, 
&sh->dev[i].flags))
-   wake_up(&conf->wait_for_overlap);
-
-   BUG_ON(sh->dev[i].written);
-   sh->dev[i].written = chosen;
-   check_xor();
-   }
-   }
-   break;
-   case RECONSTRUCT_WRITE:
-   memset(dest, 0, STRIPE_SIZE);
-   for (i= disks; i-- ;)
-   if (i!=pd_idx && sh->dev[i].towrite) {
-   chosen = sh->dev[i].towrite;
-   sh->dev[i].towrite = NULL;
-
-   if (test_and_clear_bit(R5_Overlap, 
&sh->dev[i].flags))
-   wake_up(&conf->wait_for_overlap);
-
-   BUG_ON(sh->dev[i].written);
-   sh->dev[i].written = chosen;
-   }
-   break;
-   case CHECK_PARITY:
-   break;
-   }
-   if (count) {
-   xor_block(count, STRIPE_SIZE, dest, ptr);
-   count = 0;
-   }
-   
-   for (i = disks; i--;)
-   if (sh->dev[i].written) {
-   sector_t sector = sh->dev[i].sector;
-   struct bio *wbi = sh->dev[i].written;
-   while (wbi && wbi->bi_sector < sector + STRIPE_SECTORS) 
{
-   copy_data(1, wbi, sh->dev[i].page, sector);
-   wbi = r5_next_bio(wbi, sector);
-   }
-
-   set_bit(R5_LOCKED, &sh->dev[i].flags);
-   set_bit(R5_UPTODATE, &sh->dev[i].flags);
-   }
-
-   switch(method) {
-   case RECONSTRUCT_WRITE:
-   case CHECK_PARITY:
-   for (i=disks; i--;)
-   if (i != pd_idx) {
-   ptr[count++] = page_address(sh->dev[i].page);
-   check_xor();
-   }
-   break;
-   case READ_MODIFY_WRITE:
-   for (i = disks; i--;)
-   if (sh->dev[i].written) {
-   ptr[count++] = page_address(sh->dev[i].page);
-   check_xor();
-   }

[PATCH 14/16] dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines

2007-05-01 Thread Dan Williams
This is a driver for the iop DMA/AAU/ADMA units which are capable of pq_xor,
pq_update, pq_zero_sum, xor, dual_xor, xor_zero_sum, fill, copy+crc, and copy
operations.

Changelog:
* fixed a slot allocation bug in do_iop13xx_adma_xor that caused too few
slots to be requested eventually leading to data corruption
* enabled the slot allocation routine to attempt to free slots before
returning -ENOMEM
* switched the cleanup routine to solely use the software chain and the
status register to determine if a descriptor is complete.  This is
necessary to support other IOP engines that do not have status writeback
capability
* make the driver iop generic
* modified the allocation routines to understand allocating a group of
slots for a single operation
* added a null xor initialization operation for the xor only channel on
iop3xx
* support xor operations on buffers larger than the hardware maximum
* split the do_* routines into separate prep, src/dest set, submit stages
* added async_tx support (dependent operations initiation at cleanup time)
* simplified group handling
* added interrupt support (callbacks via tasklets)
* brought the pending depth inline with ioat (i.e. 4 descriptors)
* drop dma mapping methods, suggested by Chris Leech
* don't use inline in C files, Adrian Bunk
* remove static tasklet declarations
* make iop_adma_alloc_slots easier to read and remove chances for a
corrupted descriptor chain
* fix locking bug in iop_adma_alloc_chan_resources, Benjamin Herrenschmidt
* convert capabilities over to dma_cap_mask_t

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 drivers/dma/Kconfig |8 
 drivers/dma/Makefile|1 
 drivers/dma/iop-adma.c  | 1464 +++
 include/asm-arm/hardware/iop_adma.h |  121 +++
 4 files changed, 1594 insertions(+), 0 deletions(-)

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 292ddad..1c2ae4e 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -40,4 +40,12 @@ config INTEL_IOATDMA
default m
---help---
  Enable support for the Intel(R) I/OAT DMA engine.
+
+config INTEL_IOP_ADMA
+tristate "Intel IOP ADMA support"
+depends on DMA_ENGINE && (ARCH_IOP32X || ARCH_IOP33X || ARCH_IOP13XX)
+default m
+---help---
+  Enable support for the Intel(R) IOP Series RAID engines.
+
 endmenu
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index 6a99341..8ebf10d 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -1,4 +1,5 @@
 obj-$(CONFIG_DMA_ENGINE) += dmaengine.o
 obj-$(CONFIG_NET_DMA) += iovlock.o
 obj-$(CONFIG_INTEL_IOATDMA) += ioatdma.o
+obj-$(CONFIG_INTEL_IOP_ADMA) += iop-adma.o
 obj-$(CONFIG_ASYNC_TX_DMA) += async_tx.o xor.o
diff --git a/drivers/dma/iop-adma.c b/drivers/dma/iop-adma.c
new file mode 100644
index 000..0d85f12
--- /dev/null
+++ b/drivers/dma/iop-adma.c
@@ -0,0 +1,1464 @@
+/*
+ * Copyright(c) 2006 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called COPYING.
+ */
+
+/*
+ * This driver supports the asynchrounous DMA copy and RAID engines available
+ * on the Intel Xscale(R) family of I/O Processors (IOP 32x, 33x, 134x)
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define to_iop_adma_chan(chan) container_of(chan, struct iop_adma_chan, common)
+#define to_iop_adma_device(dev) container_of(dev, struct iop_adma_device, 
common)
+#define tx_to_iop_adma_slot(tx) container_of(tx, struct iop_adma_desc_slot, 
async_tx)
+
+/**
+ * iop_adma_free_slots - flags descriptor slots for reuse
+ * @slot: Slot to free
+ * Caller must hold &iop_chan->lock while calling this function
+ */
+static void iop_adma_free_slots(struct iop_adma_desc_slot *slot)
+{
+   int stride = slot->slots_per_op;
+
+   while (stride--) {
+   slot->slots_per_op = 0;
+   slot = list_entry(slot->slot_node.next,
+   struct iop_adma_desc_slot,
+   slot_node);
+   }
+}
+
+static dma_cookie_t
+iop_adma_run_tx_complete_actions(struct iop_adma_desc_slot *desc,
+   stru

[PATCH 12/16] md: move raid5 io requests to raid5_run_ops

2007-05-01 Thread Dan Williams
handle_stripe now only updates the state of stripes.  All execution of
operations is moved to raid5_run_ops.

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 drivers/md/raid5.c |   68 
 1 files changed, 10 insertions(+), 58 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 1966713..c9b91e3 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2388,6 +2388,8 @@ static void handle_stripe5(struct stripe_head *sh)
PRINTK("Read_old block %d for 
r-m-w\n", i);
set_bit(R5_LOCKED, &dev->flags);
set_bit(R5_Wantread, 
&dev->flags);
+   if 
(!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending))
+   sh->ops.count++;
locked++;
} else {
set_bit(STRIPE_DELAYED, 
&sh->state);
@@ -2408,6 +2410,8 @@ static void handle_stripe5(struct stripe_head *sh)
PRINTK("Read_old block %d for 
Reconstruct\n", i);
set_bit(R5_LOCKED, &dev->flags);
set_bit(R5_Wantread, 
&dev->flags);
+   if 
(!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending))
+   sh->ops.count++;
locked++;
} else {
set_bit(STRIPE_DELAYED, 
&sh->state);
@@ -2506,6 +2510,8 @@ static void handle_stripe5(struct stripe_head *sh)
 
set_bit(R5_LOCKED, &dev->flags);
set_bit(R5_Wantwrite, &dev->flags);
+   if (!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending))
+   sh->ops.count++;
clear_bit(STRIPE_DEGRADED, &sh->state);
locked++;
set_bit(STRIPE_INSYNC, &sh->state);
@@ -2527,12 +2533,16 @@ static void handle_stripe5(struct stripe_head *sh)
dev = &sh->dev[failed_num];
if (!test_bit(R5_ReWrite, &dev->flags)) {
set_bit(R5_Wantwrite, &dev->flags);
+   if (!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending))
+   sh->ops.count++;
set_bit(R5_ReWrite, &dev->flags);
set_bit(R5_LOCKED, &dev->flags);
locked++;
} else {
/* let's read it back */
set_bit(R5_Wantread, &dev->flags);
+   if (!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending))
+   sh->ops.count++;
set_bit(R5_LOCKED, &dev->flags);
locked++;
}
@@ -2642,64 +2652,6 @@ static void handle_stripe5(struct stripe_head *sh)
  test_bit(BIO_UPTODATE, &bi->bi_flags)
? 0 : -EIO);
}
-   for (i=disks; i-- ;) {
-   int rw;
-   struct bio *bi;
-   mdk_rdev_t *rdev;
-   if (test_and_clear_bit(R5_Wantwrite, &sh->dev[i].flags))
-   rw = WRITE;
-   else if (test_and_clear_bit(R5_Wantread, &sh->dev[i].flags))
-   rw = READ;
-   else
-   continue;
- 
-   bi = &sh->dev[i].req;
- 
-   bi->bi_rw = rw;
-   if (rw == WRITE)
-   bi->bi_end_io = raid5_end_write_request;
-   else
-   bi->bi_end_io = raid5_end_read_request;
- 
-   rcu_read_lock();
-   rdev = rcu_dereference(conf->disks[i].rdev);
-   if (rdev && test_bit(Faulty, &rdev->flags))
-   rdev = NULL;
-   if (rdev)
-   atomic_inc(&rdev->nr_pending);
-   rcu_read_unlock();
- 
-   if (rdev) {
-   if (syncing || expanding || expanded)
-   md_sync_acct(rdev->bdev, STRIPE_SECTORS);
-
-   bi->bi_bdev = rdev->bdev;
-   PRINTK("for %llu schedule op %ld on disc %d\n",
-   (unsigned long long)sh->sector, bi->bi_rw, i);
-   atomic_inc(&sh->count);
-   bi->bi_sector = sh->sector + rdev->data_offset;
-   bi->bi_flags = 1 << BIO_UPTODATE;
-   bi->bi_vcnt = 1;
-   bi->bi_max_vecs 

[PATCH 10/16] md: satisfy raid5 read requests via raid5_run_ops

2007-05-01 Thread Dan Williams
Use raid5_run_ops to carry out the memory copies for a raid5 read request.

Changelog:
* cleanup to_read and to_fill accounting
* do not fail reads that have reached the cache

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 drivers/md/raid5.c |   61 ++--
 1 files changed, 30 insertions(+), 31 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index f8a4522..6bde174 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1998,7 +1998,7 @@ static void handle_stripe5(struct stripe_head *sh)
int i;
int syncing, expanding, expanded;
int locked=0, uptodate=0, to_read=0, to_write=0, failed=0, written=0;
-   int compute=0, req_compute=0, non_overwrite=0;
+   int to_fill=0, compute=0, req_compute=0, non_overwrite=0;
int failed_num=0;
struct r5dev *dev;
unsigned long pending=0;
@@ -2022,37 +2022,29 @@ static void handle_stripe5(struct stripe_head *sh)
dev = &sh->dev[i];
clear_bit(R5_Insync, &dev->flags);
 
-   PRINTK("check %d: state 0x%lx read %p write %p written %p\n",
-   i, dev->flags, dev->toread, dev->towrite, dev->written);
-   /* maybe we can reply to a read */
-   if (test_bit(R5_UPTODATE, &dev->flags) && dev->toread) {
-   struct bio *rbi, *rbi2;
-   PRINTK("Return read for disc %d\n", i);
-   spin_lock_irq(&conf->device_lock);
-   rbi = dev->toread;
-   dev->toread = NULL;
-   if (test_and_clear_bit(R5_Overlap, &dev->flags))
-   wake_up(&conf->wait_for_overlap);
-   spin_unlock_irq(&conf->device_lock);
-   while (rbi && rbi->bi_sector < dev->sector + 
STRIPE_SECTORS) {
-   copy_data(0, rbi, dev->page, dev->sector);
-   rbi2 = r5_next_bio(rbi, dev->sector);
-   spin_lock_irq(&conf->device_lock);
-   if (--rbi->bi_phys_segments == 0) {
-   rbi->bi_next = return_bi;
-   return_bi = rbi;
-   }
-   spin_unlock_irq(&conf->device_lock);
-   rbi = rbi2;
-   }
-   }
+   PRINTK("check %d: state 0x%lx toread %p read %p write %p 
written %p\n",
+   i, dev->flags, dev->toread, dev->read, dev->towrite, 
dev->written);
+
+   /* maybe we can request a biofill operation
+*
+* new wantfill requests are only permitted while
+* STRIPE_OP_BIOFILL is clear
+*/
+   if (test_bit(R5_UPTODATE, &dev->flags) && dev->toread &&
+   !test_bit(STRIPE_OP_BIOFILL, &sh->ops.pending))
+   set_bit(R5_Wantfill, &dev->flags);
 
/* now count some things */
if (test_bit(R5_LOCKED, &dev->flags)) locked++;
if (test_bit(R5_UPTODATE, &dev->flags)) uptodate++;
+
+   if (test_bit(R5_Wantfill, &dev->flags))
+   to_fill++;
+   else if (dev->toread)
+   to_read++;
+
if (test_bit(R5_Wantcompute, &dev->flags)) BUG_ON(++compute > 
1);
 
-   if (dev->toread) to_read++;
if (dev->towrite) {
to_write++;
if (!test_bit(R5_OVERWRITE, &dev->flags))
@@ -2073,9 +2065,13 @@ static void handle_stripe5(struct stripe_head *sh)
set_bit(R5_Insync, &dev->flags);
}
rcu_read_unlock();
+
+   if (to_fill && !test_and_set_bit(STRIPE_OP_BIOFILL, &sh->ops.pending))
+   sh->ops.count++;
+
PRINTK("locked=%d uptodate=%d to_read=%d"
-   " to_write=%d failed=%d failed_num=%d\n",
-   locked, uptodate, to_read, to_write, failed, failed_num);
+   " to_write=%d to_fill=%d failed=%d failed_num=%d\n",
+   locked, uptodate, to_read, to_write, to_fill, failed, 
failed_num);
/* check if the array has lost two devices and, if so, some requests 
might
 * need to be failed
 */
@@ -2127,9 +2123,12 @@ static void handle_stripe5(struct stripe_head *sh)
bi = bi2;
}
 
-   /* fail any reads if this device is non-operational */
-   if (!test_bit(R5_Insync, &sh->dev[i].flags) ||
-   test_bit(R5_ReadError, &sh->dev[i].flags)) {
+   /* fail any reads if this device is non-operational and
+* the data has not reached the cache yet.
+*/
+   

[PATCH 11/16] md: use async_tx and raid5_run_ops for raid5 expansion operations

2007-05-01 Thread Dan Williams
The parity calculation for an expansion operation is the same as the
calculation performed at the end of a write with the caveat that all blocks
in the stripe are scheduled to be written.  An expansion operation is
identified as a stripe with the POSTXOR flag set and the BIODRAIN flag not
set.

The bulk copy operation to the new stripe is handled inline by async_tx.

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 drivers/md/raid5.c |   48 
 1 files changed, 36 insertions(+), 12 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 6bde174..1966713 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2538,18 +2538,32 @@ static void handle_stripe5(struct stripe_head *sh)
}
}
 
-   if (expanded && test_bit(STRIPE_EXPANDING, &sh->state)) {
-   /* Need to write out all blocks after computing parity */
-   sh->disks = conf->raid_disks;
-   sh->pd_idx = stripe_to_pdidx(sh->sector, conf, 
conf->raid_disks);
-   compute_parity5(sh, RECONSTRUCT_WRITE);
+   /* Finish postxor operations initiated by the expansion
+* process
+*/
+   if (test_bit(STRIPE_OP_POSTXOR, &sh->ops.complete) &&
+   !test_bit(STRIPE_OP_BIODRAIN, &sh->ops.pending)) {
+
+   clear_bit(STRIPE_EXPANDING, &sh->state);
+
+   clear_bit(STRIPE_OP_POSTXOR, &sh->ops.pending);
+   clear_bit(STRIPE_OP_POSTXOR, &sh->ops.ack);
+   clear_bit(STRIPE_OP_POSTXOR, &sh->ops.complete);
+
for (i= conf->raid_disks; i--;) {
-   set_bit(R5_LOCKED, &sh->dev[i].flags);
-   locked++;
set_bit(R5_Wantwrite, &sh->dev[i].flags);
+   if (!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending))
+   sh->ops.count++;
}
-   clear_bit(STRIPE_EXPANDING, &sh->state);
-   } else if (expanded) {
+   }
+
+   if (expanded && test_bit(STRIPE_EXPANDING, &sh->state) &&
+   !test_bit(STRIPE_OP_POSTXOR, &sh->ops.pending)) {
+   /* Need to write out all blocks after computing parity */
+   sh->disks = conf->raid_disks;
+   sh->pd_idx = stripe_to_pdidx(sh->sector, conf, 
conf->raid_disks);
+   locked += handle_write_operations5(sh, 0, 1);
+   } else if (expanded && !test_bit(STRIPE_OP_POSTXOR, &sh->ops.pending)) {
clear_bit(STRIPE_EXPAND_READY, &sh->state);
atomic_dec(&conf->reshape_stripes);
wake_up(&conf->wait_for_overlap);
@@ -2560,6 +2574,7 @@ static void handle_stripe5(struct stripe_head *sh)
/* We have read all the blocks in this stripe and now we need to
 * copy some of them into a target stripe for expand.
 */
+   struct dma_async_tx_descriptor *tx = NULL;
clear_bit(STRIPE_EXPAND_SOURCE, &sh->state);
for (i=0; i< sh->disks; i++)
if (i != sh->pd_idx) {
@@ -2583,9 +2598,12 @@ static void handle_stripe5(struct stripe_head *sh)
release_stripe(sh2);
continue;
}
-   memcpy(page_address(sh2->dev[dd_idx].page),
-  page_address(sh->dev[i].page),
-  STRIPE_SIZE);
+
+   /* place all the copies on one channel */
+   tx = async_memcpy(sh2->dev[dd_idx].page,
+   sh->dev[i].page, 0, 0, STRIPE_SIZE,
+   ASYNC_TX_DEP_ACK, tx, NULL, NULL);
+
set_bit(R5_Expanded, &sh2->dev[dd_idx].flags);
set_bit(R5_UPTODATE, &sh2->dev[dd_idx].flags);
for (j=0; jraid_disks; j++)
@@ -2597,6 +2615,12 @@ static void handle_stripe5(struct stripe_head *sh)
set_bit(STRIPE_HANDLE, &sh2->state);
}
release_stripe(sh2);
+
+   /* done submitting copies, wait for them to 
complete */
+   if (i + 1 >= sh->disks) {
+   async_tx_ack(tx);
+   dma_wait_for_async_tx(tx);
+   }
}
}
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/16] md: move raid5 parity checks to raid5_run_ops

2007-05-01 Thread Dan Williams
handle_stripe sets STRIPE_OP_CHECK to request a check operation in
raid5_run_ops.  If raid5_run_ops is able to perform the check with a
dma engine the parity will be preserved in memory removing the need to
re-read it from disk, as is necessary in the synchronous case.

'Repair' operations re-use the same logic as compute block, with the caveat
that the results of the compute block are immediately written back to the
parity disk.  To differentiate these operations the STRIPE_OP_MOD_REPAIR_PD
flag is added.

Changelog:
* remove test_and_set/test_and_clear BUG_ONs, Neil Brown

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 drivers/md/raid5.c |   80 
 1 files changed, 61 insertions(+), 19 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 844bd9b..f8a4522 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2430,32 +2430,74 @@ static void handle_stripe5(struct stripe_head *sh)
locked += handle_write_operations5(sh, rcw == 0, 0);
}
 
-   /* maybe we need to check and possibly fix the parity for this stripe
-* Any reads will already have been scheduled, so we just see if enough 
data
-* is available
+   /* 1/ Maybe we need to check and possibly fix the parity for this 
stripe.
+*Any reads will already have been scheduled, so we just see if 
enough data
+*is available.
+* 2/ Hold off parity checks while parity dependent operations are in 
flight
+*(conflicting writes are protected by the 'locked' variable)
 */
-   if (syncing && locked == 0 &&
-   !test_bit(STRIPE_INSYNC, &sh->state)) {
+   if ((syncing && locked == 0 && !test_bit(STRIPE_OP_COMPUTE_BLK, 
&sh->ops.pending) &&
+   !test_bit(STRIPE_INSYNC, &sh->state)) ||
+   test_bit(STRIPE_OP_CHECK, &sh->ops.pending) ||
+   test_bit(STRIPE_OP_MOD_REPAIR_PD, &sh->ops.pending)) {
+
set_bit(STRIPE_HANDLE, &sh->state);
-   if (failed == 0) {
-   BUG_ON(uptodate != disks);
-   compute_parity5(sh, CHECK_PARITY);
-   uptodate--;
-   if (page_is_zero(sh->dev[sh->pd_idx].page)) {
-   /* parity is correct (on disc, not in buffer 
any more) */
-   set_bit(STRIPE_INSYNC, &sh->state);
-   } else {
-   conf->mddev->resync_mismatches += 
STRIPE_SECTORS;
-   if (test_bit(MD_RECOVERY_CHECK, 
&conf->mddev->recovery))
-   /* don't try to repair!! */
+   /* Take one of the following actions:
+* 1/ start a check parity operation if (uptodate == disks)
+* 2/ finish a check parity operation and act on the result
+* 3/ skip to the writeback section if we previously
+*initiated a recovery operation
+*/
+   if (failed == 0 && !test_bit(STRIPE_OP_MOD_REPAIR_PD, 
&sh->ops.pending)) {
+   if (!test_and_set_bit(STRIPE_OP_CHECK, 
&sh->ops.pending)) {
+   BUG_ON(uptodate != disks);
+   clear_bit(R5_UPTODATE, 
&sh->dev[sh->pd_idx].flags);
+   sh->ops.count++;
+   uptodate--;
+   } else if (test_and_clear_bit(STRIPE_OP_CHECK, 
&sh->ops.complete)) {
+   clear_bit(STRIPE_OP_CHECK, &sh->ops.ack);
+   clear_bit(STRIPE_OP_CHECK, &sh->ops.pending);
+
+   if (sh->ops.zero_sum_result == 0)
+   /* parity is correct (on disc, not in 
buffer any more) */
set_bit(STRIPE_INSYNC, &sh->state);
else {
-   compute_block(sh, sh->pd_idx);
-   uptodate++;
+   conf->mddev->resync_mismatches += 
STRIPE_SECTORS;
+   if (test_bit(MD_RECOVERY_CHECK, 
&conf->mddev->recovery))
+   /* don't try to repair!! */
+   set_bit(STRIPE_INSYNC, 
&sh->state);
+   else {
+   set_bit(STRIPE_OP_COMPUTE_BLK,
+   &sh->ops.pending);
+   set_bit(STRIPE_OP_MOD_REPAIR_PD,
+   &sh->ops.pending);
+   set_bit(R5_Wantcompute,
+   
&sh->dev[sh-

[PATCH 08/16] md: move raid5 compute block operations to raid5_run_ops

2007-05-01 Thread Dan Williams
handle_stripe sets STRIPE_OP_COMPUTE_BLK to request servicing from
raid5_run_ops.  It also sets a flag for the block being computed to let
other parts of handle_stripe submit dependent operations.  raid5_run_ops
guarantees that the compute operation completes before any dependent
operation starts.

Changelog:
* remove the req_compute BUG_ON

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 drivers/md/raid5.c |  126 +++-
 1 files changed, 94 insertions(+), 32 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 03a435d..844bd9b 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1998,7 +1998,7 @@ static void handle_stripe5(struct stripe_head *sh)
int i;
int syncing, expanding, expanded;
int locked=0, uptodate=0, to_read=0, to_write=0, failed=0, written=0;
-   int non_overwrite = 0;
+   int compute=0, req_compute=0, non_overwrite=0;
int failed_num=0;
struct r5dev *dev;
unsigned long pending=0;
@@ -2050,8 +2050,8 @@ static void handle_stripe5(struct stripe_head *sh)
/* now count some things */
if (test_bit(R5_LOCKED, &dev->flags)) locked++;
if (test_bit(R5_UPTODATE, &dev->flags)) uptodate++;
+   if (test_bit(R5_Wantcompute, &dev->flags)) BUG_ON(++compute > 
1);
 
-   
if (dev->toread) to_read++;
if (dev->towrite) {
to_write++;
@@ -2206,31 +2206,83 @@ static void handle_stripe5(struct stripe_head *sh)
 * parity, or to satisfy requests
 * or to load a block that is being partially written.
 */
-   if (to_read || non_overwrite || (syncing && (uptodate < disks)) || 
expanding) {
-   for (i=disks; i--;) {
-   dev = &sh->dev[i];
-   if (!test_bit(R5_LOCKED, &dev->flags) && 
!test_bit(R5_UPTODATE, &dev->flags) &&
-   (dev->toread ||
-(dev->towrite && !test_bit(R5_OVERWRITE, 
&dev->flags)) ||
-syncing ||
-expanding ||
-(failed && (sh->dev[failed_num].toread ||
-(sh->dev[failed_num].towrite && 
!test_bit(R5_OVERWRITE, &sh->dev[failed_num].flags
-   )
-   ) {
-   /* we would like to get this block, possibly
-* by computing it, but we might not be able to
+   if (to_read || non_overwrite || (syncing && (uptodate + compute < 
disks)) || expanding ||
+   test_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.pending)) {
+
+   /* Clear completed compute operations.  Parity recovery
+* (STRIPE_OP_MOD_REPAIR_PD) implies a write-back which is 
handled
+* later on in this routine
+*/
+   if (test_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.complete) &&
+   !test_bit(STRIPE_OP_MOD_REPAIR_PD, &sh->ops.pending)) {
+   clear_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.complete);
+   clear_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.ack);
+   clear_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.pending);
+   }
+
+   /* look for blocks to read/compute, skip this if a compute
+* is already in flight, or if the stripe contents are in the
+* midst of changing due to a write
+*/
+   if (!test_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.pending) &&
+   !test_bit(STRIPE_OP_PREXOR, &sh->ops.pending) &&
+   !test_bit(STRIPE_OP_POSTXOR, &sh->ops.pending)) {
+   for (i=disks; i--;) {
+   dev = &sh->dev[i];
+
+   /* don't schedule compute operations or reads on
+* the parity block while a check is in flight
 */
-   if (uptodate == disks-1) {
-   PRINTK("Computing block %d\n", i);
-   compute_block(sh, i);
-   uptodate++;
-   } else if (test_bit(R5_Insync, &dev->flags)) {
-   set_bit(R5_LOCKED, &dev->flags);
-   set_bit(R5_Wantread, &dev->flags);
-   locked++;
-   PRINTK("Reading block %d (sync=%d)\n", 
-   i, syncing);
+   if ((i == sh->pd_idx) && 
test_bit(STRIPE_OP_CHECK, &sh->ops.pending))
+   continue;
+
+ 

[PATCH 07/16] md: move write operations to raid5_run_ops

2007-05-01 Thread Dan Williams
handle_stripe sets STRIPE_OP_PREXOR, STRIPE_OP_BIODRAIN, STRIPE_OP_POSTXOR
to request a write to the stripe cache.  raid5_run_ops is triggerred to run
and executes the request outside the stripe lock.

Changelog:
* make the 'rcw' parameter to handle_write_operations5 a simple flag, Neil
  Brown
* remove test_and_set/test_and_clear BUG_ONs, Neil Brown

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 drivers/md/raid5.c |  151 +---
 1 files changed, 130 insertions(+), 21 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 14e9f6a..03a435d 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1807,7 +1807,74 @@ static void compute_block_2(struct stripe_head *sh, int 
dd_idx1, int dd_idx2)
}
 }
 
+static int handle_write_operations5(struct stripe_head *sh, int rcw, int 
expand)
+{
+   int i, pd_idx = sh->pd_idx, disks = sh->disks;
+   int locked=0;
+
+   if (rcw) {
+   /* skip the drain operation on an expand */
+   if (!expand) {
+   set_bit(STRIPE_OP_BIODRAIN, &sh->ops.pending);
+   sh->ops.count++;
+   }
+
+   set_bit(STRIPE_OP_POSTXOR, &sh->ops.pending);
+   sh->ops.count++;
+
+   for (i=disks ; i-- ;) {
+   struct r5dev *dev = &sh->dev[i];
+
+   if (dev->towrite) {
+   set_bit(R5_LOCKED, &dev->flags);
+   if (!expand)
+   clear_bit(R5_UPTODATE, &dev->flags);
+   locked++;
+   }
+   }
+   } else {
+   BUG_ON(!(test_bit(R5_UPTODATE, &sh->dev[pd_idx].flags) ||
+   test_bit(R5_Wantcompute, &sh->dev[pd_idx].flags)));
+
+   set_bit(STRIPE_OP_PREXOR, &sh->ops.pending);
+   set_bit(STRIPE_OP_BIODRAIN, &sh->ops.pending);
+   set_bit(STRIPE_OP_POSTXOR, &sh->ops.pending);
+
+   sh->ops.count += 3;
+
+   for (i=disks ; i-- ;) {
+   struct r5dev *dev = &sh->dev[i];
+   if (i==pd_idx)
+   continue;
 
+   /* For a read-modify write there may be blocks that are
+* locked for reading while others are ready to be 
written
+* so we distinguish these blocks by the R5_Wantprexor 
bit
+*/
+   if (dev->towrite &&
+   (test_bit(R5_UPTODATE, &dev->flags) ||
+   test_bit(R5_Wantcompute, &dev->flags))) {
+   set_bit(R5_Wantprexor, &dev->flags);
+   set_bit(R5_LOCKED, &dev->flags);
+   clear_bit(R5_UPTODATE, &dev->flags);
+   locked++;
+   }
+   }
+   }
+
+   /* keep the parity disk locked while asynchronous operations
+* are in flight
+*/
+   set_bit(R5_LOCKED, &sh->dev[pd_idx].flags);
+   clear_bit(R5_UPTODATE, &sh->dev[pd_idx].flags);
+   locked++;
+
+   PRINTK("%s: stripe %llu locked: %d pending: %lx\n",
+   __FUNCTION__, (unsigned long long)sh->sector,
+   locked, sh->ops.pending);
+
+   return locked;
+}
 
 /*
  * Each stripe/dev can have one or more bion attached.
@@ -2170,8 +2237,67 @@ static void handle_stripe5(struct stripe_head *sh)
set_bit(STRIPE_HANDLE, &sh->state);
}
 
-   /* now to consider writing and what else, if anything should be read */
-   if (to_write) {
+   /* Now we check to see if any write operations have recently
+* completed
+*/
+
+   /* leave prexor set until postxor is done, allows us to distinguish
+* a rmw from a rcw during biodrain
+*/
+   if (test_bit(STRIPE_OP_PREXOR, &sh->ops.complete) &&
+   test_bit(STRIPE_OP_POSTXOR, &sh->ops.complete)) {
+
+   clear_bit(STRIPE_OP_PREXOR, &sh->ops.complete);
+   clear_bit(STRIPE_OP_PREXOR, &sh->ops.ack);
+   clear_bit(STRIPE_OP_PREXOR, &sh->ops.pending);
+
+   for (i=disks; i--;)
+   clear_bit(R5_Wantprexor, &sh->dev[i].flags);
+   }
+
+   /* if only POSTXOR is set then this is an 'expand' postxor */
+   if (test_bit(STRIPE_OP_BIODRAIN, &sh->ops.complete) &&
+   test_bit(STRIPE_OP_POSTXOR, &sh->ops.complete)) {
+
+   clear_bit(STRIPE_OP_BIODRAIN, &sh->ops.complete);
+   clear_bit(STRIPE_OP_BIODRAIN, &sh->ops.ack);
+   clear_bit(STRIPE_OP_BIODRAIN, &sh->ops.pending);
+
+   clear_bit(STRIPE_OP_POSTXOR, &sh->ops.complete);
+   clear_bit(STRIPE_OP_POSTXOR, &sh->ops.ack);
+   clear_bit(STRIPE_OP

[PATCH 06/16] md: use raid5_run_ops for stripe cache operations

2007-05-01 Thread Dan Williams
Each stripe has three flag variables to reflect the state of operations
(pending, ack, and complete).
-pending: set to request servicing in raid5_run_ops
-ack: set to reflect that raid5_runs_ops has seen this request
-complete: set when the operation is complete and it is ok for handle_stripe5
to clear 'pending' and 'ack'.

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 drivers/md/raid5.c |   65 +---
 1 files changed, 56 insertions(+), 9 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 0251bca..14e9f6a 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -126,6 +126,7 @@ static void __release_stripe(raid5_conf_t *conf, struct 
stripe_head *sh)
}
md_wakeup_thread(conf->mddev->thread);
} else {
+   BUG_ON(sh->ops.pending);
if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, 
&sh->state)) {
atomic_dec(&conf->preread_active_stripes);
if (atomic_read(&conf->preread_active_stripes) 
< IO_THRESHOLD)
@@ -225,7 +226,8 @@ static void init_stripe(struct stripe_head *sh, sector_t 
sector, int pd_idx, int
 
BUG_ON(atomic_read(&sh->count) != 0);
BUG_ON(test_bit(STRIPE_HANDLE, &sh->state));
-   
+   BUG_ON(sh->ops.pending || sh->ops.ack || sh->ops.complete);
+
CHECK_DEVLOCK();
PRINTK("init_stripe called, stripe %llu\n", 
(unsigned long long)sh->sector);
@@ -241,11 +243,11 @@ static void init_stripe(struct stripe_head *sh, sector_t 
sector, int pd_idx, int
for (i = sh->disks; i--; ) {
struct r5dev *dev = &sh->dev[i];
 
-   if (dev->toread || dev->towrite || dev->written ||
+   if (dev->toread || dev->read || dev->towrite || dev->written ||
test_bit(R5_LOCKED, &dev->flags)) {
-   printk("sector=%llx i=%d %p %p %p %d\n",
+   printk("sector=%llx i=%d %p %p %p %p %d\n",
   (unsigned long long)sh->sector, i, dev->toread,
-  dev->towrite, dev->written,
+  dev->read, dev->towrite, dev->written,
   test_bit(R5_LOCKED, &dev->flags));
BUG();
}
@@ -325,6 +327,43 @@ static struct stripe_head *get_active_stripe(raid5_conf_t 
*conf, sector_t sector
return sh;
 }
 
+/* check_op() ensures that we only dequeue an operation once */
+#define check_op(op) do {\
+   if (test_bit(op, &sh->ops.pending) &&\
+   !test_bit(op, &sh->ops.complete)) {\
+   if (test_and_set_bit(op, &sh->ops.ack))\
+   clear_bit(op, &pending);\
+   else\
+   ack++;\
+   } else\
+   clear_bit(op, &pending);\
+} while(0)
+
+/* find new work to run, do not resubmit work that is already
+ * in flight
+ */
+static unsigned long get_stripe_work(struct stripe_head *sh)
+{
+   unsigned long pending;
+   int ack = 0;
+
+   pending = sh->ops.pending;
+
+   check_op(STRIPE_OP_BIOFILL);
+   check_op(STRIPE_OP_COMPUTE_BLK);
+   check_op(STRIPE_OP_PREXOR);
+   check_op(STRIPE_OP_BIODRAIN);
+   check_op(STRIPE_OP_POSTXOR);
+   check_op(STRIPE_OP_CHECK);
+   if (test_and_clear_bit(STRIPE_OP_IO, &sh->ops.pending))
+   ack++;
+
+   sh->ops.count -= ack;
+   BUG_ON(sh->ops.count < 0);
+
+   return pending;
+}
+
 static int
 raid5_end_read_request(struct bio * bi, unsigned int bytes_done, int error);
 static int
@@ -1878,7 +1917,6 @@ static int stripe_to_pdidx(sector_t stripe, raid5_conf_t 
*conf, int disks)
  *schedule a write of some buffers
  *return confirmation of parity correctness
  *
- * Parity calculations are done inside the stripe lock
  * buffers are taken off read_list or write_list, and bh_cache buffers
  * get BH_Lock set before the stripe lock is released.
  *
@@ -1896,10 +1934,11 @@ static void handle_stripe5(struct stripe_head *sh)
int non_overwrite = 0;
int failed_num=0;
struct r5dev *dev;
+   unsigned long pending=0;
 
-   PRINTK("handling stripe %llu, cnt=%d, pd_idx=%d\n",
-   (unsigned long long)sh->sector, atomic_read(&sh->count),
-   sh->pd_idx);
+   PRINTK("handling stripe %llu, state=%#lx cnt=%d, pd_idx=%d 
ops=%lx:%lx:%lx\n",
+  (unsigned long long)sh->sector, sh->state, 
atomic_read(&sh->count),
+  sh->pd_idx, sh->ops.pending, sh->ops.ack, sh->ops.complete);
 
spin_lock(&sh->lock);
clear_bit(STRIPE_HANDLE, &sh->state);
@@ -2349,8 +2388,14 @@ static void handle_stripe5(struct stripe_head *sh)
}
}
 
+   if (sh->ops.count)
+   pending = get_stripe_work(sh);
+
spin_unlock(&sh->lock)

[PATCH 05/16] md: add raid5_run_ops and support routines

2007-05-01 Thread Dan Williams
Prepare the raid5 implementation to use async_tx for running stripe
operations:
* biofill (copy data into request buffers to satisfy a read request)
* compute block (generate a missing block in the cache from the other
blocks)
* prexor (subtract existing data as part of the read-modify-write process)
* biodrain (copy data out of request buffers to satisfy a write request)
* postxor (recalculate parity for new data that has entered the cache)
* check (verify that the parity is correct)
* io (submit i/o to the member disks)

Changelog:
* removed ops_complete_biodrain in favor of ops_complete_postxor and
ops_complete_write.
* removed the workqueue
* call bi_end_io for reads in ops_complete_biofill
* explicitly handle the 2-disk raid5 case (xor becomes memcpy)
* fix race between async engines and bi_end_io call for reads, Neil Brown
* remove unnecessary spin_lock from ops_complete_biofill
* remove test_and_set/test_and_clear BUG_ONs, Neil Brown
* remove explicit interrupt handling

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 drivers/md/raid5.c |  539 
 include/linux/raid/raid5.h |   63 +
 2 files changed, 599 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ab8702d..0251bca 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -52,6 +52,7 @@
 #include "raid6.h"
 
 #include 
+#include 
 
 /*
  * Stripe cache
@@ -324,6 +325,544 @@ static struct stripe_head *get_active_stripe(raid5_conf_t 
*conf, sector_t sector
return sh;
 }
 
+static int
+raid5_end_read_request(struct bio * bi, unsigned int bytes_done, int error);
+static int
+raid5_end_write_request (struct bio *bi, unsigned int bytes_done, int error);
+
+static void ops_run_io(struct stripe_head *sh)
+{
+   raid5_conf_t *conf = sh->raid_conf;
+   int i, disks = sh->disks;
+
+   might_sleep();
+
+   for (i=disks; i-- ;) {
+   int rw;
+   struct bio *bi;
+   mdk_rdev_t *rdev;
+   if (test_and_clear_bit(R5_Wantwrite, &sh->dev[i].flags))
+   rw = WRITE;
+   else if (test_and_clear_bit(R5_Wantread, &sh->dev[i].flags))
+   rw = READ;
+   else
+   continue;
+
+   bi = &sh->dev[i].req;
+
+   bi->bi_rw = rw;
+   if (rw == WRITE)
+   bi->bi_end_io = raid5_end_write_request;
+   else
+   bi->bi_end_io = raid5_end_read_request;
+
+   rcu_read_lock();
+   rdev = rcu_dereference(conf->disks[i].rdev);
+   if (rdev && test_bit(Faulty, &rdev->flags))
+   rdev = NULL;
+   if (rdev)
+   atomic_inc(&rdev->nr_pending);
+   rcu_read_unlock();
+
+   if (rdev) {
+   if (test_bit(STRIPE_SYNCING, &sh->state) ||
+   test_bit(STRIPE_EXPAND_SOURCE, &sh->state) ||
+   test_bit(STRIPE_EXPAND_READY, &sh->state))
+   md_sync_acct(rdev->bdev, STRIPE_SECTORS);
+
+   bi->bi_bdev = rdev->bdev;
+   PRINTK("%s: for %llu schedule op %ld on disc %d\n",
+   __FUNCTION__, (unsigned long long)sh->sector,
+   bi->bi_rw, i);
+   atomic_inc(&sh->count);
+   bi->bi_sector = sh->sector + rdev->data_offset;
+   bi->bi_flags = 1 << BIO_UPTODATE;
+   bi->bi_vcnt = 1;
+   bi->bi_max_vecs = 1;
+   bi->bi_idx = 0;
+   bi->bi_io_vec = &sh->dev[i].vec;
+   bi->bi_io_vec[0].bv_len = STRIPE_SIZE;
+   bi->bi_io_vec[0].bv_offset = 0;
+   bi->bi_size = STRIPE_SIZE;
+   bi->bi_next = NULL;
+   if (rw == WRITE &&
+   test_bit(R5_ReWrite, &sh->dev[i].flags))
+   atomic_add(STRIPE_SECTORS, 
&rdev->corrected_errors);
+   generic_make_request(bi);
+   } else {
+   if (rw == WRITE)
+   set_bit(STRIPE_DEGRADED, &sh->state);
+   PRINTK("skip op %ld on disc %d for sector %llu\n",
+   bi->bi_rw, i, (unsigned long long)sh->sector);
+   clear_bit(R5_LOCKED, &sh->dev[i].flags);
+   set_bit(STRIPE_HANDLE, &sh->state);
+   }
+   }
+}
+
+static struct dma_async_tx_descriptor *
+async_copy_data(int frombio, struct bio *bio, struct page *page, sector_t 
sector,
+   struct dma_async_tx_descriptor *tx)
+{
+   struct bio_vec *bvl;
+   struct page *bio_page;
+   int i;
+   int page_offset;
+
+   

[PATCH 04/16] dmaengine: add the async_tx api

2007-05-01 Thread Dan Williams
The async_tx api provides methods for describing a chain of asynchronous
bulk memory transfers/transforms with support for inter-transactional
dependencies.  It is implemented as a dmaengine client that smooths over
the details of different hardware offload engine implementations.  Code
that is written to the api can optimize for asynchrnous operation and
the api will fit the chain of operations to the available offload
resources. 
 
Currently the raid5 implementation in the MD raid456 driver has been
converted to the async_tx api.  A driver for the offload engines on the
Intel Xscale series of I/O processors, iop-adma, is provided.  With the
iop-adma driver and async_tx, raid456 is able to offload copy, xor, and
xor-zero-sum operations to hardware engines.
 
On iop342 tiobench showed higher throughput for sequential writes (20 -
30% improvement) and sequential reads to a degraded array (40 - 55%
improvement).  For the other cases performance was roughly equal, +/- a
few percentage points.  On a x86-smp platform the performance of the
async_tx implementation (in synchronous mode) was also +/- a few
percentage points of the original implementation.  According to 'top'
CPU utilization was positively affected in the offload case, but exact
measurements have yet to be taken.
 
The tiobench command line used for testing was: tiobench --size 2048
--block 4096 --block 131072 --dir /mnt/raid --numruns 5 * iop342 had 1GB
of memory available

Xor operations are handled by async_tx, to this end xor.c is moved into
drivers/dma and is changed to take an explicit destination address and a
series of sources to match the hardware engine implementation.

When CONFIG_DMA_ENGINE is not set the asynchrounous path is compiled
away.

Changelog:
* fixed a leftover debug print
* don't allow callbacks in async_interrupt_cond
* fixed xor_block changes
* fixed usage of ASYNC_TX_XOR_DROP_DEST
* drop dma mapping methods, suggested by Chris Leech
* printk warning fixups from Andrew Morton
* don't use inline in C files, Adrian Bunk
* select the API when MD is enabled
* BUG_ON xor source counts <= 1
* implicitly handle hardware concerns like channel switching and
  interrupts, Neil Brown
* remove the per operation type list, and distribute operation capabilities
  evenly amongst the available channels
* simplify async_tx_find_channel to optimize the fast path

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 drivers/Makefile |1 
 drivers/dma/Kconfig  |   15 +
 drivers/dma/Makefile |1 
 drivers/dma/async_tx.c   |  889 ++
 drivers/dma/xor.c|  153 
 drivers/md/Kconfig   |3 
 drivers/md/Makefile  |6 
 drivers/md/raid5.c   |   52 +--
 drivers/md/xor.c |  154 
 include/linux/async_tx.h |  173 +
 include/linux/raid/xor.h |5 
 11 files changed, 1263 insertions(+), 189 deletions(-)

diff --git a/drivers/Makefile b/drivers/Makefile
index 3a718f5..2e8de9e 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -62,6 +62,7 @@ obj-$(CONFIG_I2C) += i2c/
 obj-$(CONFIG_W1)   += w1/
 obj-$(CONFIG_HWMON)+= hwmon/
 obj-$(CONFIG_PHONE)+= telephony/
+obj-$(CONFIG_ASYNC_TX_DMA) += dma/
 obj-$(CONFIG_MD)   += md/
 obj-$(CONFIG_BT)   += bluetooth/
 obj-$(CONFIG_ISDN) += isdn/
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 30d021d..292ddad 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -7,8 +7,8 @@ menu "DMA Engine support"
 config DMA_ENGINE
bool "Support for DMA engines"
---help---
- DMA engines offload copy operations from the CPU to dedicated
- hardware, allowing the copies to happen asynchronously.
+  DMA engines offload bulk memory operations from the CPU to dedicated
+  hardware, allowing the operations to happen asynchronously.
 
 comment "DMA Clients"
 
@@ -22,6 +22,16 @@ config NET_DMA
  Since this is the main user of the DMA engine, it should be enabled;
  say Y here.
 
+config ASYNC_TX_DMA
+   tristate "Asynchronous Bulk Memory Transfers/Transforms API"
+   ---help---
+ This enables the async_tx management layer for dma engines.
+ Subsystems coded to this API will use offload engines for bulk
+ memory operations where present.  Software implementations are
+ called when a dma engine is not present or fails to allocate
+ memory to carry out the transaction.
+ Current subsystems ported to async_tx: MD_RAID4,5
+
 comment "DMA Devices"
 
 config INTEL_IOATDMA
@@ -30,5 +40,4 @@ config INTEL_IOATDMA
default m
---help---
  Enable support for the Intel(R) I/OAT DMA engine.
-
 endmenu
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index bdcfdbd..6a99341 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_DMA_ENGINE) += d

[PATCH 03/16] ARM: Add drivers/dma to arch/arm/Kconfig

2007-05-01 Thread Dan Williams
Cc: Russell King <[EMAIL PROTECTED]>
Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 arch/arm/Kconfig |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e7baca2..74077e3 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -997,6 +997,8 @@ source "drivers/mmc/Kconfig"
 
 source "drivers/rtc/Kconfig"
 
+source "drivers/dma/Kconfig"
+
 endmenu
 
 source "fs/Kconfig"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 14/22] pollfs: pollable futex

2007-05-01 Thread Davi Arnaut
Eric Dumazet wrote:
> Davi Arnaut a écrit :
>> Asynchronously wait for FUTEX_WAKE operation on a futex if it still contains
>> a given value. There can be only one futex wait per file descriptor. However,
>> it can be rearmed (possibly at a different address) anytime.
>>
>> The pollable futex approach is far superior (send and receive events from
>> userspace or kernel) to eventfd and fixes (supercedes) FUTEX_FD at the same 
>> time.
>>
>> Building block for pollable semaphores and user-defined events.
>>
>> Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>
>>
>> ---
>>  fs/pollfs/Makefile |1 
>>  fs/pollfs/futex.c  |  154 
>> +
>>  init/Kconfig   |7 ++
>>  3 files changed, 162 insertions(+)
>>
>> Index: linux-2.6/fs/pollfs/Makefile
>> ===
>> --- linux-2.6.orig/fs/pollfs/Makefile
>> +++ linux-2.6/fs/pollfs/Makefile
>> @@ -3,3 +3,4 @@ pollfs-y := file.o
>>  
>>  pollfs-$(CONFIG_POLLFS_SIGNAL) += signal.o
>>  pollfs-$(CONFIG_POLLFS_TIMER) += timer.o
>> +pollfs-$(CONFIG_POLLFS_FUTEX) += futex.o
>> Index: linux-2.6/fs/pollfs/futex.c
>> ===
>> --- /dev/null
>> +++ linux-2.6/fs/pollfs/futex.c
>> @@ -0,0 +1,154 @@
>> +/*
>> + * pollable futex
>> + *
>> + * Copyright (C) 2007 Davi E. M. Arnaut
>> + *
>> + * Licensed under the GNU GPL. See the file COPYING for details.
>> + */
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +struct futex_event {
>> +union {
>> +void __user *addr;
>> +u64 padding;
>> +};
>> +int val;
>> +};
> 
> Hum... Here we might have a problem with 64 bit futexes, or private futexes
> 
> So I believe this interface is not well defined and not expandable: in case 
> of 
> future additions to futexes, an old application compiled with an old pollable 
> futex_event type might fail.
> 

Hmm, how about:

struct futex_event {
union {
void __user *addr;
u64 padding;
};
union {
int val;
s64 val64;
};
/* whatever room is necessary for future improvements */
};

I haven't been keeping up with 64 bit or private futexes. What else
could probably go wrong?

--
Davi Arnaut

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/16] dmaengine: move channel management to the client

2007-05-01 Thread Dan Williams
This effectively makes channels a shared resource rather than tying them
to a specific client.  dmaengine now assumes that clients will internally
track how many channels they need and dmaengine will learn if the client cares 
about
a channel at dma_event_callback time.  This also enables a client to ignore
a channel if it does not meet extra client specific constraints beyond
simple base capabilities.

This patch also fixes up the NET_DMA client to use the new mechanism.

Cc: Chris Leech <[EMAIL PROTECTED]>
Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 drivers/dma/dmaengine.c   |  206 ++---
 drivers/dma/ioatdma.c |1 
 drivers/dma/ioatdma.h |3 -
 include/linux/dmaengine.h |   46 +-
 net/core/dev.c|  106 ---
 5 files changed, 198 insertions(+), 164 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 8a49103..1a26ce3 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -37,8 +37,8 @@
  * Each device has a channels list, which runs unlocked but is never modified
  * once the device is registered, it's just setup by the driver.
  *
- * Each client has a channels list, it's only modified under the client->lock
- * and in an RCU callback, so it's safe to read under rcu_read_lock().
+ * Each client is responsible for keeping track of the channels it uses.  See
+ * the definition of dma_event_callback in dmaengine.h.
  *
  * Each device has a kref, which is initialized to 1 when the device is
  * registered. A kref_put is done for each class_device registered.  When the
@@ -51,10 +51,12 @@
  * references to finish.
  *
  * Each channel has an open-coded implementation of Rusty Russell's "bigref,"
- * with a kref and a per_cpu local_t.  A single reference is set when on an
- * ADDED event, and removed with a REMOVE event.  Net DMA client takes an
- * extra reference per outstanding transaction.  The relase function does a
- * kref_put on the device. -ChrisL
+ * with a kref and a per_cpu local_t.  A dma_chan_get is called when a client
+ * signals that it wants to use a channel, and dma_chan_put is called when
+ * a channel is removed or a client using it is unregesitered.  A client can
+ * take extra references per outstanding transaction, as is the case with
+ * the NET DMA client.  The release function does a kref_put on the device.
+ * -ChrisL, DanW
  */
 
 #include 
@@ -102,8 +104,18 @@ static ssize_t show_bytes_transferred(struct class_device 
*cd, char *buf)
 static ssize_t show_in_use(struct class_device *cd, char *buf)
 {
struct dma_chan *chan = container_of(cd, struct dma_chan, class_dev);
+   int in_use = 0;
+
+   if (unlikely(chan->slow_ref) && atomic_read(&chan->refcount.refcount) > 
1)
+   in_use = 1;
+   else {
+   if (local_read(&(per_cpu_ptr(chan->local,
+   get_cpu())->refcount)) > 0)
+   in_use = 1;
+   put_cpu();
+   }
 
-   return sprintf(buf, "%d\n", (chan->client ? 1 : 0));
+   return sprintf(buf, "%d\n", in_use);
 }
 
 static struct class_device_attribute dma_class_attrs[] = {
@@ -129,42 +141,50 @@ static struct class dma_devclass = {
 
 /* --- client and device registration --- */
 
+#define dma_async_chan_satisfies_mask(chan, mask) 
__dma_async_chan_satisfies_mask((chan), &(mask))
+static int __dma_async_chan_satisfies_mask(struct dma_chan *chan, 
dma_cap_mask_t *want)
+{
+   dma_cap_mask_t has;
+
+   bitmap_and(has.bits, want->bits, chan->device->cap_mask.bits, 
DMA_TX_TYPE_END);
+   return bitmap_equal(want->bits, has.bits, DMA_TX_TYPE_END);
+}
+
 /**
- * dma_client_chan_alloc - try to allocate a channel to a client
+ * dma_client_chan_alloc - try to allocate channels to a client
  * @client: &dma_client
  *
  * Called with dma_list_mutex held.
  */
-static struct dma_chan *dma_client_chan_alloc(struct dma_client *client)
+static void dma_client_chan_alloc(struct dma_client *client)
 {
struct dma_device *device;
struct dma_chan *chan;
-   unsigned long flags;
int desc;   /* allocated descriptor count */
+   int ack; /* client has taken a reference to this channel */
 
-   /* Find a channel, any DMA engine will do */
-   list_for_each_entry(device, &dma_device_list, global_node) {
+   /* Find a channel */
+   list_for_each_entry(device, &dma_device_list, global_node)
list_for_each_entry(chan, &device->channels, device_node) {
-   if (chan->client)
+   if (!dma_async_chan_satisfies_mask(chan, 
client->cap_mask))
continue;
 
desc = chan->device->device_alloc_chan_resources(chan);
if (desc >= 0) {
-   kref_get(&device->refcount);
-   kref_init(&chan->refcount);
-

Re: [patch 0/3] Clocksource / clockevent updates

2007-05-01 Thread Andrew Morton
On Wed, 02 May 2007 08:09:29 +0200 Thomas Gleixner <[EMAIL PROTECTED]> wrote:

> On Tue, 2007-05-01 at 17:33 -0700, Andrew Morton wrote:
> > On Mon, 30 Apr 2007 10:43:31 -
> > Thomas Gleixner <[EMAIL PROTECTED]> wrote:
> > 
> > > Andrew,
> > > 
> > > please pick up the following updates to clocksource / clockevents:
> > > 
> > > - Fixups to the resume logic
> > > - Keep TSC stable, when lapic_timer_c2_ok is set
> > > 
> > 
> > Should we be targetting these at 2.6.20.x?
> 
> 2.6.21.x ?
> 
> Hmm. They should get some testing first, but otherwise yes.
> 

OK, I added the cc.  The second patch won't apply to 2.6.21 when we get
around to it, but it'll be pretty simple to repair.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/16] dmaengine: add base support for the async_tx api

2007-05-01 Thread Dan Williams
In preparation for the async_tx (dmaengine client) API this patch:
1/ introduces struct dma_async_tx_descriptor as a common field for all
   dmaengine software descriptors.  The primary role of this structure
   is to enable callbacks at transaction completion time, and support
   transaction chains that span multiple channels
2/ converts the device_memcpy_* methods into separate prep, set
   src/dest, and submit stages
3/ adds support for capabilities beyond memcpy (xor, memset, xor zero
   sum, completion interrupts).  place holders for future capabilities
   are also included
4/ converts ioatdma to the new semantics

Changelog:
* drop dma mapping methods, suggested by Chris Leech
* fix ioat_dma_dependency_added, also caught by Andrew Morton
* fix dma_sync_wait, change from Andrew Morton
* uninline large functions, change from Andrew Morton
* add tx->callback = NULL to dmaengine calls to interoperate with async_tx
  calls
* hookup ioat_tx_submit
* convert channel capabilities to a 'cpumask_t like' bitmap

Cc: Chris Leech <[EMAIL PROTECTED]>
Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 drivers/dma/dmaengine.c   |  182 +
 drivers/dma/ioatdma.c |  248 -
 drivers/dma/ioatdma.h |8 +
 include/linux/dmaengine.h |  245 
 4 files changed, 454 insertions(+), 229 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 322ee29..8a49103 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -59,6 +59,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -66,6 +67,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static DEFINE_MUTEX(dma_list_mutex);
 static LIST_HEAD(dma_device_list);
@@ -165,6 +167,24 @@ static struct dma_chan *dma_client_chan_alloc(struct 
dma_client *client)
return NULL;
 }
 
+enum dma_status dma_sync_wait(struct dma_chan *chan, dma_cookie_t cookie)
+{
+   enum dma_status status;
+   unsigned long dma_sync_wait_timeout = jiffies + msecs_to_jiffies(5000);
+
+   dma_async_issue_pending(chan);
+   do {
+   status = dma_async_is_tx_complete(chan, cookie, NULL, NULL);
+   if (time_after_eq(jiffies, dma_sync_wait_timeout)) {
+   printk(KERN_ERR "dma_sync_wait_timeout!\n");
+   return DMA_ERROR;
+   }
+   } while (status == DMA_IN_PROGRESS);
+
+   return status;
+}
+EXPORT_SYMBOL(dma_sync_wait);
+
 /**
  * dma_chan_cleanup - release a DMA channel's resources
  * @kref: kernel reference structure that contains the DMA channel device
@@ -322,6 +342,28 @@ int dma_async_device_register(struct dma_device *device)
if (!device)
return -ENODEV;
 
+   /* validate device routines */
+   BUG_ON(dma_has_cap(DMA_MEMCPY, device->cap_mask) &&
+   !device->device_prep_dma_memcpy);
+   BUG_ON(dma_has_cap(DMA_XOR, device->cap_mask) &&
+   !device->device_prep_dma_xor);
+   BUG_ON(dma_has_cap(DMA_ZERO_SUM, device->cap_mask) &&
+   !device->device_prep_dma_zero_sum);
+   BUG_ON(dma_has_cap(DMA_MEMSET, device->cap_mask) &&
+   !device->device_prep_dma_memset);
+   BUG_ON(dma_has_cap(DMA_ZERO_SUM, device->cap_mask) &&
+   !device->device_prep_dma_interrupt);
+
+   BUG_ON(!device->device_alloc_chan_resources);
+   BUG_ON(!device->device_free_chan_resources);
+   BUG_ON(!device->device_tx_submit);
+   BUG_ON(!device->device_set_dest);
+   BUG_ON(!device->device_set_src);
+   BUG_ON(!device->device_dependency_added);
+   BUG_ON(!device->device_is_tx_complete);
+   BUG_ON(!device->device_issue_pending);
+   BUG_ON(!device->dev);
+
init_completion(&device->done);
kref_init(&device->refcount);
device->dev_id = id++;
@@ -397,6 +439,146 @@ void dma_async_device_unregister(struct dma_device 
*device)
 }
 EXPORT_SYMBOL(dma_async_device_unregister);
 
+/**
+ * dma_async_memcpy_buf_to_buf - offloaded copy between virtual addresses
+ * @chan: DMA channel to offload copy to
+ * @dest: destination address (virtual)
+ * @src: source address (virtual)
+ * @len: length
+ *
+ * Both @dest and @src must be mappable to a bus address according to the
+ * DMA mapping API rules for streaming mappings.
+ * Both @dest and @src must stay memory resident (kernel memory or locked
+ * user space pages).
+ */
+dma_cookie_t dma_async_memcpy_buf_to_buf(struct dma_chan *chan,
+void *dest, void *src, size_t len)
+{
+   struct dma_device *dev = chan->device;
+   struct dma_async_tx_descriptor *tx;
+   dma_addr_t addr;
+   dma_cookie_t cookie;
+   int cpu;
+
+   tx = dev->device_prep_dma_memcpy(chan, len, 0);
+   if (!tx)
+   return -ENOMEM;
+
+   tx->ack = 1;
+   tx->callback = NULL;
+   addr = dma_map_single(dev->dev, src, len, 

[PATCH 00/16] raid acceleration and asynchronous offload api for 2.6.22

2007-05-01 Thread Dan Williams
I am pleased to release this latest spin of the raid acceleration
patches for merge consideration.  This release aims to address all
pending review items including MD bug fixes and async_tx api changes
from Neil, and concerns on channel management from Chris and others.

Data integrity tests using home grown scripts and 'iozone -V' are
passing.  I am open to suggestions for additional testing criteria.  I
have also verified that git bisect is not broken by this set.

The short log below highlights the most recent changes.  The patches
will be sent as a reply to this message, and they are also available via
git:

git pull git://lost.foo-projects.org/~dwillia2/git/iop md-accel-linus

Additional comments and feedback welcome.

Thanks,
Dan

--
01/16: dmaengine: add base support for the async_tx api
* convert channel capabilities to a 'cpumask_t like' bitmap
02/16: dmaengine: move channel management to the client
* this patch is new to this series
03/16: ARM: Add drivers/dma to arch/arm/Kconfig
04/16: dmaengine: add the async_tx api
* remove the per operation type list, and distribute operation
  capabilities evenly amongst the available channels
* simplify async_tx_find_channel to optimize the fast path
05/16: md: add raid5_run_ops and support routines
* explicitly handle the 2-disk raid5 case (xor becomes memcpy)
* fix race between async engines and bi_end_io call for reads,
  Neil Brown
* remove unnecessary spin_lock from ops_complete_biofill
* remove test_and_set/test_and_clear BUG_ONs, Neil Brown
* remove explicit interrupt handling, Neil Brown
06/16: md: use raid5_run_ops for stripe cache operations
07/16: md: move write operations to raid5_run_ops
* remove test_and_set/test_and_clear BUG_ONs, Neil Brown
08/16: md: move raid5 compute block operations to raid5_run_ops
* remove the req_compute BUG_ON
09/16: md: move raid5 parity checks to raid5_run_ops
* remove test_and_set/test_and_clear BUG_ONs, Neil Brown
10/16: md: satisfy raid5 read requests via raid5_run_ops
* cleanup to_read and to_fill accounting
* do not fail reads that have reached the cache
11/16: md: use async_tx and raid5_run_ops for raid5 expansion operations
12/16: md: move raid5 io requests to raid5_run_ops
13/16: md: remove raid5 compute_block and compute_parity5
14/16: dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines
* fix locking bug in iop_adma_alloc_chan_resources, Benjamin
  Herrenschmidt
* convert capabilities over to dma_cap_mask_t
15/16: iop13xx: Surface the iop13xx adma units to the iop-adma driver
16/16: iop3xx: Surface the iop3xx DMA and AAU units to the iop-adma driver

(previous release: http://marc.info/?l=linux-raid&m=117463257423193&w=2)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 1/9] Containers (V9): Basic container framework

2007-05-01 Thread Paul Jackson
Balbir wrote:
> Would it be possible to extract those test cases and integrate them
> with a testing framework like LTP? Do you have any regression test
> suite for cpusets that can be made available publicly so that
> any changes to cpusets can be validated?

There are essentially two sorts of cpuset regression tests of interest.

I have one such test, and the batch scheduler developers have various
tests of their batch schedulers.

1) Testing batch schedulers against cpusets:

I doubt that the batch scheduler developers would be able to
extract a cpuset test from their tests, or be able to share it if
they did.  Their tests tend to be large tests of batch schedulers,
and only incidentally test cpusets -- if we break cpusets,
in sometimes even subtle ways that they happen to depend on,
we break them.

Sometimes there is no way to guess exactly what sorts of changes
will break their code; we'll just have to schedule at least one
run through one or more of them that rely heavily on cpusets
before a change as big as rebasing cpusets on containers is
reasonably safe.  This test cycle won't be all that easy, so I'd
wait until we are pretty close to what we think should be taken
into the mainline kernel.

I suppose I will have to be the one co-ordinating this test,
as I am the only one I know with a presence in both camps.

Once this test is done, from then forward, if we break them,
we'll just have to deal with it as we do now, when the breakage
shows up well down stream from the main kernel tree, at the point
that a major batch scheduler release runs into a major distribution
release containing the breakage.  There is no practical way that I
can see, as an ongoing basis, to continue testing for such breakage
with every minor change to cpuset related code in the kernel.  Any
breakage found this way is dealt with by changes in user level code.

Once again, I have bcc'd one or more developers of batch schedulers,
so they can see what nonsense I am spouting about them now ;).

2) Testing cpusets with a specific test.

There I can do better.  Attached is the cpuset regression test I
use.  It requires at least 4 cpus and 2 memory nodes to do anything
useful.  It is copyright by SGI, released under GPL license.

This regression test is the primary cpuset test upon which I
relied during the development of cpusets, and continue to rely.
Except for one subtle race condition in the test itself, it has
not changed in the last two to three years.

This test requires no user level code not found in an ordinary
distro.  It does require the taskset and numactl commands,
for the purposes of testing certain interactions with them.
It assumes that there are not other cpusets currently setup in
the system that happen to conflict with the ones it creates.

See further comments within the test script itself.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401


cpuset_test
Description: Binary data


Re: [patch 01/10] compiler: define __attribute_unused__

2007-05-01 Thread David Rientjes
On Wed, 2 May 2007, Rusty Russell wrote:

> Adding this macro doesn't give us anything that simply saying
> "__attribute__((unused))" doesn't give.  But it does add a layer of
> kernel-specific indirection.
> 

That's obviously true since we're defining __attribute_unused__ to be 
__attribute__((unused)).

We were trying to clean up the misconception that the current 
__attribute_used__ was created to suppress warnings when, in fact, that 
was not its purpose.  It was created to emit the code for a function that 
appeared to be unreferenced and only suppressed warnings as a side-effect 
in gcc <3.4.

> If we're going to get kernel-specific, I'd prefer to see:
> 
> __needed: suppress warning and don't discard,

That would be the current definition of __attribute_used__ (i.e. we're 
saying that we use the function in inline assembly even though it appears 
we don't use it at all).

> __unneeded: suppress warning and might discard.
> 

That would be the patched definition of __attribute_unused__.

So let's go back to the problem this was initially supposed to fix from 
arch/i386/pci/init.c:

static __init int pci_access_init(void)
{
int type = 0;

#ifdef CONFIG_PCI_DIRECT
type = pci_direct_probe();
#endif
#ifdef CONFIG_PCI_MMCONFIG
pci_mmcfg_init(type);
#endif
...

and type is unreferenced for the remainder of the function.  Obviously we 
could add #if defined(CONFIG_PCI_DIRECT) || defined(CONFIG_PCI_MMCONFIG) 
before the declaration of 'type', but that becomes sloppy pretty quickly.

The patched version makes this:

int type __attribute_unused__ = 0;

which definitely tells you that you're using a compiler attribute that 
will be attached to that automatic.  In your case:

int type __unneeded = 0;

doesn't say anything in this case.  It doesn't resemble any attribute that 
a programmer might be familiar with and begs the question of why we've 
declared it if it's truly "unneeded"?

By the way, there are tons of these instances where __attribute__((used)) 
needs to be added in driver code to suppress unreferenced warnings.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/3] Clocksource / clockevent updates

2007-05-01 Thread Thomas Gleixner
On Tue, 2007-05-01 at 17:33 -0700, Andrew Morton wrote:
> On Mon, 30 Apr 2007 10:43:31 -
> Thomas Gleixner <[EMAIL PROTECTED]> wrote:
> 
> > Andrew,
> > 
> > please pick up the following updates to clocksource / clockevents:
> > 
> > - Fixups to the resume logic
> > - Keep TSC stable, when lapic_timer_c2_ok is set
> > 
> 
> Should we be targetting these at 2.6.20.x?

2.6.21.x ?

Hmm. They should get some testing first, but otherwise yes.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 01/10] compiler: define __attribute_unused__

2007-05-01 Thread Andrew Morton
On Tue, 1 May 2007 22:53:52 -0700 (PDT) David Rientjes <[EMAIL PROTECTED]> 
wrote:

> On Wed, 2 May 2007, Alexey Dobriyan wrote:
> 
> > On Tue, May 01, 2007 at 09:28:18PM -0700, David Rientjes wrote:
> > > +#define __attribute_unused__ __attribute__((unused))
> > 
> > Suggest __unused which is shorter and looks compiler-neutral.
> > 
> 
> So you would also suggest renaming __attribute_used__ and all 48 of its 
> uses to __used?

Or __needed or __unneeded.  None of them mean much to me and I'd be forever
going back to the definition to work out what was intended.

We're still in search of a name, IMO.  But once we have it, yeah, we should
update all present users.  We can do that over time: retain the old and new
definitions for a while.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/22] pollfs: filesystem abstraction for pollable objects

2007-05-01 Thread Andrew Morton
On Wed, 02 May 2007 02:22:35 -0300 Davi Arnaut <[EMAIL PROTECTED]> wrote:

> This patch set introduces a new file system for the delivery of pollable
> events through file descriptors. To the detriment of debugability, pollable
> objects are a nice adjunct to nonblocking/epoll/event-based servers.
> 
> The pollfs filesystem abstraction provides better mechanisms needed for
> creating and maintaining pollable objects. Also the pollable futex approach
> is far superior (send and receive events from userspace or kernel) to eventfd
> and fixes (supercedes) FUTEX_FD at the same time.
> 
> The (non) blocking and object size (user <-> kernel) semantics and are handled
> internally, decoupling the core filesystem from the "subsystems" (mere push 
> and
> pop operations).
> 
> Currently implemented waitable "objects" are: signals, futexes, ai/o blocks 
> and
> timers.

Well that throws a spanner in the signalfd works.  The code _looks_ nice
and simple and clean from a quick scan.

David, could you provide some feedback please?  The patches are stunningly
free of comments, but you used to do that to me pretty often so my sympathy
is limited ;)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 14/22] pollfs: pollable futex

2007-05-01 Thread Eric Dumazet

Davi Arnaut a écrit :

Asynchronously wait for FUTEX_WAKE operation on a futex if it still contains
a given value. There can be only one futex wait per file descriptor. However,
it can be rearmed (possibly at a different address) anytime.

The pollable futex approach is far superior (send and receive events from
userspace or kernel) to eventfd and fixes (supercedes) FUTEX_FD at the same 
time.

Building block for pollable semaphores and user-defined events.

Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>

---
 fs/pollfs/Makefile |1 
 fs/pollfs/futex.c  |  154 +

 init/Kconfig   |7 ++
 3 files changed, 162 insertions(+)

Index: linux-2.6/fs/pollfs/Makefile
===
--- linux-2.6.orig/fs/pollfs/Makefile
+++ linux-2.6/fs/pollfs/Makefile
@@ -3,3 +3,4 @@ pollfs-y := file.o
 
 pollfs-$(CONFIG_POLLFS_SIGNAL) += signal.o

 pollfs-$(CONFIG_POLLFS_TIMER) += timer.o
+pollfs-$(CONFIG_POLLFS_FUTEX) += futex.o
Index: linux-2.6/fs/pollfs/futex.c
===
--- /dev/null
+++ linux-2.6/fs/pollfs/futex.c
@@ -0,0 +1,154 @@
+/*
+ * pollable futex
+ *
+ * Copyright (C) 2007 Davi E. M. Arnaut
+ *
+ * Licensed under the GNU GPL. See the file COPYING for details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct futex_event {
+   union {
+   void __user *addr;
+   u64 padding;
+   };
+   int val;
+};


Hum... Here we might have a problem with 64 bit futexes, or private futexes

So I believe this interface is not well defined and not expandable: in case of 
future additions to futexes, an old application compiled with an old pollable 
futex_event type might fail.




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Natsemi DP83815 driver spaming

2007-05-01 Thread Rafał Bilski
>> > >  * 2) check for sudden death of the NIC:
>> > >  *It seems that a reference set for this chip went out with
>> incorrect info,
>> > >  *and there exist boards that aren't quite right.  An
>> unexpected voltage
>> > >  *drop can cause the PHY to get itself in a weird state
>> (basically reset).
>> > >  *NOTE: this only seems to affect revC chips.
>>
>> > Code commented out and NIC is working OK. Strange.
>> > eth0: DSPCFG accepted after 0 usec.
>> > eth0: link up.
>> > eth0: Setting full-duplex based on negotiated link capability.
>> > dspcfg = 0x  np->dspcfg = 0x5060
>>
>> Oh, that's entertaining.  I have to confess that I've never seen an that
>> triggered the workaround before - adding the maintainer, Tim Hockin, who
>> may be able to shed some light on the expected behaviour here?
> 
> It's been quite a while since I dealt with this issue, so I am going
> on faulty memory.  A particular reference design for this chip had bad
> resistor values, or something similar.  That caused the chip to get
> very very confused and need a reset.
Can You send me documentation? I can't find anything in datasheet. 
I will replace bad resitors with correct ones.
> So the driver is finding your chip to be hosed over and over again.
> dspcfg = 0x00 is bad.  I'd be very surprised if you don't get
> other wierdness - bad performance or noise or who knows what.
No. It is much better. Much less packets need to be retransmitted.  
I was blaming w3cache.tkdami.net earlier.
> You could take out the error message and just let the driver do it's
> thing, or you can try to run with that logic removed.  But I'd measure
> both and see what they do.  Specifically - look  for packet errors.
With code commented out I have 1 error / 3 transmitted packets from 
DP83815C. I have 1 error / 10 transmitted packets to DP83815C. Maybe 
it works at all because I have short cable, only 10m long.
I don't remember any errors with plain 2.6.21.1.
> Tim
Rafał


--
NIE KUPUJ!!!
...zanim nie porownasz cen >> http://link.interia.pl/f1a5e



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 01/10] compiler: define __attribute_unused__

2007-05-01 Thread David Rientjes
On Wed, 2 May 2007, Alexey Dobriyan wrote:

> On Tue, May 01, 2007 at 09:28:18PM -0700, David Rientjes wrote:
> > +#define __attribute_unused__   __attribute__((unused))
> 
> Suggest __unused which is shorter and looks compiler-neutral.
> 

So you would also suggest renaming __attribute_used__ and all 48 of its 
uses to __used?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 02/22] pollfs: file system operations

2007-05-01 Thread Davi Arnaut
The key feature of the pollfs file operations is to internally handle
pollable (waitable) resources as files without exporting complex and
bug-prone underlying (VFS) implementation details.

All resource handlers are required to implement the read, write, poll,
release operations and must not block.

Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>

---
 fs/Makefile|1 
 fs/pollfs/Makefile |2 
 fs/pollfs/file.c   |  238 +
 init/Kconfig   |6 +
 4 files changed, 247 insertions(+)

Index: linux-2.6/fs/pollfs/file.c
===
--- /dev/null
+++ linux-2.6/fs/pollfs/file.c
@@ -0,0 +1,238 @@
+/*
+ * Copyright (C) 2007 Davi E. M. Arnaut
+ *
+ * Licensed under the GNU GPL. See the file COPYING for details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define POLLFS_MAGIC 0x9a6afcd
+
+MODULE_LICENSE("GPL");
+
+/* pollfs vfsmount entry */
+static struct vfsmount *pfs_mnt;
+
+/* pollfs file operations */
+static const struct file_operations pfs_fops;
+
+static inline ssize_t
+pfs_read_nonblock(const struct pfs_operations *fops, void *data,
+ void __user *obj, size_t nr)
+{
+   ssize_t count = 0, res = 0;
+
+   do {
+   res = fops->read(data, obj);
+   if (res)
+   break;
+   count++;
+   obj += fops->rsize;
+   } while (--nr);
+
+   if (count)
+   return count * fops->rsize;
+   else if (res)
+   return res;
+   else
+   return -EAGAIN;
+}
+
+static inline ssize_t
+pfs_read_block(const struct pfs_operations *fops, void *data,
+  wait_queue_head_t *wait, void __user *obj, size_t nr)
+{
+   ssize_t count;
+
+   do {
+   count = pfs_read_nonblock(fops, data, obj, nr);
+   if (count != -EAGAIN)
+   break;
+   count = wait_event_interruptible((*wait), fops->poll(data));
+   } while (!count);
+
+   return count;
+}
+
+static ssize_t pfs_read(struct file *filp, char __user *buf,
+   size_t count, loff_t * pos)
+{
+   size_t nevents = count;
+   struct pfs_file *pfs = filp->private_data;
+   const struct pfs_operations *fops = pfs->fops;
+
+   if (fops->rsize)
+   nevents /= fops->rsize;
+   else
+   nevents = 1;
+
+   if (!nevents)
+   return -EINVAL;
+
+   if (filp->f_flags & O_NONBLOCK)
+   return pfs_read_nonblock(fops, pfs->data, buf, nevents);
+   else
+   return pfs_read_block(fops, pfs->data, pfs->wait, buf, nevents);
+}
+
+static ssize_t pfs_write(struct file *filp, const char __user *buf,
+size_t count, loff_t * ppos)
+{
+   ssize_t res = 0;
+   size_t nevents = count;
+   struct pfs_file *pfs = filp->private_data;
+   const struct pfs_operations *fops = pfs->fops;
+
+   if (fops->wsize)
+   nevents /= fops->wsize;
+   else
+   nevents = 1;
+
+   if (!nevents)
+   return -EINVAL;
+
+   count = 0;
+
+   do {
+   res = fops->write(pfs->data, buf);
+   if (res)
+   break;
+   count++;
+   buf += fops->wsize;
+   } while (--nevents);
+
+   if (count)
+   return count * fops->wsize;
+   else if (res)
+   return res;
+   else
+   return 0;
+}
+
+static unsigned int pfs_poll(struct file *filp, struct poll_table_struct *wait)
+{
+   int ret = 0;
+   struct pfs_file *pfs = filp->private_data;
+
+   poll_wait(filp, pfs->wait, wait);
+
+   if (pfs->fops->poll)
+   ret = pfs->fops->poll(pfs->data);
+   else
+   ret = POLLIN;
+
+   return ret;
+}
+
+static int pfs_mmap(struct file *filp, struct vm_area_struct *vma)
+{
+   struct pfs_file *pfs = filp->private_data;
+
+   return (pfs->fops->mmap) ? pfs->fops->mmap(pfs->data, vma) : -ENODEV;
+}
+
+static int pfs_release(struct inode *inode, struct file *filp)
+{
+   struct pfs_file *pfs = filp->private_data;
+
+   return pfs->fops->release(pfs->data);
+}
+
+static const struct file_operations pfs_fops = {
+   .poll = pfs_poll,
+   .mmap = pfs_mmap,
+   .read = pfs_read,
+   .write = pfs_write,
+   .release = pfs_release
+};
+
+long pfs_open(struct pfs_file *pfs)
+{
+   int fd;
+   struct file *filp;
+   const struct pfs_operations *fops = pfs->fops;
+
+   if (IS_ERR(pfs_mnt))
+   return -ENOSYS;
+
+   if (!fops->poll || (!fops->read || !fops->write))
+   return -EINVAL;
+
+   fd = get_unused_fd();
+   if (fd < 0)
+   return -ENFILE;
+
+   filp = get_empty_filp();
+ 

[PATCH] x86_64: O_EXCL on /dev/mcelog

2007-05-01 Thread Tim Hockin
From: Tim Hockin <[EMAIL PROTECTED]>

Background:
 /dev/mcelog is a clear-on-read interface.  It is currently possible for
 multiple users to open and read() the device.  Users are protected from
 each other during any one read, but not across reads.

Description:
 This patch adds support for O_EXCL to /dev/mcelog.  If a user opens the
 device with O_EXCL, no other user may open the device (EBUSY).  Likewise,
 any user that tries to open the device with O_EXCL while another user has
 the device will fail (EBUSY).

Result:
 Applications can get exclusive access to /dev/mcelog.  Applications that
 do not care will be unchanged.

Alternatives:
 A simpler choice would be to only allow one open() at all, regardless of
 O_EXCL.

Testing:
 I wrote an application that opens /dev/mcelog with O_EXCL and observed
 that any other app that tried to open /dev/mcelog would fail until the
 exclusive app had closed the device.

Caveats:
 None.

Patch:
 This patch is against 2.6.21-rc7.

Signed-off-by: Tim Hockin <[EMAIL PROTECTED]>

---

This is the first version version of this patch.  The simpler alternative
of only one open() sounds better to me, but becomes a net change in
behavior.


diff -pruN linux-2.6.20+th/arch/x86_64/kernel/mce.c 
linux-2.6.20+th1.5/arch/x86_64/kernel/mce.c
--- linux-2.6.20+th/arch/x86_64/kernel/mce.c2007-04-27 14:19:08.0 
-0700
+++ linux-2.6.20+th1.5/arch/x86_64/kernel/mce.c 2007-05-01 21:53:10.0 
-0700
@@ -465,6 +465,40 @@ void __cpuinit mcheck_init(struct cpuinf
  * Character device to read and clear the MCE log.
  */
 
+static DEFINE_SPINLOCK(mce_state_lock);
+static int open_count; /* #times opened */
+static int open_exclu; /* already open exclusive? */
+
+static int mce_open(struct inode *inode, struct file *file)
+{
+   spin_lock(&mce_state_lock);
+
+   if (open_exclu || (open_count && (file->f_flags & O_EXCL))) {
+   spin_unlock(&mce_state_lock);
+   return -EBUSY;
+   }
+
+   if (file->f_flags & O_EXCL)
+   open_exclu = 1;
+   open_count++;
+
+   spin_unlock(&mce_state_lock);
+
+   return 0;
+}
+
+static int mce_release(struct inode *inode, struct file *file)
+{
+   spin_lock(&mce_state_lock);
+
+   open_count--;
+   open_exclu = 0;
+
+   spin_unlock(&mce_state_lock);
+
+   return 0;
+}
+
 static void collect_tscs(void *data) 
 { 
unsigned long *cpu_tsc = (unsigned long *)data;
@@ -553,6 +587,8 @@ static int mce_ioctl(struct inode *i, st
 }
 
 static const struct file_operations mce_chrdev_ops = {
+   .open = mce_open,
+   .release = mce_release,
.read = mce_read,
.ioctl = mce_ioctl,
 };
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 06/22] pollfs: export the plsignal system call

2007-05-01 Thread Davi Arnaut
Export the new plsignal syscall prototype. While there, make it "conditional".

Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>

---
 include/linux/syscalls.h |2 ++
 kernel/sys_ni.c  |1 +
 2 files changed, 3 insertions(+)

Index: linux-2.6/include/linux/syscalls.h
===
--- linux-2.6.orig/include/linux/syscalls.h
+++ linux-2.6/include/linux/syscalls.h
@@ -605,4 +605,6 @@ asmlinkage long sys_getcpu(unsigned __us
 
 int kernel_execve(const char *filename, char *const argv[], char *const 
envp[]);
 
+asmlinkage long sys_plsignal(const sigset_t __user * set);
+
 #endif
Index: linux-2.6/kernel/sys_ni.c
===
--- linux-2.6.orig/kernel/sys_ni.c
+++ linux-2.6/kernel/sys_ni.c
@@ -112,6 +112,7 @@ cond_syscall(sys_vm86old);
 cond_syscall(sys_vm86);
 cond_syscall(compat_sys_ipc);
 cond_syscall(compat_sys_sysctl);
+cond_syscall(sys_plsignal);
 
 /* arch-specific weak syscall entries */
 cond_syscall(sys_pciconfig_read);

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 04/22] pollfs: pollable signal

2007-05-01 Thread Davi Arnaut
Retrieve multiple per-process signals through a file descriptor. The mask
of signals can be changed at any time. Also, the compat code can be kept
very simple.

Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>

---
 fs/pollfs/Makefile |2 
 fs/pollfs/signal.c |  144 +
 init/Kconfig   |7 ++
 3 files changed, 153 insertions(+)

Index: linux-2.6/fs/pollfs/signal.c
===
--- /dev/null
+++ linux-2.6/fs/pollfs/signal.c
@@ -0,0 +1,144 @@
+/*
+ * sigtimedwait4, retrieve multiple signals with one call.
+ *
+ * Copyright (C) 2007 Davi E. M. Arnaut
+ *
+ * Licensed under the GNU GPL. See the file COPYING for details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct pfs_signal {
+   sigset_t set;
+   spinlock_t lock;
+   struct task_struct *task;
+   struct pfs_file file;
+};
+
+static void inline sigset_adjust(sigset_t *set)
+{
+   /* SIGKILL and SIGSTOP cannot be caught, blocked, or ignored */
+   sigdelsetmask(set, sigmask(SIGKILL) | sigmask(SIGSTOP));
+
+   /* Signals we don't want to dequeue */
+   signotset(set);
+}
+
+static ssize_t read(struct pfs_signal *evs, siginfo_t __user *infoup)
+{
+   int signo;
+   siginfo_t info;
+
+   signo = dequeue_signal_lock(evs->task, &evs->set, &info);
+   if (!signo)
+   return -EAGAIN;
+
+   if (copy_siginfo_to_user(infoup, &info))
+   return -EFAULT;
+
+   return 0;
+}
+
+static ssize_t write(struct pfs_signal *evs, const sigset_t __user *uset)
+{
+   sigset_t set;
+
+   if (copy_from_user(&set, uset, sizeof(sigset_t)))
+   return -EFAULT;
+
+   sigset_adjust(&set);
+
+   spin_lock_irq(&evs->lock);
+   sigemptyset(&evs->set);
+   sigorsets(&evs->set, &evs->set, &set);
+   spin_unlock_irq(&evs->lock);
+
+   return 0;
+}
+
+static int poll(struct pfs_signal *evs)
+{
+   int ret = 0;
+   sigset_t pending;
+   unsigned long flags;
+
+   rcu_read_lock();
+
+   if (!lock_task_sighand(evs->task, &flags))
+   goto out_unlock;
+
+   sigorsets(&pending, &evs->task->pending.signal,
+ &evs->task->signal->shared_pending.signal);
+
+   unlock_task_sighand(evs->task, &flags);
+
+   spin_lock_irqsave(&evs->lock, flags);
+   signandsets(&pending, &pending, &evs->set);
+   spin_unlock_irqrestore(&evs->lock, flags);
+
+   if (!sigisemptyset(&pending))
+   ret = POLLIN;
+
+out_unlock:
+   rcu_read_unlock();
+
+   return ret;
+}
+
+static int release(struct pfs_signal *evs)
+{
+   put_task_struct(evs->task);
+   kfree(evs);
+
+   return 0;
+}
+
+static const struct pfs_operations signal_ops = {
+   .read   = PFS_READ(read, struct pfs_signal, siginfo_t),
+   .write  = PFS_WRITE(write, struct pfs_signal, sigset_t),
+   .poll   = PFS_POLL(poll, struct pfs_signal),
+   .release= PFS_RELEASE(release, struct pfs_signal),
+   .rsize  = sizeof(siginfo_t),
+   .wsize  = sizeof(sigset_t),
+};
+
+asmlinkage long sys_plsignal(const sigset_t __user *uset)
+{
+   long error;
+   struct pfs_signal *evs;
+
+   evs = kmalloc(sizeof(*evs), GFP_KERNEL);
+   if (!evs)
+   return -ENOMEM;
+
+   if (copy_from_user(&evs->set, uset, sizeof(sigset_t))) {
+   kfree(evs);
+   return -EFAULT;
+   }
+
+   spin_lock_init(&evs->lock);
+
+   evs->task = current;
+   get_task_struct(current);
+
+   sigset_adjust(&evs->set);
+
+   evs->file.data = evs;
+   evs->file.fops = &signal_ops;
+   evs->file.wait = &evs->task->sigwait;
+
+   error = pfs_open(&evs->file);
+   if (error < 0)
+   release(evs);
+
+   return error;
+}
Index: linux-2.6/fs/pollfs/Makefile
===
--- linux-2.6.orig/fs/pollfs/Makefile
+++ linux-2.6/fs/pollfs/Makefile
@@ -1,2 +1,4 @@
 obj-$(CONFIG_POLLFS) += pollfs.o
 pollfs-y := file.o
+
+pollfs-$(CONFIG_POLLFS_SIGNAL) += signal.o
Index: linux-2.6/init/Kconfig
===
--- linux-2.6.orig/init/Kconfig
+++ linux-2.6/init/Kconfig
@@ -469,6 +469,13 @@ config POLLFS
help
 Pollfs support
 
+config POLLFS_SIGNAL
+   bool "Enable pollfs signal" if EMBEDDED
+   default y
+   depends on POLLFS
+   help
+Pollable signal support
+
 config SHMEM
bool "Use full shmem filesystem" if EMBEDDED
default y

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 05/22] pollfs: pollable signal compat code

2007-05-01 Thread Davi Arnaut
Compat handlers for the pollable signal operations. Later the0 compat operations
can operate on a per call basis.

Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>

---
 fs/pollfs/signal.c |   85 +
 1 file changed, 85 insertions(+)

Index: linux-2.6/fs/pollfs/signal.c
===
--- linux-2.6.orig/fs/pollfs/signal.c
+++ linux-2.6/fs/pollfs/signal.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct pfs_signal {
sigset_t set;
@@ -48,6 +49,24 @@ static ssize_t read(struct pfs_signal *e
return 0;
 }
 
+#ifdef CONFIG_COMPAT
+static ssize_t compat_read(struct pfs_signal *evs,
+  struct compat_siginfo __user *infoup)
+{
+   int signo;
+   siginfo_t info;
+
+   signo = dequeue_signal_lock(evs->task, &evs->set, &info);
+   if (!signo)
+   return -EAGAIN;
+
+   if (copy_siginfo_to_user32(infoup, &info))
+   return -EFAULT;
+
+   return 0;
+}
+#endif
+
 static ssize_t write(struct pfs_signal *evs, const sigset_t __user *uset)
 {
sigset_t set;
@@ -65,6 +84,28 @@ static ssize_t write(struct pfs_signal *
return 0;
 }
 
+#ifdef CONFIG_COMPAT
+static ssize_t compat_write(struct pfs_signal *evs,
+   const compat_sigset_t __user *uset)
+{
+   sigset_t set;
+   compat_sigset_t cset;
+
+   if (copy_from_user(&cset, uset, sizeof(compat_sigset_t)))
+   return -EFAULT;
+
+   sigset_from_compat(&set, &cset);
+   sigset_adjust(&set);
+
+   spin_lock_irq(&evs->lock);
+   sigemptyset(&evs->set);
+   sigorsets(&evs->set, &evs->set, &set);
+   spin_unlock_irq(&evs->lock);
+
+   return 0;
+}
+#endif
+
 static int poll(struct pfs_signal *evs)
 {
int ret = 0;
@@ -142,3 +183,47 @@ asmlinkage long sys_plsignal(const sigse
 
return error;
 }
+
+#ifdef CONFIG_COMPAT
+static const struct pfs_operations compat_signal_ops = {
+   /* .read= PFS_READ(compat_read, struct pfs_signal, struct 
compat_siginfo), */
+   .write  = PFS_WRITE(compat_write, struct pfs_signal, 
compat_sigset_t),
+   .poll   = PFS_POLL(poll, struct pfs_signal),
+   .release= PFS_RELEASE(release, struct pfs_signal),
+   /* .rsize   = sizeof(compat_siginfo_t), */
+   .wsize  = sizeof(sigset_t)
+};
+
+asmlinkage long compat_plsignal(const compat_sigset_t __user *uset)
+{
+   long error;
+   compat_sigset_t cset;
+   struct pfs_signal *evs;
+
+   if (copy_from_user(&cset, uset, sizeof(compat_sigset_t)))
+   return -EFAULT;
+
+   evs = kmalloc(sizeof(*evs), GFP_KERNEL);
+   if (!evs)
+   return -ENOMEM;
+
+   spin_lock_init(&evs->lock);
+
+   evs->task = current;
+   get_task_struct(current);
+
+   sigset_from_compat(&evs->set, &cset);
+   sigset_adjust(&evs->set);
+
+   evs->file.data = evs;
+   evs->file.fops = &compat_signal_ops;
+   evs->file.wait = &evs->task->sigwait;
+
+   error = pfs_open(&evs->file);
+
+   if (error < 0)
+   release(evs);
+
+   return error;
+}
+#endif

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 01/10] compiler: define __attribute_unused__

2007-05-01 Thread Rusty Russell
On Tue, 2007-05-01 at 21:28 -0700, David Rientjes wrote:
> For all supported versions of gcc (major version 3 and above), functions
> and variables may be declared with __attribute__((unused)) to suppress
> warnings if they are declared but unused.

Adding this macro doesn't give us anything that simply saying
"__attribute__((unused))" doesn't give.  But it does add a layer of
kernel-specific indirection.

If we're going to get kernel-specific, I'd prefer to see:

__needed: suppress warning and don't discard,
__unneeded: suppress warning and might discard.

For me this fits better with how I think.

Rusty.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 17/22] pollfs: x86_64, wire up the plfutex system call

2007-05-01 Thread Davi Arnaut
Make the plfutex syscall available to user-space on x86_64.

Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>

---
 arch/x86_64/ia32/ia32entry.S |1 +
 include/asm-x86_64/unistd.h  |4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

Index: linux-2.6/arch/x86_64/ia32/ia32entry.S
===
--- linux-2.6.orig/arch/x86_64/ia32/ia32entry.S
+++ linux-2.6/arch/x86_64/ia32/ia32entry.S
@@ -721,4 +721,5 @@ ia32_sys_call_table:
.quad sys_epoll_pwait
.quad sys_plsignal  /* 320 */
.quad sys_pltimer
+   .quad sys_plfutex
 ia32_syscall_end:  
Index: linux-2.6/include/asm-x86_64/unistd.h
===
--- linux-2.6.orig/include/asm-x86_64/unistd.h
+++ linux-2.6/include/asm-x86_64/unistd.h
@@ -623,8 +623,10 @@ __SYSCALL(__NR_move_pages, sys_move_page
 __SYSCALL(__NR_plsignal, sys_plsignal)
 #define __NR_pltimer   281
 __SYSCALL(__NR_pltimer, sys_pltimer)
+#define __NR_plfutex   282
+__SYSCALL(__NR_plfutex, sys_plfutex)
 
-#define __NR_syscall_max __NR_pltimer
+#define __NR_syscall_max __NR_plfutex
 
 #ifndef __NO_STUBS
 #define __ARCH_WANT_OLD_READDIR

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 13/22] pollfs: asynchronous futex wait

2007-05-01 Thread Davi Arnaut
Break apart and export the futex_wait function in order to be able to
associate (wait for) a futex with other resources.

Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>

---
 include/linux/futex.h |   80 ++
 kernel/futex.c|  130 ++
 2 files changed, 118 insertions(+), 92 deletions(-)

Index: linux-2.6/kernel/futex.c
===
--- linux-2.6.orig/kernel/futex.c
+++ linux-2.6/kernel/futex.c
@@ -55,81 +55,6 @@
 #define FUTEX_HASHBITS (CONFIG_BASE_SMALL ? 4 : 8)
 
 /*
- * Futexes are matched on equal values of this key.
- * The key type depends on whether it's a shared or private mapping.
- * Don't rearrange members without looking at hash_futex().
- *
- * offset is aligned to a multiple of sizeof(u32) (== 4) by definition.
- * We set bit 0 to indicate if it's an inode-based key.
- */
-union futex_key {
-   struct {
-   unsigned long pgoff;
-   struct inode *inode;
-   int offset;
-   } shared;
-   struct {
-   unsigned long address;
-   struct mm_struct *mm;
-   int offset;
-   } private;
-   struct {
-   unsigned long word;
-   void *ptr;
-   int offset;
-   } both;
-};
-
-/*
- * Priority Inheritance state:
- */
-struct futex_pi_state {
-   /*
-* list of 'owned' pi_state instances - these have to be
-* cleaned up in do_exit() if the task exits prematurely:
-*/
-   struct list_head list;
-
-   /*
-* The PI object:
-*/
-   struct rt_mutex pi_mutex;
-
-   struct task_struct *owner;
-   atomic_t refcount;
-
-   union futex_key key;
-};
-
-/*
- * We use this hashed waitqueue instead of a normal wait_queue_t, so
- * we can wake only the relevant ones (hashed queues may be shared).
- *
- * A futex_q has a woken state, just like tasks have TASK_RUNNING.
- * It is considered woken when list_empty(&q->list) || q->lock_ptr == 0.
- * The order of wakup is always to make the first condition true, then
- * wake up q->waiters, then make the second condition true.
- */
-struct futex_q {
-   struct list_head list;
-   wait_queue_head_t waiters;
-
-   /* Which hash list lock to use: */
-   spinlock_t *lock_ptr;
-
-   /* Key which the futex is hashed on: */
-   union futex_key key;
-
-   /* For fd, sigio sent using these: */
-   int fd;
-   struct file *filp;
-
-   /* Optional priority inheritance state: */
-   struct futex_pi_state *pi_state;
-   struct task_struct *task;
-};
-
-/*
  * Split the global futex_lock into every hash list lock.
  */
 struct futex_hash_bucket {
@@ -904,8 +829,6 @@ queue_lock(struct futex_q *q, int fd, st
q->fd = fd;
q->filp = filp;
 
-   init_waitqueue_head(&q->waiters);
-
get_key_refs(&q->key);
hb = hash_futex(&q->key);
q->lock_ptr = &hb->lock;
@@ -938,6 +861,7 @@ static void queue_me(struct futex_q *q, 
 {
struct futex_hash_bucket *hb;
 
+   init_waitqueue_head(&q->waiters);
hb = queue_lock(q, fd, filp);
__queue_me(q, hb);
 }
@@ -1002,24 +926,22 @@ static void unqueue_me_pi(struct futex_q
drop_key_refs(&q->key);
 }
 
-static int futex_wait(u32 __user *uaddr, u32 val, unsigned long time)
+int futex_wait_queue(struct futex_q *q, u32 __user *uaddr, u32 val)
 {
struct task_struct *curr = current;
-   DECLARE_WAITQUEUE(wait, curr);
struct futex_hash_bucket *hb;
-   struct futex_q q;
u32 uval;
int ret;
 
-   q.pi_state = NULL;
+   q->pi_state = NULL;
  retry:
down_read(&curr->mm->mmap_sem);
 
-   ret = get_futex_key(uaddr, &q.key);
+   ret = get_futex_key(uaddr, &q->key);
if (unlikely(ret != 0))
goto out_release_sem;
 
-   hb = queue_lock(&q, -1, NULL);
+   hb = queue_lock(q, -1, NULL);
 
/*
 * Access the page AFTER the futex is queued.
@@ -1044,7 +966,7 @@ static int futex_wait(u32 __user *uaddr,
ret = get_futex_value_locked(&uval, uaddr);
 
if (unlikely(ret)) {
-   queue_unlock(&q, hb);
+   queue_unlock(q, hb);
 
/*
 * If we would have faulted, release mmap_sem, fault it in and
@@ -1063,14 +985,37 @@ static int futex_wait(u32 __user *uaddr,
goto out_unlock_release_sem;
 
/* Only actually queue if *uaddr contained val.  */
-   __queue_me(&q, hb);
+   __queue_me(q, hb);
 
/*
 * Now the futex is queued and we have checked the data, we
-* don't want to hold mmap_sem while we sleep.
+* don't want to hold mmap_sem while we (might) sleep.
 */
up_read(&curr->mm->mmap_sem);
 
+   return 0;
+
+ out_unlock_release_sem:
+   queue_unlock(q, hb);
+
+ out_release_sem:
+   up_rea

[patch 09/22] pollfs: pollable hrtimers

2007-05-01 Thread Davi Arnaut
Per file descriptor high-resolution timers. A classic unix file interface for
the POSIX timer_(create|settime|gettime|delete) family of functions.

Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>

---
 fs/pollfs/Makefile |1 
 fs/pollfs/timer.c  |  198 +
 init/Kconfig   |7 +
 3 files changed, 206 insertions(+)

Index: linux-2.6/fs/pollfs/timer.c
===
--- /dev/null
+++ linux-2.6/fs/pollfs/timer.c
@@ -0,0 +1,198 @@
+/*
+ * pollable timers
+ *
+ * Copyright (C) 2007 Davi E. M. Arnaut
+ *
+ * Licensed under the GNU GPL. See the file COPYING for details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct pfs_timer {
+   wait_queue_head_t wait;
+   ktime_t interval;
+   spinlock_t lock;
+   unsigned long overruns;
+   struct hrtimer timer;
+   struct pfs_file file;
+};
+
+struct hrtimerspec {
+   int flags;
+   clockid_t clock;
+   struct itimerspec expr;
+};
+
+static ssize_t read(struct pfs_timer *evs, struct itimerspec __user *uspec)
+{
+   int ret = -EAGAIN;
+   ktime_t remaining = {};
+   unsigned long overruns = 0;
+   struct itimerspec spec = {};
+   struct hrtimer *timer = &evs->timer;
+
+   spin_lock_irq(&evs->lock);
+
+   if (!evs->overruns)
+   goto out_unlock;
+
+   if (hrtimer_active(timer))
+   remaining = hrtimer_get_remaining(timer);
+   else if (evs->interval.tv64 > 0)
+   overruns = hrtimer_forward(timer, hrtimer_cb_get_time(timer),
+  evs->interval);
+
+   ret = -EOVERFLOW;
+   if (overruns > (ULONG_MAX - evs->overruns))
+   goto out_unlock;
+   else
+   evs->overruns += overruns;
+
+   if (remaining.tv64 > 0)
+   spec.it_value = ktime_to_timespec(remaining);
+
+   spec.it_interval = ktime_to_timespec(evs->interval);
+
+   ret = 0;
+
+out_unlock:
+   spin_unlock_irq(&evs->lock);
+
+   if (ret)
+   return ret;
+
+   if (copy_to_user(uspec, &spec, sizeof(spec)))
+   return -EFAULT;
+
+   return 0;
+}
+
+static enum hrtimer_restart timer_fn(struct hrtimer *timer)
+{
+   struct pfs_timer *evs = container_of(timer, struct pfs_timer, timer);
+   unsigned long flags;
+
+   spin_lock_irqsave(&evs->lock, flags);
+   /* timer tick, interval has elapsed */
+   if (!evs->overruns++)
+   wake_up_all(&evs->wait);
+   spin_unlock_irqrestore(&evs->lock, flags);
+
+   return HRTIMER_NORESTART;
+}
+
+static inline void rearm_timer(struct pfs_timer *evs, struct hrtimerspec *spec)
+{
+   struct hrtimer *timer = &evs->timer;
+   enum hrtimer_mode mode = HRTIMER_MODE_REL;
+
+   if (spec->flags & TIMER_ABSTIME)
+   mode = HRTIMER_MODE_ABS;
+
+   do {
+   spin_lock_irq(&evs->lock);
+   if (hrtimer_try_to_cancel(timer) >= 0)
+   break;
+   spin_unlock_irq(&evs->lock);
+   cpu_relax();
+   } while (1);
+
+   hrtimer_init(timer, spec->clock, mode);
+
+   timer->function = timer_fn;
+   timer->expires = timespec_to_ktime(spec->expr.it_value);
+   evs->interval = timespec_to_ktime(spec->expr.it_interval);
+
+   if (timer->expires.tv64)
+   hrtimer_start(timer, timer->expires, mode);
+
+   spin_unlock_irq(&evs->lock);
+}
+
+static inline int spec_invalid(const struct hrtimerspec *spec)
+{
+   if (spec->clock != CLOCK_REALTIME && spec->clock != CLOCK_MONOTONIC)
+   return 1;
+
+   if (!timespec_valid(&spec->expr.it_value) ||
+   !timespec_valid(&spec->expr.it_interval))
+   return 1;
+
+   return 0;
+}
+
+static ssize_t write(struct pfs_timer *evs,
+const struct hrtimerspec __user *uspec)
+{
+   struct hrtimerspec spec;
+
+   if (copy_from_user(&spec, uspec, sizeof(spec)))
+   return -EFAULT;
+
+   if (spec_invalid(&spec))
+   return -EINVAL;
+
+   rearm_timer(evs, &spec);
+
+   return 0;
+}
+
+static int poll(struct pfs_timer *evs)
+{
+   int ret;
+
+   ret = evs->overruns ? POLLIN : 0;
+
+   return ret;
+}
+
+static int release(struct pfs_timer *evs)
+{
+   hrtimer_cancel(&evs->timer);
+   kfree(evs);
+
+   return 0;
+}
+
+static const struct pfs_operations timer_ops = {
+   .read = PFS_READ(read, struct pfs_timer, struct itimerspec),
+   .write = PFS_WRITE(write, struct pfs_timer, struct hrtimerspec),
+   .poll = PFS_POLL(poll, struct pfs_timer),
+   .release = PFS_RELEASE(release, struct pfs_timer),
+   .rsize = sizeof(struct itimerspec),
+   .wsize = sizeof(struct hrtimerspec),
+};
+
+asmlinkage long sys_pltimer(void)
+{
+   long error;
+   struct pf

[patch 08/22] pollfs: x86_64, wire up the plsignal system call

2007-05-01 Thread Davi Arnaut
Make the plsignal syscall available to user-space on x86_64.

Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>

---
 arch/x86_64/ia32/ia32entry.S |3 ++-
 include/asm-x86_64/unistd.h  |4 +++-
 2 files changed, 5 insertions(+), 2 deletions(-)

Index: linux-2.6/include/asm-x86_64/unistd.h
===
--- linux-2.6.orig/include/asm-x86_64/unistd.h
+++ linux-2.6/include/asm-x86_64/unistd.h
@@ -619,8 +619,10 @@ __SYSCALL(__NR_sync_file_range, sys_sync
 __SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_move_pages279
 __SYSCALL(__NR_move_pages, sys_move_pages)
+#define __NR_plsignal  280
+__SYSCALL(__NR_plsignal, sys_plsignal)
 
-#define __NR_syscall_max __NR_move_pages
+#define __NR_syscall_max __NR_plsignal
 
 #ifndef __NO_STUBS
 #define __ARCH_WANT_OLD_READDIR
Index: linux-2.6/arch/x86_64/ia32/ia32entry.S
===
--- linux-2.6.orig/arch/x86_64/ia32/ia32entry.S
+++ linux-2.6/arch/x86_64/ia32/ia32entry.S
@@ -714,9 +714,10 @@ ia32_sys_call_table:
.quad compat_sys_get_robust_list
.quad sys_splice
.quad sys_sync_file_range
-   .quad sys_tee
+   .quad sys_tee   /* 315 */
.quad compat_sys_vmsplice
.quad compat_sys_move_pages
.quad sys_getcpu
.quad sys_epoll_pwait
+   .quad sys_plsignal  /* 320 */
 ia32_syscall_end:  

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 03/22] pollfs: asynchronously wait for a signal

2007-05-01 Thread Davi Arnaut
Add a wait queue to the task_struct in order to be able to
associate (wait for) a signal with other resources.

Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>

---
 include/linux/init_task.h |1 +
 include/linux/sched.h |1 +
 kernel/fork.c |1 +
 kernel/signal.c   |5 +
 4 files changed, 8 insertions(+)

Index: linux-2.6/include/linux/sched.h
===
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -939,6 +939,7 @@ struct task_struct {
sigset_t blocked, real_blocked;
sigset_t saved_sigmask; /* To be restored with 
TIF_RESTORE_SIGMASK */
struct sigpending pending;
+   wait_queue_head_t sigwait;
 
unsigned long sas_ss_sp;
size_t sas_ss_size;
Index: linux-2.6/include/linux/init_task.h
===
--- linux-2.6.orig/include/linux/init_task.h
+++ linux-2.6/include/linux/init_task.h
@@ -134,6 +134,7 @@ extern struct group_info init_groups;
.list = LIST_HEAD_INIT(tsk.pending.list),   \
.signal = {{0}}},   \
.blocked= {{0}},\
+   .sigwait= __WAIT_QUEUE_HEAD_INITIALIZER(tsk.sigwait),   \
.alloc_lock = __SPIN_LOCK_UNLOCKED(tsk.alloc_lock), \
.journal_info   = NULL, \
.cpu_timers = INIT_CPU_TIMERS(tsk.cpu_timers),  \
Index: linux-2.6/kernel/fork.c
===
--- linux-2.6.orig/kernel/fork.c
+++ linux-2.6/kernel/fork.c
@@ -1034,6 +1034,7 @@ static struct task_struct *copy_process(
 
clear_tsk_thread_flag(p, TIF_SIGPENDING);
init_sigpending(&p->pending);
+   init_waitqueue_head(&p->sigwait);
 
p->utime = cputime_zero;
p->stime = cputime_zero;
Index: linux-2.6/kernel/signal.c
===
--- linux-2.6.orig/kernel/signal.c
+++ linux-2.6/kernel/signal.c
@@ -224,6 +224,8 @@ fastcall void recalc_sigpending_tsk(stru
set_tsk_thread_flag(t, TIF_SIGPENDING);
else
clear_tsk_thread_flag(t, TIF_SIGPENDING);
+
+   wake_up_interruptible_sync(&t->sigwait);
 }
 
 void recalc_sigpending(void)
@@ -759,6 +761,7 @@ static int send_signal(int sig, struct s
  info->si_code >= 0)));
if (q) {
list_add_tail(&q->list, &signals->list);
+   wake_up_interruptible_sync(&t->sigwait);
switch ((unsigned long) info) {
case (unsigned long) SEND_SIG_NOINFO:
q->info.si_signo = sig;
@@ -1404,6 +1407,7 @@ int send_sigqueue(int sig, struct sigque
 
list_add_tail(&q->list, &p->pending.list);
sigaddset(&p->pending.signal, sig);
+   wake_up_interruptible_sync(&p->sigwait);
if (!sigismember(&p->blocked, sig))
signal_wake_up(p, sig == SIGKILL);
 
@@ -1453,6 +1457,7 @@ send_group_sigqueue(int sig, struct sigq
list_add_tail(&q->list, &p->signal->shared_pending.list);
sigaddset(&p->signal->shared_pending.signal, sig);
 
+   wake_up_interruptible_sync(&p->sigwait);
__group_complete_signal(sig, p);
 out:
spin_unlock_irqrestore(&p->sighand->siglock, flags);

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 21/22] pollfs: x86, wire up the plaio system call

2007-05-01 Thread Davi Arnaut
Make the plaio syscall available to user-space on x86.

Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>

---
 arch/i386/kernel/syscall_table.S |1 +
 include/asm-i386/unistd.h|3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6/include/asm-i386/unistd.h
===
--- linux-2.6.orig/include/asm-i386/unistd.h
+++ linux-2.6/include/asm-i386/unistd.h
@@ -328,10 +328,11 @@
 #define __NR_plsignal  320
 #define __NR_pltimer   321
 #define __NR_plfutex   322
+#define __NR_plaio 323
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 323
+#define NR_syscalls 324
 
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR
Index: linux-2.6/arch/i386/kernel/syscall_table.S
===
--- linux-2.6.orig/arch/i386/kernel/syscall_table.S
+++ linux-2.6/arch/i386/kernel/syscall_table.S
@@ -322,3 +322,4 @@ ENTRY(sys_call_table)
.long sys_plsignal  /* 320 */
.long sys_pltimer
.long sys_plfutex
+   .long sys_plaio

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 19/22] pollfs: pollable aio

2007-05-01 Thread Davi Arnaut
Submit, retrieve, or poll aio requests for completion through a
file descriptor. User supplies a aio_context_t that is used to
fetch a reference to the kioctx. Once the file descriptor is
closed, the reference is decremented.

Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>

---
 fs/pollfs/Makefile |1 
 fs/pollfs/aio.c|  103 +
 init/Kconfig   |7 +++
 3 files changed, 111 insertions(+)

Index: linux-2.6/fs/pollfs/Makefile
===
--- linux-2.6.orig/fs/pollfs/Makefile
+++ linux-2.6/fs/pollfs/Makefile
@@ -4,3 +4,4 @@ pollfs-y := file.o
 pollfs-$(CONFIG_POLLFS_SIGNAL) += signal.o
 pollfs-$(CONFIG_POLLFS_TIMER) += timer.o
 pollfs-$(CONFIG_POLLFS_FUTEX) += futex.o
+pollfs-$(CONFIG_POLLFS_AIO) += aio.o
Index: linux-2.6/fs/pollfs/aio.c
===
--- /dev/null
+++ linux-2.6/fs/pollfs/aio.c
@@ -0,0 +1,103 @@
+/*
+ * pollable aio
+ *
+ * Copyright (C) 2007 Davi E. M. Arnaut
+ *
+ * Licensed under the GNU GPL. See the file COPYING for details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct pfs_aio {
+   struct kioctx *ioctx;
+   struct pfs_file file;
+};
+
+static ssize_t read(struct pfs_aio *evs, struct io_event __user *uioevt)
+{
+   int ret;
+
+   ret = sys_io_getevents(evs->ioctx->user_id, 0, 1, uioevt, NULL);
+
+   if (!ret)
+   ret = -EAGAIN;
+   else if (ret > 0)
+   ret = 0;
+
+   return ret;
+}
+
+static ssize_t write(struct pfs_aio *evs, const struct iocb __user *uiocb)
+{
+   struct iocb iocb;
+
+   if (copy_from_user(&iocb, uiocb, sizeof(iocb)))
+   return -EFAULT;
+
+   return io_submit_one(evs->ioctx, uiocb, &iocb);
+}
+
+static int poll(struct pfs_aio *evs)
+{
+   int ret;
+
+   ret = aio_ring_empty(evs->ioctx) ? 0 : POLLIN;
+
+   return ret;
+}
+
+static int release(struct pfs_aio *evs)
+{
+   put_ioctx(evs->ioctx);
+
+   kfree(evs);
+
+   return 0;
+}
+
+static const struct pfs_operations aio_ops = {
+   .read = PFS_READ(read, struct pfs_aio, struct io_event),
+   .write = PFS_WRITE(write, struct pfs_aio, struct iocb),
+   .poll = PFS_POLL(poll, struct pfs_aio),
+   .release = PFS_RELEASE(release, struct pfs_aio),
+   .rsize = sizeof(struct io_event),
+   .wsize = sizeof(struct iocb),
+};
+
+asmlinkage long sys_plaio(aio_context_t ctx)
+{
+   long error;
+   struct pfs_aio *evs;
+   struct kioctx *ioctx = lookup_ioctx(ctx);
+
+   if (!ioctx)
+   return -EINVAL;
+
+   evs = kzalloc(sizeof(*evs), GFP_KERNEL);
+   if (!evs) {
+   put_ioctx(ioctx);
+   return -ENOMEM;
+   }
+
+   evs->ioctx = ioctx;
+
+   evs->file.data = evs;
+   evs->file.fops = &aio_ops;
+   evs->file.wait = &ioctx->wait;
+
+   error = pfs_open(&evs->file);
+
+   if (error < 0)
+   release(evs);
+
+   return error;
+}
Index: linux-2.6/init/Kconfig
===
--- linux-2.6.orig/init/Kconfig
+++ linux-2.6/init/Kconfig
@@ -490,6 +490,13 @@ config POLLFS_FUTEX
help
 Pollable futex support
 
+config POLLFS_AIO
+   bool "Enable pollfs aio" if EMBEDDED
+   default y
+   depends on POLLFS
+   help
+Pollable aio support
+
 config SHMEM
bool "Use full shmem filesystem" if EMBEDDED
default y

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 10/22] pollfs: export the pltimer system call

2007-05-01 Thread Davi Arnaut
Export the new pltimer syscall prototype. While there, make it "conditional".

Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>

---
 include/linux/syscalls.h |2 ++
 kernel/sys_ni.c  |1 +
 2 files changed, 3 insertions(+)

Index: linux-2.6/include/linux/syscalls.h
===
--- linux-2.6.orig/include/linux/syscalls.h
+++ linux-2.6/include/linux/syscalls.h
@@ -607,4 +607,6 @@ int kernel_execve(const char *filename, 
 
 asmlinkage long sys_plsignal(const sigset_t __user * set);
 
+asmlinkage long sys_pltimer(void);
+
 #endif
Index: linux-2.6/kernel/sys_ni.c
===
--- linux-2.6.orig/kernel/sys_ni.c
+++ linux-2.6/kernel/sys_ni.c
@@ -113,6 +113,7 @@ cond_syscall(sys_vm86);
 cond_syscall(compat_sys_ipc);
 cond_syscall(compat_sys_sysctl);
 cond_syscall(sys_plsignal);
+cond_syscall(sys_pltimer);
 
 /* arch-specific weak syscall entries */
 cond_syscall(sys_pciconfig_read);

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SOME STUFF ABOUT REISER4 To Mr Hopper

2007-05-01 Thread David Miller
From: [EMAIL PROTECTED]
Date: Tue, 01 May 2007 21:55:59 -0700

> Hi Jeff, it seems that lkml has contacted both of my email accounts and
> cripped them.

Actually we aren't blocking your email address, rather we are blocking
emails with lots of caps in them because that is what small children
use when they first start using a computer.

So if you stop using caps lock so much, you postings might start going
through.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 07/22] pollfs: x86, wire up the plsignal system call

2007-05-01 Thread Davi Arnaut
Make the plsignal syscall available to user-space on x86.

Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>

---
 arch/i386/kernel/syscall_table.S |1 +
 include/asm-i386/unistd.h|3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6/include/asm-i386/unistd.h
===
--- linux-2.6.orig/include/asm-i386/unistd.h
+++ linux-2.6/include/asm-i386/unistd.h
@@ -325,10 +325,11 @@
 #define __NR_move_pages317
 #define __NR_getcpu318
 #define __NR_epoll_pwait   319
+#define __NR_plsignal  320
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 320
+#define NR_syscalls 321
 
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR
Index: linux-2.6/arch/i386/kernel/syscall_table.S
===
--- linux-2.6.orig/arch/i386/kernel/syscall_table.S
+++ linux-2.6/arch/i386/kernel/syscall_table.S
@@ -319,3 +319,4 @@ ENTRY(sys_call_table)
.long sys_move_pages
.long sys_getcpu
.long sys_epoll_pwait
+   .long sys_plsignal  /* 320 */

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 16/22] pollfs: x86, wire up the plfutex system call

2007-05-01 Thread Davi Arnaut
Make the plfutex syscall available to user-space on x86.

Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>

---
 arch/i386/kernel/syscall_table.S |1 +
 include/asm-i386/unistd.h|3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6/include/asm-i386/unistd.h
===
--- linux-2.6.orig/include/asm-i386/unistd.h
+++ linux-2.6/include/asm-i386/unistd.h
@@ -327,10 +327,11 @@
 #define __NR_epoll_pwait   319
 #define __NR_plsignal  320
 #define __NR_pltimer   321
+#define __NR_plfutex   322
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 322
+#define NR_syscalls 323
 
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR
Index: linux-2.6/arch/i386/kernel/syscall_table.S
===
--- linux-2.6.orig/arch/i386/kernel/syscall_table.S
+++ linux-2.6/arch/i386/kernel/syscall_table.S
@@ -321,3 +321,4 @@ ENTRY(sys_call_table)
.long sys_epoll_pwait
.long sys_plsignal  /* 320 */
.long sys_pltimer
+   .long sys_plfutex

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 18/22] pollfs: check if a AIO event ring is empty

2007-05-01 Thread Davi Arnaut
The aio_ring_empty() function returns true if the AIO event ring has no
elements, false otherwise.

Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>

---
 fs/aio.c|   17 +
 include/linux/aio.h |1 +
 2 files changed, 18 insertions(+)

Index: linux-2.6/fs/aio.c
===
--- linux-2.6.orig/fs/aio.c
+++ linux-2.6/fs/aio.c
@@ -1004,6 +1004,23 @@ put_rq:
return ret;
 }
 
+int fastcall aio_ring_empty(struct kioctx *ioctx)
+{
+   struct aio_ring_info *info = &ioctx->ring_info;
+   struct aio_ring *ring;
+   unsigned long flags;
+   int ret = 0;
+
+   spin_lock_irqsave(&ioctx->ctx_lock, flags);
+   ring = kmap_atomic(info->ring_pages[0], KM_IRQ1);
+   if (ring->head == ring->tail)
+   ret = 1;
+   kunmap_atomic(ring, KM_IRQ1);
+   spin_unlock_irqrestore(&ioctx->ctx_lock, flags);
+
+   return ret;
+}
+
 /* aio_read_evt
  * Pull an event off of the ioctx's event ring.  Returns the number of 
  * events fetched (0 or 1 ;-)
Index: linux-2.6/include/linux/aio.h
===
--- linux-2.6.orig/include/linux/aio.h
+++ linux-2.6/include/linux/aio.h
@@ -202,6 +202,7 @@ extern unsigned aio_max_size;
 
 extern ssize_t FASTCALL(wait_on_sync_kiocb(struct kiocb *iocb));
 extern int FASTCALL(aio_put_req(struct kiocb *iocb));
+extern int FASTCALL(aio_ring_empty(struct kioctx *ioctx));
 extern void FASTCALL(kick_iocb(struct kiocb *iocb));
 extern int FASTCALL(aio_complete(struct kiocb *iocb, long res, long res2));
 extern void FASTCALL(__put_ioctx(struct kioctx *ctx));

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 11/22] pollfs: x86, wire up the pltimer system call

2007-05-01 Thread Davi Arnaut
Make the pltimer syscall available to user-space on x86.

Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>

---
 arch/i386/kernel/syscall_table.S |1 +
 include/asm-i386/unistd.h|3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6/include/asm-i386/unistd.h
===
--- linux-2.6.orig/include/asm-i386/unistd.h
+++ linux-2.6/include/asm-i386/unistd.h
@@ -326,10 +326,11 @@
 #define __NR_getcpu318
 #define __NR_epoll_pwait   319
 #define __NR_plsignal  320
+#define __NR_pltimer   321
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 321
+#define NR_syscalls 322
 
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR
Index: linux-2.6/arch/i386/kernel/syscall_table.S
===
--- linux-2.6.orig/arch/i386/kernel/syscall_table.S
+++ linux-2.6/arch/i386/kernel/syscall_table.S
@@ -320,3 +320,4 @@ ENTRY(sys_call_table)
.long sys_getcpu
.long sys_epoll_pwait
.long sys_plsignal  /* 320 */
+   .long sys_pltimer

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 14/22] pollfs: pollable futex

2007-05-01 Thread Davi Arnaut
Asynchronously wait for FUTEX_WAKE operation on a futex if it still contains
a given value. There can be only one futex wait per file descriptor. However,
it can be rearmed (possibly at a different address) anytime.

The pollable futex approach is far superior (send and receive events from
userspace or kernel) to eventfd and fixes (supercedes) FUTEX_FD at the same 
time.

Building block for pollable semaphores and user-defined events.

Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>

---
 fs/pollfs/Makefile |1 
 fs/pollfs/futex.c  |  154 +
 init/Kconfig   |7 ++
 3 files changed, 162 insertions(+)

Index: linux-2.6/fs/pollfs/Makefile
===
--- linux-2.6.orig/fs/pollfs/Makefile
+++ linux-2.6/fs/pollfs/Makefile
@@ -3,3 +3,4 @@ pollfs-y := file.o
 
 pollfs-$(CONFIG_POLLFS_SIGNAL) += signal.o
 pollfs-$(CONFIG_POLLFS_TIMER) += timer.o
+pollfs-$(CONFIG_POLLFS_FUTEX) += futex.o
Index: linux-2.6/fs/pollfs/futex.c
===
--- /dev/null
+++ linux-2.6/fs/pollfs/futex.c
@@ -0,0 +1,154 @@
+/*
+ * pollable futex
+ *
+ * Copyright (C) 2007 Davi E. M. Arnaut
+ *
+ * Licensed under the GNU GPL. See the file COPYING for details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct futex_event {
+   union {
+   void __user *addr;
+   u64 padding;
+   };
+   int val;
+};
+
+struct pfs_futex {
+   struct futex_q q;
+   struct futex_event fevt;
+   struct mutex mutex;
+   unsigned volatile queued;
+   struct pfs_file file;
+};
+
+static ssize_t read(struct pfs_futex *evs, struct futex_event __user *ufevt)
+{
+   int ret;
+   struct futex_event fevt;
+
+   mutex_lock(&evs->mutex);
+
+   fevt = evs->fevt;
+
+   ret = -EAGAIN;
+
+   if (!evs->queued)
+   ret = -EINVAL;
+   else if (list_empty(&evs->q.list))
+   ret = futex_wait_unqueue(&evs->q);
+
+   switch (ret) {
+   case 1:
+   ret = -EAGAIN;
+   case 0:
+   evs->queued = 0;
+   }
+
+   mutex_unlock(&evs->mutex);
+
+   if (ret < 0)
+   return ret;
+
+   if (copy_to_user(ufevt, &fevt, sizeof(fevt)))
+   return -EFAULT;
+
+   return 0;
+}
+
+static ssize_t write(struct pfs_futex *evs,
+const struct futex_event __user *ufevt)
+{
+   int ret;
+   struct futex_event fevt;
+
+   if (copy_from_user(&fevt, ufevt, sizeof(fevt)))
+   return -EFAULT;
+
+   mutex_lock(&evs->mutex);
+
+   if (evs->queued)
+   futex_wait_unqueue(&evs->q);
+
+   ret = futex_wait_queue(&evs->q, fevt.addr, fevt.val);
+
+   if (ret)
+   evs->queued = 0;
+   else {
+   evs->queued = 1;
+   evs->fevt = fevt;
+   }
+
+   mutex_unlock(&evs->mutex);
+
+   return ret;
+}
+
+static int poll(struct pfs_futex *evs)
+{
+   int ret;
+
+   while (!mutex_trylock(&evs->mutex))
+   cpu_relax();
+
+   ret = evs->queued && list_empty(&evs->q.list) ? POLLIN : 0;
+
+   mutex_unlock(&evs->mutex);
+
+   return ret;
+}
+
+static int release(struct pfs_futex *evs)
+{
+   if (evs->queued)
+   futex_wait_unqueue(&evs->q);
+
+   mutex_destroy(&evs->mutex);
+
+   kfree(evs);
+
+   return 0;
+}
+
+static const struct pfs_operations futex_ops = {
+   .read = PFS_READ(read, struct pfs_futex, struct futex_event),
+   .write = PFS_WRITE(write, struct pfs_futex, struct futex_event),
+   .poll = PFS_POLL(poll, struct pfs_futex),
+   .release = PFS_RELEASE(release, struct pfs_futex),
+   .rsize = sizeof(struct futex_event),
+   .wsize = sizeof(struct futex_event),
+};
+
+asmlinkage long sys_plfutex(void)
+{
+   long error;
+   struct pfs_futex *evs;
+
+   evs = kzalloc(sizeof(*evs), GFP_KERNEL);
+   if (!evs)
+   return -ENOMEM;
+
+   mutex_init(&evs->mutex);
+   init_waitqueue_head(&evs->q.waiters);
+
+   evs->file.data = evs;
+   evs->file.fops = &futex_ops;
+   evs->file.wait = &evs->q.waiters;
+
+   error = pfs_open(&evs->file);
+
+   if (error < 0)
+   release(evs);
+
+   return error;
+}
Index: linux-2.6/init/Kconfig
===
--- linux-2.6.orig/init/Kconfig
+++ linux-2.6/init/Kconfig
@@ -483,6 +483,13 @@ config POLLFS_TIMER
help
 Pollable timer support
 
+config POLLFS_FUTEX
+   bool "Enable pollfs futex" if EMBEDDED
+   default y
+   depends on POLLFS && FUTEX
+   help
+Pollable futex support
+
 config SHMEM
bool "Use full shmem filesystem" if EMBEDDED
default y

--
-
To unsubscribe from this list: send the line "unsub

[patch 15/22] pollfs: export the plfutex system call

2007-05-01 Thread Davi Arnaut
Export the new plfutex syscall prototype. While there, make it "conditional".

Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>

---
 include/linux/syscalls.h |2 ++
 kernel/sys_ni.c  |1 +
 2 files changed, 3 insertions(+)

Index: linux-2.6/include/linux/syscalls.h
===
--- linux-2.6.orig/include/linux/syscalls.h
+++ linux-2.6/include/linux/syscalls.h
@@ -609,4 +609,6 @@ asmlinkage long sys_plsignal(const sigse
 
 asmlinkage long sys_pltimer(void);
 
+asmlinkage long sys_plfutex(void);
+
 #endif
Index: linux-2.6/kernel/sys_ni.c
===
--- linux-2.6.orig/kernel/sys_ni.c
+++ linux-2.6/kernel/sys_ni.c
@@ -114,6 +114,7 @@ cond_syscall(compat_sys_ipc);
 cond_syscall(compat_sys_sysctl);
 cond_syscall(sys_plsignal);
 cond_syscall(sys_pltimer);
+cond_syscall(sys_plfutex);
 
 /* arch-specific weak syscall entries */
 cond_syscall(sys_pciconfig_read);

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 22/22] pollfs: x86_64, wire up the plaio system call

2007-05-01 Thread Davi Arnaut
Make the plaio syscall available to user-space on x86_64.

Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>

---
 arch/x86_64/ia32/ia32entry.S |1 +
 include/asm-x86_64/unistd.h  |4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

Index: linux-2.6/arch/x86_64/ia32/ia32entry.S
===
--- linux-2.6.orig/arch/x86_64/ia32/ia32entry.S
+++ linux-2.6/arch/x86_64/ia32/ia32entry.S
@@ -722,4 +722,5 @@ ia32_sys_call_table:
.quad sys_plsignal  /* 320 */
.quad sys_pltimer
.quad sys_plfutex
+   .quad sys_plaio
 ia32_syscall_end:  
Index: linux-2.6/include/asm-x86_64/unistd.h
===
--- linux-2.6.orig/include/asm-x86_64/unistd.h
+++ linux-2.6/include/asm-x86_64/unistd.h
@@ -625,8 +625,10 @@ __SYSCALL(__NR_plsignal, sys_plsignal)
 __SYSCALL(__NR_pltimer, sys_pltimer)
 #define __NR_plfutex   282
 __SYSCALL(__NR_plfutex, sys_plfutex)
+#define __NR_plaio 283
+__SYSCALL(__NR_plaio, sys_plaio)
 
-#define __NR_syscall_max __NR_plfutex
+#define __NR_syscall_max __NR_plaio
 
 #ifndef __NO_STUBS
 #define __ARCH_WANT_OLD_READDIR

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 20/22] pollfs: export the plaio system call

2007-05-01 Thread Davi Arnaut
Export the new plaio syscall prototype. While there, make it "conditional".

Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>

---
 include/linux/syscalls.h |2 ++
 kernel/sys_ni.c  |1 +
 2 files changed, 3 insertions(+)

Index: linux-2.6/include/linux/syscalls.h
===
--- linux-2.6.orig/include/linux/syscalls.h
+++ linux-2.6/include/linux/syscalls.h
@@ -611,4 +611,6 @@ asmlinkage long sys_pltimer(void);
 
 asmlinkage long sys_plfutex(void);
 
+asmlinkage long sys_plaio(aio_context_t ctx);
+
 #endif
Index: linux-2.6/kernel/sys_ni.c
===
--- linux-2.6.orig/kernel/sys_ni.c
+++ linux-2.6/kernel/sys_ni.c
@@ -115,6 +115,7 @@ cond_syscall(compat_sys_sysctl);
 cond_syscall(sys_plsignal);
 cond_syscall(sys_pltimer);
 cond_syscall(sys_plfutex);
+cond_syscall(sys_plaio);
 
 /* arch-specific weak syscall entries */
 cond_syscall(sys_pciconfig_read);

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 12/22] pollfs: x86_64, wire up the pltimer system call

2007-05-01 Thread Davi Arnaut
Make the pltimer syscall available to user-space on x86_64.

Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>

---
 arch/x86_64/ia32/ia32entry.S |1 +
 include/asm-x86_64/unistd.h  |4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

Index: linux-2.6/arch/x86_64/ia32/ia32entry.S
===
--- linux-2.6.orig/arch/x86_64/ia32/ia32entry.S
+++ linux-2.6/arch/x86_64/ia32/ia32entry.S
@@ -720,4 +720,5 @@ ia32_sys_call_table:
.quad sys_getcpu
.quad sys_epoll_pwait
.quad sys_plsignal  /* 320 */
+   .quad sys_pltimer
 ia32_syscall_end:  
Index: linux-2.6/include/asm-x86_64/unistd.h
===
--- linux-2.6.orig/include/asm-x86_64/unistd.h
+++ linux-2.6/include/asm-x86_64/unistd.h
@@ -621,8 +621,10 @@ __SYSCALL(__NR_vmsplice, sys_vmsplice)
 __SYSCALL(__NR_move_pages, sys_move_pages)
 #define __NR_plsignal  280
 __SYSCALL(__NR_plsignal, sys_plsignal)
+#define __NR_pltimer   281
+__SYSCALL(__NR_pltimer, sys_pltimer)
 
-#define __NR_syscall_max __NR_plsignal
+#define __NR_syscall_max __NR_pltimer
 
 #ifndef __NO_STUBS
 #define __ARCH_WANT_OLD_READDIR

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 01/22] pollfs: kernel-side API header

2007-05-01 Thread Davi Arnaut
Add pollfs_fs.h header which contains the kernel-side declarations
and auxiliary macros for type safety checks. Those macros can be
simplified later.

Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>

---
 include/linux/pollfs_fs.h |   57 ++
 1 file changed, 57 insertions(+)

Index: linux-2.6/include/linux/pollfs_fs.h
===
--- /dev/null
+++ linux-2.6/include/linux/pollfs_fs.h
@@ -0,0 +1,57 @@
+/*
+ * pollfs, a naive filesystem for pollable (waitable) files (objects)
+ *
+ * Copyright (C) 2007 Davi E. M. Arnaut
+ *
+ */
+
+#ifndef _LINUX_POLL_FS_H
+#define _LINUX_POLL_FS_H
+
+#ifdef __KERNEL__
+
+#include 
+#include 
+#include 
+
+#define PFS_CHECK_CALLBACK_1(f, a) (void*) \
+   (sizeof((f)((typeof(a *))0)))
+
+#define PFS_CHECK_CALLBACK_2(f, a, b) (void*)  \
+   (sizeof((f)((typeof(a *))0, (typeof(b*))0)))
+
+#define PFS_WRITE(func, type, utype)   \
+   (ssize_t (*)(void *, const void __user *))  \
+   (0 ? PFS_CHECK_CALLBACK_2(func, type, utype) : func)
+
+#define PFS_READ(func, type, utype)\
+   (ssize_t (*)(void *, void __user *))\
+   (0 ? PFS_CHECK_CALLBACK_2(func, type, utype) : func)
+
+#define PFS_POLL(func, type)   \
+   (int (*)(void *))(0 ? PFS_CHECK_CALLBACK_1(func, type) : func)
+
+#define PFS_RELEASE(func, type)
\
+   (int (*)(void *))(0 ? PFS_CHECK_CALLBACK_1(func, type) : func)
+
+struct pfs_operations {
+   ssize_t (*read)(void *, void __user *);
+   ssize_t (*write)(void *, const void __user *);
+   int (*mmap)(void *, struct vm_area_struct *);
+   int (*poll)(void *);
+   int (*release)(void *);
+   size_t rsize;
+   size_t wsize;
+};
+
+struct pfs_file {
+   void *data;
+   wait_queue_head_t *wait;
+   const struct pfs_operations *fops;
+};
+
+long pfs_open(struct pfs_file *pfs);
+
+#endif /* __KERNEL __ */
+
+#endif /* _LINUX_POLLFS_FS_H */

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 00/22] pollfs: filesystem abstraction for pollable objects

2007-05-01 Thread Davi Arnaut
This patch set introduces a new file system for the delivery of pollable
events through file descriptors. To the detriment of debugability, pollable
objects are a nice adjunct to nonblocking/epoll/event-based servers.

The pollfs filesystem abstraction provides better mechanisms needed for
creating and maintaining pollable objects. Also the pollable futex approach
is far superior (send and receive events from userspace or kernel) to eventfd
and fixes (supercedes) FUTEX_FD at the same time.

The (non) blocking and object size (user <-> kernel) semantics and are handled
internally, decoupling the core filesystem from the "subsystems" (mere push and
pop operations).

Currently implemented waitable "objects" are: signals, futexes, ai/o blocks and
timers.

More details at each patch.

http://haxent.com/~davi/pollfs/

Comments are welcome.

--
Davi Arnaut
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v8

2007-05-01 Thread William Lee Irwin III
On Tue, May 01, 2007 at 10:57:14PM -0400, Ting Yang wrote:
>  Authors of this paper proposed a scheduler: Earlist Eligible Virtual 
> Deadline First (EEVDF). EEVDF uses exactly the same method as CFS to 
> track the execution of each running task. The only difference between 
> EEVDF and CFS is that EEVDF tries to _deadline_ fair while CFS is 
> _start-time_ fair. Scheduling based on deadline gives better reponse 
> time bound and seems to more fair.
>  In the following part of this email, I will try to explain the 
> similarities and differences between EEVDF and CFS. Hopefully, this 
> might provide you with some useful information w.r.t your current work 
> on CFS.

Any chance you could write a patch to convert CFS to EEVDF? People may
have an easier time understanding code than theoretical explanations.
(I guess I could do it if sufficiently pressed.)


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

2007-05-01 Thread Jesse Barnes
On Tuesday, May 01, 2007, Jesse Barnes wrote:
> > I'm testing it now on my 965...
>
> Bah... nevermind Robert, I see you're doing this already in
> pci_mmcfg_reject_broken.  I'm about to reboot & test now.

Ok, I've tested a bit on my 965 (after re-adding my old patch to support 
it) and the new checks are more complete, but my BIOS still appears to be 
buggy.

The extended config space (as defined by the register) is at 0xf000 
(full value is 0xf003 indicating 128M enabled).  The ACPI MCFG table 
has this space reserved according to Robert's new code, but the machine 
hangs due to the address space aliasing Olivier mentioned awhile back.  I 
don't have a PCIe card to test with (or any devices that require extended 
config space that I know of) so I can't really tell if Windows supports 
PCIe on this platform, but if it does I don't see how it would w/o having 
a full bridge driver and sophisticated address space allocation builtin.

I'm going to try updating my BIOS, but if that doesn't solve this problem, 
I'm not sure what we can do about it.  Should pci_mmcfg_insert_resources 
check for conflicts?  Should we just blacklist certain boards?  I can try 
pinging our BIOS folks about this board to see what was intended, but I'm 
sure this won't be the only board we have problems with, so we'll need to 
address it generically somehow.

Thanks,
Jesse
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH]: linux-2.6.21-uc0 (MMU-less updates)

2007-05-01 Thread Greg Ungerer

Hi All,

An update of the uClinux (MMU-less) code against 2.6.21.
A lot of cleanups, and a few bug fixes.

Ahead is more changes to finalize platform device support
for the new style ColdFire serial driver, and switching to
the generic irq code.

http://www.uclinux.org/pub/uClinux/uClinux-2.6.x/linux-2.6.21-uc0.patch.gz


Change log:

. Arctururs UC5272 and UC5282 board supportDavid Wu
. use THREAD_SIZE for stack manipulation   Philippe De Muyter
. remove dead code from setup.cGreg Ungerer
. remove dead cache code from mm   Greg Ungerer
. remove useless is_in_rom()   Greg Ungerer
. consolidate fixed bootparam code Greg Ungerer
. no need to preserve THREAD_SR in resume  Philippe De Muyter
. implement irq_regs in interrupt service  Greg Ungerer
. remove machine specific irq code Greg Ungerer
. fix timer step count for ColdFirePhilippe De Muyter
. add chip select mappings for cobra5329   Thomas Brinker
. remove old machine specific clock definesGreg Ungerer
. improve readability of fec driver code   Philippe De Muyter
. do not read ICR before writing in fec driver Philippe De Muyter
. fix INIT_WORK usage in fec driverGreg Ungerer
. remove legacy PM code in 68328 serial driver Greg Ungerer
. fix errno reporting in binfmt_flat loaderPhilippe De Muyter
. create hw_irq.h for m68knommuGreg Ungerer
. fix CLOCK_TICK_RATE for m68knommuPhilippe De Muyter
. add expand_stack() funtcion to nommu Greg Ungerer
. move to platform device setup for 520x   Greg Ungerer
. move to platform device setup for 5249   Greg Ungerer
. new style serial driver for ColdFire UARTGreg Ungerer
. add QSPI defines for 528x ColdFire parts David Wu
. improve SoC device defines for 523x ColdFire Thomas Brinker


Regards
Greg



Greg Ungerer  --  Chief Software Dude   EMAIL: [EMAIL PROTECTED]
SnapGear -- a division of Secure Computing  PHONE:   +61 7 3435 2888
825 Stanley St, FAX: +61 7 3891 3630
Woolloongabba, QLD, 4102, Australia WEB: http://www.SnapGear.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 18/36] Use menuconfig objects II - MMC

2007-05-01 Thread Pierre Ossman
Jan Engelhardt wrote:
> 
> If it works, no problem. just put your sign-off somewhere
> and let Andrew (or the appropriate subsys maintainer) have it :)
> 

Well, the appropriate subsys maintainer would be me. :)

Has it been decided that this is the way to go? I have no strong feelings either
way.

Rgds
-- 
 -- Pierre Ossman

  Linux kernel, MMC maintainerhttp://www.kernel.org
  PulseAudio, core developer  http://pulseaudio.org
  rdesktop, core developer  http://www.rdesktop.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


keyboard and mouse regresions as of dc87c398 plus cfs-v7

2007-05-01 Thread James Cloos
I just tried out git as of commit dc87c398 plus Ingo's cfs v6 patch.

I'll try out w/o the patch to confirm later this morning.

The kernel log shows no indication of a problem, but the (ps/2) mice do
not work, and the keyboard repeat is *very* slow, and will not speed up
if changed with xset(1x).

>From dmesg(1):

[   25.448092] PNP: PS/2 Controller [PNP0303:KBC,PNP0f13:PS2M] at 0x60,0x64 irq 
1,12
[   25.460470] serio: i8042 KBD port at 0x60,0x64 irq 1
[   25.466816] serio: i8042 AUX port at 0x60,0x64 irq 12
[   25.473262] mice: PS/2 mouse device common for all mice
[   25.484605] input: AT Translated Set 2 keyboard as /class/input/input3
[   25.497339] input: PC Speaker as /class/input/input4

but no /class/input entry gets generated for the mice.

/proc/interupts shows 5 ints for irq 12, and that does not increment no
matter what is done to the trackpoint, syn pad or external ps/2 mouse.

My last known good version was 0f851021c0f91e5073fa89f26b5ac68e23df8e11
plus the rt patch.

To get dc87c398 plus cfs-v7 I cloned, checked out v2.6.21, applied the
cfs-v7 patch and then pulled in master.

-JimC
-- 
James Cloos <[EMAIL PROTECTED]> OpenPGP: 1024D/ED7DAEA6
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 01/10] compiler: define __attribute_unused__

2007-05-01 Thread Alexey Dobriyan
On Tue, May 01, 2007 at 09:28:18PM -0700, David Rientjes wrote:
> +#define __attribute_unused__ __attribute__((unused))

Suggest __unused which is shorter and looks compiler-neutral.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v8

2007-05-01 Thread Willy Tarreau
Hi Ting,

On Tue, May 01, 2007 at 10:57:14PM -0400, Ting Yang wrote:
> 
> Hi, Ingo
> 
>  My name is Ting Yang, a graduate student from UMASS. I am currently 
> studying the linux scheduler and virtual memory manager to solve some 
> page swapping problems. I am very excited with the new scheduler CFS. 
> After I read through your code, I think that you might be interested in 
> reading this paper:
> 
>  "A Proportional Share REsource Allocation Algorithm for Real-Time, 
> Time-Shared Systems", by Ion Stoica. You can find the paper here: 
> http://citeseer.ist.psu.edu/37752.html
> 
>  Authors of this paper proposed a scheduler: Earlist Eligible Virtual 
> Deadline First (EEVDF). EEVDF uses exactly the same method as CFS to 
> track the execution of each running task. The only difference between 
> EEVDF and CFS is that EEVDF tries to _deadline_ fair while CFS is 
> _start-time_ fair. Scheduling based on deadline gives better reponse 
> time bound and seems to more fair.
> 
>  In the following part of this email, I will try to explain the 
> similarities and differences between EEVDF and CFS. Hopefully, this 
> might provide you with some useful information w.r.t your current work 
> on CFS.

(...)
Thanks very much for this very clear explanation. Now I realize that
some of the principles I've had in mind for a long time already exist
and are documented ! That's what I called sorting by job completion
time in the past, which might not have been clear for everyone. Now
you have put words on all those concepts, it's more clear ;-)

> The decouple of weight w_i and timeslice l_i is important. Generally 
> speaking, weight determines throughput and timeslice determines the 
> responsiveness of a task.

I 100% agree. That's the problem we have with nice today. Some people
want to use nice to assign more CPU to tasks (as has always been for
years) and others want to use nice to get better interactivity (meaning
nice as when you're in a queue and leaving the old woman go before you).

IMHO, the two concepts are opposed. Either you're a CPU hog OR you get
quick responsiveness.

> In normal situation, high priority tasks 
> usually need more cpu capacity within short period of time (bursty, such 
> as keyboard, mouse move, X updates, daemons, etc), and need to be 
> processed as quick as possible (responsiveness and interactiveness). 
> Follow the analysis above, we know that for higher priority tasks we 
> should give _higher weight_ to ensure its CPU throughput, and at the 
> same time give _smaller timeslice_ to ensure better responsiveness.  
> This is a bit counter-intuitive against the current linux 
> implementation: smaller nice value leads to higher weight and larger 
> timeslice.

We have an additional problem in Linux, and not the least : it already
exists and is deployed everywhere, so we cannot break existing setups.
More specifically, we don't want to play with nice values of processes
such as X.

That's why I think that monitoring the amount of the time-slice (l_i)
consumed by the task is important. I proposed to conserve the unused
part of l_i as a credit (and conversely the credit can be negative if
the time-slice has been over-used). This credit would serve two purposes :

  - reassign the unused part of l_i on next time-slices to get the
most fair share of CPU between tasks

  - use it as an interactivity key to sort the tasks. Basically, if
we note u_i the unused CPU cycles, you can sort based on
(d_i - u_i) instead of just d_i, and the less hungry tasks will
reach the CPU faster than others.

(...)

>  Based on my understanding, adopting something like EEVDF in CFS should 
> not be very difficult given their similarities, although I do not have 
> any idea on how this impacts the load balancing for SMP. Does this worth 
> a try?

I think that if you have time to spend on this, everyone would like to
see the difference. All the works on the scheduler are more or less
experimental and several people are exchanging ideas right now, so it
should be the right moment. You seem to understand very well both
approaches and it's likely that it will not take you too much time :-)


> Sorry for such a long email :-)

It was worth it, thanks !

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: per-thread rusage

2007-05-01 Thread Balbir Singh

Alan Cox wrote:

I just so happen to think we should implement a variety of CPU resource
limits beyond what we now do, so this, too, interests me.


Agreed - and make them all 64bit while doing the cleanup. One thing
several Unixen have we don't for 32bi boxes is a proper set of 64bit
resource handling for memory/file etc.

We could also start using the CPU facilities to enforce some of
the really interesting real time process ones (like main memory
bandwidth) that at the moment we have no control over and can lead to
very unfair behaviour.

Alan


Hi, Alan,

Thanks for bringing this up. There are a couple of patches posted to
lkml for RSS control (unmapped page cache controller under development).

http://lwn.net/Articles/223829/

and the new enhanced verison by Pavel at

http://www.opensubscriber.com/message/linux-kernel@vger.kernel.org/6456480.html

We would appreciate any feedback to help us move the work forward and
make the code ready for acceptance

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Crypto Update for 2.6.22

2007-05-01 Thread Herbert Xu
Hi:

Here is the crypto update for 2.6.22:

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git

or

master.kernel.org:/pub/scm/linux/kernel/git/herbert/crypto-2.6.git

Summary:

* Added API for asynchronous block ciphers.
* Small clean-up's.


Herbert Xu (9):
  [CRYPTO] api: Proc functions should be marked as unused
  [CRYPTO] api: Add async block cipher interface
  [CRYPTO] tcrypt: Use async blkcipher interface
  [CRYPTO] templates: Pass type/mask when creating instances
  [CRYPTO] api: Add async blkcipher type
  [CRYPTO] cryptomgr: Fix parsing of nested templates
  [CRYPTO] api: Do not remove users unless new algorithm matches
  [CRYPTO] cryptd: Add software async crypto daemon
  [CRYPTO] api: Add ablkcipher_request_set_tfm

Simon Arlott (1):
  [CRYPTO] padlock: Remove pointless padlock module

 crypto/Kconfig  |   13 +
 crypto/Makefile |2 
 crypto/ablkcipher.c |   83 ++
 crypto/algapi.c |  169 +
 crypto/blkcipher.c  |   72 -
 crypto/cbc.c|   11 +
 crypto/cryptd.c |  375 
 crypto/cryptomgr.c  |   66 +---
 crypto/ecb.c|   11 +
 crypto/hash.c   |2 
 crypto/hmac.c   |   11 +
 crypto/lrw.c|   11 +
 crypto/pcbc.c   |   11 +
 crypto/tcrypt.c |  121 ++-
 crypto/xcbc.c   |   12 +
 drivers/crypto/Kconfig  |   16 --
 drivers/crypto/Makefile |1 
 include/crypto/algapi.h |   84 ++
 include/linux/crypto.h  |  236 +-
 19 files changed, 1166 insertions(+), 141 deletions(-)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] ehea: NAPI multi queue TX/RX path for SMP

2007-05-01 Thread Michael Ellerman
On Wed, 2007-02-28 at 18:34 +0100, Jan-Bernd Themann wrote:
> This patch provides a functionality that allows parallel 
> RX processing on multiple RX queues by using dummy netdevices.
> 
> 
> Signed-off-by: Jan-Bernd Themann <[EMAIL PROTECTED]>
> ---

> @@ -1789,6 +1798,22 @@ static void ehea_xmit3(struct sk_buff *s
>   dev_kfree_skb(skb);
>  }
>  
> +static inline int ehea_hash_skb(struct sk_buff *skb, int num_qps)
> +{
> + struct tcphdr *tcp;
> + u32 tmp;
> +
> + if ((skb->protocol == htons(ETH_P_IP)) &&
> + (skb->nh.iph->protocol == IPPROTO_TCP)) {

This breaks the build, looks like skb->nh went away:
b0e380b1d8a8e0aca215df97702f99815f05c094

/scratch/michael/kisskb-build/src/drivers/net/ehea/ehea_main.c:1806: error: 
'struct sk_buff' has no member named 'nh'
/scratch/michael/kisskb-build/src/drivers/net/ehea/ehea_main.c:1807: error: 
'struct sk_buff' has no member named 'nh'
/scratch/michael/kisskb-build/src/drivers/net/ehea/ehea_main.c:1807: error: 
'struct sk_buff' has no member named 'nh'
/scratch/michael/kisskb-build/src/drivers/net/ehea/ehea_main.c:1809: error: 
'struct sk_buff' has no member named 'nh'

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


signature.asc
Description: This is a digitally signed message part


Re: Question about Reiser4 (how to boot it?)

2007-05-01 Thread lkml777

Hi Jeff, it seems that lkml has contacted both of my email accounts and
cripped them.

I can no longer recieve email from lkml on this account.

I can neither recieve or send email to lkml from my other account.

They have also just deleted the 4 emails I sent to lkml from the page
http://lkml.org/lkml/2007/4/30/

This included one to you.

In case you didn't get it,... here it is again.

---

I used GRUB and the kernel and initrd from a separate partition to boot
a Reiser4 installation.
-- 
  
  [EMAIL PROTECTED]

-- 
http://www.fastmail.fm - Does exactly what it says on the tin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: per-thread rusage

2007-05-01 Thread Ulrich Drepper

On 5/1/07, Theodore Tso <[EMAIL PROTECTED]> wrote:

The question is should we use setrlimit() to set the per-thread CPU
limit, given that we would need some separate interface to set signal
that should be sent.

Is there any reason why we should have the interface specify whether
the signal should be directed to a specified process or kernel
thread-id, perhaps using si_pid field in the siginfo_t to specify
which thread had exceeded its CPU limit.  Or would this be overkill?


The more I think about it the more complex it gets.  There is a
problem with delivering the signal to the receiving process itself: it
is out of time and cannot perform the cleanup operation anymore.  You
could grant it a grace period but how long should that be?  Some of
the cleanup handlers might take a long time.  If you don't enforce the
CPU limit then it doesn't have to be in the kernel and you might as
well use CLOCK_THREAD_CPUTIME_ID and create a timer.  This should
already work today.  If not it must be fixed.

Delivering the timeout signal to another thread isn't really possible
either since the cleanup code might access thread-local data which
wouldn't work since it's not the canceled thread's data which is
accessed.

I don't have a good answer right now whether enforced CPU limits can
be implemented at all.  But it seems for your purposes a timer with
the CPU clock might be sufficient.



Do you think this is something that we could get standardized into an
upcoming Posix/Posix Threads standard?


Regardless of whether a solution can be found, it's too late for the
next revision.  The deadline for new features is long gone by.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SOME STUFF ABOUT REISER4 To Mr Hopper

2007-05-01 Thread lkml777

On Wed, 25 Apr 2007 09:29:19 -0400, "Jeff Garzik" <[EMAIL PROTECTED]>
said:
> Please fix your caps lock key.  Thanks.
> 
>   Jeff

Hi Jeff, it seems that lkml has contacted both of my email accounts and
cripped them.

I can no longer recieve email from lkml on this account.

I can neither recieve or send email to lkml from my other account.

They have also just deleted the 4 emails I sent to lkml from the page
http://lkml.org/lkml/2007/4/30/

This included one to you.

In case you didn't get it,... here it is again.

---

Please fix your attitude.
-- 
  
  [EMAIL PROTECTED]

-- 
http://www.fastmail.fm - The way an email service should be

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question about Reiser4

2007-05-01 Thread lkml777

Hi Edward, it seems that lkml has contacted both of my email accounts
and cripped them.

I can no longer recieve email from lkml on this account.

I can neither recieve or send email to lkml from my other account.

They have also just deleted the 4 emails I sent to lkml from the page
http://lkml.org/lkml/2007/4/30/

This included one to you.

In case you didn't get it,... here it is again.

(Since you still haven't answered this one).

-

On Wed, 25 Apr 2007 19:03:12 +0400, "Edward Shishkin"
<[EMAIL PROTECTED]> said:
> [EMAIL PROTECTED] wrote:
> 
> >
> >As I understand it, the default Reiser4 DOES NOT USE any compression at
> >all, not even tail compression,
> >
> 
> ^tail compression^tail conversion
> Reiser4 does use tail conversion by default.
> 
> > but saves space by eliminating block
> >alignment wastage (tail compression is an option).
> >
> >So lets LOSE the statistics that involve compression. The results now
> >look like this:
> >
> >.-.
> >| FILESYSTEM | TIME |DISK |
> >| TYPE   |(secs)|USAGE|
> >.-.
> >|REISER4 | 3462 | 692 |
> >|EXT2| 4092 | 816 |
> >|JFS | 4225 | 806 |
> >|EXT4| 4408 | 816 |
> >|EXT3| 4421 | 816 |
> >|XFS | 4625 | 779 |
> >|REISER3 | 6178 | 793 |
> >|FAT32   |12342 | 988 |
> >|NTFS-3g |10414 | 772 |
> >.-.
> >
> >These results are still EXTREMELY GOOD for REISER4.
> >  
> >
> 
> Everything is not so simple in the science of testing..
> Would you please change direction of your activity to stressing
> instead of benchmarking? Caught oopses would have great value..
> OK?
> 
> Regards,
> Edward.
> 

Tail conversion is NOT compression,

So what exactly is your point?

By "tail compression" I mean plugin ctail40, but since I was never able
to get it to work, maybe its not tail compression at all.
-- 
  
  [EMAIL PROTECTED]

-- 
http://www.fastmail.fm - Faster than the air-speed velocity of an
  unladen european swallow

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH (v2)] crypto: Remove pointless padlock module

2007-05-01 Thread Herbert Xu
On Sun, Apr 29, 2007 at 09:01:10AM +0100, Simon Arlott wrote:
> 
> Well that's mostly the point - it shouldn't get compiled in - ever, 
> but it also has other modules depending on it in Kconfig that 
> shouldn't need to be modules.

Patch applied.  Thanks!
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why ask Sun for ZFS while we have ReiserFS4 !?

2007-05-01 Thread lkml777

Hi andrew, it seems that lkml has contacted both of my email accounts
and cripped them.

I can no longer recieve email from lkml on this account.

I can neither recieve or send email to lkml from my other account.

They have also just deleted the 4 emails I sent to lkml from the page
http://lkml.org/lkml/2007/4/30/

This included one to you.

In case you didn't get it,... here it is again.



Yeah, why do you need ZFS while we have ReiserFS4?

REISER4 - THE BEST FILESYSTEM EVER.

You can read more here:

http://linuxhelp.150m.com/resources/fs-benchmarks.htm
http://m.domaindlx.com/LinuxHelp/resources/fs-benchmarks.htm

.-.
| FILESYSTEM | TIME |DISK |
| TYPE   |(secs)|USAGE|
.-.
|REISER4 lzo | 1938 | 278 |
|REISER4 gzip| 2295 | 213 |
.-.
|REISER4 | 3462 | 692 |
|EXT2| 4092 | 816 |
|JFS | 4225 | 806 |
|EXT4| 4408 | 816 |
|EXT3| 4421 | 816 |
|XFS | 4625 | 779 |
|REISER3 | 6178 | 793 |
|FAT32   |12342 | 988 |
|NTFS-3g |10414 | 772 |
.-.


Column one measures the time taken to complete the bonnie++ benchmarking
test (run with the parameters bonnie++ -n128:128k:0). The top two
results use Reiser4 with compression. Since bonnie++ writes test files
which are almost all zeros, compression speeds things up dramatically.
That this is not the case in real world examples can be seen below where
compression does not speed things up. However, more importantly, it does
not slow things down either.

Column two, Disk Usage: measures the amount of disk used to store 655MB
of raw data (which was 3 different copies of the Linux kernel sources).

OR LOOK AT THE FULL RESULTS:

.-.
|File |Disk |Copy |Copy |Tar  |Unzip| Del |
|System   |Usage|655MB|655MB|Gzip |UnTar| 2.5 |
|Type | (MB)| (1) | (2) |655MB|655MB| Gig |
.-.
|REISER4 gzip | 213 | 148 |  68 |  83 |  48 |  70 |
|REISER4 lzo  | 278 | 138 |  56 |  80 |  34 |  84 |
|REISER4 tails| 673 | 148 |  63 |  78 |  33 |  65 |
|REISER4  | 692 | 148 |  55 |  67 |  25 |  56 |
|NTFS3g   | 772 |1333 |1426 | 585 | 767 | 194 |
|NTFS | 779 | 781 | 173 |   X |   X |   X |
|REISER3  | 793 | 184 |  98 |  85 |  63 |  22 |
|XFS  | 799 | 220 | 173 | 119 |  90 | 106 |
|JFS  | 806 | 228 | 202 |  95 |  97 | 127 |
|EXT4 extents | 806 | 162 |  55 |  69 |  36 |  32 |
|EXT4 default | 816 | 174 |  70 |  74 |  42 |  50 |
|EXT3 | 816 | 182 |  74 |  73 |  43 |  51 |
|EXT2 | 816 | 201 |  82 |  73 |  39 |  67 |
|FAT32| 988 | 253 | 158 | 118 |  81 |  95 |
.-.


Each test was preformed 5 times and the average value recorded.
Disk Usage: The amount of disk used to store the data (which was 3
different copies of the Linux kernel sources).
The raw data (without filesystem meta-data, block alignment wastage,
etc) was 655MB.
Copy 655MB (1): Copy the data over a partition boundary.
Copy 655MB (2): Copy the data within a partition.
Tar Gzip 655MB: Tar and Gzip the data.
Unzip UnTar 655MB: UnGzip and UnTar the data.
Del 2.5 Gig: Delete everything just written (about 2.5 Gig).

http://lkml.org/lkml/2007/4/9/4
-- 
  
  [EMAIL PROTECTED]

-- 
http://www.fastmail.fm - A fast, anti-spam email service.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] crypto: convert "crypto" subdirectory to UTF-8

2007-05-01 Thread Herbert Xu
On Tue, Apr 17, 2007 at 01:25:49PM -0400, John Anthony Kazos Jr. wrote:
> From: John Anthony Kazos Jr. <[EMAIL PROTECTED]>
> 
> Convert the subdirectory "crypto" to UTF-8. The files changed are 
>  and .
> 
> Signed-off-by: John Anthony Kazos Jr. <[EMAIL PROTECTED]>

Thanks.  Could you fix up include/linux/crypto.h as well?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Remove unnecessary irq disabling

2007-05-01 Thread Glauber de Oliveira Costa
On Tue, May 01, 2007 at 07:59:21PM -0400, Mark Lord wrote:
> Glauber de Oliveira Costa wrote:
> >RR asks us if it is really necessary to disable interrupts in
> >setup_secondary_APIC_clock(). The answer is no, since setup_APIC_timer()
> >starts by saving irq flags, which also disables them.
> >
> >Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
> >
> >--- a/arch/x86_64/kernel/apic.c
> >+++ b/arch/x86_64/kernel/apic.c
> >@@ -875,9 +875,7 @@ void __init setup_boot_APIC_clock (void)
> > 
> > void __cpuinit setup_secondary_APIC_clock(void)
> > {
> >-local_irq_disable(); /* FIXME: Do we need this? --RR */
> > setup_APIC_timer(calibration_result);
> >-local_irq_enable();
> > }
> > 
> > void disable_APIC_timer(void)
> 
> Okay, I'll bite:  before the patch, this code would exit
> with interrupts *enabled*, always.   Now it does not.
> 
yeah, you have a point. The disable is unnecessary, but maybe
the enable is not. However,

> What does that break, or was it already broken and this fixes it?
I think neither. This function is only called at early bootup,
(start_secondary() ), and most of its callees have interrupts off anyway.
But maybe we do lose something. Andi, do you have a word on this?

-- 
Glauber de Oliveira Costa
Red Hat Inc.
"Free as in Freedom"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: per-thread rusage

2007-05-01 Thread Theodore Tso
On Tue, May 01, 2007 at 05:17:28PM -0700, Ulrich Drepper wrote:
> We have, in principal: setrlimit.  We jump through hoops in the moment
> to make RLIMIT_CPU a per-process facility.  This is all nice.  All you
> need to do is to add resources RLIMIT_*_THREAD (e.g.,
> RLIMIT_CPU_THREAD) and additionally do accounting in a per-thread
> basis.

Indeed; in fact it would be easier to do per-thread accounting than
our current per-process accounting, as you note.

> The thread library can also not simply hijack the SIGXCPU signal,
> the application want to use it So what would be additionally
> needed is a method to specify what signal to sent.  The default
> might just as well be SIGXCPU but this must be changable.


The question is should we use setrlimit() to set the per-thread CPU
limit, given that we would need some separate interface to set signal
that should be sent.  

Is there any reason why we should have the interface specify whether
the signal should be directed to a specified process or kernel
thread-id, perhaps using si_pid field in the siginfo_t to specify
which thread had exceeded its CPU limit.  Or would this be overkill?

> The thread cancellation must appear like any other cancellation,
> perhaps with a special status value (PTHREAD_CANCELED_XCPU instead of
> PTHREAD_CANCEL).  But that's a userlevel detail.

Yep, I agree that thread cancellation is the right thing to happen at
the Posix Threads level.

Do you think this is something that we could get standardized into an
upcoming Posix/Posix Threads standard?  

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 07/10] mips: excite: use __attribute_unused__

2007-05-01 Thread David Rientjes
Replace variable instances of __attribute__((unused)) with
__attribute_unused__.

Cc: Ralf Baechle <[EMAIL PROTECTED]>
Cc: Thomas Koeller <[EMAIL PROTECTED]>
Signed-off-by: David Rientjes <[EMAIL PROTECTED]>
---
 arch/mips/basler/excite/excite_device.c |   16 
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/mips/basler/excite/excite_device.c 
b/arch/mips/basler/excite/excite_device.c
--- a/arch/mips/basler/excite/excite_device.c
+++ b/arch/mips/basler/excite/excite_device.c
@@ -68,7 +68,7 @@ enum {
 
 
 static struct resource
-   excite_ctr_resource __attribute__((unused)) = {
+   excite_ctr_resource __attribute_unused__ = {
.name   = "GPI counters",
.start  = 0,
.end= 5,
@@ -77,7 +77,7 @@ static struct resource
.sibling= NULL,
.child  = NULL
},
-   excite_gpislice_resource __attribute__((unused)) = {
+   excite_gpislice_resource __attribute_unused__ = {
.name   = "GPI slices",
.start  = 0,
.end= 1,
@@ -86,7 +86,7 @@ static struct resource
.sibling= NULL,
.child  = NULL
},
-   excite_mdio_channel_resource __attribute__((unused)) = {
+   excite_mdio_channel_resource __attribute_unused__ = {
.name   = "MDIO channels",
.start  = 0,
.end= 1,
@@ -95,7 +95,7 @@ static struct resource
.sibling= NULL,
.child  = NULL
},
-   excite_fifomem_resource __attribute__((unused)) = {
+   excite_fifomem_resource __attribute_unused__ = {
.name   = "FIFO memory",
.start  = 0,
.end= 767,
@@ -104,7 +104,7 @@ static struct resource
.sibling= NULL,
.child  = NULL
},
-   excite_scram_resource __attribute__((unused)) = {
+   excite_scram_resource __attribute_unused__ = {
.name   = "Scratch RAM",
.start  = EXCITE_PHYS_SCRAM,
.end= EXCITE_PHYS_SCRAM + EXCITE_SIZE_SCRAM - 1,
@@ -113,7 +113,7 @@ static struct resource
.sibling= NULL,
.child  = NULL
},
-   excite_fpga_resource __attribute__((unused)) = {
+   excite_fpga_resource __attribute_unused__ = {
.name   = "System FPGA",
.start  = EXCITE_PHYS_FPGA,
.end= EXCITE_PHYS_FPGA + EXCITE_SIZE_FPGA - 1,
@@ -122,7 +122,7 @@ static struct resource
.sibling= NULL,
.child  = NULL
},
-   excite_nand_resource __attribute__((unused)) = {
+   excite_nand_resource __attribute_unused__ = {
.name   = "NAND flash control",
.start  = EXCITE_PHYS_NAND,
.end= EXCITE_PHYS_NAND + EXCITE_SIZE_NAND - 1,
@@ -131,7 +131,7 @@ static struct resource
.sibling= NULL,
.child  = NULL
},
-   excite_titan_resource __attribute__((unused)) = {
+   excite_titan_resource __attribute_unused__ = {
.name   = "TITAN registers",
.start  = EXCITE_PHYS_TITAN,
.end= EXCITE_PHYS_TITAN + EXCITE_SIZE_TITAN - 1,
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 03/10] sh: dma: use __attribute_unused__

2007-05-01 Thread David Rientjes
There is no such thing as labeling a variable as __attribute__((used)).
Since ts_shift is not referenced in inline assembly, we assume that we're
simply suppressing a warning here if the variable is declared but
unreferenced.

Cc: Paul Mundt <[EMAIL PROTECTED]>
Signed-off-by: David Rientjes <[EMAIL PROTECTED]>
---
 include/asm-sh/cpu-sh3/dma.h|2 +-
 include/asm-sh/cpu-sh4/dma-sh7780.h |2 +-
 include/asm-sh/cpu-sh4/dma.h|2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/asm-sh/cpu-sh3/dma.h b/include/asm-sh/cpu-sh3/dma.h
--- a/include/asm-sh/cpu-sh3/dma.h
+++ b/include/asm-sh/cpu-sh3/dma.h
@@ -26,7 +26,7 @@ enum {
XMIT_SZ_128BIT,
 };
 
-static unsigned int ts_shift[] __attribute__ ((used)) = {
+static unsigned int ts_shift[] __attribute_unused__ = {
[XMIT_SZ_8BIT]  = 0,
[XMIT_SZ_16BIT] = 1,
[XMIT_SZ_32BIT] = 2,
diff --git a/include/asm-sh/cpu-sh4/dma-sh7780.h 
b/include/asm-sh/cpu-sh4/dma-sh7780.h
--- a/include/asm-sh/cpu-sh4/dma-sh7780.h
+++ b/include/asm-sh/cpu-sh4/dma-sh7780.h
@@ -28,7 +28,7 @@ enum {
 /*
  * The DMA count is defined as the number of bytes to transfer.
  */
-static unsigned int __attribute__ ((used)) ts_shift[] = {
+static unsigned int ts_shift[] __attribute_unused__ = {
[XMIT_SZ_8BIT]  = 0,
[XMIT_SZ_16BIT] = 1,
[XMIT_SZ_32BIT] = 2,
diff --git a/include/asm-sh/cpu-sh4/dma.h b/include/asm-sh/cpu-sh4/dma.h
--- a/include/asm-sh/cpu-sh4/dma.h
+++ b/include/asm-sh/cpu-sh4/dma.h
@@ -53,7 +53,7 @@ enum {
 /*
  * The DMA count is defined as the number of bytes to transfer.
  */
-static unsigned int ts_shift[] __attribute__ ((used)) = {
+static unsigned int ts_shift[] __attribute_unused__ = {
[XMIT_SZ_64BIT] = 3,
[XMIT_SZ_8BIT]  = 0,
[XMIT_SZ_16BIT] = 1,
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 05/10] frv: gdb: use __attribute_unused__

2007-05-01 Thread David Rientjes
Replace function instances of __attribute__((unused)) with
__attribute_unused__ to suppress warnings.

Cc: David Howells <[EMAIL PROTECTED]>
Signed-off-by: David Rientjes <[EMAIL PROTECTED]>
---
 arch/frv/kernel/gdb-stub.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/frv/kernel/gdb-stub.c b/arch/frv/kernel/gdb-stub.c
--- a/arch/frv/kernel/gdb-stub.c
+++ b/arch/frv/kernel/gdb-stub.c
@@ -1195,7 +1195,7 @@ static void gdbstub_check_breakpoint(void)
 /*
  *
  */
-static void __attribute__((unused)) gdbstub_show_regs(void)
+static void __attribute_unused__ gdbstub_show_regs(void)
 {
unsigned long *reg;
int loop;
@@ -1223,7 +1223,7 @@ static void __attribute__((unused)) 
gdbstub_show_regs(void)
 /*
  * dump debugging regs
  */
-static void __attribute__((unused)) gdbstub_dump_debugregs(void)
+static void __attribute_unused__ gdbstub_dump_debugregs(void)
 {
gdbstub_printk("DCR%08lx  ", __debug_status.dcr);
gdbstub_printk("BRR%08lx\n", __debug_status.brr);
@@ -2079,25 +2079,25 @@ void gdbstub_exit(int status)
  * GDB wants to call malloc() and free() to allocate memory for calling kernel
  * functions directly from its command line
  */
-static void *malloc(size_t size) __attribute__((unused));
+static void *malloc(size_t size) __attribute_unused__;
 static void *malloc(size_t size)
 {
return kmalloc(size, GFP_ATOMIC);
 }
 
-static void free(void *p) __attribute__((unused));
+static void free(void *p) __attribute_unused__;
 static void free(void *p)
 {
kfree(p);
 }
 
-static uint32_t ___get_HSR0(void) __attribute__((unused));
+static uint32_t ___get_HSR0(void) __attribute_unused__;
 static uint32_t ___get_HSR0(void)
 {
return __get_HSR(0);
 }
 
-static uint32_t ___set_HSR0(uint32_t x) __attribute__((unused));
+static uint32_t ___set_HSR0(uint32_t x) __attribute_unused__;
 static uint32_t ___set_HSR0(uint32_t x)
 {
__set_HSR(0, x);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 08/10] mips: tlbex: use __attribute_unused__

2007-05-01 Thread David Rientjes
Replace function instances of __attribute__((unused)) with
__attribute_unused__.

Cc: Ralf Baechle <[EMAIL PROTECTED]>
Signed-off-by: David Rientjes <[EMAIL PROTECTED]>
---
 arch/mips/mm/tlbex.c |   36 ++--
 1 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -35,24 +35,24 @@
 #include 
 #include 
 
-static __init int __attribute__((unused)) r45k_bvahwbug(void)
+static __init int __attribute_unused__ r45k_bvahwbug(void)
 {
/* XXX: We should probe for the presence of this bug, but we don't. */
return 0;
 }
 
-static __init int __attribute__((unused)) r4k_250MHZhwbug(void)
+static __init int __attribute_unused__ r4k_250MHZhwbug(void)
 {
/* XXX: We should probe for the presence of this bug, but we don't. */
return 0;
 }
 
-static __init int __attribute__((unused)) bcm1250_m3_war(void)
+static __init int __attribute_unused__ bcm1250_m3_war(void)
 {
return BCM1250_M3_WAR;
 }
 
-static __init int __attribute__((unused)) r1_llsc_war(void)
+static __init int __attribute_unused__ r1_llsc_war(void)
 {
return R1_LLSC_WAR;
 }
@@ -511,18 +511,18 @@ L_LA(_r3000_write_probe_fail)
 #define i_ehb(buf) i_sll(buf, 0, 0, 3)
 
 #ifdef CONFIG_64BIT
-static __init int __attribute__((unused)) in_compat_space_p(long addr)
+static __init int __attribute_unused__ in_compat_space_p(long addr)
 {
/* Is this address in 32bit compat space? */
return (((addr) & 0xL) == 0xL);
 }
 
-static __init int __attribute__((unused)) rel_highest(long val)
+static __init int __attribute_unused__ rel_highest(long val)
 {
return val + 0x800080008000L) >> 48) & 0x) ^ 0x8000) - 0x8000;
 }
 
-static __init int __attribute__((unused)) rel_higher(long val)
+static __init int __attribute_unused__ rel_higher(long val)
 {
return val + 0x80008000L) >> 32) & 0x) ^ 0x8000) - 0x8000;
 }
@@ -556,8 +556,8 @@ static __init void i_LA_mostly(u32 **buf, unsigned int rs, 
long addr)
i_lui(buf, rs, rel_hi(addr));
 }
 
-static __init void __attribute__((unused)) i_LA(u32 **buf, unsigned int rs,
-   long addr)
+static __init void __attribute_unused__ i_LA(u32 **buf, unsigned int rs,
+long addr)
 {
i_LA_mostly(buf, rs, addr);
if (rel_lo(addr))
@@ -636,8 +636,8 @@ static __init void copy_handler(struct reloc *rel, struct 
label *lab,
move_labels(lab, first, end, off);
 }
 
-static __init int __attribute__((unused)) insn_has_bdelay(struct reloc *rel,
- u32 *addr)
+static __init int __attribute_unused__ insn_has_bdelay(struct reloc *rel,
+  u32 *addr)
 {
for (; rel->lab != label_invalid; rel++) {
if (rel->addr == addr
@@ -650,15 +650,15 @@ static __init int __attribute__((unused)) 
insn_has_bdelay(struct reloc *rel,
 }
 
 /* convenience functions for labeled branches */
-static void __init __attribute__((unused))
+static void __init __attribute_unused__
il_bltz(u32 **p, struct reloc **r, unsigned int reg, enum label_id l)
 {
r_mips_pc16(r, *p, l);
i_bltz(p, reg, 0);
 }
 
-static void __init __attribute__((unused)) il_b(u32 **p, struct reloc **r,
-enum label_id l)
+static void __init __attribute_unused__ il_b(u32 **p, struct reloc **r,
+enum label_id l)
 {
r_mips_pc16(r, *p, l);
i_b(p, 0);
@@ -671,7 +671,7 @@ static void __init il_beqz(u32 **p, struct reloc **r, 
unsigned int reg,
i_beqz(p, reg, 0);
 }
 
-static void __init __attribute__((unused))
+static void __init __attribute_unused__
 il_beqzl(u32 **p, struct reloc **r, unsigned int reg, enum label_id l)
 {
r_mips_pc16(r, *p, l);
@@ -692,7 +692,7 @@ static void __init il_bgezl(u32 **p, struct reloc **r, 
unsigned int reg,
i_bgezl(p, reg, 0);
 }
 
-static void __init __attribute__((unused))
+static void __init __attribute_unused__
 il_bgez(u32 **p, struct reloc **r, unsigned int reg, enum label_id l)
 {
r_mips_pc16(r, *p, l);
@@ -810,7 +810,7 @@ static __initdata u32 final_handler[64];
  *
  * As if we MIPS hackers wouldn't know how to nop pipelines happy ...
  */
-static __init void __attribute__((unused)) build_tlb_probe_entry(u32 **p)
+static __init void __attribute_unused__ build_tlb_probe_entry(u32 **p)
 {
switch (current_cpu_data.cputype) {
/* Found by experiment: R4600 v2.0 needs this, too.  */
@@ -1098,7 +1098,7 @@ build_get_pgd_vmalloc64(u32 **p, struct label **l, struct 
reloc **r,
  * TMP and PTR are scratch.
  * TMP will be clobbered, PTR will hold the pgd entry.
  */
-static __init void __attribute__((unused)

[patch 04/10] scsi: fix ambiguous gdthtable definition

2007-05-01 Thread David Rientjes
Labeling a variable as __attribute_used__ is ambiguous: it means
__attribute__((unused)) for gcc <3.4 and __attribute__((used)) for
gcc >=3.4.  There is no such thing as labeling a variable as
__attribute__((used)).  We assume that we're simply suppressing a warning
here if gdthtable[] is declared but unreferenced.

Cc: Achim Leubner <[EMAIL PROTECTED]>
Signed-off-by: David Rientjes <[EMAIL PROTECTED]>
---
 drivers/scsi/gdth.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c
--- a/drivers/scsi/gdth.c
+++ b/drivers/scsi/gdth.c
@@ -876,7 +876,7 @@ static int __init gdth_search_pci(gdth_pci_str *pcistr)
 /* Vortex only makes RAID controllers.
  * We do not really want to specify all 550 ids here, so wildcard match.
  */
-static struct pci_device_id gdthtable[] __attribute_used__ = {
+static struct pci_device_id gdthtable[] __attribute_unused__ = {
 {PCI_VENDOR_ID_VORTEX,PCI_ANY_ID,PCI_ANY_ID, PCI_ANY_ID},
 {PCI_VENDOR_ID_INTEL,PCI_DEVICE_ID_INTEL_SRC,PCI_ANY_ID,PCI_ANY_ID}, 
 
{PCI_VENDOR_ID_INTEL,PCI_DEVICE_ID_INTEL_SRC_XSCALE,PCI_ANY_ID,PCI_ANY_ID}, 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 09/10] powerpc: ps3: use __attribute_unused__

2007-05-01 Thread David Rientjes
Replace function instances of __attribute__ ((unused)) with
__attribute_unused__.

Cc: Geoff Levand <[EMAIL PROTECTED]>
Signed-off-by: David Rientjes <[EMAIL PROTECTED]>
---
 arch/powerpc/platforms/ps3/interrupt.c |4 ++--
 arch/powerpc/platforms/ps3/time.c  |2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/ps3/interrupt.c 
b/arch/powerpc/platforms/ps3/interrupt.c
--- a/arch/powerpc/platforms/ps3/interrupt.c
+++ b/arch/powerpc/platforms/ps3/interrupt.c
@@ -434,7 +434,7 @@ static void _dump_64_bmp(const char *header, const u64 *p, 
unsigned cpu,
*p & 0x);
 }
 
-static void __attribute__ ((unused)) _dump_256_bmp(const char *header,
+static void __attribute_unused__ _dump_256_bmp(const char *header,
const u64 *p, unsigned cpu, const char* func, int line)
 {
pr_debug("%s:%d: %s %u {%016lx:%016lx:%016lx:%016lx}\n",
@@ -453,7 +453,7 @@ static void _dump_bmp(struct ps3_private* pd, const char* 
func, int line)
 }
 
 #define dump_mask(_x) _dump_mask(_x, __func__, __LINE__)
-static void __attribute__ ((unused)) _dump_mask(struct ps3_private* pd,
+static void __attribute_unused__ _dump_mask(struct ps3_private* pd,
const char* func, int line)
 {
unsigned long flags;
diff --git a/arch/powerpc/platforms/ps3/time.c 
b/arch/powerpc/platforms/ps3/time.c
--- a/arch/powerpc/platforms/ps3/time.c
+++ b/arch/powerpc/platforms/ps3/time.c
@@ -39,7 +39,7 @@ static void _dump_tm(const struct rtc_time *tm, const char* 
func, int line)
 }
 
 #define dump_time(_a) _dump_time(_a, __func__, __LINE__)
-static void __attribute__ ((unused)) _dump_time(int time, const char* func,
+static void __attribute_unused__ _dump_time(int time, const char* func,
int line)
 {
struct rtc_time tm;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 10/10] i386 mmzone: use __attribute_unused__

2007-05-01 Thread David Rientjes
Replace automatic variable instances of __attribute__ ((unused)) with
__attribute_unused__.

Cc: Andy Whitcroft <[EMAIL PROTECTED]>
Signed-off-by: David Rientjes <[EMAIL PROTECTED]>
---
 include/asm-i386/mmzone.h |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/asm-i386/mmzone.h b/include/asm-i386/mmzone.h
--- a/include/asm-i386/mmzone.h
+++ b/include/asm-i386/mmzone.h
@@ -122,21 +122,21 @@ static inline int pfn_valid(int pfn)
__alloc_bootmem_node(NODE_DATA(0), (x), PAGE_SIZE, 0)
 #define alloc_bootmem_node(pgdat, x)   \
 ({ \
-   struct pglist_data  __attribute__ ((unused))\
+   struct pglist_data  __attribute_unused__\
*__alloc_bootmem_node__pgdat = (pgdat); \
__alloc_bootmem_node(NODE_DATA(0), (x), SMP_CACHE_BYTES,\
__pa(MAX_DMA_ADDRESS)); \
 })
 #define alloc_bootmem_pages_node(pgdat, x) \
 ({ \
-   struct pglist_data  __attribute__ ((unused))\
+   struct pglist_data  __attribute_unused__\
*__alloc_bootmem_node__pgdat = (pgdat); \
__alloc_bootmem_node(NODE_DATA(0), (x), PAGE_SIZE,  \
__pa(MAX_DMA_ADDRESS))  \
 })
 #define alloc_bootmem_low_pages_node(pgdat, x) \
 ({ \
-   struct pglist_data  __attribute__ ((unused))\
+   struct pglist_data  __attribute_unused__\
*__alloc_bootmem_node__pgdat = (pgdat); \
__alloc_bootmem_node(NODE_DATA(0), (x), PAGE_SIZE, 0);  \
 })
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 06/10] i386: voyager: use __attribute_unused__

2007-05-01 Thread David Rientjes
Replace automatic variable instances of __attribute__((unused)) with
__attribute_unused__ in mca_nmi_hook().

Cc: James Bottomley <[EMAIL PROTECTED]>
Signed-off-by: David Rientjes <[EMAIL PROTECTED]>
---
 arch/i386/mach-voyager/voyager_basic.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/i386/mach-voyager/voyager_basic.c 
b/arch/i386/mach-voyager/voyager_basic.c
--- a/arch/i386/mach-voyager/voyager_basic.c
+++ b/arch/i386/mach-voyager/voyager_basic.c
@@ -292,8 +292,8 @@ machine_emergency_restart(void)
 void
 mca_nmi_hook(void)
 {
-   __u8 dumpval __attribute__((unused)) = inb(0xf823);
-   __u8 swnmi __attribute__((unused)) = inb(0xf813);
+   __u8 dumpval __attribute_unused__ = inb(0xf823);
+   __u8 swnmi __attribute_unused__ = inb(0xf813);
 
/* FIXME: assume dump switch pressed */
/* check to see if the dump switch was pressed */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 01/10] compiler: define __attribute_unused__

2007-05-01 Thread David Rientjes
For all supported versions of gcc (major version 3 and above), functions
and variables may be declared with __attribute__((unused)) to suppress
warnings if they are declared but unused.

This shouldn't be confused with functions being declared with
__attribute__((used)).  This specifies that the function code shall still
be emitted even if it appears to be unreferenced, normally used if
embedded in inline assembly.  For gcc 3.4 and later, unreferenced static
variables and functions are not emitted so this attribute is necessary to
force variables and functions to be output.  Earlier versions of gcc can
simply use __attribute__((unused)) to suppress warnings about such
variables: we do not require any special classification to ensure they are
emitted.

We introduce __attribute_unused__ for variables that should not produce a
compile warning if they can, due to preprocessor macros, go unreferenced.

Signed-off-by: David Rientjes <[EMAIL PROTECTED]>
---
 include/linux/compiler-gcc.h |   16 
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -37,3 +37,19 @@
 #define  noinline  __attribute__((noinline))
 #define __attribute_pure__ __attribute__((pure))
 #define __attribute_const____attribute__((__const__))
+
+/*
+ * __attribute_unused__ shall be used for functions or variables to suppress
+ * warnings when they may be declared but, due to preprocessor macros,
+ * commenting, etc., go unreferenced.
+ *
+ * In contrast, __attribute_used__ shall be used only for functions.  gcc <3.4
+ * emits code for static functions that are unreferenced and outputs a warning.
+ * __attribute_used__ will correctly suppress this warning.  gcc >=3.4 does not
+ * emit code for static functions that are unreferenced (and thus there is no
+ * warning), but __attribute_used__ forces the function code to be output.  Use
+ * __attribute_unused__ to suppress warnings about functions being unused or
+ * __attribute_used__ to ensure code is emitted when it is referenced only in
+ * inline assembly.
+ */
+#define __attribute_unused__   __attribute__((unused))
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 02/10] i386 pci: type may be unused

2007-05-01 Thread David Rientjes
In the case of !CONFIG_PCI_DIRECT && !CONFIG_PCI_MMCONFIG, type is
unreferened.

Cc: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: David Rientjes <[EMAIL PROTECTED]>
---
 arch/i386/pci/init.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/i386/pci/init.c b/arch/i386/pci/init.c
--- a/arch/i386/pci/init.c
+++ b/arch/i386/pci/init.c
@@ -6,7 +6,7 @@
in the right sequence from here. */
 static __init int pci_access_init(void)
 {
-   int type = 0;
+   int type __attribute_unused__ = 0;
 
 #ifdef CONFIG_PCI_DIRECT
type = pci_direct_probe();
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   >