[PATCH] Fix the warning when running make tags

2007-06-26 Thread Aneesh Kumar K.V

From: Aneesh Kumar K.V <[EMAIL PROTECTED]>


make tags was giving the below warning.

ctags: Warning: arch/x86_64/kernel/head.S:124: null expansion of name
pattern "\1"

Fix the same by making sure we taken only ENTRY pattern found at the
begining of the line.

Signed-off-by: Aneesh Kumar K.V <[EMAIL PROTECTED]>
---
Makefile |2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Makefile b/Makefile
index 8a3c271..9c2670e 100644
--- a/Makefile
+++ b/Makefile
@@ -1316,7 +1316,7 @@ define xtags
-I __initdata,__exitdata,__acquires,__releases \
-I EXPORT_SYMBOL,EXPORT_SYMBOL_GPL \
--extra=+f --c-kinds=+px \
-   --regex-asm='/ENTRY\(([^)]*)\).*/\1/'; \
+   --regex-asm='/^ENTRY\(([^)]*)\).*/\1/'; \
$(all-kconfigs) | xargs $1 -a \
--langdef=kconfig \
--language-force=kconfig \
--
1.5.2.2.571.ge1341-dirty

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH, RFD]: Unbreak no-mmu mmap

2007-06-26 Thread Greg Ungerer


Bryan Wu wrote:

On Wed, 2007-06-20 at 12:00 +0900, Paul Mundt wrote:

On Fri, Jun 08, 2007 at 03:53:49PM +0200, Bernd Schmidt wrote:

diff --git a/mm/nommu.c b/mm/nommu.c
index 2b16b00..7480a95 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c

[snip]

+   /*
+* Must always set the VM_SPLIT_PAGES flag for single-page allocations,
+* to avoid trying to get the order of the compound page later on.
+*/
+   if (len == PAGE_SIZE)
+   vma->vm_flags |= VM_SPLIT_PAGES;
+   else if (flags & MAP_SPLIT_PAGES

And now you've just broken every non-blackfin nommu platform, as you've
only defined MAP_SPLIT_PAGES in asm-blackfin/mman.h.


+#ifdef CONFIG_NP2
+   || len < total_len
+#endif

And what is this? It only shows up in the blackfin defconfig. This is not
the place to be putting board-specific hacks.


Yes, it is our own NP2 memory allocator option. I think Bernd will fix
it.


Theres no reason you can't add the MAP_SPLIT_PAGES define in
all the necessary places too.



On Tue, Jun 19, 2007 at 07:26:19PM -0400, Robin Getz wrote:
I'm assuming that since no one had any large objections, that this is OK, and 
we should send to Andrew to live in -mm for awhile?



No real objections to the approach, but it would be nice if these sorts
of things were test compiled for at least one platform that isn't yours,
so the obviously broken stuff is fixed before it's posted and someone
else has to find out about it later.


Exactly, Could please do some simple test on your SH-NOMMU platform? And
we are waiting for some feedback from other nommu arch maintainers.

David and Grep could you please help on this? Maybe Robin got some m68k
nommu by hand which can be used for testing, I only have Blackfin, -:))


I have compiled the patch on m68knommu (after adding a MAP_SPLIT_PAGES
define). And it seems to work ok with simple testing.

I don't have a problem with the change, though please do add that
MAP_SPLIT_PAGES define in the appropriate mman.h includes. And like Paul
said there is no place for CONFIG_NP2 in it currently. Please take
that out.

Regards
Greg





Greg Ungerer  --  Chief Software Dude   EMAIL: [EMAIL PROTECTED]
Secure Computing CorporationPHONE:   +61 7 3435 2888
825 Stanley St, FAX: +61 7 3891 3630
Woolloongabba, QLD, 4102, Australia WEB: http://www.SnapGear.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] atkbd: cleanup only once

2007-06-26 Thread Greg KH
On Wed, Jun 27, 2007 at 12:59:32AM -0400, Dmitry Torokhov wrote:
> On Wednesday 27 June 2007 00:28, Greg KH wrote:
> > On Wed, Jun 27, 2007 at 12:34:09AM -0400, Dmitry Torokhov wrote:
> > > Hi Dave,
> > > 
> > > On Wednesday 27 June 2007 06:59, Dave Young wrote:
> > > > Hi,
> > > > 
> > > > If you press ctrl+alt+del several times as kernel booting (before user 
> > > > level bootin), the kernel will oops. I found the ps2_command is called 
> > > > more than once, then the ps2dev->serio maybe NULL pointer.
> > > > 
> > > > 2.6.22-rc5 and 2.6.22-rc6 have same result.
> > > > 
> > > > Signed-off-by: Dave Young <[EMAIL PROTECTED]>
> > > > ---
> > > > diff -upr linux/drivers/input/keyboard/atkbd.c 
> > > > linux.new/drivers/input/keyboard/atkbd.c
> > > > --- linux/drivers/input/keyboard/atkbd.c2007-06-27 
> > > > 10:38:37.0 +
> > > > +++ linux.new/drivers/input/keyboard/atkbd.c2007-06-27 
> > > > 10:37:39.0 +
> > > > @@ -795,6 +795,11 @@ static int atkbd_activate(struct atkbd *
> > > >  
> > > >  static void atkbd_cleanup(struct serio *serio)
> > > >  {
> > > > +   static int flag;
> > > > +
> > > > +   if(flag)
> > > > +   return;
> > > > +   flag = 1;
> > > 
> > > Unfortunately this will prevent atkbd from resetting keyboard on 2nd
> > > suspend attempt. It will also not work if you have an active MUX and
> > > have a couple of keyboards connected.
> > > 
> > > Greg, now that you removed rwsem from subsystem (and subsystem itself
> > > for that matter) there is nothing as far as I can see that stops
> > > several threads from running device_shutdown() simultaneously. I also
> > > do not see what would isolate device probing and shutting them down
> > > at the same time. Am I missing something?
> > 
> > There was never anything stopping that from happening before.  No driver
> > core code was using that rwsem, so it wasn't protecting anything,
> > despite people trying to use it as if it was :)
> > 
> 
> It did protect device_shutdown() from itself, didn't it?

Hm, yeah, it did, but that was it.  If that was its goal, it sure wasn't
obvious at all.

Do you think the driver core needs to serialize this?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] fsblock

2007-06-26 Thread Nick Piggin
On Tue, Jun 26, 2007 at 08:34:49AM -0400, Chris Mason wrote:
> On Tue, Jun 26, 2007 at 07:23:09PM +1000, David Chinner wrote:
> > On Tue, Jun 26, 2007 at 01:55:11PM +1000, Nick Piggin wrote:
> 
> [ ... fsblocks vs extent range mapping ]
> 
> > iomaps can double as range locks simply because iomaps are
> > expressions of ranges within the file.  Seeing as you can only
> > access a given range exclusively to modify it, inserting an empty
> > mapping into the tree as a range lock gives an effective method of
> > allowing safe parallel reads, writes and allocation into the file.
> > 
> > The fsblocks and the vm page cache interface cannot be used to
> > facilitate this because a radix tree is the wrong type of tree to
> > store this information in. A sparse, range based tree (e.g. btree)
> > is the right way to do this and it matches very well with
> > a range based API.
> 
> I'm really not against the extent based page cache idea, but I kind of
> assumed it would be too big a change for this kind of generic setup.  At
> any rate, if we'd like to do it, it may be best to ditch the idea of
> "attach mapping information to a page", and switch to "lookup mapping
> information and range locking for a page".

Well the get_block equivalent API is extent based one now, and I'll
look at what is required in making map_fsblock a more generic call
that could be used for an extent-based scheme.

An extent based thing IMO really isn't appropriate as the main generic
layer here though. If it is really useful and popular, then it could
be turned into generic code and sit along side fsblock or underneath
fsblock...

It definitely isn't trivial to drive the IO directly from something
like that which doesn't correspond to filesystem block size. Splitting
parts of your extent tree when things go dirty or uptodate or partially
under IO, etc.. joining things back up again when they are mergable.
Not that it would be impossible, but it would be a lot more heavyweight
than fsblock.

I think using fsblock to drive the IO and keep the pagecache flags
uptodate and using a btree in the filesystem to manage extents of block
allocations wouldn't be a bad idea though. Do any filesystems actually
do this?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NVidia Driver Support - 1680x1050 mode

2007-06-26 Thread Michael Lothian

OK where to start?

Firstly this is really the wrong list your writing to. Chances are
you'll be wanting to ask your question at
http://www.nvnews.net/vbulletin/forumdisplay.php?s==14 if your
using the Nvidia Blob otherwise if your using the 2D only NV driver
then you should really aim your question at the Xorg guys and gals

Secondly how exactly did you tell it differently?

Chances are your /etc/conf.d/xorg.conf is wrong if you were using the
latest X server with the latest binary blob (nvidia driver) chances
are it would detect the highest resolution and set it for you. The
program nvidia-xconfig should fix the file for you.

Oh and thirdly do you really think it just works on windows is a good
incentive to get people to help you? Yes it should work out the box
but unfortunately the world of Linux 3D drivers is mostly dominated
with company's that prefers keeping their drivers in a black box and
hopefully in the not too distant future the neuveau project might
remedy this.

Any way I hope this e-mail both helps with your problems and adds to
your understanding of how things work. The kernel mailing list is for
kernel issues (which include rivafb and nvidiafb but not nv and nvidia
3d issues) so if you ever plug in a hard drive and it's not working at
full speed or something along those lines that's when you should call.

Cheers

Mike

On 27/06/07, Marc Perkel <[EMAIL PROTECTED]> wrote:

Trying to get my Asus M2NPV-VM motherboard and my
Samsung SyncMaster 215tw Digital to work in 1680x1050
mode but 1280x1024 is the most I can get. Chip Set is
GeForce 6150.

Looking in Xorg.0.log it ssems to think that the panel
size is 1280x1024 in spite of my setting telling it
differently.

Sorry if this is off topic but I thought that the
smart people would be here. In Windows I just plug it
in and it works. So I figure Linux should work too. :)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ata: Add the SW NCQ support to sata_nv for MCP51/MCP55/MCP61

2007-06-26 Thread Andrew Morton
On Wed, 27 Jun 2007 11:04:44 +0800 "kuan luo" <[EMAIL PROTECTED]> wrote:

> Add the Software NCQ support to sata_nv.c for MCP51/MCP55/MCP61 SATA 
> controller.
> NCQ function is disable by default, you can enable it with 'swncq=1'
> 

This patch adds a large amount of new trailing whitespace.

> ---
> diff -Nurp a/sata_nv.c b/sata_nv.c
> --- a/sata_nv.c   2007-06-13 10:15:07.0 -0400
> +++ b/sata_nv.c   2007-06-26 12:52:27.0 -0400

Please prepare patches in `pathc -p1' form.

> +typedef struct {
> + u32 defer_bits;
> + u8  front;
> + u8  rear;
> + unsigned inttag[ATA_MAX_QUEUE + 1];
> +}defer_queue_t;

Avoid adding typedefs.

> +static int swncq_enabled = 0;

Don't initialise static storage to zero: it needlessly increases the
vmlinux size.

>  nv_hardreset, ata_std_postreset);
>  }
> 
> +static void nv_swncq_qc_to_dq(struct ata_port *ap, struct ata_queued_cmd *qc)
> +{
> + struct nv_swncq_port_priv *pp = ap->private_data;
> + defer_queue_t   *dq = >defer_queue;
> + 
> + /* queue is full */
> + WARN_ON((dq->rear + 1) % (ATA_MAX_QUEUE + 1) == dq->front);

This is peculiar.  The array is sized ATA_MAX_QUEUE+1 (ie: 33) and the code
uses ATA_MAX_QUEUE+1 everywhere.

It looks to me like the ata code was designed to queue up to 32 elements
and all this code has taken that to 33.  What exactly is going on here?  

> +
> + dq->defer_bits |= (1 << qc->tag);
> + 
> + dq->tag[dq->rear] = qc->tag;
> + dq->rear = (dq->rear + 1) % (ATA_MAX_QUEUE + 1);
> + 
> +}
> +
> +static struct ata_queued_cmd *nv_swncq_qc_from_dq(struct ata_port *ap)
> +{
> + struct nv_swncq_port_priv *pp = ap->private_data;
> + defer_queue_t   *dq = >defer_queue;
> + unsigned int tag;
> + 
> + if (dq->front == dq->rear) /* null queue */
> + return NULL;
> + 
> + tag = dq->tag[dq->front];
> + dq->tag[dq->front] = ATA_TAG_POISON;
> + dq->front = (dq->front + 1) % (ATA_MAX_QUEUE + 1);

etc.

> + WARN_ON(!(dq->defer_bits & (1 << tag)));
> + dq->defer_bits &= ~(1 << tag);
> +
> + return ata_qc_from_tag(ap, tag);
> +}
> +
> + dq->front = dq->rear = 0;
> + dq->defer_bits = 0;
> + pp->qc_active = 0;
> + pp->last_issue_tag = ATA_TAG_POISON;
> + nv_swncq_fis_reinit(ap);
> +}
> +
> +static void nv_swncq_irq_clear(struct ata_port *ap, u32 val)
> +{
> + void __iomem *mmio = ap->host->iomap[NV_MMIO_BAR];
> + u32  flags = (val << (ap->port_no * NV_INT_PORT_SHIFT_MCP55));

I hope we'll never need to support more than two ports...

> + writel(flags, mmio + NV_INT_STATUS_MCP55);
> +}
> +
> +static void nv_swncq_ncq_stop(struct ata_port *ap)
> +{
> + struct nv_swncq_port_priv *pp = ap->private_data;
> + unsigned int i; 
> + u32 sactive;
> + u32 done_mask;
> +
> + ata_port_printk(ap, KERN_ERR,
> + "EH in SWNCQ mode,QC:qc_active 0x%X sactive 0x%X\n",
> + ap->qc_active, ap->sactive);
> + ata_port_printk(ap, KERN_ERR,
> + "SWNCQ:qc_active 0x%X defer_bits 0x%X last_issue_tag 0x%x\n  "
> +  "dhfis 0x%X dmafis 0x%X sdbfis 0x%X\n",
> + pp->qc_active, pp->defer_queue.defer_bits, pp->last_issue_tag,
> + pp->dhfis_bits, pp->dmafis_bits,
> + pp->sdbfis_bits);
> + 
> + ata_port_printk(ap, KERN_ERR, "ATA_REG 0x%X ERR_REG 0x%X\n",
> + ap->ops->check_status(ap), ioread8(ap->ioaddr.error_addr));
> + 
> + sactive = readl(pp->sactive_block);
> + done_mask = pp->qc_active ^ sactive;
> + 
> + ata_port_printk(ap, KERN_ERR, "tag : dhfis dmafis sdbfis sacitve\n");
> + for (i=0; i < ATA_MAX_QUEUE; i++) {

Missing spaces around the "=".

We have a script in scripts/checkpatch.pl which will inform you about many
of these little things.  Please familiarise yourself with it.

> + u8 err = 0;
> + if (pp->qc_active & (1 << i))
> + err = 0;
> + else if (done_mask & (1 << i))
> + err = 1;
> + else
> + continue;
> + 
> + ata_port_printk(ap, KERN_ERR,
> + "tag 0x%x: %01x %01x %01x %01x %s\n", i,
> + (pp->dhfis_bits >> i) & 0x1,
> + (pp->dmafis_bits >> i) & 0x1 , (pp->sdbfis_bits >> i) & 0x1,
> + (sactive >> i) & 0x1,
> + (err ? "error!tag doesn't exit, but sactive bit is set" : " "));
> + }
> +
> + nv_swncq_pp_reinit(ap);
> + ap->ops->irq_clear(ap);
> + nv_swncq_bmdma_stop(ap);
> + nv_swncq_irq_clear(ap, 0x);
> +}
>
> ...
>
> +
> +static void nv_swncq_fill_sg(struct ata_queued_cmd *qc)
> +{
> + struct ata_port *ap = qc->ap;
> + struct scatterlist *sg;
> + unsigned int idx;
> + 
> + struct nv_swncq_port_priv *pp = ap->private_data;
> + 
> 

Re: [BUG] long freezes on thinkpad t60

2007-06-26 Thread Nick Piggin

Linus Torvalds wrote:


On Tue, 26 Jun 2007, Nick Piggin wrote:


Hmm, not that I have a strong opinion one way or the other, but I
don't know that they would encourage bad code. They are not going to
reduce latency under a locked section, but will improve determinism
in the contended case.



xadd really generally *is* slower than an add. One is often microcoded, 
the other is not.


Oh. I found xadd to be not hugely slower on my P4, but it was a little
bit.


But the real problem is that your "unlock" sequence is now about two 
orders of magnitude slower than it used to be. So it used to be that a 
spinlocked sequence only had a single synchronization point, now it has 
two. *That* is really bad, and I guarantee that it makes your spinlocks 
effectively twice as slow for the non-contended parts.


I don't know why my unlock sequence should be that much slower? Unlocked
mov vs unlocked add? Definitely in dumb micro-benchmark testing it wasn't
twice as slow (IIRC).


But your xadd thing might be worth looking at, just to see how expensive 
it is. As an _alternative_ to spinlocks, it's certainly viable.


(Side note: why make it a word? Word operations are slower on many x86 
implementations, because they add yet another prefix. You only need a 
byte)


No real reason I guess. I'll change it.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ata: Add the SW NCQ support to sata_nv for MCP51/MCP55/MCP61

2007-06-26 Thread Robert Hancock

kuan luo wrote:
Add the Software NCQ support to sata_nv.c for MCP51/MCP55/MCP61 SATA 
controller.

NCQ function is disable by default, you can enable it with 'swncq=1'

Signed-off-by: Kuan Luo <[EMAIL PROTECTED]>
Signed-off-by: Peer Chen <[EMAIL PROTECTED]>



Haven't reviewed in detail, but does look cleaner than the previous 
version. Some people reported seeing some unrecognized FIS, etc. errors 
with the previous version, have those been looked into/fixed?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] MAP_NOZERO - implement sys_brk2()

2007-06-26 Thread Ulrich Drepper

On 6/26/07, Rik van Riel <[EMAIL PROTECTED]> wrote:

After going through the first malloc()/free() cycle, surely
the memory will no longer be zeroed on the second malloc() ?


If returned to the system, sure.



What makes the first brk malloc so special?


If the memory is zeroed it needs not be initialized by malloc.  No
calloc zeroing, no pointer clearing.

Anyway, it's irrelevant what the benefits are, the fact is current
code depends on brk to zero the memory and you'd break the ABI if
you'd change it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6 Linux for PowerPC supports kdb?

2007-06-26 Thread Randy Dunlap
On Tue, 26 Jun 2007 23:03:55 -0400 Shan, Guo Wen (Gavin) wrote:

> Does anybody knew if 2.6 linux for PowerPC supports kdb?

PowerPC isn't listed AFAICT:
ftp://oss.sgi.com/www/projects/kdb/download/v4.4/README

I.e., all that I see are i386, x86_64, and ia64.

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] MAP_NOZERO - implement sys_brk2()

2007-06-26 Thread Ulrich Drepper

On 6/26/07, Davide Libenzi <[EMAIL PROTECTED]> wrote:

I acutally have the code for it, but I never posted it since it did not
receive a too warm review (and the only user was the fdmap thingy).


Only user of sys_indirect?  There will be quite a few right away.
Every syscall that returns a file descriptor needs O_CLOEXEC support
(socket, pipe, epoll_create, ...)



OTOH glibc could implement __morecore using mmap(MAP_NOZERO), and hence
brk2() would not be needed, no?


No.  mmap calls create individual VMAs which gets expensive.  There
are also some hardware drivers which get more expensive the more VMAs
there are.  I want to go away as much as possible from mmap for
malloc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] atkbd: cleanup only once

2007-06-26 Thread dave young

2007/6/27, Dmitry Torokhov <[EMAIL PROTECTED]>:

On Wednesday 27 June 2007 00:28, Greg KH wrote:
> On Wed, Jun 27, 2007 at 12:34:09AM -0400, Dmitry Torokhov wrote:
> > Hi Dave,
> >
> > On Wednesday 27 June 2007 06:59, Dave Young wrote:
> > > Hi,
> > >
> > > If you press ctrl+alt+del several times as kernel booting (before user level 
bootin), the kernel will oops. I found the ps2_command is called more than once, then the 
ps2dev->serio maybe NULL pointer.
> > >
> > > 2.6.22-rc5 and 2.6.22-rc6 have same result.
> > >
> > > Signed-off-by: Dave Young <[EMAIL PROTECTED]>
> > > ---
> > > diff -upr linux/drivers/input/keyboard/atkbd.c 
linux.new/drivers/input/keyboard/atkbd.c
> > > --- linux/drivers/input/keyboard/atkbd.c  2007-06-27 10:38:37.0 
+
> > > +++ linux.new/drivers/input/keyboard/atkbd.c  2007-06-27 
10:37:39.0 +
> > > @@ -795,6 +795,11 @@ static int atkbd_activate(struct atkbd *
> > >
> > >  static void atkbd_cleanup(struct serio *serio)
> > >  {
> > > + static int flag;
> > > +
> > > + if(flag)
> > > + return;
> > > + flag = 1;
> >
> > Unfortunately this will prevent atkbd from resetting keyboard on 2nd
> > suspend attempt. It will also not work if you have an active MUX and
> > have a couple of keyboards connected.
> >
> > Greg, now that you removed rwsem from subsystem (and subsystem itself
> > for that matter) there is nothing as far as I can see that stops
> > several threads from running device_shutdown() simultaneously. I also
> > do not see what would isolate device probing and shutting them down
> > at the same time. Am I missing something?
>
> There was never anything stopping that from happening before.  No driver
> core code was using that rwsem, so it wasn't protecting anything,
> despite people trying to use it as if it was :)
>

It did protect device_shutdown() from itself, didn't it?

--
Dmitry


how about check ps2dev->serio in ps2_command before use it?
Regards
dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: implement-file-posix-capabilities.patch

2007-06-26 Thread Andrew Morgan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Serge E. Hallyn wrote:
> 
>> I don't particularly mind, but can you point out any case where
>> it is an advantage to have the one bit for f'E rather than just
>> drop f'E altogether?  Instead of having
> 
>>  f'I=something
>>  f'P=something
>>  f'E=off
> 
>> we can always just remove the security.capability xattr.  Right?

No. Bear in mind that capabilities are all about trusting specific
applications with privilege as opposed to trusting the superuser to not
run dangerous applications.

There are three situations, we'll take them in turn:

  - no capabilities (fP=fI=fE=0): this is for applications that are not
intended to operate with privilege. Because of the way the capability
convolution rules work, such a program can't execute with privilege. Period.

  - with capabilities (fP and/or fI != 0), but fE=0 (off): this is for
applications that are intended to operate with privilege, but they were
designed to know about capabilities and they manipulate (raise and
lower) capabilities as needed to do the things they do.

  - with capabilities, but fE=1 (on): this is a class of applications
loosely called 'legacy'. They can use privilege to operate, but don't
strictly need to know about it. For example, /bin/chown . Such a program
will have fP=0,fI=CAP_CHOWN. Since the administrator sets
fP=0,fI=CAP_CHOWN,fE=1 on the /bin/chown file, any process with
CAP_CHOWN in its inheritable set (pI) can "exec /bin/chown" and have it
do the thing historically reserved for the superuser...

In some future world, the legacy fE bit may become unnecessary because
every application will be rewritten to be careful about exercising
privilege explicitly. In the meantime, the fE bit can be used to drop
the setuid-0 bits from things like ping and traceroute.

>> If there's a case where that does not suffice, then I have no objection
>> to doing it this way.

Does that explain it?

> 2) Allocate capability bit-31 for CAP_SETFCAP, and use it to gate
> whether the user can set this xattr on a file or not. CAP_SYS_ADMIN is
> way too overloaded and this functionality is special.
> 
>> The functionality is special, but someone with CAP_SYS_ADMIN can always
>> unload the capability module and create the security.capability xattr
>> using the dummy module.

This argument leads down a rat hole. (As appears to have happened with
the non-modularization LSM thread elsewhere...)

The simple fact is that CAP_SYS_ADMIN is equivalent to every other
capability in the system if it can be used to load any flavor of kernel
module. Arguing that you don't need a capability for something because
you can do it with CAP_SYS_ADMIN is very close to admitting that you
prefer the superuser (uid=0) model.

>> If we do add this cap, do we want to make it apply to all security.*
>> xattrs?

I recommend limiting it to just capabilities.

For now, you can leave the other security attributes with the
CAP_SYS_ADMIN ("misc") capability. In the original 'POSIX' drafts, there
is a separate notion of advisory and mandatory access control;
leveraging different capabilities to override.

> 3) The cap_from_disk() interface checking needs some work Most
> notably, size must be greater than sizeof(u32) or the very first line
> will do something nasty... I'd recommend you use code like this:
> 
> [...] cap_from_disk(...)
> {
>if (size != sizeof(struct vfs_cap_data)) {
[...]
> mistake at least once... The future is uncertain, so don't trust it will
> look the way you want it to. ;-)
> 
>> Ok, so you're saying that when we do switch to 64-bit caps or some other
>> evolution, we switch to completely separate logic based on the
>> VFS_CAP_REVISION?

Yes.

>> That seems sane to me.

You might also want to verify that unallocated bits hold the value
'zero'. Its funny what people do when they realize they can silently
store bits in obscure places like this. That really messes up allocating
bits in the future. Add a check that is something like:

  if (version & ~(VFS_CAP_REVISION_MASK|VFS_CAP_FLAGS_EFFECTIVE)) {
  return -EINVAL;
  }

> 7) This one is subtle, and to my mind not well appreciated. In
> cap_bprm_apply_creds(), the wart of the global 'cap_bset' masking
> permitted bits can lead to problems like the one we saw a few years back
> with sendmail and capabilities. There is an assumption in setting
> permitted (they are called 'forced' in some documents) capabilities on a
> file that the file will execute with at least these. The inheritable
> ones are optional.
> 
>> Hmm, changing the behavior of the cap_bset is something that seems to
>> belong in 8), though I see what you're saying, it does affect the
>> behavior of vfs caps.

I'm not really changing the behavior of cap_bset. I'm specifying the
behavior of fP. ;-)

[Your comments on cap_bset and CAP_SETPCAP are exactly where my
ax^H^H^Hscalpel will fall after all this VFS support is stable. But
*that* is a subject for a different thread... aka item 8.]


Re: [PATCH] atkbd: cleanup only once

2007-06-26 Thread Dmitry Torokhov
On Wednesday 27 June 2007 00:28, Greg KH wrote:
> On Wed, Jun 27, 2007 at 12:34:09AM -0400, Dmitry Torokhov wrote:
> > Hi Dave,
> > 
> > On Wednesday 27 June 2007 06:59, Dave Young wrote:
> > > Hi,
> > > 
> > > If you press ctrl+alt+del several times as kernel booting (before user 
> > > level bootin), the kernel will oops. I found the ps2_command is called 
> > > more than once, then the ps2dev->serio maybe NULL pointer.
> > > 
> > > 2.6.22-rc5 and 2.6.22-rc6 have same result.
> > > 
> > > Signed-off-by: Dave Young <[EMAIL PROTECTED]>
> > > ---
> > > diff -upr linux/drivers/input/keyboard/atkbd.c 
> > > linux.new/drivers/input/keyboard/atkbd.c
> > > --- linux/drivers/input/keyboard/atkbd.c  2007-06-27 10:38:37.0 
> > > +
> > > +++ linux.new/drivers/input/keyboard/atkbd.c  2007-06-27 
> > > 10:37:39.0 +
> > > @@ -795,6 +795,11 @@ static int atkbd_activate(struct atkbd *
> > >  
> > >  static void atkbd_cleanup(struct serio *serio)
> > >  {
> > > + static int flag;
> > > +
> > > + if(flag)
> > > + return;
> > > + flag = 1;
> > 
> > Unfortunately this will prevent atkbd from resetting keyboard on 2nd
> > suspend attempt. It will also not work if you have an active MUX and
> > have a couple of keyboards connected.
> > 
> > Greg, now that you removed rwsem from subsystem (and subsystem itself
> > for that matter) there is nothing as far as I can see that stops
> > several threads from running device_shutdown() simultaneously. I also
> > do not see what would isolate device probing and shutting them down
> > at the same time. Am I missing something?
> 
> There was never anything stopping that from happening before.  No driver
> core code was using that rwsem, so it wasn't protecting anything,
> despite people trying to use it as if it was :)
> 

It did protect device_shutdown() from itself, didn't it?

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] atkbd: cleanup only once

2007-06-26 Thread Greg KH
On Wed, Jun 27, 2007 at 12:34:09AM -0400, Dmitry Torokhov wrote:
> Hi Dave,
> 
> On Wednesday 27 June 2007 06:59, Dave Young wrote:
> > Hi,
> > 
> > If you press ctrl+alt+del several times as kernel booting (before user 
> > level bootin), the kernel will oops. I found the ps2_command is called more 
> > than once, then the ps2dev->serio maybe NULL pointer.
> > 
> > 2.6.22-rc5 and 2.6.22-rc6 have same result.
> > 
> > Signed-off-by: Dave Young <[EMAIL PROTECTED]>
> > ---
> > diff -upr linux/drivers/input/keyboard/atkbd.c 
> > linux.new/drivers/input/keyboard/atkbd.c
> > --- linux/drivers/input/keyboard/atkbd.c2007-06-27 10:38:37.0 
> > +
> > +++ linux.new/drivers/input/keyboard/atkbd.c2007-06-27 
> > 10:37:39.0 +
> > @@ -795,6 +795,11 @@ static int atkbd_activate(struct atkbd *
> >  
> >  static void atkbd_cleanup(struct serio *serio)
> >  {
> > +   static int flag;
> > +
> > +   if(flag)
> > +   return;
> > +   flag = 1;
> 
> Unfortunately this will prevent atkbd from resetting keyboard on 2nd
> suspend attempt. It will also not work if you have an active MUX and
> have a couple of keyboards connected.
> 
> Greg, now that you removed rwsem from subsystem (and subsystem itself
> for that matter) there is nothing as far as I can see that stops
> several threads from running device_shutdown() simultaneously. I also
> do not see what would isolate device probing and shutting them down
> at the same time. Am I missing something?

There was never anything stopping that from happening before.  No driver
core code was using that rwsem, so it wasn't protecting anything,
despite people trying to use it as if it was :)

That's why I removed it.

So, if you need to have a lock for your subsystem to serialize this,
please do so, I have no objection to it.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: scheduling while atomic and DEBUG_SPINLOCK_SLEEP

2007-06-26 Thread Arjan van de Ven
On Tue, 2007-06-26 at 22:55 -0400, Jon Ringle wrote:
> Hello,
> Out of these two, the first one that is showing "in_atomic():1" seems
> more likely to me to be a potential cause of the "scheduling while
> atomic" dump.
> 
> Does this logic seem reasonable? Are there other debugging techniques I
> can use to narrow down the cause for the "scheduling while atomic"?


you could start by giving us pointers to the sources of the two
drivers... without that... how can we look and help?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] atkbd: cleanup only once

2007-06-26 Thread Dmitry Torokhov
Hi Dave,

On Wednesday 27 June 2007 06:59, Dave Young wrote:
> Hi,
> 
> If you press ctrl+alt+del several times as kernel booting (before user level 
> bootin), the kernel will oops. I found the ps2_command is called more than 
> once, then the ps2dev->serio maybe NULL pointer.
> 
> 2.6.22-rc5 and 2.6.22-rc6 have same result.
> 
> Signed-off-by: Dave Young <[EMAIL PROTECTED]>
> ---
> diff -upr linux/drivers/input/keyboard/atkbd.c 
> linux.new/drivers/input/keyboard/atkbd.c
> --- linux/drivers/input/keyboard/atkbd.c  2007-06-27 10:38:37.0 
> +
> +++ linux.new/drivers/input/keyboard/atkbd.c  2007-06-27 10:37:39.0 
> +
> @@ -795,6 +795,11 @@ static int atkbd_activate(struct atkbd *
>  
>  static void atkbd_cleanup(struct serio *serio)
>  {
> + static int flag;
> +
> + if(flag)
> + return;
> + flag = 1;

Unfortunately this will prevent atkbd from resetting keyboard on 2nd
suspend attempt. It will also not work if you have an active MUX and
have a couple of keyboards connected.

Greg, now that you removed rwsem from subsystem (and subsystem itself
for that matter) there is nothing as far as I can see that stops
several threads from running device_shutdown() simultaneously. I also
do not see what would isolate device probing and shutting them down
at the same time. Am I missing something?

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] MAP_NOZERO - implement a new VM_NOZERO/MAP_NOZERO page retirement policy

2007-06-26 Thread Davide Libenzi
On Wed, 27 Jun 2007, Rik van Riel wrote:

> Davide Libenzi wrote:
> > On Tue, 26 Jun 2007, Rik van Riel wrote:
> > 
> > > SUID programs should not be able to use this feature,
> > > either.
> > 
> > Why? A SUID programs runs under the UID of the owner, and should be no
> > problems in it seeing the owners data.
> 
> Because an SUID program can change its UID back.
> 
> At least, one that was SUID root.  OTOH, any
> program running as root can change UID, so we
> should probably not allow root to get nonzeroed
> pages.

Well, root can in general access the whole system in any case. At the 
moment, root cannot access othe UIDs pages. Only their own. And this 
differs from standard security policies where root can access everything.
Pages used internally by the kernel, cannot be reused by anyone.



> > I tried to look, and the attempt to reuse _mapcount failed miserably :)
> > The last time we have the owner info (vma->mm) available, is before
> > processing of the other fields ends. OTOH I'm not VM guru either, so I may
> > be wrong. It can share ->virtual (when enabled).
> 
> I think the process that actually calls the page freeing
> functions is always the process that owned the page, so
> going for current->mm should work.

I'll try to see if that works out...



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH try #2] security: Convert LSM into a static interface

2007-06-26 Thread Chris Wright
* Crispin Cowan ([EMAIL PROTECTED]) wrote:
> and simple LSMs that can be
> unloaded safely can permit it.

there are none, and making the above possible is prohibitively
expensive.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] MAP_NOZERO - implement a new VM_NOZERO/MAP_NOZERO page retirement policy

2007-06-26 Thread Rik van Riel

Davide Libenzi wrote:

On Tue, 26 Jun 2007, Rik van Riel wrote:


SUID programs should not be able to use this feature,
either.


Why? A SUID programs runs under the UID of the owner, and should be no 
problems in it seeing the owners data.


Because an SUID program can change its UID back.

At least, one that was SUID root.  OTOH, any
program running as root can change UID, so we
should probably not allow root to get nonzeroed
pages.

But the patch post was more a quest for possible scenarios where the use 
of MAP_NOZERO can result in lower security WRT the same program (under the 
same security restrictions) not using such feature.

If you have something specific in mind, please go ahead and shoot.


Besides the non-enforcing of SELinux security
labels (and maybe namespaces?), I cannot think
of anything.


When pages exit (unmapped from) a  vma, they are marked with the effective
UID of the  mm_struct  that owns it.



--- linux-2.6.mod.orig/include/linux/mm_types.h 2007-06-21
14:02:06.0 -0700
+++ linux-2.6.mod/include/linux/mm_types.h  2007-06-25 19:11:22.0
-0700
@@ -64,6 +64,7 @@
struct list_head lru;   /* Pageout list, eg. active_list
 * protected by zone->lru_lock !
 */
+   int owner_uid;  /* Last owner of the page */
/*
 * On machines where all RAM is mapped into kernel address space,
 * we can simply calculate the virtual address. On machines with

Since this is only set when the page is freed, could
the owner_uid and security context be put inside a
union with some fields that are not otherwise used
for free pages?


I tried to look, and the attempt to reuse _mapcount failed miserably :)
The last time we have the owner info (vma->mm) available, is before 
processing of the other fields ends. OTOH I'm not VM guru either, so I may 
be wrong. It can share ->virtual (when enabled).


I think the process that actually calls the page freeing
functions is always the process that owned the page, so
going for current->mm should work.

Getting the UID wrong for file pages caught in a truncate
is fine, since the process obviously already had access
to the data in that page.

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] MAP_NOZERO - implement sys_brk2()

2007-06-26 Thread Rik van Riel

Ulrich Drepper wrote:

On 6/26/07, Rik van Riel <[EMAIL PROTECTED]> wrote:

Since programs can get back free()d memory after a malloc(),
with the old contents of the memory intact, surely your
MAP_NONZERO behavior could be the default for programs that
can get away with it?

Maybe we could use some magic ELF header, similar to the
way non-executable stack is handled?


No.  This is an implementation detail of the libc version.  The malloc
as compiled today is expecting brk-ed memory to be zeroed.  This
default can of course be changed (it's a simple define) but you cannot
make this the default behavior for brk.


After going through the first malloc()/free() cycle, surely
the memory will no longer be zeroed on the second malloc() ?

What makes the first brk malloc so special?

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Is it time for remove (crap) ALSA from kernel tree ?

2007-06-26 Thread Rene Herman

On 06/26/2007 10:39 PM, Andreas Hartmetz wrote:


Okay, here's a rant.

As an interested kernel outsider and KDE developer(*), it looks to me
like most kernel people are too focused on the history and feature lists
of the particular technologies here.

The real matter with ALSA is that you get a strong "ALSA hates me"
feeling when dealing with it. There is bad documentation, bad API, and a
config file syntax that is much harder to understand than necessary.


I'll agree to the documentation bit; the funny thing is that it's partly 
caused by documentation actually being, or once having been, _better_ than 
it is for the average subsystem. ALSA for example has the useful "Writing an 
ALSA Driver" document from Takashi Iwai:


http://www.alsa-project.org/~iwai/writing-an-alsa-driver/index.html

Documentation becomes obsolete as code progresses though and yes, especially 
on the userside of things the documentation is slow to follow. And then the 
usual problem of noone ever removing obsolete junk from the web exacerbates 
matters. Google will find you tons of useless, outdated crap but if you need 
the information in the first place, you don't know that it _is_ obsolete.


And yes, this unfortunately includes www.alsa-project.org. For the longest 
time it was advocating writing ~/.asoundrc files for example through generic 
driver boilerplate texts while that was actually at that point mostly 
counter productive in getting ALSA functional.


As to the config file -- well, sure. The best thing about is that normally 
you don't need it...


The "bad API" I find interesting since you are a KDE developer. I'm not an 
audio application developer myself so I don't have (m)any well thought out 
opinions on it, but isn't the only thing in KDE4 that talks to ALSA the 
Phonon ALSA backend? If you are talking in that context, I'm quite sure the 
alsa-user and/or alsa-devel lists (@alsa-project.org) would like to hear 
about any specific comments/problems. Getting the Phonon backend right from 
the start is something that seems important.



Then there is the kernel/library split that seems to have no convincing
reason at all in its current form. Why not put the whole sound system in
userland? It has been done before. Sound is just not performance critical
at all and it's almost never mission critical


Heh. Sound may not be, but audio is. For the longest time, audio users stuck 
with 2.4 kernels and the low-latency patches that were availabe for it due 
to latency issues. Large parts of ALSA already are in userland in the form 
of libasound and I expect moving over everything would not so much help.


[ ... ]


The track record of ALSA for me goes like this:

- dmix finally started working automatically (at least on my Kubuntu
system) about one year ago, about five years after everybody could see
that this was badly needed. I couldn't get it to work before. The howtos
somehow didn't work and ALSA's documentation isn't all that helpful.


dmix was really only implemented (or at least, made default) for casual 
users. Hope it'll not come across as elitist but people who are serious 
about music or audio don't actually need or want it. It's a consumer thing. 
To have software mixing work you have to resample to a common rate and this 
an absolute unthinkable horror to a serious user. It's a good thing it's now 
default, but only because a majority of sound users is not serious (simply 
because it's mostly all computer users).



- Different desktop environments have different sound daemons to paper
over the weaknesses of ALSA (no dmix by default / unfriendly API), which
creates new problems. Yes there are other reasons for sound daemons, but
I doubt anybody would have come up with the idea if it wasn't for ALSA.


Given that they existed before ALSA did this seems to be a somewhat odd doubt.

- I have an Envy24HT based soundcard in my desktop PC, which also goes to show 
that I'm really interested in sound issues.


Nice chip. I don't have one, and am not too sure about its native supported 
rates but if you are mostly playing 44100 through it (ie, CD source audio) 
I'd consider doing without dmix. A nice sounding chip like that shouldn't be 
subjected to resampling really. Someone recently informed me on the ALSA 
list that Envy24 indeed doesn't do hardware mixing though, so I guess you 
may need it if you really do want the also have the card available for 
desktop sounds.



I have to run alsamixer after every bootup to unmute the left channel
because restoring volume only works for the right channel. The left
channel starts working after changing its volume.


Sounds like a rather debugable problem. I'm (almost) sure someone will try 
to get you a useful answer if you post to the [EMAIL PROTECTED] 
list :)



- On my IBM/Lenovo R50e notebook with Intel chipset sound didn't work
before I "muted" the "headphone jack sense" control in alsamixer. That
took two hours or so. When both the master volume and the PCM volume

Re: [patch 2/3] MAP_NOZERO - implement sys_brk2()

2007-06-26 Thread Davide Libenzi
On Tue, 26 Jun 2007, Ulrich Drepper wrote:

> On 6/26/07, Davide Libenzi <[EMAIL PROTECTED]> wrote:
> > The following patch implements the sys_brk2() syscall, that nothing is
> > other than a sys_brk() with an extra "flags" parameter.
> 
> Shouldn't we wait for Linus' sys_indirect to arrive and make this
> another syscall which takes advantage of it?

I acutally have the code for it, but I never posted it since it did not 
receive a too warm review (and the only user was the fdmap thingy).
OTOH glibc could implement __morecore using mmap(MAP_NOZERO), and hence 
brk2() would not be needed, no?



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] MAP_NOZERO - implement sys_brk2()

2007-06-26 Thread Ulrich Drepper

On 6/26/07, Davide Libenzi <[EMAIL PROTECTED]> wrote:

The following patch implements the sys_brk2() syscall, that nothing is
other than a sys_brk() with an extra "flags" parameter.


Shouldn't we wait for Linus' sys_indirect to arrive and make this
another syscall which takes advantage of it?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] MAP_NOZERO - implement sys_brk2()

2007-06-26 Thread Ulrich Drepper

On 6/26/07, Rik van Riel <[EMAIL PROTECTED]> wrote:

Since programs can get back free()d memory after a malloc(),
with the old contents of the memory intact, surely your
MAP_NONZERO behavior could be the default for programs that
can get away with it?

Maybe we could use some magic ELF header, similar to the
way non-executable stack is handled?


No.  This is an implementation detail of the libc version.  The malloc
as compiled today is expecting brk-ed memory to be zeroed.  This
default can of course be changed (it's a simple define) but you cannot
make this the default behavior for brk.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


NVidia Driver Support - 1680x1050 mode

2007-06-26 Thread Marc Perkel
Trying to get my Asus M2NPV-VM motherboard and my
Samsung SyncMaster 215tw Digital to work in 1680x1050
mode but 1280x1024 is the most I can get. Chip Set is
GeForce 6150.

Looking in Xorg.0.log it ssems to think that the panel
size is 1280x1024 in spite of my setting telling it
differently.

Sorry if this is off topic but I thought that the
smart people would be here. In Windows I just plug it
in and it works. So I figure Linux should work too. :)



   

Moody friends. Drama queens. Your life? Nope! - their life, your story. Play 
Sims Stories at Yahoo! Games.
http://sims.yahoo.com/  
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Dual-Licensing Linux Kernel with GPL V2 and GPL V3

2007-06-26 Thread Al Boldi
Alexandre Oliva wrote:
> On Jun 26, 2007, Al Boldi <[EMAIL PROTECTED]> wrote:
> > I read your scenario of the vendor not giving you the source to mean:
> > not directly; i.e.  they could give you a third-party download link.
>
> This has never been enough to comply with GPLv2.

Section 3a of the GPLv2 mentions "a medium customarily used for software 
interchange".  I would think the Internet is a medium customarily used for 
software interchange, is it not?


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] MAP_NOZERO - implement sys_brk2()

2007-06-26 Thread Davide Libenzi
On Tue, 26 Jun 2007, Rik van Riel wrote:

> Davide Libenzi wrote:
> > The following patch implements the sys_brk2() syscall, that nothing is
> > other than a sys_brk() with an extra "flags" parameter. This can be used
> > to pass the new MAP_NOZERO bit, to ask the kernel to hand over non-zero
> > pages if possible.
> 
> Since programs can get back free()d memory after a malloc(),
> with the old contents of the memory intact, surely your
> MAP_NONZERO behavior could be the default for programs that
> can get away with it?
> 
> Maybe we could use some magic ELF header, similar to the
> way non-executable stack is handled?

Well, the quick glibc patch simply uses an environment variable, just 
because I wanted to bench the kernel build with using the same glibc+gcc.
Yes, it can be the default behaviour for the allocator. The patch handles 
calloc() correctly, by forcibly zeroing memory in such calls.
But other software must be taught too, to use MAP_NOZERO when they do not 
need zeroed memory. I did that for the gcc garbage collector.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

2007-06-26 Thread Alan Stern
On Tue, 26 Jun 2007, Roland McGrath wrote:

> > Here's the next iteration.  The arch-specific parts are now completely 
> > encapsulated.  validate_settings is in a form which should be workable 
> > on all architectures.  And the address, length, and type are passed as 
> > arguments to register_{kernel,user}_hw_breakpoint().
> 
> I like it!

Good.  My earlier stubbornness was caused by a desire to allow static
initializers, but now I see that specifying the values in the
registration call really isn't all that bad.

> > I haven't tried to modify Kconfig at all.  To do it properly would
> > require making ptrace configurable, which is not something I want to
> > tackle at the moment.
> 
> You don't need to worry about that.  Under utrace, CONFIG_PTRACE is
> already separate and can be turned off.  I don't think we need really to
> finish the Kconfig stuff at all before I merge it into the utrace code.

So far this work has all been based on the vanilla kernel.  Should I 
switch over to basing it on -mm?


> Calling send_sigtrap twice during the same exception does happen to be
> harmless, but I don't think it should be presumed to be.  It is just not
> the right way to go about things that you send a signal twice when there
> is one signal you want to generate.

What happens when there are two ptrace exceptions at different points
during the same system call?  Won't we end up sending the signal twice
no matter what?

> Also, send_sigtrap is an i386-only function (not even x86_64 has the
> same).  Only x86_64 will share this actual code, but all others will be
> modelled on it.  I think it makes things simplest across the board if
> the standard form is that when there is a ptrace exception, the notifier
> does not return NOTIFY_STOP, so it falls through to the existing SIGTRAP
> arch code.
> 
> So, hmm.  In the old do_debug code, if a notifier returns NOTIFY_STOP,
> it bails immediately, before the db6 value is saved in current->thread.
> This is the normal theory of notify_die use, where NOTIFY_STOP means to
> completely swallow the event as if it never happened.  In the event
> there were some third party notifier involved, it ought to be able to
> swallow its magic exceptions as before and have no user-visible db6
> change happen at the time of that exception.  So how about this:
> 
>   get_debugreg(condition, 6);
>   set_debugreg(0UL, 6);   /* The CPU does not clear it.  */
> 
>   if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
>   SIGTRAP) == NOTIFY_STOP)
>   return;
> 
> The kprobes notifier uses max priority, so it will run first.  Its
> notifier code uses my version.  For a single-step that belongs to it,
> it will return NOTIFY_STOP and nothing else happens (noone touches
> vdr6).  (I think I'm dredging up old territory by asking what happens
> when kprobes steps over an insn that hits a data breakpoint, but I
> don't recall atm.)

In theory we should get an exception with both DR_STEP and DR_TRAPn 
set, meaning that neither notifier will return NOTIFY_STOP.  But if the 
kprobes handler clears DR_STEP in the DR6 image passed to the 
hw_breakpoint handler, it should work out better.

> vdr6 belongs wholly to hw_breakpoint, no other code refers to it
> directly.  hw_breakpoint's notifier sets vdr6 with non-DR_TRAPn bits,
> if it's a user-mode exception.  If it's a ptrace exception it also
> sets the mapped DR_TRAPn bits.  If it's not a ptrace exception and
> only DR_TRAPn bits were newly set, then it returns NOTIFY_STOP.  If
> it's a spurious exception from lazy db7 setting, hw_breakpoint just
> returns NOTIFY_STOP early.

That sounds not quite right.  To a user-space debugger, a system call
should appear as an atomic operation.  If multiple ptrace exceptions
occur during a system call, all the relevant DR_TRAPn bits should be
set in vdr6 together and all the other ones reset.  How can we arrange
that?

There's also the question of whether to send the SIGTRAP.  If
extraneous bits are set in DR6 (e.g., because the CPU always sets some
extra bits) then we will never get NOTIFY_STOP.  Nevertheless, the
signal should not always be sent.

> > @@ -484,7 +495,8 @@ int copy_thread(int nr, unsigned long cl
> >  
> > err = 0;
> >   out:
> > -   if (err && p->thread.io_bitmap_ptr) {
> > +   if (err) {
> > +   flush_thread_hw_breakpoint(p);
> > kfree(p->thread.io_bitmap_ptr);
> > p->thread.io_bitmap_max = 0;
> > }
> 
> This can call kfree(NULL).  I would leave the original code alone, i.e.:
> 
>   if (err)
>   flush_thread_hw_breakpoint(p);
>   if (err && p->thread.io_bitmap_ptr) {
>   kfree(p->thread.io_bitmap_ptr);
>   p->thread.io_bitmap_max = 0;
>   }

I disagree.  kfree() is documented to return harmlessly when passed a
NULL pointer, and lots of places in the kernel have been changed to
remove useless tests for NULL before calls to 

Re: [patch 1/3] MAP_NOZERO - implement a new VM_NOZERO/MAP_NOZERO page retirement policy

2007-06-26 Thread Davide Libenzi
On Tue, 26 Jun 2007, Rik van Riel wrote:

> SUID programs should not be able to use this feature,
> either.

Why? A SUID programs runs under the UID of the owner, and should be no 
problems in it seeing the owners data.
But the patch post was more a quest for possible scenarios where the use 
of MAP_NOZERO can result in lower security WRT the same program (under the 
same security restrictions) not using such feature.
If you have something specific in mind, please go ahead and shoot.



> > When pages exit (unmapped from) a  vma, they are marked with the effective
> > UID of the  mm_struct  that owns it.
> 
> 
> > --- linux-2.6.mod.orig/include/linux/mm_types.h 2007-06-21
> > 14:02:06.0 -0700
> > +++ linux-2.6.mod/include/linux/mm_types.h  2007-06-25 19:11:22.0
> > -0700
> > @@ -64,6 +64,7 @@
> > struct list_head lru;   /* Pageout list, eg. active_list
> >  * protected by zone->lru_lock !
> >  */
> > +   int owner_uid;  /* Last owner of the page */
> > /*
> >  * On machines where all RAM is mapped into kernel address space,
> >  * we can simply calculate the virtual address. On machines with
> 
> Since this is only set when the page is freed, could
> the owner_uid and security context be put inside a
> union with some fields that are not otherwise used
> for free pages?

I tried to look, and the attempt to reuse _mapcount failed miserably :)
The last time we have the owner info (vma->mm) available, is before 
processing of the other fields ends. OTOH I'm not VM guru either, so I may 
be wrong. It can share ->virtual (when enabled).




- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6 Linux for PowerPC supports kdb?

2007-06-26 Thread Shan, Guo Wen (Gavin)
Does anybody knew if 2.6 linux for PowerPC supports kdb?

Best Regards,
Gavin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 1/7] cpuset write dirty map

2007-06-26 Thread Christoph Lameter
On Tue, 26 Jun 2007, Andrew Morton wrote:

> Is in my queue somewhere.  Could be that by the time I get to it it will
> need refreshing (again), we'll see.
> 
> One open question is the interaction between these changes and with Peter's
> per-device-dirty-throttling changes.  They also are in my queue somewhere. 
> Having a 100:1 coder:reviewer ratio doesn't exactly make for swift
> progress.

H.. How can we help? I can look at some aspects of Peter's per device 
throttling.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] hw_random: add quality categories

2007-06-26 Thread Matt Mackall
On Tue, Jun 26, 2007 at 04:45:24PM +0200, Michael Buesch wrote:
> On Tuesday 26 June 2007 16:32:37 Matt Mackall wrote:
> > > No wait. You are missing the whole point of this
> > > quality category.
> > > The whole point of it is to prevent defaulting to a bad RNG, if
> > > there's a bad and a good one in a machine.
> > > Well, what's bad.
> > > It's easy. HWRNGs like the one in bcm43xx are bad.
> > > It's proprietary and nobody knows what it does (I guess
> > > it gathers the entropy from the network or something
> > > and hashes that in hardware).
> > > So such a device would be QUAL_LOW.
> > 
> > If it's gathering its entropy from the network, it is not a QUAL_LOW
> > RNG because it is not a hardware random number generator at all!
> > 
> > Such a device is QUAL_PSEUDO or QUAL_UNKNOWN. If it's known or
> > suspected to be bogus, it should be so marked. 
> 
> No, it should not be marked pseudo. It _is_ a RNG in hardware.

Again, if it's not using an underlying physical process that's
unpredictable, it does not deserve to be called a real HWRNG. It's no
better than the software PRNG in the kernel at that point.

If you have a reasonable suspicion that this is the case with the BCM
part, then you should so mark it.

> Where it gets its entropy from is unknown. (I'm just guessing
> around).
> PSEUDO is for example for entropy gathered from hardware sensors.

Not sure what this means. Some hardware sensors are quite good sources
of noise. What gets you into trouble is when the sources are either
predictable (ie heavily correlated with fixed-frequency crosstalk),
observable (ie wireless traffic), or controllable (ie wireless traffic).

> > Once you've merged your LOW class with PSEUDO, you're left with a
> > meaningless, unquantifiable distinction between NORMAL and HIGH.
> 
> No, that's not true. I explained the difference to you and it's even
> explained in the kdoc help text. Re-read it, please.
> HIGH is for seperate dedicated extension devices that you buy and
> stick into your machine. So it would default to that, as you want
> to use that by default (why would you otherwise stick it in).

I do not believe there exist devices that deserve to be classified as
"HIGH". Any device that makes this claim probably instead deserves to
be classified as "SNAKE OIL". Making a high-quality HWRNG is easy, and
cheap (>$.05), and very hard to improve on except by upping the
bandwidth. Anyone who tells you that their HWRNG is significantly or
even measurably better than the one in, say, VIA Padlock, in any
dimension except for speed, they are almost certainly LYING.

Given that, I'd really rather not create an opportunity for such snake
oil salesmen to claim to be "the only Linux-supported RNG to use
QUAL_HIGH" or some such bullshit.

> To say it again: It all is _just_ for defining a sane _default_
> policy. That's all.
> Currently the policy is: "Select whatever comes first", which is
> random. So it could select crap (bcm43xx) over not-so-crap (in-CPU-RNG).

That's perfectly reasonable. And all I'm saying is please have only
two levels: CRAP and NOTCRAP. Anything else just muddies the waters.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] MAP_NOZERO - implement sys_brk2()

2007-06-26 Thread Rik van Riel

Davide Libenzi wrote:

The following patch implements the sys_brk2() syscall, that nothing is
other than a sys_brk() with an extra "flags" parameter. This can be used
to pass the new MAP_NOZERO bit, to ask the kernel to hand over non-zero
pages if possible.


Since programs can get back free()d memory after a malloc(),
with the old contents of the memory intact, surely your
MAP_NONZERO behavior could be the default for programs that
can get away with it?

Maybe we could use some magic ELF header, similar to the
way non-executable stack is handled?

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ata: Add the SW NCQ support to sata_nv for MCP51/MCP55/MCP61

2007-06-26 Thread kuan luo

Add the Software NCQ support to sata_nv.c for MCP51/MCP55/MCP61 SATA controller.
NCQ function is disable by default, you can enable it with 'swncq=1'

Signed-off-by: Kuan Luo <[EMAIL PROTECTED]>
Signed-off-by: Peer Chen <[EMAIL PROTECTED]>

---
diff -Nurp a/sata_nv.c b/sata_nv.c
--- a/sata_nv.c 2007-06-13 10:15:07.0 -0400
+++ b/sata_nv.c 2007-06-26 12:52:27.0 -0400
@@ -169,6 +169,35 @@ enum {
NV_ADMA_PORT_REGISTER_MODE  = (1 << 0),
NV_ADMA_ATAPI_SETUP_COMPLETE= (1 << 1),

+   /* MCP55 reg offset */
+   NV_CTL_MCP55= 0x400,
+   NV_INT_STATUS_MCP55 = 0x440,
+   NV_INT_ENABLE_MCP55 = 0x444,
+   NV_NCQ_REG_MCP55= 0x448,
+   
+   /* MCP55 */
+   NV_INT_ALL_MCP55= 0x,
+   NV_INT_PORT_SHIFT_MCP55 = 16,   /* each port occupies 16 bits */
+   NV_INT_MASK_MCP55   = NV_INT_ALL_MCP55 & 0xfffd,
+   
+   /* SWNCQ ENABLE BITS*/
+   NV_CTL_PRI_SWNCQ= 0x02,
+   NV_CTL_SEC_SWNCQ= 0x04,
+   
+   /* SW NCQ status bits*/
+   NV_SWNCQ_IRQ_DEV= (1 << 0),
+   NV_SWNCQ_IRQ_PM = (1 << 1),
+   NV_SWNCQ_IRQ_ADDED  = (1 << 2),
+   NV_SWNCQ_IRQ_REMOVED= (1 << 3),
+   
+   NV_SWNCQ_IRQ_BACKOUT= (1 << 4),
+   NV_SWNCQ_IRQ_SDBFIS = (1 << 5),
+   NV_SWNCQ_IRQ_DHREGFIS   = (1 << 6),
+   NV_SWNCQ_IRQ_DMASETUP   = (1 << 7),
+   
+   NV_SWNCQ_IRQ_HOTPLUG= NV_SWNCQ_IRQ_ADDED |
+ NV_SWNCQ_IRQ_REMOVED,
+
};

/* ADMA Physical Region Descriptor - one SG segment */
@@ -226,6 +255,35 @@ struct nv_host_priv {
unsigned long   type;
};

+typedef struct {
+   u32 defer_bits;
+   u8  front;
+   u8  rear;
+   unsigned inttag[ATA_MAX_QUEUE + 1];
+}defer_queue_t;
+
+struct nv_swncq_port_priv {
+   struct ata_prd  *prd;/* our SG list */
+   dma_addr_t  prd_dma; /* and its DMA mapping */
+   void __iomem*sactive_block;
+   u32 qc_active;
+   unsigned intlast_issue_tag;
+   spinlock_t  lock;
+   /* fifo loop queue  to store deferral command */
+   defer_queue_t   defer_queue;
+
+   /* for NCQ interrupt analysis */
+   u32 dhfis_bits;
+   u32 dmafis_bits;
+   u32 sdbfis_bits;
+   
+   unsigned intncq_saw_d2h:1;
+   unsigned intncq_saw_dmas:1;
+   unsigned intncq_saw_sdb:1;
+   unsigned intncq_saw_backout:1;
+};
+
+
#define NV_ADMA_CHECK_INTR(GCTL, PORT) ((GCTL) & ( 1 << (19 + (12 * (PORT)

static int nv_init_one (struct pci_dev *pdev, const struct pci_device_id *ent);
@@ -263,13 +321,28 @@ static void nv_adma_host_stop(struct ata
static void nv_adma_post_internal_cmd(struct ata_queued_cmd *qc);
static void nv_adma_tf_read(struct ata_port *ap, struct ata_taskfile *tf);

+static void nv_mcp55_thaw(struct ata_port *ap);
+static void nv_mcp55_freeze(struct ata_port *ap);
+static void nv_swncq_error_handler(struct ata_port *ap);
+static int  nv_swncq_port_start(struct ata_port *ap);
+static void nv_swncq_qc_prep(struct ata_queued_cmd *qc);
+static void nv_swncq_fill_sg(struct ata_queued_cmd *qc);
+static unsigned int nv_swncq_qc_issue(struct ata_queued_cmd *qc);
+static void nv_swncq_irq_clear(struct ata_port *ap, u32 val);
+static irqreturn_t nv_swncq_interrupt(int irq, void *dev_instance);
+#ifdef CONFIG_PM
+static int nv_swncq_port_suspend(struct ata_port *ap, pm_message_t mesg);
+static int nv_swncq_port_resume(struct ata_port *ap);
+#endif
+
enum nv_host_type
{
GENERIC,
NFORCE2,
NFORCE3 = NFORCE2,  /* NF2 == NF3 as far as sata_nv is concerned */
CK804,
-   ADMA
+   ADMA,
+   SWNCQ
};

static const struct pci_device_id nv_pci_tbl[] = {
@@ -280,13 +353,13 @@ static const struct pci_device_id nv_pci
{ PCI_VDEVICE(NVIDIA, PCI_DEVICE_ID_NVIDIA_NFORCE_CK804_SATA2), CK804 },
{ PCI_VDEVICE(NVIDIA, PCI_DEVICE_ID_NVIDIA_NFORCE_MCP04_SATA), CK804 },
{ PCI_VDEVICE(NVIDIA, PCI_DEVICE_ID_NVIDIA_NFORCE_MCP04_SATA2), CK804 },
-   { PCI_VDEVICE(NVIDIA, PCI_DEVICE_ID_NVIDIA_NFORCE_MCP51_SATA), GENERIC 
},
-   { PCI_VDEVICE(NVIDIA, PCI_DEVICE_ID_NVIDIA_NFORCE_MCP51_SATA2), GENERIC 
},
-   { PCI_VDEVICE(NVIDIA, PCI_DEVICE_ID_NVIDIA_NFORCE_MCP55_SATA), GENERIC 
},
-   { PCI_VDEVICE(NVIDIA, PCI_DEVICE_ID_NVIDIA_NFORCE_MCP55_SATA2), GENERIC 
},
-   { PCI_VDEVICE(NVIDIA, PCI_DEVICE_ID_NVIDIA_NFORCE_MCP61_SATA), GENERIC 
},
-   { PCI_VDEVICE(NVIDIA, PCI_DEVICE_ID_NVIDIA_NFORCE_MCP61_SATA2), GENERIC 
},
-   { PCI_VDEVICE(NVIDIA, PCI_DEVICE_ID_NVIDIA_NFORCE_MCP61_SATA3), GENERIC 
},
+   { PCI_VDEVICE(NVIDIA, 

Re: [patch 1/3] MAP_NOZERO - implement a new VM_NOZERO/MAP_NOZERO page retirement policy

2007-06-26 Thread Rik van Riel

Davide Libenzi wrote:

This is the core implementation of the new VM_NOZERO page retirement
policy (and the associated MAP_NOZERO).
A new field  owner_uid  is added the the  mm_struct, and it is kept set to
the effective UID of the task that own the  mm_struct.
A new field  owner_uid  is also added to the page struct.


You will also need to take the task's SELinux security
context into account.

SUID programs should not be able to use this feature,
either.


When pages exit (unmapped from) a  vma, they are marked with the effective
UID of the  mm_struct  that owns it.




--- linux-2.6.mod.orig/include/linux/mm_types.h 2007-06-21 14:02:06.0 
-0700
+++ linux-2.6.mod/include/linux/mm_types.h  2007-06-25 19:11:22.0 
-0700
@@ -64,6 +64,7 @@
struct list_head lru;   /* Pageout list, eg. active_list
 * protected by zone->lru_lock !
 */
+   int owner_uid;  /* Last owner of the page */
/*
 * On machines where all RAM is mapped into kernel address space,
 * we can simply calculate the virtual address. On machines with


Since this is only set when the page is freed, could
the owner_uid and security context be put inside a
union with some fields that are not otherwise used
for free pages?

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] atkbd: cleanup only once

2007-06-26 Thread Dave Young
Hi,

If you press ctrl+alt+del several times as kernel booting (before user level 
bootin), the kernel will oops. I found the ps2_command is called more than 
once, then the ps2dev->serio maybe NULL pointer.

2.6.22-rc5 and 2.6.22-rc6 have same result.

Signed-off-by: Dave Young <[EMAIL PROTECTED]>
---
diff -upr linux/drivers/input/keyboard/atkbd.c 
linux.new/drivers/input/keyboard/atkbd.c
--- linux/drivers/input/keyboard/atkbd.c2007-06-27 10:38:37.0 
+
+++ linux.new/drivers/input/keyboard/atkbd.c2007-06-27 10:37:39.0 
+
@@ -795,6 +795,11 @@ static int atkbd_activate(struct atkbd *
 
 static void atkbd_cleanup(struct serio *serio)
 {
+   static int flag;
+
+   if(flag)
+   return;
+   flag = 1;
struct atkbd *atkbd = serio_get_drvdata(serio);
ps2_command(>ps2dev, NULL, ATKBD_CMD_RESET_BAT);
 }

Regards
dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


scheduling while atomic and DEBUG_SPINLOCK_SLEEP

2007-06-26 Thread Jon Ringle
Hello,

I am sometimes getting the following "scheduling while atomic" dump:

[42949427.37] scheduling while atomic: sh/0x0002/144
[42949427.38] [] (dump_stack+0x0/0x14) from []
(schedule+0x628/0x6c8)
[42949427.39] [] (schedule+0x0/0x6c8) from []
(__down_read+0xc4/0x128)
[42949427.40] [] (__down_read+0x0/0x128) from []
(do_page_fault+0x84/0x214)
[42949427.40]  r5 = 0017  r4 = C02F6168
[42949427.41] [] (do_page_fault+0x0/0x214) from
[] (do_DataAbort+0x3c/0xa4)
[42949427.42] [] (do_DataAbort+0x0/0xa4) from []
(__dabt_svc+0x40/0x60)
[42949427.43]  r8 = 0093  r7 = C0186920  r6 = E5903048  r5 =
CF21FD14
[42949427.43]  r4 = 
[42949427.44] [] (do_alignment_ldrstr+0x0/0x130) from
[] (do_alignment+0x238/0x34c)
[42949427.45]  r4 = CF21E000
[42949427.45] [] (do_alignment+0x0/0x34c) from
[] (do_DataAbort+0x3c/0xa4)
[42949427.46] [] (do_DataAbort+0x0/0xa4) from []
(__dabt_svc+0x40/0x60)
[42949427.47]  r8 = CFD51F34  r7 = 8000  r6 = 0001  r5 =
CF21FE58
[42949427.47]  r4 = 
[42949427.48] [] (get_index+0x0/0x5c) from []
(prio_tree_insert+0xac/0x28c)
[42949427.49] [] (prio_tree_insert+0x0/0x28c) from
[] (vma_prio_tree_insert+0x28/0x40)
[42949427.50] [] (vma_prio_tree_insert+0x0/0x40) from
[] (vma_link+0xe0/0x1d4)
[42949427.50]  r5 = CFC4F90C  r4 = CF21E000
[42949427.51] [] (vma_link+0x0/0x1d4) from []
(do_mmap_pgoff+0x390/0x760)
[42949427.52]  r7 = CFC374E0  r6 = 1000  r5 = 4005E000  r4 =
CFC4F90C
[42949427.52] [] (do_mmap_pgoff+0x0/0x760) from
[] (old_mmap+0x108/0x130)
[42949427.53] [] (old_mmap+0x0/0x130) from []
(ret_fast_syscall+0x0/0x2c)

So, I think I need to try to figure out why the preempt_count is 2. I
enabled CONFIG_DEBUG_SPINLOCK_SLEEP thinking that it would give me more
information about this problem. I got two different hits with this
turned on.

The first dump is coming from Intel's ixp400eth driver:

[42949391.91] Debug: sleeping function called from invalid context
at include/asm/semaphore.h:69
[42949391.91] in_atomic():1, irqs_disabled():128
[42949391.91] [] (dump_stack+0x0/0x14) from []
(__might_sleep+0xe8/0x114)
[42949391.91] [] (__might_sleep+0x0/0x114) from
[] (ixOsalMutexLock+0x190/0x1d8 [ixp400eth])
[42949391.91]  r5 = BF0A3D84  r4 = CFC2D2C0
[42949391.91] [] (ixOsalMutexLock+0x0/0x1d8 [ixp400eth])
from [] (ixEthAccPortMulticastAddressLeaveAll+0x38/0x60
[ixp400eth])
[42949391.91]  r8 = FF9D  r7 = BF09ED88  r6 = C43F5260  r5 =
CFD36000
[42949391.91]  r4 = 
[42949391.91] []
(ixEthAccPortMulticastAddressLeaveAll+0x0/0x60 [ixp400eth]) from
[] (dev_set_multicast_list+0x68/0x214 [ixp400eth])
[42949391.91]  r4 = C43F5000
[42949391.91] [] (dev_set_multicast_list+0x0/0x214
[ixp400eth]) from [] (__dev_mc_upload+0x3c/0x40)
[42949391.91]  r7 =   r6 = 1002  r5 =   r4 =
CFD36000
[42949391.91] [] (__dev_mc_upload+0x0/0x40) from
[] (dev_mc_upload+0x30/0x44)
[42949391.91] [] (dev_mc_upload+0x0/0x44) from
[] (dev_open+0x70/0xcc)
[42949391.91]  r4 = C43F5000
[42949391.91] [] (dev_open+0x0/0xcc) from []
(dev_change_flags+0x68/0x138)
[42949391.91]  r5 = 1043  r4 = C43F5000
[42949391.91] [] (dev_change_flags+0x0/0x138) from
[] (devinet_ioctl+0x64c/0x72c)
[42949391.91]  r7 = CFA09760  r6 = CFD36000  r5 = BEFA8D2C  r4 =
CFB99D40
[42949391.91] [] (devinet_ioctl+0x0/0x72c) from
[] (inet_ioctl+0x1b0/0x1d4)
[42949391.91] [] (inet_ioctl+0x0/0x1d4) from []
(sock_ioctl+0x184/0x2f0)
[42949391.91] [] (sock_ioctl+0x0/0x2f0) from []
(do_ioctl+0x84/0xa0)
[42949391.91]  r8 = C002AE44  r7 = BEFA8D2C  r6 = 8914  r5 =
FFE7
[42949391.91]  r4 = C43F1800
[42949391.91] [] (do_ioctl+0x0/0xa0) from []
(vfs_ioctl+0x94/0x314)
[42949391.91]  r7 =   r6 = BEFA8D2C  r5 = 0003  r4 =
C43F1800
[42949391.91] [] (vfs_ioctl+0x0/0x314) from []
(sys_ioctl+0x40/0x64)
[42949391.91]  r8 = C002AE44  r7 = 0036  r6 = 8914  r5 =
FFF7
[42949391.91]  r4 = C43F1800
[42949391.91] [] (sys_ioctl+0x0/0x64) from []
(ret_fast_syscall+0x0/0x2c)
[42949391.91]  r6 =   r5 = BEFA8E1C  r4 = BEFA8D2C

And the other one is from one of our own kernel modules:

[42949490.89] Debug: sleeping function called from invalid context
at mm/slab.c:2729
[42949490.89] in_atomic():0, irqs_disabled():128
[42949490.89] [] (dump_stack+0x0/0x14) from []
(__might_sleep+0xe8/0x114)
[42949490.89] [] (__might_sleep+0x0/0x114) from
[] (kmem_cache_alloc+0x74/0x84)
[42949490.89]  r5 = 00D0  r4 = CFFFE0C0
[42949490.89] [] (kmem_cache_alloc+0x0/0x84) from
[] (request_irq+0x80/0xdc)
[42949490.89]  r6 =   r5 = 0007  r4 = 
[42949490.89] [] (request_irq+0x0/0xdc) from []
(VbusHookInterrupt+0x2c/0x68 [dstdrv])
[42949490.89] [] (VbusHookInterrupt+0x0/0x68 [dstdrv])
from [] (VbusRegisterISR+0xcc/0xfc [dstdrv])
[42949490.89]  

Re: [PATCH RFC #2] hwrng: Add type categories

2007-06-26 Thread Henrique de Moraes Holschuh
On Tue, 26 Jun 2007, Matt Mackall wrote:
> On Tue, Jun 26, 2007 at 08:21:51PM +0200, Michael Buesch wrote:
> > Don't use the word "quality", as people seem to think of
> > the entropy quality when hearing that word.
> 
> Why do I so often feel compelled to respond with "did you read what I
> wrote?" on this list?
> 
> I object to your MEANINGLESS CATEGORIES.
> 
> > This uses the word "type", which is probably better for
> > understanding what the value really means.
> 
> Please explain:
> 
> a) how is bad different from pseudo?
> b) how is onboard different than dedicated?

Actually, I think I understand the reason behind (b).  If someone adds a
dedicated crypto/RNG engine to the system, he likely wants to use that and
not anything else that might also be around.

(a) is just broken, unless one is to take it as "never use it".  And I am
really not sure about (b).  It *is* better than just using whatever crap we
found first (or last), but it is the wrong solution for a problem that we
really should not have in the first place if someone had thought a bit
before adding a misc device for something that has no reason to be unique in
a system.

Instead of papering over the problem with borked solutions, maybe we should
just export ALL HRNGs to userspace.  While at it, please add whatever is
needed so that userspace can talk to the kernel driver to get vital
information about the HRNG device the driver might have (the current
interface is a bad simplistic hack).

Let userspace get the data from whichever HRNG it wants, process it in any
way it wants and pipe it back through /dev/random IOCTLs.  And let it do it
for as many HRNGs it wants at the same time.

And if you must have /dev/hw_random point somewhere, let udev scripts or
something else like that take care of it.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [AppArmor 00/44] AppArmor security module overview

2007-06-26 Thread Andrew Morton
On Tue, 26 Jun 2007 19:24:03 -0700 John Johansen <[EMAIL PROTECTED]> wrote:

> > 
> > so...  where do we stand with this?  Fundamental, irreconcilable
> > differences over the use of pathname-based security?
> > 
> There certainly seems to be some differences of opinion over the use
> of pathname-based-security.

I was refreshed to have not been cc'ed on a lkml thread for once.  I guess
it couldn't last.

Do you agree with the "irreconcilable" part?  I think I do.

I suspect that we're at the stage of having to decide between

a) set aside the technical issues and grudgingly merge this stuff as a
   service to Suse and to their users (both of which entities are very
   important to us) and leave it all as an object lesson in
   how-not-to-develop-kernel-features.

   Minimisation of the impact on the rest of the kernel is of course
   very important here.

versus

b) leave it out and require that Suse wear the permanent cost and
   quality impact of maintaining it out-of-tree.  It will still be an
   object lesson in how-not-to-develop-kernel-features.

Sigh.  Please don't put us in this position again.  Get stuff upstream
before shipping it to customers, OK?  It ain't rocket science.

> > Are there any other sticking points?
> > 
> > 
> The conditional passing of the vfsmnt mount in the vfs, as done in this
> patch series, has received a NAK.  This problem results from NFS passing
> a NULL nameidata into the vfs.  We have a second patch series that we
> have posted for discussion that addresses this by splitting the nameidata
> struct.
> Message-Id: <[EMAIL PROTECTED]>
> Subject: [RFD 0/4] AppArmor - Don't pass NULL nameidata to
> vfs_create/lookup/permission IOPs
> 
> other issues that have been raised are:
> - AppArmor does not currently mediate IPC and network communications.
>   Mediation of these is a wip
> - the use of d_path to generate the pathname used for mediation when a
>   file is opened.
>   - Generating the pathname using a reverse walk is considered ugly
>   - A buffer is alloced to store the generated path name.
>   - The  buffer size has a configurable upper limit which will cause
> opens to fail if the pathname length exceeds this limit.  This
> is a fail closed behavior.
>   - there have been some concerns expressed about the performance
> of this approach
>   We are evaluating our options on how best to address this issue.

OK, useful summary, thanks.  I'd encourage you to proceed apace.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 3/3] MAP_NOZERO - wire sys_brk2() to the x86 family

2007-06-26 Thread Davide Libenzi
Wires up sys_brk2() to the x86 family.



Signed-off-by: Davide Libenzi <[EMAIL PROTECTED]>


- Davide


---
 arch/i386/kernel/syscall_table.S |1 +
 arch/x86_64/ia32/ia32entry.S |1 +
 include/asm-i386/unistd.h|3 ++-
 include/asm-x86_64/unistd.h  |2 ++
 4 files changed, 6 insertions(+), 1 deletion(-)

Index: linux-2.6.mod/arch/i386/kernel/syscall_table.S
===
--- linux-2.6.mod.orig/arch/i386/kernel/syscall_table.S 2007-06-25 
19:14:46.0 -0700
+++ linux-2.6.mod/arch/i386/kernel/syscall_table.S  2007-06-26 
18:08:30.0 -0700
@@ -323,3 +323,4 @@
.long sys_signalfd
.long sys_timerfd
.long sys_eventfd
+   .long sys_brk2
Index: linux-2.6.mod/arch/x86_64/ia32/ia32entry.S
===
--- linux-2.6.mod.orig/arch/x86_64/ia32/ia32entry.S 2007-06-25 
19:14:46.0 -0700
+++ linux-2.6.mod/arch/x86_64/ia32/ia32entry.S  2007-06-26 18:08:30.0 
-0700
@@ -719,4 +719,5 @@
.quad compat_sys_signalfd
.quad compat_sys_timerfd
.quad sys_eventfd
+   .quad sys_brk2
 ia32_syscall_end:
Index: linux-2.6.mod/include/asm-i386/unistd.h
===
--- linux-2.6.mod.orig/include/asm-i386/unistd.h2007-06-25 
19:14:46.0 -0700
+++ linux-2.6.mod/include/asm-i386/unistd.h 2007-06-26 18:08:30.0 
-0700
@@ -329,10 +329,11 @@
 #define __NR_signalfd  321
 #define __NR_timerfd   322
 #define __NR_eventfd   323
+#define __NR_brk2  324
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 324
+#define NR_syscalls 325
 
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR
Index: linux-2.6.mod/include/asm-x86_64/unistd.h
===
--- linux-2.6.mod.orig/include/asm-x86_64/unistd.h  2007-06-25 
19:14:46.0 -0700
+++ linux-2.6.mod/include/asm-x86_64/unistd.h   2007-06-26 18:08:30.0 
-0700
@@ -630,6 +630,8 @@
 __SYSCALL(__NR_timerfd, sys_timerfd)
 #define __NR_eventfd   284
 __SYSCALL(__NR_eventfd, sys_eventfd)
+#define __NR_brk2  285
+__SYSCALL(__NR_brk2, sys_brk2)
 
 #ifndef __NO_STUBS
 #define __ARCH_WANT_OLD_READDIR

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 1/3] MAP_NOZERO - implement a new VM_NOZERO/MAP_NOZERO page retirement policy

2007-06-26 Thread Davide Libenzi
This is the core implementation of the new VM_NOZERO page retirement
policy (and the associated MAP_NOZERO).
A new field  owner_uid  is added the the  mm_struct, and it is kept set to
the effective UID of the task that own the  mm_struct.
A new field  owner_uid  is also added to the page struct.
When pages exit (unmapped from) a  vma, they are marked with the effective
UID of the  mm_struct  that owns it.
When pages exit the allocator, their  owner_uid  is cleared, unless the
new flag __GFP_UIDKEEP is passed to it. So every page fetcher other than
the new alloc_zeroed_page_vma(), clears the owner_uid and blocks all the
following uses of the uncleared page itself.
The new alloc_zeroed_page_vma() calls __alloc_pages() with the __GFP_UIDKEEP
flag, and checks if the VM_NOZERO flag is set in the vma, and if the  owner_uid
field of the page matches the one of the  mm_struct  owning the vma.
If any of these test fail, the page is cleared in the usual way, otherwise
it is passed back without being cleared.
Page-cache pages are (once unmapped) marked with the uid owning the  inode
of the mapping the pages are associated with.




Signed-off-by: Davide Libenzi <[EMAIL PROTECTED]>


- Davide



---
 include/asm-alpha/page.h |3 ++-
 include/asm-cris/page.h  |3 ++-
 include/asm-generic/mman.h   |1 +
 include/asm-h8300/page.h |3 ++-
 include/asm-i386/page.h  |3 ++-
 include/asm-ia64/page.h  |2 +-
 include/asm-m32r/page.h  |3 ++-
 include/asm-m68knommu/page.h |3 ++-
 include/asm-s390/page.h  |3 ++-
 include/asm-x86_64/page.h|3 ++-
 include/linux/gfp.h  |5 +
 include/linux/highmem.h  |7 +--
 include/linux/mm.h   |   16 
 include/linux/mm_types.h |1 +
 include/linux/mman.h |3 ++-
 include/linux/rmap.h |1 +
 include/linux/sched.h|3 +++
 kernel/fork.c|1 +
 kernel/sys.c |3 +++
 mm/filemap.c |2 ++
 mm/mmap.c|3 ++-
 mm/page_alloc.c  |   33 +
 mm/rmap.c|   14 ++
 23 files changed, 102 insertions(+), 17 deletions(-)

Index: linux-2.6.mod/include/linux/sched.h
===
--- linux-2.6.mod.orig/include/linux/sched.h2007-06-21 13:59:38.0 
-0700
+++ linux-2.6.mod/include/linux/sched.h 2007-06-21 14:01:28.0 -0700
@@ -386,6 +386,9 @@
/* aio bits */
rwlock_tioctx_list_lock;
struct kioctx   *ioctx_list;
+
+   /* Effective UID of the owner of this mm_struct */
+   uid_t   owner_uid;
 };
 
 struct sighand_struct {
Index: linux-2.6.mod/mm/rmap.c
===
--- linux-2.6.mod.orig/mm/rmap.c2007-06-21 14:27:19.0 -0700
+++ linux-2.6.mod/mm/rmap.c 2007-06-25 17:42:59.0 -0700
@@ -627,6 +627,16 @@
 }
 #endif
 
+void page_set_owner(struct page *page, uid_t owner_uid)
+{
+   if (unlikely(PageCompound(page))) {
+   unsigned int nrpages = 1U << compound_order(page);
+   for (; nrpages; nrpages--, page++)
+   page_set_owner_uid(page, owner_uid);
+   } else
+   page_set_owner_uid(page, owner_uid);
+}
+
 /**
  * page_remove_rmap - take down pte mapping from a page
  * @page: page to remove mapping from
@@ -649,6 +659,10 @@
print_symbol (KERN_EMERG "  
vma->vm_file->f_op->mmap = %s\n", (unsigned long)vma->vm_file->f_op->mmap);
BUG();
}
+   /*
+* Record the last owner of the page.
+*/
+   page_set_owner(page, vma->vm_mm->owner_uid);
 
/*
 * It would be tidy to reset the PageAnon mapping here,
Index: linux-2.6.mod/kernel/fork.c
===
--- linux-2.6.mod.orig/kernel/fork.c2007-06-21 14:32:44.0 -0700
+++ linux-2.6.mod/kernel/fork.c 2007-06-24 21:23:52.0 -0700
@@ -342,6 +342,7 @@
mm->ioctx_list = NULL;
mm->free_area_cache = TASK_UNMAPPED_BASE;
mm->cached_hole_size = ~0UL;
+   mm->owner_uid = current->euid;
 
if (likely(!mm_alloc_pgd(mm))) {
mm->def_flags = 0;
Index: linux-2.6.mod/include/linux/highmem.h
===
--- linux-2.6.mod.orig/include/linux/highmem.h  2007-06-21 14:38:02.0 
-0700
+++ linux-2.6.mod/include/linux/highmem.h   2007-06-22 12:10:36.0 
-0700
@@ -76,12 +76,7 @@
 static inline struct page *
 alloc_zeroed_user_highpage(struct vm_area_struct *vma, unsigned long vaddr)
 {
-   struct page *page = alloc_page_vma(GFP_HIGHUSER, vma, vaddr);
-
-   if (page)
-

[patch 2/3] MAP_NOZERO - implement sys_brk2()

2007-06-26 Thread Davide Libenzi
The following patch implements the sys_brk2() syscall, that nothing is
other than a sys_brk() with an extra "flags" parameter. This can be used
to pass the new MAP_NOZERO bit, to ask the kernel to hand over non-zero
pages if possible.



Signed-off-by: Davide Libenzi <[EMAIL PROTECTED]>


- Davide


---
 include/linux/mm.h   |3 ++-
 include/linux/syscalls.h |1 +
 mm/mmap.c|   22 ++
 3 files changed, 21 insertions(+), 5 deletions(-)

Index: linux-2.6.mod/include/linux/mm.h
===
--- linux-2.6.mod.orig/include/linux/mm.h   2007-06-25 19:27:42.0 
-0700
+++ linux-2.6.mod/include/linux/mm.h2007-06-26 18:08:28.0 -0700
@@ -1099,7 +1099,8 @@
 }
 
 extern int do_munmap(struct mm_struct *, unsigned long, size_t);
-
+extern unsigned long do_brk_flags(unsigned long addr, unsigned long len,
+ unsigned long vmflags);
 extern unsigned long do_brk(unsigned long, unsigned long);
 
 /* filemap.c */
Index: linux-2.6.mod/include/linux/syscalls.h
===
--- linux-2.6.mod.orig/include/linux/syscalls.h 2007-06-25 19:14:49.0 
-0700
+++ linux-2.6.mod/include/linux/syscalls.h  2007-06-26 18:08:28.0 
-0700
@@ -263,6 +263,7 @@
 asmlinkage long sys_fremovexattr(int fd, char __user *name);
 
 asmlinkage unsigned long sys_brk(unsigned long brk);
+asmlinkage unsigned long sys_brk2(unsigned long brk, unsigned long flags);
 asmlinkage long sys_mprotect(unsigned long start, size_t len,
unsigned long prot);
 asmlinkage unsigned long sys_mremap(unsigned long addr,
Index: linux-2.6.mod/mm/mmap.c
===
--- linux-2.6.mod.orig/mm/mmap.c2007-06-25 19:14:49.0 -0700
+++ linux-2.6.mod/mm/mmap.c 2007-06-26 18:08:28.0 -0700
@@ -35,6 +35,8 @@
 #define arch_mmap_check(addr, len, flags)  (0)
 #endif
 
+#define BRK_ALLOWED_FLAGS  (VM_NOZERO)
+
 static void unmap_region(struct mm_struct *mm,
struct vm_area_struct *vma, struct vm_area_struct *prev,
unsigned long start, unsigned long end);
@@ -234,7 +236,7 @@
return next;
 }
 
-asmlinkage unsigned long sys_brk(unsigned long brk)
+asmlinkage unsigned long sys_brk2(unsigned long brk, unsigned long flags)
 {
unsigned long rlim, retval;
unsigned long newbrk, oldbrk;
@@ -271,8 +273,10 @@
if (find_vma_intersection(mm, oldbrk, newbrk+PAGE_SIZE))
goto out;
 
+   flags = BRK_ALLOWED_FLAGS & calc_vm_flag_bits(flags);
+
/* Ok, looks good - let it rip. */
-   if (do_brk(oldbrk, newbrk-oldbrk) != oldbrk)
+   if (do_brk_flags(oldbrk, newbrk-oldbrk, flags) != oldbrk)
goto out;
 set_brk:
mm->brk = brk;
@@ -282,6 +286,11 @@
return retval;
 }
 
+asmlinkage unsigned long sys_brk(unsigned long brk)
+{
+   return sys_brk2(brk, 0);
+}
+
 #ifdef DEBUG_MM_RB
 static int browse_rb(struct rb_root *root)
 {
@@ -1863,7 +1872,8 @@
  *  anonymous maps.  eventually we may be able to do some
  *  brk-specific accounting here.
  */
-unsigned long do_brk(unsigned long addr, unsigned long len)
+unsigned long do_brk_flags(unsigned long addr, unsigned long len,
+  unsigned long vmflags)
 {
struct mm_struct * mm = current->mm;
struct vm_area_struct * vma, * prev;
@@ -1882,7 +1892,7 @@
if (is_hugepage_only_range(mm, addr, len))
return -EINVAL;
 
-   flags = VM_DATA_DEFAULT_FLAGS | VM_ACCOUNT | mm->def_flags;
+   flags = VM_DATA_DEFAULT_FLAGS | VM_ACCOUNT | mm->def_flags | vmflags;
 
error = arch_mmap_check(addr, len, flags);
if (error)
@@ -1959,6 +1969,10 @@
return addr;
 }
 
+unsigned long do_brk(unsigned long addr, unsigned long len)
+{
+   return do_brk_flags(addr, len, 0);
+}
 EXPORT_SYMBOL(do_brk);
 
 /* Release all mmaps. */

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

2007-06-26 Thread Alan Stern
On Tue, 26 Jun 2007, Roland McGrath wrote:

> I needed the attached patch on top of the bptest patch for the current
> code.  Btw, that is a very nice little tester!

I had already made some of those changes (the ones needed to make 
bptest build with the new hw_breakpoint code).  I'll add in the others.

> Below that is a patch to go on top of your current patch, with x86-64
> support.  I've only tried a few trivial tests with bptest (including an
> 8-byte bp), which worked great.  It is a pretty faithful copy of your i386
> changes.  I'm still not sure we have all that right, but you might as well
> incorporate this into your patch.  You should change the x86_64 code in
> parallel with any i386 changes we decide on later, and I can test it and
> send you any typo fixups or whatnot.

Right.  I may update a few comments...

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 0/3] MAP_NOZERO - VM_NOZERO/MAP_NOZERO early summer madness

2007-06-26 Thread Davide Libenzi
I was using oprofile to sample some userspace code I am working on,
and I was continuosly noticing  clear_page  in the top three entries
of the oprofile logs.
Also, a simple kernel build, in my Dual Opteron with 8GB of RAM,
shows  clear_page  as the first kernel entry, second only to the
userspace the  cc1  and  as.
Most of the userspace code uses malloc() (and anonymous mappings) in
such a way that the memory returned via kernel->glibc is immediately
written soon after. The POSIX malloc() definition itself also, does
not require the returned memory to be zeroed (as calloc() does).
So I implemented a rather quick hack that introduces a new mmap() flag
MAP_NOZERO (only valid for anonymous mappings) and the  vma  counter-part
VM_NOZERO. Also, a new sys_brk2() has been introduced to accept a new
flags  parameter. A brief description of the patches follows in the next
emails.
I first hacked Val's ebizzy to accept a new '-N' flag to make use of
MAP_NOZERO:

http://infohost.nmt.edu/~val/patches/ebizzy.tar.gz
http://www.xmailserver.org/ebizzy-nzmmap-0.2.diff

On my box,  ebizzy  performance jumped up from 10% to 15%.
The userspace code I am working on (uses malloc() quite heavily), saw
a performance jump of around 14%.
In both cases,  clear_page  dropped way down in the oprofile logs.
I then coded quick (and rather ugly) hacks for  glibc  and  gcc  to
make them use the new features (MAP_NOZERO and sys_brk2()):

http://www.xmailserver.org/glibc-nzmalloc-tweaks
http://www.xmailserver.org/gcc-nozero-hack

I then tried a 2.6.22-rc5 kernel build using the newly built  glibc
and  gcc  (with and w/out no-zero enabling options/env-vars), and
when using the no-zero mode,  clear_page  went way down in the oprofile
logs and build time dropped of about 2.5% to 3%.
I did not have time (and will) to tweak  as  and  ld  also.
These are some test utilities to verify the no-zero behaviour of MAP_NOZERO
(and sys_brk2()):

http://www.xmailserver.org/nzmmap-test.c
http://www.xmailserver.org/nzmalloc-test.c
http://www.xmailserver.org/smiffy.c

To run  nzmalloc-test  you need a patched glibc (using  glibc-nzmalloc-tweaks).
The  smiffy  one, should be run under a user that has no other processes
running and that owns no files on the system, and it verifies that all the
pages it gets from the kernel are zeroed (otherwise "Houston, we have a problem 
...").
It is running on my system w/out barfing by more than two days.
How crazy is that?




- Davide



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC #2] hwrng: Add type categories

2007-06-26 Thread Matt Mackall
On Tue, Jun 26, 2007 at 05:45:17PM -0500, Matt Mackall wrote:
> On Tue, Jun 26, 2007 at 08:21:51PM +0200, Michael Buesch wrote:
> > Don't use the word "quality", as people seem to think of
> > the entropy quality when hearing that word.
> 
> Why do I so often feel compelled to respond with "did you read what I
> wrote?" on this list?

Ahh, I see you did respond to what I wrote earlier. Missed it do to
travelling earlier today. Will respond to the earlier thread.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NDAs - ANY KNOWN RULES?

2007-06-26 Thread Arjan van de Ven

> Thanks for your explanations,
> 
> but I know for sure it does't work.

then.. do you have an actual question or are you just trying to troll?

and yes there have been several such trolls lately on this list, and so
far your postings have all the signs of being just another one..


DO NOT FEED THE TROLLS


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [AppArmor 00/44] AppArmor security module overview

2007-06-26 Thread John Johansen
On Tue, Jun 26, 2007 at 04:52:02PM -0700, Andrew Morton wrote:
> On Tue, 26 Jun 2007 16:07:56 -0700
> [EMAIL PROTECTED] wrote:
> 
> > This post contains patches to include the AppArmor application security
> > framework, with request for inclusion into -mm for wider testing.
> 
> Patches 24 and 31 didn't come through.
> 
yes, sorry about that I had a very odd failure authetication failure
with those two mails and missed it.

They have been recent.

> 
> so...  where do we stand with this?  Fundamental, irreconcilable
> differences over the use of pathname-based security?
> 
There certainly seems to be some differences of opinion over the use
of pathname-based-security.

> Are there any other sticking points?
> 
> 
The conditional passing of the vfsmnt mount in the vfs, as done in this
patch series, has received a NAK.  This problem results from NFS passing
a NULL nameidata into the vfs.  We have a second patch series that we
have posted for discussion that addresses this by splitting the nameidata
struct.
Message-Id: <[EMAIL PROTECTED]>
Subject: [RFD 0/4] AppArmor - Don't pass NULL nameidata to
vfs_create/lookup/permission IOPs

other issues that have been raised are:
- AppArmor does not currently mediate IPC and network communications.
  Mediation of these is a wip
- the use of d_path to generate the pathname used for mediation when a
  file is opened.
  - Generating the pathname using a reverse walk is considered ugly
  - A buffer is alloced to store the generated path name.
  - The  buffer size has a configurable upper limit which will cause
opens to fail if the pathname length exceeds this limit.  This
is a fail closed behavior.
  - there have been some concerns expressed about the performance
of this approach
  We are evaluating our options on how best to address this issue.


pgpHKUsFfcLeK.pgp
Description: PGP signature


Re: [PATCH] hw_random: add quality categories

2007-06-26 Thread Henrique de Moraes Holschuh
On Tue, 26 Jun 2007, Michael Buesch wrote:
> On Tuesday 26 June 2007 16:06:25 Henrique de Moraes Holschuh wrote:
> > Which, AFAIK, we can quantify as the minimum expected entropy in the output.
> 
> The category is _not_ a measure of the entropy in the output.
> It is _just_ to get the chance to get a sane _default_ policy
> for which RNG is enabled by default, in the kernel.
> It's just about a default policy. _Nothing_ else.

Then why don't you call it "preference", or something to that effect?

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[md-accel PATCH 18/19] iop3xx: surface the iop3xx DMA and AAU units to the iop-adma driver

2007-06-26 Thread Dan Williams
Adds the platform device definitions and the architecture specific support
routines (i.e. register initialization and descriptor formats) for the
iop-adma driver.

Changelog:
* add support for > 1k zero sum buffer sizes
* added dma/aau platform devices to iq80321 and iq80332 setup
* fixed the calculation in iop_desc_is_aligned
* support xor buffer sizes larger than 16MB
* fix places where software descriptors are assumed to be contiguous, only
  hardware descriptors are contiguous for up to a PAGE_SIZE buffer size
* convert to async_tx
* add interrupt support
* add platform devices for 80219 boards
* do not call platform register macros in driver code
* remove switch() statements for compatible register offsets/layouts
* change over to bitmap based capabilities
* remove unnecessary ARM assembly statement
* checkpatch.pl fixes
* gpl v2 only correction

Cc: Russell King <[EMAIL PROTECTED]>
Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 arch/arm/mach-iop32x/glantank.c|2 
 arch/arm/mach-iop32x/iq31244.c |5 
 arch/arm/mach-iop32x/iq80321.c |3 
 arch/arm/mach-iop32x/n2100.c   |2 
 arch/arm/mach-iop33x/iq80331.c |3 
 arch/arm/mach-iop33x/iq80332.c |3 
 arch/arm/plat-iop/Makefile |2 
 arch/arm/plat-iop/adma.c   |  209 
 include/asm-arm/arch-iop32x/adma.h |5 
 include/asm-arm/arch-iop33x/adma.h |5 
 include/asm-arm/hardware/iop3xx-adma.h |  891 
 include/asm-arm/hardware/iop3xx.h  |   68 --
 12 files changed, 1138 insertions(+), 60 deletions(-)

diff --git a/arch/arm/mach-iop32x/glantank.c b/arch/arm/mach-iop32x/glantank.c
index 5776fd8..2b086ab 100644
--- a/arch/arm/mach-iop32x/glantank.c
+++ b/arch/arm/mach-iop32x/glantank.c
@@ -180,6 +180,8 @@ static void __init glantank_init_machine(void)
platform_device_register(_i2c1_device);
platform_device_register(_flash_device);
platform_device_register(_serial_device);
+   platform_device_register(_dma_0_channel);
+   platform_device_register(_dma_1_channel);
 
pm_power_off = glantank_power_off;
 }
diff --git a/arch/arm/mach-iop32x/iq31244.c b/arch/arm/mach-iop32x/iq31244.c
index d4eefbe..98cfa1c 100644
--- a/arch/arm/mach-iop32x/iq31244.c
+++ b/arch/arm/mach-iop32x/iq31244.c
@@ -298,9 +298,14 @@ static void __init iq31244_init_machine(void)
platform_device_register(_i2c1_device);
platform_device_register(_flash_device);
platform_device_register(_serial_device);
+   platform_device_register(_dma_0_channel);
+   platform_device_register(_dma_1_channel);
 
if (is_ep80219())
pm_power_off = ep80219_power_off;
+
+   if (!is_80219())
+   platform_device_register(_aau_channel);
 }
 
 static int __init force_ep80219_setup(char *str)
diff --git a/arch/arm/mach-iop32x/iq80321.c b/arch/arm/mach-iop32x/iq80321.c
index 8d9f491..18ad29f 100644
--- a/arch/arm/mach-iop32x/iq80321.c
+++ b/arch/arm/mach-iop32x/iq80321.c
@@ -181,6 +181,9 @@ static void __init iq80321_init_machine(void)
platform_device_register(_i2c1_device);
platform_device_register(_flash_device);
platform_device_register(_serial_device);
+   platform_device_register(_dma_0_channel);
+   platform_device_register(_dma_1_channel);
+   platform_device_register(_aau_channel);
 }
 
 MACHINE_START(IQ80321, "Intel IQ80321")
diff --git a/arch/arm/mach-iop32x/n2100.c b/arch/arm/mach-iop32x/n2100.c
index d55005d..390a97d 100644
--- a/arch/arm/mach-iop32x/n2100.c
+++ b/arch/arm/mach-iop32x/n2100.c
@@ -245,6 +245,8 @@ static void __init n2100_init_machine(void)
platform_device_register(_i2c0_device);
platform_device_register(_flash_device);
platform_device_register(_serial_device);
+   platform_device_register(_dma_0_channel);
+   platform_device_register(_dma_1_channel);
 
pm_power_off = n2100_power_off;
 
diff --git a/arch/arm/mach-iop33x/iq80331.c b/arch/arm/mach-iop33x/iq80331.c
index 2b06318..433188e 100644
--- a/arch/arm/mach-iop33x/iq80331.c
+++ b/arch/arm/mach-iop33x/iq80331.c
@@ -136,6 +136,9 @@ static void __init iq80331_init_machine(void)
platform_device_register(_uart0_device);
platform_device_register(_uart1_device);
platform_device_register(_flash_device);
+   platform_device_register(_dma_0_channel);
+   platform_device_register(_dma_1_channel);
+   platform_device_register(_aau_channel);
 }
 
 MACHINE_START(IQ80331, "Intel IQ80331")
diff --git a/arch/arm/mach-iop33x/iq80332.c b/arch/arm/mach-iop33x/iq80332.c
index 7889ce3..416c095 100644
--- a/arch/arm/mach-iop33x/iq80332.c
+++ b/arch/arm/mach-iop33x/iq80332.c
@@ -136,6 +136,9 @@ static void __init iq80332_init_machine(void)
platform_device_register(_uart0_device);
platform_device_register(_uart1_device);
platform_device_register(_flash_device);
+   

[md-accel PATCH 19/19] ARM: Add drivers/dma to arch/arm/Kconfig

2007-06-26 Thread Dan Williams
Cc: Russell King <[EMAIL PROTECTED]>
Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 arch/arm/Kconfig |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 50d9f3e..0cb2d4f 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1034,6 +1034,8 @@ source "drivers/mmc/Kconfig"
 
 source "drivers/rtc/Kconfig"
 
+source "drivers/dma/Kconfig"
+
 endmenu
 
 source "fs/Kconfig"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[md-accel PATCH 17/19] iop13xx: surface the iop13xx adma units to the iop-adma driver

2007-06-26 Thread Dan Williams
Adds the platform device definitions and the architecture specific
support routines (i.e. register initialization and descriptor formats) for the
iop-adma driver.

Changelog:
* added 'descriptor pool size' to the platform data
* add base support for buffer sizes larger than 16MB (hw max)
* build error fix from Kirill A. Shutemov
* rebase for async_tx changes
* add interrupt support
* do not call platform register macros in driver code
* remove unnecessary ARM assembly statement
* checkpatch.pl fixes
* gpl v2 only correction

Cc: Russell King <[EMAIL PROTECTED]>
Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 arch/arm/mach-iop13xx/setup.c  |  217 +
 include/asm-arm/arch-iop13xx/adma.h|  544 
 include/asm-arm/arch-iop13xx/iop13xx.h |   38 +-
 3 files changed, 774 insertions(+), 25 deletions(-)

diff --git a/arch/arm/mach-iop13xx/setup.c b/arch/arm/mach-iop13xx/setup.c
index bc48715..bfe0c87 100644
--- a/arch/arm/mach-iop13xx/setup.c
+++ b/arch/arm/mach-iop13xx/setup.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define IOP13XX_UART_XTAL 4000
 #define IOP13XX_SETUP_DEBUG 0
@@ -236,19 +237,143 @@ static unsigned long iq8134x_probe_flash_size(void)
 }
 #endif
 
+/* ADMA Channels */
+static struct resource iop13xx_adma_0_resources[] = {
+   [0] = {
+   .start = IOP13XX_ADMA_PHYS_BASE(0),
+   .end = IOP13XX_ADMA_UPPER_PA(0),
+   .flags = IORESOURCE_MEM,
+   },
+   [1] = {
+   .start = IRQ_IOP13XX_ADMA0_EOT,
+   .end = IRQ_IOP13XX_ADMA0_EOT,
+   .flags = IORESOURCE_IRQ
+   },
+   [2] = {
+   .start = IRQ_IOP13XX_ADMA0_EOC,
+   .end = IRQ_IOP13XX_ADMA0_EOC,
+   .flags = IORESOURCE_IRQ
+   },
+   [3] = {
+   .start = IRQ_IOP13XX_ADMA0_ERR,
+   .end = IRQ_IOP13XX_ADMA0_ERR,
+   .flags = IORESOURCE_IRQ
+   }
+};
+
+static struct resource iop13xx_adma_1_resources[] = {
+   [0] = {
+   .start = IOP13XX_ADMA_PHYS_BASE(1),
+   .end = IOP13XX_ADMA_UPPER_PA(1),
+   .flags = IORESOURCE_MEM,
+   },
+   [1] = {
+   .start = IRQ_IOP13XX_ADMA1_EOT,
+   .end = IRQ_IOP13XX_ADMA1_EOT,
+   .flags = IORESOURCE_IRQ
+   },
+   [2] = {
+   .start = IRQ_IOP13XX_ADMA1_EOC,
+   .end = IRQ_IOP13XX_ADMA1_EOC,
+   .flags = IORESOURCE_IRQ
+   },
+   [3] = {
+   .start = IRQ_IOP13XX_ADMA1_ERR,
+   .end = IRQ_IOP13XX_ADMA1_ERR,
+   .flags = IORESOURCE_IRQ
+   }
+};
+
+static struct resource iop13xx_adma_2_resources[] = {
+   [0] = {
+   .start = IOP13XX_ADMA_PHYS_BASE(2),
+   .end = IOP13XX_ADMA_UPPER_PA(2),
+   .flags = IORESOURCE_MEM,
+   },
+   [1] = {
+   .start = IRQ_IOP13XX_ADMA2_EOT,
+   .end = IRQ_IOP13XX_ADMA2_EOT,
+   .flags = IORESOURCE_IRQ
+   },
+   [2] = {
+   .start = IRQ_IOP13XX_ADMA2_EOC,
+   .end = IRQ_IOP13XX_ADMA2_EOC,
+   .flags = IORESOURCE_IRQ
+   },
+   [3] = {
+   .start = IRQ_IOP13XX_ADMA2_ERR,
+   .end = IRQ_IOP13XX_ADMA2_ERR,
+   .flags = IORESOURCE_IRQ
+   }
+};
+
+static u64 iop13xx_adma_dmamask = DMA_64BIT_MASK;
+static struct iop_adma_platform_data iop13xx_adma_0_data = {
+   .hw_id = 0,
+   .pool_size = PAGE_SIZE,
+};
+
+static struct iop_adma_platform_data iop13xx_adma_1_data = {
+   .hw_id = 1,
+   .pool_size = PAGE_SIZE,
+};
+
+static struct iop_adma_platform_data iop13xx_adma_2_data = {
+   .hw_id = 2,
+   .pool_size = PAGE_SIZE,
+};
+
+/* The ids are fixed up later in iop13xx_platform_init */
+static struct platform_device iop13xx_adma_0_channel = {
+   .name = "iop-adma",
+   .id = 0,
+   .num_resources = 4,
+   .resource = iop13xx_adma_0_resources,
+   .dev = {
+   .dma_mask = _adma_dmamask,
+   .coherent_dma_mask = DMA_64BIT_MASK,
+   .platform_data = (void *) _adma_0_data,
+   },
+};
+
+static struct platform_device iop13xx_adma_1_channel = {
+   .name = "iop-adma",
+   .id = 0,
+   .num_resources = 4,
+   .resource = iop13xx_adma_1_resources,
+   .dev = {
+   .dma_mask = _adma_dmamask,
+   .coherent_dma_mask = DMA_64BIT_MASK,
+   .platform_data = (void *) _adma_1_data,
+   },
+};
+
+static struct platform_device iop13xx_adma_2_channel = {
+   .name = "iop-adma",
+   .id = 0,
+   .num_resources = 4,
+   .resource = iop13xx_adma_2_resources,
+   .dev = {
+   .dma_mask = _adma_dmamask,
+   .coherent_dma_mask = DMA_64BIT_MASK,
+   .platform_data = (void *) _adma_2_data,
+   },
+};
+
 void __init 

[md-accel PATCH 16/19] dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines

2007-06-26 Thread Dan Williams
The Intel(R) IOP series of i/o processors integrate an Xscale core with
raid acceleration engines.  The capabilities per platform are:

iop219:
 (2) copy engines
iop321:
 (2) copy engines
 (1) xor and block fill engine
iop33x:
 (2) copy and crc32c engines
 (1) xor, xor zero sum, pq, pq zero sum, and block fill engine
iop13xx:
 (2) copy, crc32c, xor, xor zero sum, and block fill engines
 (1) copy, crc32c, xor, xor zero sum, pq, pq zero sum, and block fill engine

The driver supports the features of the async_tx api:
* asynchronous notification of operation completion
* implicit (interupt triggered) handling of inter-channel transaction
  dependencies

The driver adapts to the platform it is running by two methods.
1/ #include  which defines the hardware specific
   iop_chan_* and iop_desc_* routines as a series of static inline
   functions
2/ The private platform data attached to the platform_device defines the
   capabilities of the channels

20070626: Callbacks are run in a tasklet.  Given the recent discussion on
LKML about killing tasklets in favor of workqueues I did a quick conversion
of the driver.  Raid5 resync performance dropped from 50MB/s to 30MB/s, so
the tasklet implementation remains until a generic softirq interface is
available.

Changelog:
* fixed a slot allocation bug in do_iop13xx_adma_xor that caused too few
slots to be requested eventually leading to data corruption
* enabled the slot allocation routine to attempt to free slots before
returning -ENOMEM
* switched the cleanup routine to solely use the software chain and the
status register to determine if a descriptor is complete.  This is
necessary to support other IOP engines that do not have status writeback
capability
* make the driver iop generic
* modified the allocation routines to understand allocating a group of
slots for a single operation
* added a null xor initialization operation for the xor only channel on
iop3xx
* support xor operations on buffers larger than the hardware maximum
* split the do_* routines into separate prep, src/dest set, submit stages
* added async_tx support (dependent operations initiation at cleanup time)
* simplified group handling
* added interrupt support (callbacks via tasklets)
* brought the pending depth inline with ioat (i.e. 4 descriptors)
* drop dma mapping methods, suggested by Chris Leech
* don't use inline in C files, Adrian Bunk
* remove static tasklet declarations
* make iop_adma_alloc_slots easier to read and remove chances for a
corrupted descriptor chain
* fix locking bug in iop_adma_alloc_chan_resources, Benjamin Herrenschmidt
* convert capabilities over to dma_cap_mask_t
* fixup sparse warnings
* add descriptor flush before iop_chan_enable
* checkpatch.pl fixes
* gpl v2 only correction
* move set_src, set_dest, submit to async_tx methods

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 drivers/dma/Kconfig |8 
 drivers/dma/Makefile|1 
 drivers/dma/iop-adma.c  | 1465 +++
 include/asm-arm/hardware/iop_adma.h |  120 +++
 4 files changed, 1594 insertions(+), 0 deletions(-)

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 492aa08..f27f5c7 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -31,4 +31,12 @@ config INTEL_IOATDMA
default m
---help---
  Enable support for the Intel(R) I/OAT DMA engine.
+
+config INTEL_IOP_ADMA
+tristate "Intel IOP ADMA support"
+depends on DMA_ENGINE && (ARCH_IOP32X || ARCH_IOP33X || ARCH_IOP13XX)
+default m
+---help---
+  Enable support for the Intel(R) IOP Series RAID engines.
+
 endmenu
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index bdcfdbd..b3839b6 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_DMA_ENGINE) += dmaengine.o
 obj-$(CONFIG_NET_DMA) += iovlock.o
 obj-$(CONFIG_INTEL_IOATDMA) += ioatdma.o
+obj-$(CONFIG_INTEL_IOP_ADMA) += iop-adma.o
diff --git a/drivers/dma/iop-adma.c b/drivers/dma/iop-adma.c
new file mode 100644
index 000..3db12d6
--- /dev/null
+++ b/drivers/dma/iop-adma.c
@@ -0,0 +1,1465 @@
+/*
+ * offload engine driver for the Intel Xscale series of i/o processors
+ * Copyright © 2006, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+
+/*

[md-accel PATCH 14/19] md: handle_stripe5 - request io processing in raid5_run_ops

2007-06-26 Thread Dan Williams
I/O submission requests were already handled outside of the stripe lock in
handle_stripe.  Now that handle_stripe is only tasked with finding work,
this logic belongs in raid5_run_ops.

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
Acked-By: NeilBrown <[EMAIL PROTECTED]>
---

 drivers/md/raid5.c |   71 ++--
 1 files changed, 13 insertions(+), 58 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index e0ae26d..a09bc5f 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2319,6 +2319,9 @@ static void 
handle_issuing_new_write_requests5(raid5_conf_t *conf,
"%d for r-m-w\n", i);
set_bit(R5_LOCKED, >flags);
set_bit(R5_Wantread, >flags);
+   if (!test_and_set_bit(
+   STRIPE_OP_IO, >ops.pending))
+   sh->ops.count++;
s->locked++;
} else {
set_bit(STRIPE_DELAYED, >state);
@@ -2342,6 +2345,9 @@ static void 
handle_issuing_new_write_requests5(raid5_conf_t *conf,
"%d for Reconstruct\n", i);
set_bit(R5_LOCKED, >flags);
set_bit(R5_Wantread, >flags);
+   if (!test_and_set_bit(
+   STRIPE_OP_IO, >ops.pending))
+   sh->ops.count++;
s->locked++;
} else {
set_bit(STRIPE_DELAYED, >state);
@@ -2538,6 +2544,9 @@ static void handle_parity_checks5(raid5_conf_t *conf, 
struct stripe_head *sh,
 
set_bit(R5_LOCKED, >flags);
set_bit(R5_Wantwrite, >flags);
+   if (!test_and_set_bit(STRIPE_OP_IO, >ops.pending))
+   sh->ops.count++;
+
clear_bit(STRIPE_DEGRADED, >state);
s->locked++;
set_bit(STRIPE_INSYNC, >state);
@@ -2923,12 +2932,16 @@ static void handle_stripe5(struct stripe_head *sh)
dev = >dev[s.failed_num];
if (!test_bit(R5_ReWrite, >flags)) {
set_bit(R5_Wantwrite, >flags);
+   if (!test_and_set_bit(STRIPE_OP_IO, >ops.pending))
+   sh->ops.count++;
set_bit(R5_ReWrite, >flags);
set_bit(R5_LOCKED, >flags);
s.locked++;
} else {
/* let's read it back */
set_bit(R5_Wantread, >flags);
+   if (!test_and_set_bit(STRIPE_OP_IO, >ops.pending))
+   sh->ops.count++;
set_bit(R5_LOCKED, >flags);
s.locked++;
}
@@ -2989,64 +3002,6 @@ static void handle_stripe5(struct stripe_head *sh)
  test_bit(BIO_UPTODATE, >bi_flags)
? 0 : -EIO);
}
-   for (i=disks; i-- ;) {
-   int rw;
-   struct bio *bi;
-   mdk_rdev_t *rdev;
-   if (test_and_clear_bit(R5_Wantwrite, >dev[i].flags))
-   rw = WRITE;
-   else if (test_and_clear_bit(R5_Wantread, >dev[i].flags))
-   rw = READ;
-   else
-   continue;
- 
-   bi = >dev[i].req;
- 
-   bi->bi_rw = rw;
-   if (rw == WRITE)
-   bi->bi_end_io = raid5_end_write_request;
-   else
-   bi->bi_end_io = raid5_end_read_request;
- 
-   rcu_read_lock();
-   rdev = rcu_dereference(conf->disks[i].rdev);
-   if (rdev && test_bit(Faulty, >flags))
-   rdev = NULL;
-   if (rdev)
-   atomic_inc(>nr_pending);
-   rcu_read_unlock();
- 
-   if (rdev) {
-   if (s.syncing || s.expanding || s.expanded)
-   md_sync_acct(rdev->bdev, STRIPE_SECTORS);
-
-   bi->bi_bdev = rdev->bdev;
-   pr_debug("for %llu schedule op %ld on disc %d\n",
-   (unsigned long long)sh->sector, bi->bi_rw, i);
-   atomic_inc(>count);
-   bi->bi_sector = sh->sector + rdev->data_offset;
-   bi->bi_flags = 1 << BIO_UPTODATE;
-   bi->bi_vcnt = 1;
-   bi->bi_max_vecs = 1;
-   bi->bi_idx = 0;
- 

[md-accel PATCH 15/19] md: remove raid5 compute_block and compute_parity5

2007-06-26 Thread Dan Williams
replaced by raid5_run_ops

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
Acked-By: NeilBrown <[EMAIL PROTECTED]>
---

 drivers/md/raid5.c |  124 
 1 files changed, 0 insertions(+), 124 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index a09bc5f..0579d1f 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1509,130 +1509,6 @@ static void copy_data(int frombio, struct bio *bio,
   }  \
} while(0)
 
-
-static void compute_block(struct stripe_head *sh, int dd_idx)
-{
-   int i, count, disks = sh->disks;
-   void *ptr[MAX_XOR_BLOCKS], *dest, *p;
-
-   pr_debug("compute_block, stripe %llu, idx %d\n",
-   (unsigned long long)sh->sector, dd_idx);
-
-   dest = page_address(sh->dev[dd_idx].page);
-   memset(dest, 0, STRIPE_SIZE);
-   count = 0;
-   for (i = disks ; i--; ) {
-   if (i == dd_idx)
-   continue;
-   p = page_address(sh->dev[i].page);
-   if (test_bit(R5_UPTODATE, >dev[i].flags))
-   ptr[count++] = p;
-   else
-   printk(KERN_ERR "compute_block() %d, stripe %llu, %d"
-   " not present\n", dd_idx,
-   (unsigned long long)sh->sector, i);
-
-   check_xor();
-   }
-   if (count)
-   xor_blocks(count, STRIPE_SIZE, dest, ptr);
-   set_bit(R5_UPTODATE, >dev[dd_idx].flags);
-}
-
-static void compute_parity5(struct stripe_head *sh, int method)
-{
-   raid5_conf_t *conf = sh->raid_conf;
-   int i, pd_idx = sh->pd_idx, disks = sh->disks, count;
-   void *ptr[MAX_XOR_BLOCKS], *dest;
-   struct bio *chosen;
-
-   pr_debug("compute_parity5, stripe %llu, method %d\n",
-   (unsigned long long)sh->sector, method);
-
-   count = 0;
-   dest = page_address(sh->dev[pd_idx].page);
-   switch(method) {
-   case READ_MODIFY_WRITE:
-   BUG_ON(!test_bit(R5_UPTODATE, >dev[pd_idx].flags));
-   for (i=disks ; i-- ;) {
-   if (i==pd_idx)
-   continue;
-   if (sh->dev[i].towrite &&
-   test_bit(R5_UPTODATE, >dev[i].flags)) {
-   ptr[count++] = page_address(sh->dev[i].page);
-   chosen = sh->dev[i].towrite;
-   sh->dev[i].towrite = NULL;
-
-   if (test_and_clear_bit(R5_Overlap, 
>dev[i].flags))
-   wake_up(>wait_for_overlap);
-
-   BUG_ON(sh->dev[i].written);
-   sh->dev[i].written = chosen;
-   check_xor();
-   }
-   }
-   break;
-   case RECONSTRUCT_WRITE:
-   memset(dest, 0, STRIPE_SIZE);
-   for (i= disks; i-- ;)
-   if (i!=pd_idx && sh->dev[i].towrite) {
-   chosen = sh->dev[i].towrite;
-   sh->dev[i].towrite = NULL;
-
-   if (test_and_clear_bit(R5_Overlap, 
>dev[i].flags))
-   wake_up(>wait_for_overlap);
-
-   BUG_ON(sh->dev[i].written);
-   sh->dev[i].written = chosen;
-   }
-   break;
-   case CHECK_PARITY:
-   break;
-   }
-   if (count) {
-   xor_blocks(count, STRIPE_SIZE, dest, ptr);
-   count = 0;
-   }
-   
-   for (i = disks; i--;)
-   if (sh->dev[i].written) {
-   sector_t sector = sh->dev[i].sector;
-   struct bio *wbi = sh->dev[i].written;
-   while (wbi && wbi->bi_sector < sector + STRIPE_SECTORS) 
{
-   copy_data(1, wbi, sh->dev[i].page, sector);
-   wbi = r5_next_bio(wbi, sector);
-   }
-
-   set_bit(R5_LOCKED, >dev[i].flags);
-   set_bit(R5_UPTODATE, >dev[i].flags);
-   }
-
-   switch(method) {
-   case RECONSTRUCT_WRITE:
-   case CHECK_PARITY:
-   for (i=disks; i--;)
-   if (i != pd_idx) {
-   ptr[count++] = page_address(sh->dev[i].page);
-   check_xor();
-   }
-   break;
-   case READ_MODIFY_WRITE:
-   for (i = disks; i--;)
-   if (sh->dev[i].written) {
-   ptr[count++] = page_address(sh->dev[i].page);
-   check_xor();
-   }
-  

[md-accel PATCH 11/19] md: handle_stripe5 - add request/completion logic for async check ops

2007-06-26 Thread Dan Williams
Check operations are scheduled when the array is being resynced or an
explicit 'check/repair' command was sent to the array.  Previously check
operations would destroy the parity block in the cache such that even if
parity turned out to be correct the parity block would be marked
!R5_UPTODATE at the completion of the check.  When the operation can be
carried out by a dma engine the assumption is that it can check parity as a
read-only operation.  If raid5_run_ops notices that the check was handled
by hardware it will preserve the R5_UPTODATE status of the parity disk.

When a check operation determines that the parity needs to be repaired we
reuse the existing compute block infrastructure to carry out the operation.
Repair operations imply an immediate write back of the data, so to
differentiate a repair from a normal compute operation the
STRIPE_OP_MOD_REPAIR_PD flag is added.

Changelog:
* remove test_and_set/test_and_clear BUG_ONs, Neil Brown

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
Acked-By: NeilBrown <[EMAIL PROTECTED]>
---

 drivers/md/raid5.c |   84 
 1 files changed, 65 insertions(+), 19 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 38b8167..89d3890 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2464,26 +2464,67 @@ static void handle_parity_checks5(raid5_conf_t *conf, 
struct stripe_head *sh,
struct stripe_head_state *s, int disks)
 {
set_bit(STRIPE_HANDLE, >state);
-   if (s->failed == 0) {
-   BUG_ON(s->uptodate != disks);
-   compute_parity5(sh, CHECK_PARITY);
-   s->uptodate--;
-   if (page_is_zero(sh->dev[sh->pd_idx].page)) {
-   /* parity is correct (on disc, not in buffer any more)
-*/
-   set_bit(STRIPE_INSYNC, >state);
-   } else {
-   conf->mddev->resync_mismatches += STRIPE_SECTORS;
-   if (test_bit(MD_RECOVERY_CHECK, >mddev->recovery))
-   /* don't try to repair!! */
+   /* Take one of the following actions:
+* 1/ start a check parity operation if (uptodate == disks)
+* 2/ finish a check parity operation and act on the result
+* 3/ skip to the writeback section if we previously
+*initiated a recovery operation
+*/
+   if (s->failed == 0 &&
+   !test_bit(STRIPE_OP_MOD_REPAIR_PD, >ops.pending)) {
+   if (!test_and_set_bit(STRIPE_OP_CHECK, >ops.pending)) {
+   BUG_ON(s->uptodate != disks);
+   clear_bit(R5_UPTODATE, >dev[sh->pd_idx].flags);
+   sh->ops.count++;
+   s->uptodate--;
+   } else if (
+  test_and_clear_bit(STRIPE_OP_CHECK, >ops.complete)) {
+   clear_bit(STRIPE_OP_CHECK, >ops.ack);
+   clear_bit(STRIPE_OP_CHECK, >ops.pending);
+
+   if (sh->ops.zero_sum_result == 0)
+   /* parity is correct (on disc,
+* not in buffer any more)
+*/
set_bit(STRIPE_INSYNC, >state);
else {
-   compute_block(sh, sh->pd_idx);
-   s->uptodate++;
+   conf->mddev->resync_mismatches +=
+   STRIPE_SECTORS;
+   if (test_bit(
+MD_RECOVERY_CHECK, >mddev->recovery))
+   /* don't try to repair!! */
+   set_bit(STRIPE_INSYNC, >state);
+   else {
+   set_bit(STRIPE_OP_COMPUTE_BLK,
+   >ops.pending);
+   set_bit(STRIPE_OP_MOD_REPAIR_PD,
+   >ops.pending);
+   set_bit(R5_Wantcompute,
+   >dev[sh->pd_idx].flags);
+   sh->ops.target = sh->pd_idx;
+   sh->ops.count++;
+   s->uptodate++;
+   }
}
}
}
-   if (!test_bit(STRIPE_INSYNC, >state)) {
+
+   /* check if we can clear a parity disk reconstruct */
+   if (test_bit(STRIPE_OP_COMPUTE_BLK, >ops.complete) &&
+   test_bit(STRIPE_OP_MOD_REPAIR_PD, >ops.pending)) {
+
+   clear_bit(STRIPE_OP_MOD_REPAIR_PD, >ops.pending);
+   clear_bit(STRIPE_OP_COMPUTE_BLK, >ops.complete);
+   

[md-accel PATCH 12/19] md: handle_stripe5 - add request/completion logic for async read ops

2007-06-26 Thread Dan Williams
When a read bio is attached to the stripe and the corresponding block is
marked R5_UPTODATE, then a read (biofill) operation is scheduled to copy
the data from the stripe cache to the bio buffer.  handle_stripe flags the
blocks to be operated on with the R5_Wantfill flag.  If new read requests
arrive while raid5_run_ops is running they will not be handled until
handle_stripe is scheduled to run again.

Changelog:
* cleanup to_read and to_fill accounting
* do not fail reads that have reached the cache

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
Acked-By: NeilBrown <[EMAIL PROTECTED]>
---

 drivers/md/raid5.c |   53 +---
 include/linux/raid/raid5.h |2 +-
 2 files changed, 26 insertions(+), 29 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 89d3890..3d0dca9 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2042,9 +2042,12 @@ handle_requests_to_failed_array(raid5_conf_t *conf, 
struct stripe_head *sh,
bi = bi2;
}
 
-   /* fail any reads if this device is non-operational */
-   if (!test_bit(R5_Insync, >dev[i].flags) ||
-   test_bit(R5_ReadError, >dev[i].flags)) {
+   /* fail any reads if this device is non-operational and
+* the data has not reached the cache yet.
+*/
+   if (!test_bit(R5_Wantfill, >dev[i].flags) &&
+   (!test_bit(R5_Insync, >dev[i].flags) ||
+ test_bit(R5_ReadError, >dev[i].flags))) {
bi = sh->dev[i].toread;
sh->dev[i].toread = NULL;
if (test_and_clear_bit(R5_Overlap, >dev[i].flags))
@@ -2733,37 +2736,27 @@ static void handle_stripe5(struct stripe_head *sh)
struct r5dev *dev = >dev[i];
clear_bit(R5_Insync, >flags);
 
-   pr_debug("check %d: state 0x%lx read %p write %p written %p\n",
-   i, dev->flags, dev->toread, dev->towrite, dev->written);
-   /* maybe we can reply to a read */
-   if (test_bit(R5_UPTODATE, >flags) && dev->toread) {
-   struct bio *rbi, *rbi2;
-   pr_debug("Return read for disc %d\n", i);
-   spin_lock_irq(>device_lock);
-   rbi = dev->toread;
-   dev->toread = NULL;
-   if (test_and_clear_bit(R5_Overlap, >flags))
-   wake_up(>wait_for_overlap);
-   spin_unlock_irq(>device_lock);
-   while (rbi && rbi->bi_sector < dev->sector + 
STRIPE_SECTORS) {
-   copy_data(0, rbi, dev->page, dev->sector);
-   rbi2 = r5_next_bio(rbi, dev->sector);
-   spin_lock_irq(>device_lock);
-   if (--rbi->bi_phys_segments == 0) {
-   rbi->bi_next = return_bi;
-   return_bi = rbi;
-   }
-   spin_unlock_irq(>device_lock);
-   rbi = rbi2;
-   }
-   }
+   pr_debug("check %d: state 0x%lx toread %p read %p write %p "
+   "written %p\n", i, dev->flags, dev->toread, dev->read,
+   dev->towrite, dev->written);
+
+   /* maybe we can request a biofill operation
+*
+* new wantfill requests are only permitted while
+* STRIPE_OP_BIOFILL is clear
+*/
+   if (test_bit(R5_UPTODATE, >flags) && dev->toread &&
+   !test_bit(STRIPE_OP_BIOFILL, >ops.pending))
+   set_bit(R5_Wantfill, >flags);
 
/* now count some things */
if (test_bit(R5_LOCKED, >flags)) s.locked++;
if (test_bit(R5_UPTODATE, >flags)) s.uptodate++;
if (test_bit(R5_Wantcompute, >flags)) s.compute++;
 
-   if (dev->toread)
+   if (test_bit(R5_Wantfill, >flags))
+   s.to_fill++;
+   else if (dev->toread)
s.to_read++;
if (dev->towrite) {
s.to_write++;
@@ -2786,6 +2779,10 @@ static void handle_stripe5(struct stripe_head *sh)
set_bit(R5_Insync, >flags);
}
rcu_read_unlock();
+
+   if (s.to_fill && !test_and_set_bit(STRIPE_OP_BIOFILL, >ops.pending))
+   sh->ops.count++;
+
pr_debug("locked=%d uptodate=%d to_read=%d"
" to_write=%d failed=%d failed_num=%d\n",
s.locked, s.uptodate, s.to_read, s.to_write,
diff --git a/include/linux/raid/raid5.h b/include/linux/raid/raid5.h
index 2d45eba..e9dfb2d 100644
--- 

[md-accel PATCH 13/19] md: handle_stripe5 - add request/completion logic for async expand ops

2007-06-26 Thread Dan Williams
When a stripe is being expanded bulk copying takes place to move the data
from the old stripe to the new.  Since raid5_run_ops only operates on one
stripe at a time these bulk copies are handled in-line under the stripe
lock.  In the dma offload case we poll for the completion of the operation.

After the data has been copied into the new stripe the parity needs to be
recalculated across the new disks.  We reuse the existing postxor
functionality to carry out this calculation.  By setting STRIPE_OP_POSTXOR
without setting STRIPE_OP_BIODRAIN the completion path in handle stripe
can differentiate expand operations from normal write operations.

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
Acked-By: NeilBrown <[EMAIL PROTECTED]>
---

 drivers/md/raid5.c |   50 ++
 1 files changed, 38 insertions(+), 12 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 3d0dca9..e0ae26d 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2646,6 +2646,7 @@ static void handle_stripe_expansion(raid5_conf_t *conf, 
struct stripe_head *sh,
/* We have read all the blocks in this stripe and now we need to
 * copy some of them into a target stripe for expand.
 */
+   struct dma_async_tx_descriptor *tx = NULL;
clear_bit(STRIPE_EXPAND_SOURCE, >state);
for (i = 0; i < sh->disks; i++)
if (i != sh->pd_idx && (r6s && i != r6s->qd_idx)) {
@@ -2671,9 +2672,12 @@ static void handle_stripe_expansion(raid5_conf_t *conf, 
struct stripe_head *sh,
release_stripe(sh2);
continue;
}
-   memcpy(page_address(sh2->dev[dd_idx].page),
-  page_address(sh->dev[i].page),
-  STRIPE_SIZE);
+
+   /* place all the copies on one channel */
+   tx = async_memcpy(sh2->dev[dd_idx].page,
+   sh->dev[i].page, 0, 0, STRIPE_SIZE,
+   ASYNC_TX_DEP_ACK, tx, NULL, NULL);
+
set_bit(R5_Expanded, >dev[dd_idx].flags);
set_bit(R5_UPTODATE, >dev[dd_idx].flags);
for (j = 0; j < conf->raid_disks; j++)
@@ -2686,6 +2690,12 @@ static void handle_stripe_expansion(raid5_conf_t *conf, 
struct stripe_head *sh,
set_bit(STRIPE_HANDLE, >state);
}
release_stripe(sh2);
+
+   /* done submitting copies, wait for them to complete */
+   if (i + 1 >= sh->disks) {
+   async_tx_ack(tx);
+   dma_wait_for_async_tx(tx);
+   }
}
 }
 
@@ -2924,18 +2934,34 @@ static void handle_stripe5(struct stripe_head *sh)
}
}
 
-   if (s.expanded && test_bit(STRIPE_EXPANDING, >state)) {
-   /* Need to write out all blocks after computing parity */
-   sh->disks = conf->raid_disks;
-   sh->pd_idx = stripe_to_pdidx(sh->sector, conf, 
conf->raid_disks);
-   compute_parity5(sh, RECONSTRUCT_WRITE);
+   /* Finish postxor operations initiated by the expansion
+* process
+*/
+   if (test_bit(STRIPE_OP_POSTXOR, >ops.complete) &&
+   !test_bit(STRIPE_OP_BIODRAIN, >ops.pending)) {
+
+   clear_bit(STRIPE_EXPANDING, >state);
+
+   clear_bit(STRIPE_OP_POSTXOR, >ops.pending);
+   clear_bit(STRIPE_OP_POSTXOR, >ops.ack);
+   clear_bit(STRIPE_OP_POSTXOR, >ops.complete);
+
for (i = conf->raid_disks; i--; ) {
-   set_bit(R5_LOCKED, >dev[i].flags);
-   s.locked++;
set_bit(R5_Wantwrite, >dev[i].flags);
+   if (!test_and_set_bit(STRIPE_OP_IO, >ops.pending))
+   sh->ops.count++;
}
-   clear_bit(STRIPE_EXPANDING, >state);
-   } else if (s.expanded) {
+   }
+
+   if (s.expanded && test_bit(STRIPE_EXPANDING, >state) &&
+   !test_bit(STRIPE_OP_POSTXOR, >ops.pending)) {
+   /* Need to write out all blocks after computing parity */
+   sh->disks = conf->raid_disks;
+   sh->pd_idx = stripe_to_pdidx(sh->sector, conf,
+   conf->raid_disks);
+   s.locked += handle_write_operations5(sh, 0, 1);
+   } else if (s.expanded &&
+   !test_bit(STRIPE_OP_POSTXOR, >ops.pending)) {
clear_bit(STRIPE_EXPAND_READY, >state);
atomic_dec(>reshape_stripes);
wake_up(>wait_for_overlap);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  

[md-accel PATCH 09/19] md: handle_stripe5 - add request/completion logic for async write ops

2007-06-26 Thread Dan Williams
After handle_stripe5 decides whether it wants to perform a
read-modify-write, or a reconstruct write it calls
handle_write_operations5.  A read-modify-write operation will perform an
xor subtraction of the blocks marked with the R5_Wantprexor flag, copy the
new data into the stripe (biodrain) and perform a postxor operation across
all up-to-date blocks to generate the new parity.  A reconstruct write is run
when all blocks are already up-to-date in the cache so all that is needed
is a biodrain and postxor.

On the completion path STRIPE_OP_PREXOR will be set if the operation was a
read-modify-write.  The STRIPE_OP_BIODRAIN flag is used in the completion
path to differentiate write-initiated postxor operations versus
expansion-initiated postxor operations.  Completion of a write triggers i/o
to the drives.

Changelog:
* make the 'rcw' parameter to handle_write_operations5 a simple flag, Neil Brown
* remove test_and_set/test_and_clear BUG_ONs, Neil Brown

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
Acked-By: NeilBrown <[EMAIL PROTECTED]>
---

 drivers/md/raid5.c |  161 +---
 1 files changed, 138 insertions(+), 23 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 7c688f6..b2e88fe 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1815,7 +1815,79 @@ static void compute_block_2(struct stripe_head *sh, int 
dd_idx1, int dd_idx2)
}
 }
 
+static int
+handle_write_operations5(struct stripe_head *sh, int rcw, int expand)
+{
+   int i, pd_idx = sh->pd_idx, disks = sh->disks;
+   int locked = 0;
+
+   if (rcw) {
+   /* if we are not expanding this is a proper write request, and
+* there will be bios with new data to be drained into the
+* stripe cache
+*/
+   if (!expand) {
+   set_bit(STRIPE_OP_BIODRAIN, >ops.pending);
+   sh->ops.count++;
+   }
+
+   set_bit(STRIPE_OP_POSTXOR, >ops.pending);
+   sh->ops.count++;
+
+   for (i = disks; i--; ) {
+   struct r5dev *dev = >dev[i];
+
+   if (dev->towrite) {
+   set_bit(R5_LOCKED, >flags);
+   if (!expand)
+   clear_bit(R5_UPTODATE, >flags);
+   locked++;
+   }
+   }
+   } else {
+   BUG_ON(!(test_bit(R5_UPTODATE, >dev[pd_idx].flags) ||
+   test_bit(R5_Wantcompute, >dev[pd_idx].flags)));
+
+   set_bit(STRIPE_OP_PREXOR, >ops.pending);
+   set_bit(STRIPE_OP_BIODRAIN, >ops.pending);
+   set_bit(STRIPE_OP_POSTXOR, >ops.pending);
+
+   sh->ops.count += 3;
+
+   for (i = disks; i--; ) {
+   struct r5dev *dev = >dev[i];
+   if (i == pd_idx)
+   continue;
+
+   /* For a read-modify write there may be blocks that are
+* locked for reading while others are ready to be
+* written so we distinguish these blocks by the
+* R5_Wantprexor bit
+*/
+   if (dev->towrite &&
+   (test_bit(R5_UPTODATE, >flags) ||
+   test_bit(R5_Wantcompute, >flags))) {
+   set_bit(R5_Wantprexor, >flags);
+   set_bit(R5_LOCKED, >flags);
+   clear_bit(R5_UPTODATE, >flags);
+   locked++;
+   }
+   }
+   }
+
+   /* keep the parity disk locked while asynchronous operations
+* are in flight
+*/
+   set_bit(R5_LOCKED, >dev[pd_idx].flags);
+   clear_bit(R5_UPTODATE, >dev[pd_idx].flags);
+   locked++;
 
+   pr_debug("%s: stripe %llu locked: %d pending: %lx\n",
+   __FUNCTION__, (unsigned long long)sh->sector,
+   locked, sh->ops.pending);
+
+   return locked;
+}
 
 /*
  * Each stripe/dev can have one or more bion attached.
@@ -2210,27 +2282,8 @@ static void 
handle_issuing_new_write_requests5(raid5_conf_t *conf,
 * we can start a write request
 */
if (s->locked == 0 && (rcw == 0 || rmw == 0) &&
-   !test_bit(STRIPE_BIT_DELAY, >state)) {
-   pr_debug("Computing parity...\n");
-   compute_parity5(sh, rcw == 0 ?
-   RECONSTRUCT_WRITE : READ_MODIFY_WRITE);
-   /* now every locked buffer is ready to be written */
-   for (i = disks; i--; )
-   if (test_bit(R5_LOCKED, >dev[i].flags)) {
-   pr_debug("Writing block %d\n", i);
-   s->locked++;
-  

[md-accel PATCH 10/19] md: handle_stripe5 - add request/completion logic for async compute ops

2007-06-26 Thread Dan Williams
handle_stripe will compute a block when a backing disk has failed, or when
it determines it can save a disk read by computing the block from all the
other up-to-date blocks.

Previously a block would be computed under the lock and subsequent logic in
handle_stripe could use the newly up-to-date block.  With the raid5_run_ops
implementation the compute operation is carried out a later time outside
the lock.  To preserve the old functionality we take advantage of the
dependency chain feature of async_tx to flag the block as R5_Wantcompute
and then let other parts of handle_stripe operate on the block as if it
were up-to-date.  raid5_run_ops guarantees that the block will be ready
before it is used in another operation.

However, this only works in cases where the compute and the dependent
operation are scheduled at the same time.  If a previous call to
handle_stripe sets the R5_Wantcompute flag there is no facility to pass the
async_tx dependency chain across successive calls to raid5_run_ops.  The
req_compute variable protects against this case.

Changelog:
* remove the req_compute BUG_ON

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
Acked-By: NeilBrown <[EMAIL PROTECTED]>
---

 drivers/md/raid5.c |  149 ++--
 include/linux/raid/raid5.h |2 -
 2 files changed, 115 insertions(+), 36 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index b2e88fe..38b8167 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2070,36 +2070,101 @@ handle_requests_to_failed_array(raid5_conf_t *conf, 
struct stripe_head *sh,
 
 }
 
+/* __handle_issuing_new_read_requests5 - returns 0 if there are no more disks
+ * to process
+ */
+static int __handle_issuing_new_read_requests5(struct stripe_head *sh,
+   struct stripe_head_state *s, int disk_idx, int disks)
+{
+   struct r5dev *dev = >dev[disk_idx];
+   struct r5dev *failed_dev = >dev[s->failed_num];
+
+   /* don't schedule compute operations or reads on the parity block while
+* a check is in flight
+*/
+   if ((disk_idx == sh->pd_idx) &&
+test_bit(STRIPE_OP_CHECK, >ops.pending))
+   return ~0;
+
+   /* is the data in this block needed, and can we get it? */
+   if (!test_bit(R5_LOCKED, >flags) &&
+   !test_bit(R5_UPTODATE, >flags) && (dev->toread ||
+   (dev->towrite && !test_bit(R5_OVERWRITE, >flags)) ||
+s->syncing || s->expanding || (s->failed &&
+(failed_dev->toread || (failed_dev->towrite &&
+!test_bit(R5_OVERWRITE, _dev->flags)
+) {
+   /* 1/ We would like to get this block, possibly by computing it,
+* but we might not be able to.
+*
+* 2/ Since parity check operations potentially make the parity
+* block !uptodate it will need to be refreshed before any
+* compute operations on data disks are scheduled.
+*
+* 3/ We hold off parity block re-reads until check operations
+* have quiesced.
+*/
+   if ((s->uptodate == disks - 1) &&
+   !test_bit(STRIPE_OP_CHECK, >ops.pending)) {
+   set_bit(STRIPE_OP_COMPUTE_BLK, >ops.pending);
+   set_bit(R5_Wantcompute, >flags);
+   sh->ops.target = disk_idx;
+   s->req_compute = 1;
+   sh->ops.count++;
+   /* Careful: from this point on 'uptodate' is in the eye
+* of raid5_run_ops which services 'compute' operations
+* before writes. R5_Wantcompute flags a block that will
+* be R5_UPTODATE by the time it is needed for a
+* subsequent operation.
+*/
+   s->uptodate++;
+   return 0; /* uptodate + compute == disks */
+   } else if ((s->uptodate < disks - 1) &&
+   test_bit(R5_Insync, >flags)) {
+   /* Note: we hold off compute operations while checks are
+* in flight, but we still prefer 'compute' over 'read'
+* hence we only read if (uptodate < * disks-1)
+*/
+   set_bit(R5_LOCKED, >flags);
+   set_bit(R5_Wantread, >flags);
+   if (!test_and_set_bit(STRIPE_OP_IO, >ops.pending))
+   sh->ops.count++;
+   s->locked++;
+   pr_debug("Reading block %d (sync=%d)\n", disk_idx,
+   s->syncing);
+   }
+   }
+
+   return ~0;
+}
+
 static void handle_issuing_new_read_requests5(struct stripe_head *sh,
struct stripe_head_state *s, int disks)
 {
int 

[md-accel PATCH 07/19] md: raid5_run_ops - run stripe operations outside sh->lock

2007-06-26 Thread Dan Williams
When the raid acceleration work was proposed, Neil laid out the following
attack plan:

1/ move the xor and copy operations outside spin_lock(>lock)
2/ find/implement an asynchronous offload api

The raid5_run_ops routine uses the asynchronous offload api (async_tx) and
the stripe_operations member of a stripe_head to carry out xor+copy
operations asynchronously, outside the lock.

To perform operations outside the lock a new set of state flags is needed
to track new requests, in-flight requests, and completed requests.  In this
new model handle_stripe is tasked with scanning the stripe_head for work,
updating the stripe_operations structure, and finally dropping the lock and
calling raid5_run_ops for processing.  The following flags outline the
requests that handle_stripe can make of raid5_run_ops:

STRIPE_OP_BIOFILL
 - copy data into request buffers to satisfy a read request
STRIPE_OP_COMPUTE_BLK
 - generate a missing block in the cache from the other blocks
STRIPE_OP_PREXOR
 - subtract existing data as part of the read-modify-write process
STRIPE_OP_BIODRAIN
 - copy data out of request buffers to satisfy a write request
STRIPE_OP_POSTXOR
 - recalculate parity for new data that has entered the cache
STRIPE_OP_CHECK
 - verify that the parity is correct
STRIPE_OP_IO
 - submit i/o to the member disks (note this was already performed outside
   the stripe lock, but it made sense to add it as an operation type

The flow is:
1/ handle_stripe sets STRIPE_OP_* in sh->ops.pending
2/ raid5_run_ops reads sh->ops.pending, sets sh->ops.ack, and submits the
   operation to the async_tx api
3/ async_tx triggers the completion callback routine to set
   sh->ops.complete and release the stripe
4/ handle_stripe runs again to finish the operation and optionally submit
   new operations that were previously blocked

Note this patch just defines raid5_run_ops, subsequent commits (one per
major operation type) modify handle_stripe to take advantage of this
routine.

Changelog:
* removed ops_complete_biodrain in favor of ops_complete_postxor and
  ops_complete_write.
* removed the raid5_run_ops workqueue
* call bi_end_io for reads in ops_complete_biofill, saves a call to
  handle_stripe
* explicitly handle the 2-disk raid5 case (xor becomes memcpy), Neil Brown
* fix race between async engines and bi_end_io call for reads, Neil Brown
* remove unnecessary spin_lock from ops_complete_biofill
* remove test_and_set/test_and_clear BUG_ONs, Neil Brown
* remove explicit interrupt handling for channel switching, this feature
  was absorbed (i.e. it is now implicit) by the async_tx api

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
Acked-By: NeilBrown <[EMAIL PROTECTED]>
---

 drivers/md/raid5.c |  546 
 include/linux/raid/raid5.h |   81 ++-
 2 files changed, 624 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index d21fa7a..34fcda0 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -52,6 +52,7 @@
 #include "raid6.h"
 
 #include 
+#include 
 
 /*
  * Stripe cache
@@ -324,6 +325,551 @@ static struct stripe_head *get_active_stripe(raid5_conf_t 
*conf, sector_t sector
return sh;
 }
 
+static int
+raid5_end_read_request(struct bio *bi, unsigned int bytes_done, int error);
+static int
+raid5_end_write_request (struct bio *bi, unsigned int bytes_done, int error);
+
+static void ops_run_io(struct stripe_head *sh)
+{
+   raid5_conf_t *conf = sh->raid_conf;
+   int i, disks = sh->disks;
+
+   might_sleep();
+
+   for (i = disks; i--; ) {
+   int rw;
+   struct bio *bi;
+   mdk_rdev_t *rdev;
+   if (test_and_clear_bit(R5_Wantwrite, >dev[i].flags))
+   rw = WRITE;
+   else if (test_and_clear_bit(R5_Wantread, >dev[i].flags))
+   rw = READ;
+   else
+   continue;
+
+   bi = >dev[i].req;
+
+   bi->bi_rw = rw;
+   if (rw == WRITE)
+   bi->bi_end_io = raid5_end_write_request;
+   else
+   bi->bi_end_io = raid5_end_read_request;
+
+   rcu_read_lock();
+   rdev = rcu_dereference(conf->disks[i].rdev);
+   if (rdev && test_bit(Faulty, >flags))
+   rdev = NULL;
+   if (rdev)
+   atomic_inc(>nr_pending);
+   rcu_read_unlock();
+
+   if (rdev) {
+   if (test_bit(STRIPE_SYNCING, >state) ||
+   test_bit(STRIPE_EXPAND_SOURCE, >state) ||
+   test_bit(STRIPE_EXPAND_READY, >state))
+   md_sync_acct(rdev->bdev, STRIPE_SECTORS);
+
+   bi->bi_bdev = rdev->bdev;
+   pr_debug("%s: for %llu schedule op %ld on disc %d\n",
+   __FUNCTION__, (unsigned 

[md-accel PATCH 08/19] md: common infrastructure for running operations with raid5_run_ops

2007-06-26 Thread Dan Williams
All the handle_stripe operations that are to be transitioned to use
raid5_run_ops need a method to coherently gather work under the stripe-lock
and hand that work off to raid5_run_ops.  The 'get_stripe_work' routine
runs under the lock to read all the bits in sh->ops.pending that do not
have the corresponding bit set in sh->ops.ack.  This modified 'pending'
bitmap is then passed to raid5_run_ops for processing.

The transition from 'ack' to 'completion' does not need similar protection
as the existing release_stripe infrastructure will guarantee that
handle_stripe will run again after a completion bit is set, and
handle_stripe can tolerate a sh->ops.completed bit being set while the lock
is held.

A call to async_tx_issue_pending_all() is added to raid5d to kick the
offload engines once all pending stripe operations work has been submitted.
This enables batching of the submission and completion of operations.

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
Acked-By: NeilBrown <[EMAIL PROTECTED]>
---

 drivers/md/raid5.c |   67 +---
 1 files changed, 58 insertions(+), 9 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 34fcda0..7c688f6 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -124,6 +124,7 @@ static void __release_stripe(raid5_conf_t *conf, struct 
stripe_head *sh)
}
md_wakeup_thread(conf->mddev->thread);
} else {
+   BUG_ON(sh->ops.pending);
if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, 
>state)) {
atomic_dec(>preread_active_stripes);
if (atomic_read(>preread_active_stripes) 
< IO_THRESHOLD)
@@ -225,7 +226,8 @@ static void init_stripe(struct stripe_head *sh, sector_t 
sector, int pd_idx, int
 
BUG_ON(atomic_read(>count) != 0);
BUG_ON(test_bit(STRIPE_HANDLE, >state));
-   
+   BUG_ON(sh->ops.pending || sh->ops.ack || sh->ops.complete);
+
CHECK_DEVLOCK();
pr_debug("init_stripe called, stripe %llu\n",
(unsigned long long)sh->sector);
@@ -241,11 +243,11 @@ static void init_stripe(struct stripe_head *sh, sector_t 
sector, int pd_idx, int
for (i = sh->disks; i--; ) {
struct r5dev *dev = >dev[i];
 
-   if (dev->toread || dev->towrite || dev->written ||
+   if (dev->toread || dev->read || dev->towrite || dev->written ||
test_bit(R5_LOCKED, >flags)) {
-   printk("sector=%llx i=%d %p %p %p %d\n",
+   printk(KERN_ERR "sector=%llx i=%d %p %p %p %p %d\n",
   (unsigned long long)sh->sector, i, dev->toread,
-  dev->towrite, dev->written,
+  dev->read, dev->towrite, dev->written,
   test_bit(R5_LOCKED, >flags));
BUG();
}
@@ -325,6 +327,44 @@ static struct stripe_head *get_active_stripe(raid5_conf_t 
*conf, sector_t sector
return sh;
 }
 
+/* test_and_ack_op() ensures that we only dequeue an operation once */
+#define test_and_ack_op(op, pend) \
+do {   \
+   if (test_bit(op, >ops.pending) &&   \
+   !test_bit(op, >ops.complete)) { \
+   if (test_and_set_bit(op, >ops.ack)) \
+   clear_bit(op, );   \
+   else\
+   ack++;  \
+   } else  \
+   clear_bit(op, );   \
+} while (0)
+
+/* find new work to run, do not resubmit work that is already
+ * in flight
+ */
+static unsigned long get_stripe_work(struct stripe_head *sh)
+{
+   unsigned long pending;
+   int ack = 0;
+
+   pending = sh->ops.pending;
+
+   test_and_ack_op(STRIPE_OP_BIOFILL, pending);
+   test_and_ack_op(STRIPE_OP_COMPUTE_BLK, pending);
+   test_and_ack_op(STRIPE_OP_PREXOR, pending);
+   test_and_ack_op(STRIPE_OP_BIODRAIN, pending);
+   test_and_ack_op(STRIPE_OP_POSTXOR, pending);
+   test_and_ack_op(STRIPE_OP_CHECK, pending);
+   if (test_and_clear_bit(STRIPE_OP_IO, >ops.pending))
+   ack++;
+
+   sh->ops.count -= ack;
+   BUG_ON(sh->ops.count < 0);
+
+   return pending;
+}
+
 static int
 raid5_end_read_request(struct bio *bi, unsigned int bytes_done, int error);
 static int
@@ -2487,7 +2527,6 @@ static void handle_stripe_expansion(raid5_conf_t *conf, 
struct stripe_head *sh,
  *schedule a write of some buffers
  *return confirmation of parity correctness
  *
- * Parity calculations are done inside the stripe lock
  * buffers are taken off read_list or write_list, and bh_cache buffers
  * get BH_Lock set before the stripe lock is released.

[md-accel PATCH 05/19] raid5: refactor handle_stripe5 and handle_stripe6 (v2)

2007-06-26 Thread Dan Williams
handle_stripe5 and handle_stripe6 have very deep logic paths handling the
various states of a stripe_head.  By introducing the 'stripe_head_state'
and 'r6_state' objects, large portions of the logic can be moved to
sub-routines.

'struct stripe_head_state' consumes all of the automatic variables that 
previously
stood alone in handle_stripe5,6.  'struct r6_state' contains the handle_stripe6
specific variables like p_failed and q_failed.

One of the nice side effects of the 'stripe_head_state' change is that it
allows for further reductions in code duplication between raid5 and raid6.
The following new routines are shared between raid5 and raid6:

handle_completed_write_requests
handle_requests_to_failed_array
handle_stripe_expansion

Changes in v2:
* fixed 'conf->raid_disk-1' for the raid6 'handle_stripe_expansion' path

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 drivers/md/raid5.c | 1488 +---
 include/linux/raid/raid5.h |   16 
 2 files changed, 737 insertions(+), 767 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 4f51dfa..94e0920 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1326,6 +1326,608 @@ static int stripe_to_pdidx(sector_t stripe, 
raid5_conf_t *conf, int disks)
return pd_idx;
 }
 
+static void
+handle_requests_to_failed_array(raid5_conf_t *conf, struct stripe_head *sh,
+   struct stripe_head_state *s, int disks,
+   struct bio **return_bi)
+{
+   int i;
+   for (i = disks; i--; ) {
+   struct bio *bi;
+   int bitmap_end = 0;
+
+   if (test_bit(R5_ReadError, >dev[i].flags)) {
+   mdk_rdev_t *rdev;
+   rcu_read_lock();
+   rdev = rcu_dereference(conf->disks[i].rdev);
+   if (rdev && test_bit(In_sync, >flags))
+   /* multiple read failures in one stripe */
+   md_error(conf->mddev, rdev);
+   rcu_read_unlock();
+   }
+   spin_lock_irq(>device_lock);
+   /* fail all writes first */
+   bi = sh->dev[i].towrite;
+   sh->dev[i].towrite = NULL;
+   if (bi) {
+   s->to_write--;
+   bitmap_end = 1;
+   }
+
+   if (test_and_clear_bit(R5_Overlap, >dev[i].flags))
+   wake_up(>wait_for_overlap);
+
+   while (bi && bi->bi_sector <
+   sh->dev[i].sector + STRIPE_SECTORS) {
+   struct bio *nextbi = r5_next_bio(bi, sh->dev[i].sector);
+   clear_bit(BIO_UPTODATE, >bi_flags);
+   if (--bi->bi_phys_segments == 0) {
+   md_write_end(conf->mddev);
+   bi->bi_next = *return_bi;
+   *return_bi = bi;
+   }
+   bi = nextbi;
+   }
+   /* and fail all 'written' */
+   bi = sh->dev[i].written;
+   sh->dev[i].written = NULL;
+   if (bi) bitmap_end = 1;
+   while (bi && bi->bi_sector <
+  sh->dev[i].sector + STRIPE_SECTORS) {
+   struct bio *bi2 = r5_next_bio(bi, sh->dev[i].sector);
+   clear_bit(BIO_UPTODATE, >bi_flags);
+   if (--bi->bi_phys_segments == 0) {
+   md_write_end(conf->mddev);
+   bi->bi_next = *return_bi;
+   *return_bi = bi;
+   }
+   bi = bi2;
+   }
+
+   /* fail any reads if this device is non-operational */
+   if (!test_bit(R5_Insync, >dev[i].flags) ||
+   test_bit(R5_ReadError, >dev[i].flags)) {
+   bi = sh->dev[i].toread;
+   sh->dev[i].toread = NULL;
+   if (test_and_clear_bit(R5_Overlap, >dev[i].flags))
+   wake_up(>wait_for_overlap);
+   if (bi) s->to_read--;
+   while (bi && bi->bi_sector <
+  sh->dev[i].sector + STRIPE_SECTORS) {
+   struct bio *nextbi =
+   r5_next_bio(bi, sh->dev[i].sector);
+   clear_bit(BIO_UPTODATE, >bi_flags);
+   if (--bi->bi_phys_segments == 0) {
+   bi->bi_next = *return_bi;
+   *return_bi = bi;
+   }
+   bi = nextbi;
+   }
+   }
+   spin_unlock_irq(>device_lock);
+

[md-accel PATCH 04/19] async_tx: add the async_tx api

2007-06-26 Thread Dan Williams
The async_tx api provides methods for describing a chain of asynchronous
bulk memory transfers/transforms with support for inter-transactional
dependencies.  It is implemented as a dmaengine client that smooths over
the details of different hardware offload engine implementations.  Code
that is written to the api can optimize for asynchronous operation and the
api will fit the chain of operations to the available offload resources. 
 
I imagine that any piece of ADMA hardware would register with the
'async_*' subsystem, and a call to async_X would be routed as
appropriate, or be run in-line. - Neil Brown

async_tx exploits the capabilities of struct dma_async_tx_descriptor to
provide an api of the following general format:

struct dma_async_tx_descriptor *
async_(..., struct dma_async_tx_descriptor *depend_tx,
dma_async_tx_callback cb_fn, void *cb_param)
{
struct dma_chan *chan = async_tx_find_channel(depend_tx, );
struct dma_device *device = chan ? chan->device : NULL;
int int_en = cb_fn ? 1 : 0;
struct dma_async_tx_descriptor *tx = device ?
device->device_prep_dma_(chan, len, int_en) : NULL;

if (tx) { /* run  asynchronously */
...
tx->tx_set_dest(addr, tx, index);
...
tx->tx_set_src(addr, tx, index);
...
async_tx_submit(chan, tx, flags, depend_tx, cb_fn, cb_param);
} else { /* run  synchronously */
...

...
async_tx_sync_epilog(flags, depend_tx, cb_fn, cb_param);
}

return tx;
}

async_tx_find_channel() returns a capable channel from its pool.  The
channel pool is organized as a per-cpu array of channel pointers.  The
async_tx_rebalance() routine is tasked with managing these arrays.  In the
uniprocessor case async_tx_rebalance() tries to spread responsibility
evenly over channels of similar capabilities.  For example if there are two
copy+xor channels, one will handle copy operations and the other will
handle xor.  In the SMP case async_tx_rebalance() attempts to spread the
operations evenly over the cpus, e.g. cpu0 gets copy channel0 and xor
channel0 while cpu1 gets copy channel 1 and xor channel 1.  When a
dependency is specified async_tx_find_channel defaults to keeping the
operation on the same channel.  A xor->copy->xor chain will stay on one
channel if it supports both operation types, otherwise the transaction will
transition between a copy and a xor resource.

Currently the raid5 implementation in the MD raid456 driver has been
converted to the async_tx api.  A driver for the offload engines on the
Intel Xscale series of I/O processors, iop-adma, is provided in a later
commit.  With the iop-adma driver and async_tx, raid456 is able to offload
copy, xor, and xor-zero-sum operations to hardware engines.
 
On iop342 tiobench showed higher throughput for sequential writes (20 - 30%
improvement) and sequential reads to a degraded array (40 - 55%
improvement).  For the other cases performance was roughly equal, +/- a few
percentage points.  On a x86-smp platform the performance of the async_tx
implementation (in synchronous mode) was also +/- a few percentage points
of the original implementation.  According to 'top' on iop342 CPU
utilization drops from ~50% to ~15% during a 'resync' while the speed
according to /proc/mdstat doubles from ~25 MB/s to ~50 MB/s.
 
The tiobench command line used for testing was: tiobench --size 2048
--block 4096 --block 131072 --dir /mnt/raid --numruns 5
* iop342 had 1GB of memory available

Details:
* if CONFIG_DMA_ENGINE=n the asynchronous path is compiled away by making
  async_tx_find_channel a static inline routine that always returns NULL
* when a callback is specified for a given transaction an interrupt will
  fire at operation completion time and the callback will occur in a
  tasklet.  if the the channel does not support interrupts then a live
  polling wait will be performed
* the api is written as a dmaengine client that requests all available
  channels
* In support of dependencies the api implicitly schedules channel-switch
  interrupts.  The interrupt triggers the cleanup tasklet which causes
  pending operations to be scheduled on the next channel
* Xor engines treat an xor destination address differently than a software
  xor routine.  To the software routine the destination address is an implied
  source, whereas engines treat it as a write-only destination.  This patch
  modifies the xor_blocks routine to take a an explicit destination address
  to mirror the hardware.

Changelog:
* fixed a leftover debug print
* don't allow callbacks in async_interrupt_cond
* fixed xor_block changes
* fixed usage of ASYNC_TX_XOR_DROP_DEST
* drop dma mapping methods, suggested by Chris Leech
* printk warning fixups from Andrew Morton
* don't use inline in C files, Adrian Bunk
* select 

[md-accel PATCH 06/19] raid5: replace custom debug PRINTKs with standard pr_debug

2007-06-26 Thread Dan Williams
Replaces PRINTK with pr_debug, and kills the RAID5_DEBUG definition in
favor of the global DEBUG definition.  To get local debug messages just add
'#define DEBUG' to the top of the file.

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 drivers/md/raid5.c |  116 ++--
 1 files changed, 58 insertions(+), 58 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 94e0920..d21fa7a 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -80,7 +80,6 @@
 /*
  * The following can be used to debug the driver
  */
-#define RAID5_DEBUG0
 #define RAID5_PARANOIA 1
 #if RAID5_PARANOIA && defined(CONFIG_SMP)
 # define CHECK_DEVLOCK() assert_spin_locked(>device_lock)
@@ -88,8 +87,7 @@
 # define CHECK_DEVLOCK()
 #endif
 
-#define PRINTK(x...) ((void)(RAID5_DEBUG && printk(x)))
-#if RAID5_DEBUG
+#ifdef DEBUG
 #define inline
 #define __inline__
 #endif
@@ -152,7 +150,8 @@ static void release_stripe(struct stripe_head *sh)
 
 static inline void remove_hash(struct stripe_head *sh)
 {
-   PRINTK("remove_hash(), stripe %llu\n", (unsigned long long)sh->sector);
+   pr_debug("remove_hash(), stripe %llu\n",
+   (unsigned long long)sh->sector);
 
hlist_del_init(>hash);
 }
@@ -161,7 +160,8 @@ static inline void insert_hash(raid5_conf_t *conf, struct 
stripe_head *sh)
 {
struct hlist_head *hp = stripe_hash(conf, sh->sector);
 
-   PRINTK("insert_hash(), stripe %llu\n", (unsigned long long)sh->sector);
+   pr_debug("insert_hash(), stripe %llu\n",
+   (unsigned long long)sh->sector);
 
CHECK_DEVLOCK();
hlist_add_head(>hash, hp);
@@ -226,7 +226,7 @@ static void init_stripe(struct stripe_head *sh, sector_t 
sector, int pd_idx, int
BUG_ON(test_bit(STRIPE_HANDLE, >state));

CHECK_DEVLOCK();
-   PRINTK("init_stripe called, stripe %llu\n", 
+   pr_debug("init_stripe called, stripe %llu\n",
(unsigned long long)sh->sector);
 
remove_hash(sh);
@@ -260,11 +260,11 @@ static struct stripe_head *__find_stripe(raid5_conf_t 
*conf, sector_t sector, in
struct hlist_node *hn;
 
CHECK_DEVLOCK();
-   PRINTK("__find_stripe, sector %llu\n", (unsigned long long)sector);
+   pr_debug("__find_stripe, sector %llu\n", (unsigned long long)sector);
hlist_for_each_entry(sh, hn, stripe_hash(conf, sector), hash)
if (sh->sector == sector && sh->disks == disks)
return sh;
-   PRINTK("__stripe %llu not in cache\n", (unsigned long long)sector);
+   pr_debug("__stripe %llu not in cache\n", (unsigned long long)sector);
return NULL;
 }
 
@@ -276,7 +276,7 @@ static struct stripe_head *get_active_stripe(raid5_conf_t 
*conf, sector_t sector
 {
struct stripe_head *sh;
 
-   PRINTK("get_stripe, sector %llu\n", (unsigned long long)sector);
+   pr_debug("get_stripe, sector %llu\n", (unsigned long long)sector);
 
spin_lock_irq(>device_lock);
 
@@ -537,8 +537,8 @@ static int raid5_end_read_request(struct bio * bi, unsigned 
int bytes_done,
if (bi == >dev[i].req)
break;
 
-   PRINTK("end_read_request %llu/%d, count: %d, uptodate %d.\n", 
-   (unsigned long long)sh->sector, i, atomic_read(>count), 
+   pr_debug("end_read_request %llu/%d, count: %d, uptodate %d.\n",
+   (unsigned long long)sh->sector, i, atomic_read(>count),
uptodate);
if (i == disks) {
BUG();
@@ -613,7 +613,7 @@ static int raid5_end_write_request (struct bio *bi, 
unsigned int bytes_done,
if (bi == >dev[i].req)
break;
 
-   PRINTK("end_write_request %llu/%d, count %d, uptodate: %d.\n", 
+   pr_debug("end_write_request %llu/%d, count %d, uptodate: %d.\n",
(unsigned long long)sh->sector, i, atomic_read(>count),
uptodate);
if (i == disks) {
@@ -658,7 +658,7 @@ static void error(mddev_t *mddev, mdk_rdev_t *rdev)
 {
char b[BDEVNAME_SIZE];
raid5_conf_t *conf = (raid5_conf_t *) mddev->private;
-   PRINTK("raid5: error called\n");
+   pr_debug("raid5: error called\n");
 
if (!test_bit(Faulty, >flags)) {
set_bit(MD_CHANGE_DEVS, >flags);
@@ -929,7 +929,7 @@ static void compute_block(struct stripe_head *sh, int 
dd_idx)
int i, count, disks = sh->disks;
void *ptr[MAX_XOR_BLOCKS], *dest, *p;
 
-   PRINTK("compute_block, stripe %llu, idx %d\n", 
+   pr_debug("compute_block, stripe %llu, idx %d\n",
(unsigned long long)sh->sector, dd_idx);
 
dest = page_address(sh->dev[dd_idx].page);
@@ -960,7 +960,7 @@ static void compute_parity5(struct stripe_head *sh, int 
method)
void *ptr[MAX_XOR_BLOCKS], *dest;
struct bio *chosen;
 
-   PRINTK("compute_parity5, stripe %llu, method %d\n",
+   pr_debug("compute_parity5, 

[md-accel PATCH 03/19] xor: make 'xor_blocks' a library routine for use with async_tx

2007-06-26 Thread Dan Williams
The async_tx api tries to use a dma engine for an operation, but will fall
back to an optimized software routine otherwise.  Xor support is
implemented using the raid5 xor routines.  For organizational purposes this
routine is moved to a common area.

The following fixes are also made:
* rename xor_block => xor_blocks, suggested by Adrian Bunk
* ensure that xor.o initializes before md.o in the built-in case
* checkpatch.pl fixes
* mark calibrate_xor_blocks __init, Adrian Bunk

Cc: Adrian Bunk <[EMAIL PROTECTED]>
Cc: NeilBrown <[EMAIL PROTECTED]>
Cc: Herbert Xu <[EMAIL PROTECTED]>
Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 crypto/Kconfig   |6 ++
 crypto/Makefile  |6 ++
 crypto/xor.c |  156 ++
 drivers/md/Kconfig   |1 
 drivers/md/Makefile  |4 +
 drivers/md/md.c  |2 -
 drivers/md/raid5.c   |   10 +--
 drivers/md/xor.c |  154 -
 include/linux/raid/xor.h |2 -
 9 files changed, 178 insertions(+), 163 deletions(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 4ca0ab3..b749a1a 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1,4 +1,10 @@
 #
+# Generic algorithms support
+#
+config XOR_BLOCKS
+   tristate
+
+#
 # Cryptographic API Configuration
 #
 
diff --git a/crypto/Makefile b/crypto/Makefile
index cce46a1..68e934b 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -50,3 +50,9 @@ obj-$(CONFIG_CRYPTO_MICHAEL_MIC) += michael_mic.o
 obj-$(CONFIG_CRYPTO_CRC32C) += crc32c.o
 
 obj-$(CONFIG_CRYPTO_TEST) += tcrypt.o
+
+#
+# generic algorithms and the async_tx api
+#
+obj-$(CONFIG_XOR_BLOCKS) += xor.o
+
diff --git a/crypto/xor.c b/crypto/xor.c
new file mode 100644
index 000..8281ac5
--- /dev/null
+++ b/crypto/xor.c
@@ -0,0 +1,156 @@
+/*
+ * xor.c : Multiple Devices driver for Linux
+ *
+ * Copyright (C) 1996, 1997, 1998, 1999, 2000,
+ * Ingo Molnar, Matti Aarnio, Jakub Jelinek, Richard Henderson.
+ *
+ * Dispatch optimized RAID-5 checksumming functions.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * You should have received a copy of the GNU General Public License
+ * (for example /usr/src/linux/COPYING); if not, write to the Free
+ * Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#define BH_TRACE 0
+#include 
+#include 
+#include 
+#include 
+
+/* The xor routines to use.  */
+static struct xor_block_template *active_template;
+
+void
+xor_blocks(unsigned int count, unsigned int bytes, void **ptr)
+{
+   unsigned long *p0, *p1, *p2, *p3, *p4;
+
+   p0 = (unsigned long *) ptr[0];
+   p1 = (unsigned long *) ptr[1];
+   if (count == 2) {
+   active_template->do_2(bytes, p0, p1);
+   return;
+   }
+
+   p2 = (unsigned long *) ptr[2];
+   if (count == 3) {
+   active_template->do_3(bytes, p0, p1, p2);
+   return;
+   }
+
+   p3 = (unsigned long *) ptr[3];
+   if (count == 4) {
+   active_template->do_4(bytes, p0, p1, p2, p3);
+   return;
+   }
+
+   p4 = (unsigned long *) ptr[4];
+   active_template->do_5(bytes, p0, p1, p2, p3, p4);
+}
+EXPORT_SYMBOL(xor_blocks);
+
+/* Set of all registered templates.  */
+static struct xor_block_template *template_list;
+
+#define BENCH_SIZE (PAGE_SIZE)
+
+static void
+do_xor_speed(struct xor_block_template *tmpl, void *b1, void *b2)
+{
+   int speed;
+   unsigned long now;
+   int i, count, max;
+
+   tmpl->next = template_list;
+   template_list = tmpl;
+
+   /*
+* Count the number of XORs done during a whole jiffy, and use
+* this to calculate the speed of checksumming.  We use a 2-page
+* allocation to have guaranteed color L1-cache layout.
+*/
+   max = 0;
+   for (i = 0; i < 5; i++) {
+   now = jiffies;
+   count = 0;
+   while (jiffies == now) {
+   mb(); /* prevent loop optimzation */
+   tmpl->do_2(BENCH_SIZE, b1, b2);
+   mb();
+   count++;
+   mb();
+   }
+   if (count > max)
+   max = count;
+   }
+
+   speed = max * (HZ * BENCH_SIZE / 1024);
+   tmpl->speed = speed;
+
+   printk(KERN_INFO "   %-10s: %5d.%03d MB/sec\n", tmpl->name,
+  speed / 1000, speed % 1000);
+}
+
+static int __init
+calibrate_xor_blocks(void)
+{
+   void *b1, *b2;
+   struct xor_block_template *f, *fastest;
+
+   b1 = (void *) __get_free_pages(GFP_KERNEL, 2);
+   if (!b1) {
+   printk(KERN_WARNING "xor: Yikes!  No memory available.\n");
+   return 

[md-accel PATCH 02/19] dmaengine: make clients responsible for managing channels

2007-06-26 Thread Dan Williams
The current implementation assumes that a channel will only be used by one
client at a time.  In order to enable channel sharing the dmaengine core is
changed to a model where clients subscribe to channel-available-events.
Instead of tracking how many channels a client wants and how many it has
received the core just broadcasts the available channels and lets the
clients optionally take a reference.  The core learns about the clients'
needs at dma_event_callback time.

In support of multiple operation types, clients can specify a capability
mask to only be notified of channels that satisfy a certain set of
capabilities.

Changelog:
* removed DMA_TX_ARRAY_INIT, no longer needed
* dma_client_chan_free -> dma_chan_release: switch to global reference
  counting only at device unregistration time, before it was also happening
  at client unregistration time
* clients now return dma_state_client to dmaengine (ack, dup, nak)
* checkpatch.pl fixes

Cc: Chris Leech <[EMAIL PROTECTED]>
Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 drivers/dma/dmaengine.c   |  217 +++--
 drivers/dma/ioatdma.c |1 
 drivers/dma/ioatdma.h |3 -
 include/linux/dmaengine.h |   58 +++-
 net/core/dev.c|  112 ---
 5 files changed, 224 insertions(+), 167 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 379809f..5c5378e 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -37,11 +37,11 @@
  * Each device has a channels list, which runs unlocked but is never modified
  * once the device is registered, it's just setup by the driver.
  *
- * Each client has a channels list, it's only modified under the client->lock
- * and in an RCU callback, so it's safe to read under rcu_read_lock().
+ * Each client is responsible for keeping track of the channels it uses.  See
+ * the definition of dma_event_callback in dmaengine.h.
  *
  * Each device has a kref, which is initialized to 1 when the device is
- * registered. A kref_put is done for each class_device registered.  When the
+ * registered. A kref_get is done for each class_device registered.  When the
  * class_device is released, the coresponding kref_put is done in the release
  * method. Every time one of the device's channels is allocated to a client,
  * a kref_get occurs.  When the channel is freed, the coresponding kref_put
@@ -51,10 +51,12 @@
  * references to finish.
  *
  * Each channel has an open-coded implementation of Rusty Russell's "bigref,"
- * with a kref and a per_cpu local_t.  A single reference is set when on an
- * ADDED event, and removed with a REMOVE event.  Net DMA client takes an
- * extra reference per outstanding transaction.  The relase function does a
- * kref_put on the device. -ChrisL
+ * with a kref and a per_cpu local_t.  A dma_chan_get is called when a client
+ * signals that it wants to use a channel, and dma_chan_put is called when
+ * a channel is removed or a client using it is unregesitered.  A client can
+ * take extra references per outstanding transaction, as is the case with
+ * the NET DMA client.  The release function does a kref_put on the device.
+ * -ChrisL, DanW
  */
 
 #include 
@@ -102,8 +104,19 @@ static ssize_t show_bytes_transferred(struct class_device 
*cd, char *buf)
 static ssize_t show_in_use(struct class_device *cd, char *buf)
 {
struct dma_chan *chan = container_of(cd, struct dma_chan, class_dev);
+   int in_use = 0;
+
+   if (unlikely(chan->slow_ref) &&
+   atomic_read(>refcount.refcount) > 1)
+   in_use = 1;
+   else {
+   if (local_read(&(per_cpu_ptr(chan->local,
+   get_cpu())->refcount)) > 0)
+   in_use = 1;
+   put_cpu();
+   }
 
-   return sprintf(buf, "%d\n", (chan->client ? 1 : 0));
+   return sprintf(buf, "%d\n", in_use);
 }
 
 static struct class_device_attribute dma_class_attrs[] = {
@@ -129,42 +142,53 @@ static struct class dma_devclass = {
 
 /* --- client and device registration --- */
 
+#define dma_chan_satisfies_mask(chan, mask) \
+   __dma_chan_satisfies_mask((chan), &(mask))
+static int
+__dma_chan_satisfies_mask(struct dma_chan *chan, dma_cap_mask_t *want)
+{
+   dma_cap_mask_t has;
+
+   bitmap_and(has.bits, want->bits, chan->device->cap_mask.bits,
+   DMA_TX_TYPE_END);
+   return bitmap_equal(want->bits, has.bits, DMA_TX_TYPE_END);
+}
+
 /**
- * dma_client_chan_alloc - try to allocate a channel to a client
+ * dma_client_chan_alloc - try to allocate channels to a client
  * @client: _client
  *
  * Called with dma_list_mutex held.
  */
-static struct dma_chan *dma_client_chan_alloc(struct dma_client *client)
+static void dma_client_chan_alloc(struct dma_client *client)
 {
struct dma_device *device;
struct dma_chan *chan;
-   unsigned long flags;
int desc;   /* allocated descriptor 

[md-accel PATCH 01/19] dmaengine: refactor dmaengine around dma_async_tx_descriptor

2007-06-26 Thread Dan Williams
The current dmaengine interface defines mutliple routines per operation,
i.e. dma_async_memcpy_buf_to_buf, dma_async_memcpy_buf_to_page etc.  Adding
more operation types (xor, crc, etc) to this model would result in an
unmanageable number of method permutations.

Are we really going to add a set of hooks for each DMA engine
whizbang feature?
- Jeff Garzik

The descriptor creation process is refactored using the new common
dma_async_tx_descriptor structure.  Instead of per driver
do___to_ methods, drivers integrate
dma_async_tx_descriptor into their private software descriptor and then
define a 'prep' routine per operation.  The prep routine allocates a
descriptor and ensures that the tx_set_src, tx_set_dest, tx_submit routines
are valid.  Descriptor creation and submission becomes:

struct dma_device *dev;
struct dma_chan *chan;
struct dma_async_tx_descriptor *tx;

tx = dev->device_prep_dma_(chan, len, int_flag)
tx->tx_set_src(dma_addr_t, tx, index /* for multi-source ops */)
tx->tx_set_dest(dma_addr_t, tx, index)
tx->tx_submit(tx)

In addition to the refactoring, dma_async_tx_descriptor also lays the
groundwork for definining cross-channel-operation dependencies, and a
callback facility for asynchronous notification of operation completion.

Changelog:
* drop dma mapping methods, suggested by Chris Leech
* fix ioat_dma_dependency_added, also caught by Andrew Morton
* fix dma_sync_wait, change from Andrew Morton
* uninline large functions, change from Andrew Morton
* add tx->callback = NULL to dmaengine calls to interoperate with async_tx
  calls
* hookup ioat_tx_submit
* convert channel capabilities to a 'cpumask_t like' bitmap
* removed DMA_TX_ARRAY_INIT, no longer needed
* checkpatch.pl fixes
* make set_src, set_dest, and tx_submit descriptor specific methods

Cc: Jeff Garzik <[EMAIL PROTECTED]>
Cc: Chris Leech <[EMAIL PROTECTED]>
Cc: Shannon Nelson <[EMAIL PROTECTED]>
Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 drivers/dma/dmaengine.c   |  182 ++
 drivers/dma/ioatdma.c |  277 -
 drivers/dma/ioatdma.h |8 +
 include/linux/dmaengine.h |  230 +++--
 4 files changed, 455 insertions(+), 242 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 322ee29..379809f 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -59,6 +59,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -66,6 +67,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static DEFINE_MUTEX(dma_list_mutex);
 static LIST_HEAD(dma_device_list);
@@ -165,6 +167,24 @@ static struct dma_chan *dma_client_chan_alloc(struct 
dma_client *client)
return NULL;
 }
 
+enum dma_status dma_sync_wait(struct dma_chan *chan, dma_cookie_t cookie)
+{
+   enum dma_status status;
+   unsigned long dma_sync_wait_timeout = jiffies + msecs_to_jiffies(5000);
+
+   dma_async_issue_pending(chan);
+   do {
+   status = dma_async_is_tx_complete(chan, cookie, NULL, NULL);
+   if (time_after_eq(jiffies, dma_sync_wait_timeout)) {
+   printk(KERN_ERR "dma_sync_wait_timeout!\n");
+   return DMA_ERROR;
+   }
+   } while (status == DMA_IN_PROGRESS);
+
+   return status;
+}
+EXPORT_SYMBOL(dma_sync_wait);
+
 /**
  * dma_chan_cleanup - release a DMA channel's resources
  * @kref: kernel reference structure that contains the DMA channel device
@@ -322,6 +342,25 @@ int dma_async_device_register(struct dma_device *device)
if (!device)
return -ENODEV;
 
+   /* validate device routines */
+   BUG_ON(dma_has_cap(DMA_MEMCPY, device->cap_mask) &&
+   !device->device_prep_dma_memcpy);
+   BUG_ON(dma_has_cap(DMA_XOR, device->cap_mask) &&
+   !device->device_prep_dma_xor);
+   BUG_ON(dma_has_cap(DMA_ZERO_SUM, device->cap_mask) &&
+   !device->device_prep_dma_zero_sum);
+   BUG_ON(dma_has_cap(DMA_MEMSET, device->cap_mask) &&
+   !device->device_prep_dma_memset);
+   BUG_ON(dma_has_cap(DMA_ZERO_SUM, device->cap_mask) &&
+   !device->device_prep_dma_interrupt);
+
+   BUG_ON(!device->device_alloc_chan_resources);
+   BUG_ON(!device->device_free_chan_resources);
+   BUG_ON(!device->device_dependency_added);
+   BUG_ON(!device->device_is_tx_complete);
+   BUG_ON(!device->device_issue_pending);
+   BUG_ON(!device->dev);
+
init_completion(>done);
kref_init(>refcount);
device->dev_id = id++;
@@ -397,6 +436,149 @@ void dma_async_device_unregister(struct dma_device 
*device)
 }
 EXPORT_SYMBOL(dma_async_device_unregister);
 
+/**
+ * dma_async_memcpy_buf_to_buf - offloaded copy between virtual addresses
+ * @chan: DMA channel to offload copy to
+ * @dest: destination address (virtual)
+ * @src: source address (virtual)
+ * 

[md-accel PATCH 00/19] md raid acceleration and the async_tx api

2007-06-26 Thread Dan Williams
Greetings,

Per Andrew's suggestion this is the md raid5 acceleration patch set
updated with more thorough changelogs to lower the barrier to entry for
reviewers.  To get started with the code I would suggest the following
order:
[md-accel PATCH 01/19] dmaengine: refactor dmaengine around 
dma_async_tx_descriptor
[md-accel PATCH 04/19] async_tx: add the async_tx api
[md-accel PATCH 07/19] md: raid5_run_ops - run stripe operations outside 
sh->lock
[md-accel PATCH 16/19] dmaengine: driver for the iop32x, iop33x, and iop13xx 
raid engines

The patch set can be broken down into three main categories:
1/ API (async_tx: patches 1 - 4)
2/ implementation (md changes: patches 5 - 15)
3/ driver (iop-adma: patches 16 - 19)

I have worked with Neil to get approval of the category 2 changes.
However for the category 1 and 3 changes there was no obvious
merge-path/maintainer to work through.  I have thus far extrapolated
Neil's comments about 2 out to 1 and 3, Jeff gave some direction on a
early revision about the scalability of the API, and the patch set has
picked up various fixes and suggestions from being in -mm for a few
releases.  Please help me ensure that this code is ready for Linus to
pull for 2.6.23.

git://lost.foo-projects.org/~dwillia2/git/iop md-accel-linus

Dan Williams (19):
  dmaengine: refactor dmaengine around dma_async_tx_descriptor
  dmaengine: make clients responsible for managing channels
  xor: make 'xor_blocks' a library routine for use with async_tx
  async_tx: add the async_tx api
  raid5: refactor handle_stripe5 and handle_stripe6 (v2)
  raid5: replace custom debug PRINTKs with standard pr_debug
  md: raid5_run_ops - run stripe operations outside sh->lock
  md: common infrastructure for running operations with raid5_run_ops
  md: handle_stripe5 - add request/completion logic for async write ops
  md: handle_stripe5 - add request/completion logic for async compute ops
  md: handle_stripe5 - add request/completion logic for async check ops
  md: handle_stripe5 - add request/completion logic for async read ops
  md: handle_stripe5 - add request/completion logic for async expand ops
  md: handle_stripe5 - request io processing in raid5_run_ops
  md: remove raid5 compute_block and compute_parity5
  dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines
  iop13xx: surface the iop13xx adma units to the iop-adma driver
  iop3xx: surface the iop3xx DMA and AAU units to the iop-adma driver
  ARM: Add drivers/dma to arch/arm/Kconfig

Administrivia:
This patch set contains three new patches compared to the previous
release they are:
[md-accel PATCH 03/19] xor: make 'xor_blocks' a library routine for use with 
async_tx
[md-accel PATCH 05/19] raid5: refactor handle_stripe5 and handle_stripe6 (v2) 
[md-accel PATCH 06/19] raid5: replace custom debug PRINTKs with standard 
pr_debug

net/core/dev.c is touched by the following:
[md-accel PATCH 02/19] dmaengine: make clients responsible for managing channels
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [-mm patch] remove nobh_{prepare,commit}_write()

2007-06-26 Thread Nick Piggin
On Tue, Jun 26, 2007 at 02:33:35PM -0700, Randy Dunlap wrote:
> On Tue, 26 Jun 2007 14:23:20 -0700 Andrew Morton wrote:
> 
> > On Tue, 26 Jun 2007 15:48:58 -0500
> > Dave Kleikamp <[EMAIL PROTECTED]> wrote:
> > 
> > > On Tue, 2007-06-26 at 13:32 -0700, Andrew Morton wrote:
> > > > On Fri, 15 Jun 2007 00:15:55 +0200
> > > > Adrian Bunk <[EMAIL PROTECTED]> wrote:
> > > > 
> > > > > nobh_{prepare,commit}_write() are no longer used.
> > > > 
> > > > wth?  What happened to ext2 and ext3 nobh mode?  They seem to
> > > > have magically and unchangeloggedly disappeared?
> > > 
> > > They were removed with Nick's new aops patches.
> >^secretly
> > > 
> > 
> > That much I worked out for myself.  It's kinda staggering that a fairly
> > major feature in two fairly major filesystems got removed without even a
> > mention in the changelog.  I don't recall having seen it discussed in email
> > but I obviously missed that bit.
> > 
> > Look, I'm one micron from just dropping the whole lot.  These changes
> > simply have not received the amount of energy, effort, care, attention and
> > testing which a change of this magnitude requires.
> 
> so be sure to discuss that (not the patches themselves so much,
> but the process(es)) at the kernel summit etc

I did of course mention that nobh wasn't converted when sending the
patches.

I asked for comments about how much it is used in real world. Badari
was the only one who replied about that but we didn't reach a conclusion.

I don't know about energy, but I have seen lots of other patches cause
a lot more problems...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux Kernel include files

2007-06-26 Thread Kyle Moffett

On Jun 22, 2007, at 11:00:38, Adrian Bunk wrote:
It would certainly help if Joerg would tell what exactly breaks,  
but I spot one likely problem in include/asm-i386/types.h:


#if defined(__GNUC__) && !defined(__STRICT_ANSI__)
typedef __signed__ long long __s64;
typedef unsigned long long __u64;
#endif

It might make sense to remove the #if and simply require that a C  
compiler under Linux must know about the C99 "long long"?


Gah, this particular topic and a few other similar header- 
compatibility ones show up once a month on LKML; I should probably  
just make a patch to fix all the types.h files and be done with it.   
The proper solution is this:


# if __STDC_VERSION__ >= 19901L
typedef   signed long long __s64;
typedef unsigned long long __u64;
# elif defined(__GNUC__)
__extension__ typedef   signed long long __s64;
__extension__ typedef unsigned long long __u64;
# else
#  error "Your compiler doesn't support long long (IOW: It sucks).   
Please get a new one"

# endif

That way if you have any kind of vaguely-long-long-compatible  
compiler then it will work, and otherwise you'll get a nice useful  
error message.  It also makes sure that GCC doesn't spew warnings/ 
errors when in c89-pedantic mode.  The "__extension__" keyword is  
designed for use in implementation header files which want to use GCC- 
isms unconditionally.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] RFC: have tcp_recvmsg() check kthread_should_stop() and treat it as if it were signalled

2007-06-26 Thread Satyam Sharma

On 6/27/07, Satyam Sharma <[EMAIL PROTECTED]> wrote:

[...]
On 6/26/07, Oleg Nesterov <[EMAIL PROTECTED]> wrote:
> On 06/26, Satyam Sharma wrote:
[...]
> > So could we have signals in _addition_ to kthread_stop_info and change
> > kthread_should_stop() to check for both:
> >
> > kthread_stop_info.k == current && signal_pending(current)
>
> No, this can't work in general. Some kthreads do flush_signals/dequeue_signal,
> so TIF_SIGPENDING can be lost anyway.

Yup, I had thought of precisely this issue yesterday as well. The mental note
I made to myself was that the force_sig(SIGKILL) and wake_up_process() in
kthread_stop() must be atomic so that the following race is not possible:


Hmm, the issue seems to have more to do with the ordering of
flush_signals() w.r.t. checking kthread_should_stop() in the kthread's
code. I thought about how to tackle this, but there's no easy way to make
the stuff atomic like I thought earlier. The problem, like you mentioned,
is if the target kthread proactively flushes its signals by hand *before*
checking kthread_should_stop().

The only way out seems to be to simply outlaw flush_signals() in kthreads
(or anything to do with signals), but that would be impossible to enforce ...

Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH try #2] security: Convert LSM into a static interface

2007-06-26 Thread Kyle Moffett

On Jun 26, 2007, at 20:57:53, Crispin Cowan wrote:

Kyle Moffett wrote:
Let's go over the differences between "my fs" and "my LSM", and  
the similarities between "my VM" and "my LSM":  Filesystems don't  
get hooked from virtually every userspace-initiated operation,  
whereas both VMs and LSMs do.  VMs and LSMs attach anonymous state  
data to a large percentage of the allocated objects in the system,  
whereas filesystems allocate their own independent datastructure  
and use that.  Would you want to "rmmod ext3" and then "modprobe  
ext2" while you have an ext2-as-ext3 filesystem *mounted*???  If  
you want a good analogy, that's a better one than the "my fs can't  
be a module" crap.


This whole discussion boils down to 2 points:
  1) As currently implemented, no LSM may be safely rmmod-ed
  2) Someone has submitted a patch which fixes that problem (you  
can't

rmmod them at all, so no crashes)

If you really want to do modular LSMs, then you need to submit a  
patch which fixes all the race conditions in LSM removal *without*  
adding much extra overhead.  I'm sure if your solutions works then  
everyone will be much more open to modular LSMs.


Hmmm. You seem to be mostly concerned with safely rmmod'ing  
modules. In contrast, my main concern with the proposed patch is  
that it removes the ability to *insert* a module.


You must have missed this in my emails:
  2) When you "modprobe my_custom_security_module", how exactly do  
you expect that all the processes, files, shared memory segments,  
file descriptors, sockets, SYSV mutexes, packets, etc will get  
appropriate security pointers?  This isn't even solvable the same  
way the "rmmod" problem is, since most of that isn't even  
accessible without iterating over the ENTIRE dcache, icache, every  
process, every process' file-descriptors, every socket, every unix  
socket, every anonymous socket, every SYSV shm object, every  
currently-in-process packet.


I'd argue that security-module-insertion is actually MORE complicated  
than removal.  Here's one example:  TOMOYO cares about the process  
execution tree, but you can't penalize the no-LSM case by a percent  
or two to add that kind of data.  When TOMOYO is loaded, it wants to  
do access control based on process execution trees for which data  
DOES NOT EXIST!!!  Not only that, but the processes which originally  
ran the one you care about (and which you'd need to recreate that  
data) may have exited anywhere from seconds to years before.  It is  
fundamentally IMPOSSIBLE to recreate that data, even if you could  
solve the problems of how to do it while the system is running  
without racing with existing process operations.  Imagine a process  
which hasn't had security data tagged to it yet which opens thousands  
of FIFOs per second, waits for your tagging code to assign security  
data to them in the filesystem, and then removes them; if you did it  
right you could prevent the code from EVER completely tagging every  
object (even assuming you could recreate enough information).


Such a need to add extra security data to multiple classes of objects  
is *fundamental* to any security module (isn't that the whole  
point?)  As such, you can't just "modprobe" one and expect it to  
work.  That's like mounting an ext2 filesystem, and then later trying  
to "modprobe ext3" and dynamically switch to the ext3 code and enable  
journalling all at once ON THE MOUNTED FILESYSTEM!!!


Sure, theoretically it *could* be done, but the code complexity is  
hardly worth it (plus nobody has yet even tried posting patches to  
make it happen).


Consider the use case of joe admin who is running enterprise- 
supported RHEL or SLES, and wants to try some newfangled LSM  
FooSecureMod thingie.  So he grabs a machine, config's selinux=0 or  
apparmor=0 and loads his own module on boot, and plays with it. He  
even likes FooSecure, better than SELinux or AppArmor, and wants to  
roll it out across his data center.


Flatly impossible.  You simply cannot "load" a security module and  
hope to provide any useful information about the system's present  
state.  If you want comprehensive security it has to be there before  
a single byte of userspace code is executed.  SELinux sort-of handles  
unlabelled objects by treating them with a small set of initial  
"types", but that's only enough to get the system up enough to  
actually relabel objects with type-transitions (after init loads the  
selinux policy it reexecs itself, before doing anything else).


So to solve the problem James & Kyle are concerned with, and  
preserve user choice, how about we *only* remove the ability to  
rmmod, and leave in place the ability to modprobe? Or even easier,  
LSMs that don't want to be unloaded can just block rmmod, and  
simple LSMs that can be unloaded safely can permit it.


An LSM simple enough to unload would be too simple for anybody to  
want to load in the first place (even capabilities can have this  

Re: i386 boot fail, EIP in __change_page_attr:166

2007-06-26 Thread dave young

2007/6/26, Chuck Ebbert <[EMAIL PROTECTED]>:

On 06/25/2007 09:11 PM, dave young wrote:
> Hi,
>
> 2007/6/25, Chuck Ebbert <[EMAIL PROTECTED]>:
>> On 06/24/2007 11:43 PM, dave young wrote:
>> > Hi,
>> > I reconfig my kernel, boot and oops, EIP in __change_page_attr:166, I
>> > tried 2.6.22-rc4-mm2 and 2.6.22-rc5 , same result.
>> >
>> > Anyone has some clues?
>> >
>> > here is my config file:
>>
>> Where are the oops messages?
> Attached please find the screenshots. sorry for my phone camera resolution.
> screen1.png : vga=ask select mode 6
> screen2.png : normal 80x25 console

That's 2.6.22-rc4-mm2 which I don't have.

netsc520 is doing iounmap() of an area it did
ioremap_nocache() on earlier, because it has now failed to
find a device. Why it went BUG() I have no idea.



Hi, maybe some config option cause this issue, here is my current
working-ok config file:
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.22-rc5
# Tue Jun 19 16:10:01 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
# CONFIG_IPC_NS is not set
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
# CONFIG_TASKSTATS is not set
# CONFIG_UTS_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=14
# CONFIG_CPUSETS is not set
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y
CONFIG_STOP_MACHINE=y

#
# Block layer
#
CONFIG_BLOCK=y
CONFIG_LBD=y
# CONFIG_BLK_DEV_IO_TRACE is not set
CONFIG_LSF=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_AS=y
# CONFIG_DEFAULT_DEADLINE is not set
# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="anticipatory"

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_SMP=y
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_PARAVIRT is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MCORE2 is not set
CONFIG_MPENTIUM4=y
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
CONFIG_X86_GENERIC=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=7
CONFIG_X86_XADD=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_MODEL=4
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_NR_CPUS=8
# CONFIG_SCHED_SMT is not set
CONFIG_SCHED_MC=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set

Re: i386 boot fail, EIP in __change_page_attr:166

2007-06-26 Thread dave young

Hi,

2007/6/26, Jeremy Fitzhardinge <[EMAIL PROTECTED]>:

dave young wrote:
> Hi,
> I reconfig my kernel, boot and oops, EIP in __change_page_attr:166, I
> tried 2.6.22-rc4-mm2 and 2.6.22-rc5 , same result.

oops output?  dmesg output?  Hardware config?  How much memory?  How big
is your kernel?

J


kernel oops message only is  captured by camera, please find my
screenshot images.
memory size 1G
kernel size: 3.9M
lspci -vv output:
00:00.0 Host bridge: Intel Corporation 945G/GZ/P/PL Express Memory
Controller Hub (rev 02)
Subsystem: Dell Unknown device 01d2
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
SERR- TAbort-
SERR- TAbort-
Reset- FastB2B-
Capabilities: [88] #0d []
Capabilities: [80] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [90] Message Signalled Interrupts: 64bit- Queue=0/0 
Enable-
Address:   Data: 
Capabilities: [a0] Express Root Port (Slot+) IRQ 0
Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
Device: Latency L0s <64ns, L1 <1us
Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
Link: Supported Speed 2.5Gb/s, Width x16, ASPM L0s, Port 2
Link: Latency L0s <256ns, L1 <4us
Link: ASPM Disabled RCB 64 bytes CommClk+ ExtSynch-
Link: Speed 2.5Gb/s, Width x16
Slot: AtnBtn- PwrCtrl- MRL- AtnInd- PwrInd- HotPlug- Surpise-
Slot: Number 7680, PowerLimit 75.00
Slot: Enabled AtnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq-
Slot: AttnInd Off, PwrInd On, Power-
Root: Correctable- Non-Fatal- Fatal- PME-

00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High
Definition Audio Controller (rev 01)
Subsystem: Dell Unknown device 01d2
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
SERR- TAbort-
SERR- TAbort-
Reset- FastB2B-
Capabilities: [40] Express Root Port (Slot+) IRQ 0
Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
Device: Latency L0s unlimited, L1 unlimited
Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 1
Link: Latency L0s <256ns, L1 <4us
Link: ASPM Disabled RCB 64 bytes CommClk+ ExtSynch-
Link: Speed 2.5Gb/s, Width x0
Slot: AtnBtn- PwrCtrl- MRL- AtnInd- PwrInd- HotPlug+ Surpise+
Slot: Number 2, PowerLimit 10.00
Slot: Enabled AtnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq-
Slot: AttnInd Unknown, PwrInd Unknown, Power-
Root: Correctable- Non-Fatal- Fatal- PME-
Capabilities: [80] Message Signalled Interrupts: 64bit- Queue=0/0 
Enable-
Address:   Data: 
Capabilities: [90] #0d []
Capabilities: [a0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB
UHCI #1 (rev 01) (prog-if 00 [UHCI])
Subsystem: Dell Unknown device 01d2
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
SERR- TAbort-
SERR- TAbort-
SERR- TAbort-
SERR- TAbort-
SERR- TAbort-
SERR- TAbort-
Reset- FastB2B-
Capabilities: [50] #0d []

00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC
Interface Bridge (rev 01)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
SERR- TAbort-
SERR- TAbort-
SERR- TAbort-
SERR- TAbort-
SERR- TAbort-
SERR- TAbort-
SERR- http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 31/44] Add d_namespace_path() to compute namespace relative pathnames

2007-06-26 Thread jjohansen
In AppArmor, we are interested in pathnames relative to the namespace root.
This is the same as d_path() except for the root where the search ends. Add
a function for computing the namespace-relative path.

Signed-off-by: Andreas Gruenbacher <[EMAIL PROTECTED]>
Signed-off-by: John Johansen <[EMAIL PROTECTED]>

---
 fs/dcache.c|6 +++---
 fs/namespace.c |   27 +++
 include/linux/dcache.h |2 ++
 include/linux/mount.h  |2 ++
 4 files changed, 34 insertions(+), 3 deletions(-)

--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1779,9 +1779,9 @@ shouldnt_be_hashed:
  *
  * Returns the buffer or an error code.
  */
-static char *__d_path(struct dentry *dentry, struct vfsmount *vfsmnt,
- struct dentry *root, struct vfsmount *rootmnt,
- char *buffer, int buflen, int fail_deleted)
+char *__d_path(struct dentry *dentry, struct vfsmount *vfsmnt,
+  struct dentry *root, struct vfsmount *rootmnt,
+  char *buffer, int buflen, int fail_deleted)
 {
int namelen, is_slash, vfsmount_locked = 0;
 
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1868,3 +1868,30 @@ void __put_mnt_ns(struct mnt_namespace *
release_mounts(_list);
kfree(ns);
 }
+
+char *d_namespace_path(struct dentry *dentry, struct vfsmount *vfsmnt,
+  char *buf, int buflen)
+{
+   struct vfsmount *rootmnt, *nsrootmnt = NULL;
+   struct dentry *root = NULL;
+   char *res;
+
+   read_lock(>fs->lock);
+   rootmnt = mntget(current->fs->rootmnt);
+   read_unlock(>fs->lock);
+   spin_lock(_lock);
+   if (rootmnt->mnt_ns)
+   nsrootmnt = mntget(rootmnt->mnt_ns->root);
+   spin_unlock(_lock);
+   mntput(rootmnt);
+   if (nsrootmnt)
+   root = dget(nsrootmnt->mnt_root);
+   res = __d_path(dentry, vfsmnt, root, nsrootmnt, buf, buflen, 1);
+   dput(root);
+   mntput(nsrootmnt);
+   /* Prevent empty path for lazily unmounted filesystems. */
+   if (!IS_ERR(res) && *res == '\0')
+   *--res = '.';
+   return res;
+}
+EXPORT_SYMBOL(d_namespace_path);
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -299,6 +299,8 @@ extern int d_validate(struct dentry *, s
  */
 extern char *dynamic_dname(struct dentry *, char *, int, const char *, ...);
 
+extern char *__d_path(struct dentry *, struct vfsmount *, struct dentry *,
+ struct vfsmount *, char *, int, int);
 extern char * d_path(struct dentry *, struct vfsmount *, char *, int);
   
 /* Allocation counts.. */
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -103,5 +103,7 @@ extern void shrink_submounts(struct vfsm
 extern spinlock_t vfsmount_lock;
 extern dev_t name_to_dev_t(char *name);
 
+extern char *d_namespace_path(struct dentry *, struct vfsmount *, char *, int);
+
 #endif
 #endif /* _LINUX_MOUNT_H */

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[AppArmor 24/44] Pass struct vfsmount to the inode_getxattr LSM hook

2007-06-26 Thread jjohansen
This is needed for computing pathnames in the AppArmor LSM.

Signed-off-by: Tony Jones <[EMAIL PROTECTED]>
Signed-off-by: Andreas Gruenbacher <[EMAIL PROTECTED]>
Signed-off-by: John Johansen <[EMAIL PROTECTED]>

---
 fs/xattr.c   |2 +-
 include/linux/security.h |   13 -
 security/dummy.c |3 ++-
 security/selinux/hooks.c |3 ++-
 4 files changed, 13 insertions(+), 8 deletions(-)

--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -116,7 +116,7 @@ vfs_getxattr(struct dentry *dentry, stru
if (error)
return error;
 
-   error = security_inode_getxattr(dentry, name);
+   error = security_inode_getxattr(dentry, mnt, name);
if (error)
return error;
 
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -391,7 +391,7 @@ struct request_sock;
  * @value identified by @name for @dentry and @mnt.
  * @inode_getxattr:
  * Check permission before obtaining the extended attributes
- * identified by @name for @dentry.
+ * identified by @name for @dentry and @mnt.
  * Return 0 if permission is granted.
  * @inode_listxattr:
  * Check permission before obtaining the list of extended attribute 
@@ -1248,7 +1248,8 @@ struct security_operations {
 struct vfsmount *mnt,
 char *name, void *value,
 size_t size, int flags);
-   int (*inode_getxattr) (struct dentry *dentry, char *name);
+   int (*inode_getxattr) (struct dentry *dentry, struct vfsmount *mnt,
+  char *name);
int (*inode_listxattr) (struct dentry *dentry);
int (*inode_removexattr) (struct dentry *dentry, char *name);
const char *(*inode_xattr_getsuffix) (void);
@@ -1782,11 +1783,12 @@ static inline void security_inode_post_s
security_ops->inode_post_setxattr (dentry, mnt, name, value, size, 
flags);
 }
 
-static inline int security_inode_getxattr (struct dentry *dentry, char *name)
+static inline int security_inode_getxattr (struct dentry *dentry,
+   struct vfsmount *mnt, char *name)
 {
if (unlikely (IS_PRIVATE (dentry->d_inode)))
return 0;
-   return security_ops->inode_getxattr (dentry, name);
+   return security_ops->inode_getxattr (dentry, mnt, name);
 }
 
 static inline int security_inode_listxattr (struct dentry *dentry)
@@ -2487,7 +2489,8 @@ static inline void security_inode_post_s
 int flags)
 { }
 
-static inline int security_inode_getxattr (struct dentry *dentry, char *name)
+static inline int security_inode_getxattr (struct dentry *dentry,
+   struct vfsmount *mnt, char *name)
 {
return 0;
 }
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -368,7 +368,8 @@ static void dummy_inode_post_setxattr (s
 {
 }
 
-static int dummy_inode_getxattr (struct dentry *dentry, char *name)
+static int dummy_inode_getxattr (struct dentry *dentry,
+ struct vfsmount *mnt, char *name)
 {
return 0;
 }
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2393,7 +2393,8 @@ static void selinux_inode_post_setxattr(
return;
 }
 
-static int selinux_inode_getxattr (struct dentry *dentry, char *name)
+static int selinux_inode_getxattr (struct dentry *dentry, struct vfsmount *mnt,
+  char *name)
 {
return dentry_has_perm(current, NULL, dentry, FILE__GETATTR);
 }

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Re: [2.6.21.1] soft lockup when removing netconsole module

2007-06-26 Thread Andrew Morton
On Tue, 26 Jun 2007 17:46:13 -0700 "Wessel, Jason" <[EMAIL PROTECTED]> wrote:

> > >   }
> > >   }
> > 
> > Everything went quiet?
> > 
> > If this patch has been tested and fixes the bug, can you 
> > please send a version which is ready for merging?  (ie: add a 
> > suitable description of what it does).
> > 
> > 
> 
> I mailed Jarek separately.
> 
> I had tested the patch with netconsole and kgdb and it does in fact fix
> the problem that was reported.

OK, thanks.  Please don't mail people separately!

I queued this up with a null changelog for now.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH try #2] security: Convert LSM into a static interface

2007-06-26 Thread Crispin Cowan
Kyle Moffett wrote:
> Let's go over the differences between "my fs" and "my LSM", and the
> similarities between "my VM" and "my LSM":  Filesystems don't get
> hooked from virtually every userspace-initiated operation, whereas
> both VMs and LSMs do.  VMs and LSMs attach anonymous state data to a
> large percentage of the allocated objects in the system, whereas
> filesystems allocate their own independent datastructure and use
> that.  Would you want to "rmmod ext3" and then "modprobe ext2" while
> you have an ext2-as-ext3 filesystem *mounted*???  If you want a good
> analogy, that's a better one than the "my fs can't be a module" crap.
>
> This whole discussion boils down to 2 points:
>   1) As currently implemented, no LSM may be safely rmmod-ed
>   2) Someone has submitted a patch which fixes that problem (you can't
> rmmod them at all, so no crashes)
>
> If you really want to do modular LSMs, then you need to submit a patch
> which fixes all the race conditions in LSM removal *without* adding
> much extra overhead.  I'm sure if your solutions works then everyone
> will be much more open to modular LSMs.  I said this before:
Hmmm. You seem to be mostly concerned with safely rmmod'ing modules. In
contrast, my main concern with the proposed patch is that it removes the
ability to *insert* a module.

Consider the use case of joe admin who is running enterprise-supported
RHEL or SLES, and wants to try some newfangled LSM FooSecureMod thingie.
So he grabs a machine, config's selinux=0 or apparmor=0 and loads his
own module on boot, and plays with it. He even likes FooSecure, better
than SELinux or AppArmor, and wants to roll it out across his data center.

Without James's patch, he can do that, and at worst has a tainted
kernel. RH or Novell or his favorite distro vendor can fix that with a
wave of the hand and bless FooSecure as a module. With James's patch, he
has to patch his kernels, and then enterprise support is hopeless, to
say nothing of the barrier to entry that "patch and rebuild kernel" is
more than many admins are willing to do.

So to solve the problem James & Kyle are concerned with, and preserve
user choice, how about we *only* remove the ability to rmmod, and leave
in place the ability to modprobe? Or even easier, LSMs that don't want
to be unloaded can just block rmmod, and simple LSMs that can be
unloaded safely can permit it.

Crispin

-- 
Crispin Cowan, Ph.D.   http://crispincowan.com/~crispin/
Director of Software Engineering   http://novell.com
AppArmor Chat: irc.oftc.net/#apparmor

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 16/16] fix handling of integer constant expressions

2007-06-26 Thread Derek M Jones

Al Viro wrote:


Hopefully correct handling of integer constant expressions.  Please, review.

Am I invoking sparse wrongly?  ./sparse -W -Wall doesn't diagnose
the following TU, for example.

extern int a;
extern int as1[(a = 2)];


sparse simply doesn't check that.  We don't have anything resembling
support of VLA.


If it did support VLAs it would point out that this is
a constraint violation.  VLAs must have block or function
prototype scope.

--
Derek M. Jones  tel: +44 (0) 1252 520 667
Knowledge Software Ltd  mailto:[EMAIL PROTECTED]
Applications Standards Conformance Testinghttp://www.knosof.co.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Problems with fb console [was Re: 2.6.12-rc4-mm2]

2007-06-26 Thread Andrew Morton
On Wed, 27 Jun 2007 02:35:27 +0200 "J.A. Magallón" <[EMAIL PROTECTED]> wrote:

> On Mon, 16 May 2005 02:13:02 -0700, Andrew Morton <[EMAIL PROTECTED]> wrote:
> 
> > 
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc4/2.6.12-rc4-mm2/
> > 
> > 
> 
> Hi...
> 
> I have a (stupid, I suppose) problem with framebuffer console.
> I have builtin VESAFB in this kernel, so:
> 
> werewolf:/boot# grep _FB config-2.6.21-jam09 | grep =y
> CONFIG_FB=y
> CONFIG_FB_CFB_FILLRECT=y
> CONFIG_FB_CFB_COPYAREA=y
> CONFIG_FB_CFB_IMAGEBLIT=y
> CONFIG_FB_DEFERRED_IO=y
> CONFIG_FB_MODE_HELPERS=y
> CONFIG_FB_VESA=y
> werewolf:/boot# grep CONSO config-2.6.21-jam09
> # CONFIG_NETCONSOLE is not set
> CONFIG_VT_CONSOLE=y
> CONFIG_HW_CONSOLE=y
> # CONFIG_VT_HW_CONSOLE_BINDING is not set
> CONFIG_VGA_CONSOLE=y
> CONFIG_DUMMY_CONSOLE=y
> CONFIG_FRAMEBUFFER_CONSOLE=y
> # CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY is not set
> # CONFIG_FRAMEBUFFER_CONSOLE_ROTATION is not set
> 
> I put this line in grub's menu.lst:
> 
> kernel /boot/vmlinuz video=vesafb:mtrr,ywrap vga=0x31A ro root=/dev/sdc1
> 
> (tried both with hex and decimal).
> 
> but grub keeps telling me it can't set that video mode, and I have no
> /dev/fb0 device to try with fbset. I have a '29 fb' line in /proc/devices.
> 
> Any ideas about why the device is missing ? udev is 113...
> I have followed al the info I could get (linux/Documentation/fb/, Google ;) )
> and all say that what I'm doing should work. What am I doing wrong ?
> 

Methinks that'll be git-newsetup changes?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NDAs - ANY KNOWN RULES?

2007-06-26 Thread hermann pitton
Am Dienstag, den 26.06.2007, 20:24 -0400 schrieb Daniel Barkalow:
> On Wed, 27 Jun 2007, hermann pitton wrote:
> 
> > Hi,
> > 
> > such stuff causes a lot of troubles since long.
> > 
> > Are there any rules, or can everybody go on as some sort of freelancer
> > exclusively on such? I don't like it!
> 
> http://www.linux-foundation.org/en/NDA_program
> 
> In short, the Linux Foundation can negotiate a reasonable NDA for you to 
> sign, and they may be able to show you relevant documents as a freelancer 
> under a reasonable and standardized contract.
> 
>   -Daniel
> *This .sig left intentionally blank*

Thanks for your explanations,

but I know for sure it does't work.

Cheers,
Hermann


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] Re: [2.6.21.1] soft lockup when removing netconsole module

2007-06-26 Thread Wessel, Jason


> -Original Message-
> From: Andrew Morton [mailto:[EMAIL PROTECTED] 
> > 
> > Signed-off-by: Jarek Poplawski <[EMAIL PROTECTED]>
> > 
> > ---
> > 
> > diff -Nurp 2.6.21-/net/core/netpoll.c 2.6.21/net/core/netpoll.c
> > --- 2.6.21-/net/core/netpoll.c  2007-04-26 
> 15:08:32.0 +0200
> > +++ 2.6.21/net/core/netpoll.c   2007-06-12 
> 21:05:23.0 +0200
> > @@ -73,7 +73,8 @@ static void queue_process(struct work_st
> > netif_tx_unlock(dev);
> > local_irq_restore(flags);
> >  
> > -   schedule_delayed_work(>tx_work, HZ/10);
> > +   if (atomic_read(>refcnt))
> > +   
> schedule_delayed_work(>tx_work, HZ/10);
> > return;
> > }
> > netif_tx_unlock(dev);
> > @@ -780,9 +781,15 @@ void netpoll_cleanup(struct netpoll *np)
> > if (atomic_dec_and_test(>refcnt)) {
> > skb_queue_purge(>arp_tx);
> > skb_queue_purge(>txq);
> > -   
> cancel_rearming_delayed_work(>tx_work);
> > +   cancel_delayed_work(>tx_work);
> > flush_scheduled_work();
> >  
> > +   /* clean after last, unfinished work */
> > +   if (!skb_queue_empty(>txq)) {
> > +   struct sk_buff *skb;
> > +   skb = 
> __skb_dequeue(>txq);
> > +   kfree_skb(skb);
> > +   }
> > kfree(npinfo);
> > }
> > }
> 
> Everything went quiet?
> 
> If this patch has been tested and fixes the bug, can you 
> please send a version which is ready for merging?  (ie: add a 
> suitable description of what it does).
> 
> 

I mailed Jarek separately.

I had tested the patch with netconsole and kgdb and it does in fact fix
the problem that was reported.

Jason. 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 16/16] fix handling of integer constant expressions

2007-06-26 Thread Al Viro
On Wed, Jun 27, 2007 at 01:29:59AM +0100, Derek M Jones wrote:
> Al Viro wrote:
> 
> >>>Hopefully correct handling of integer constant expressions.  Please, 
> >>>review.
> >>Am I invoking sparse wrongly?  ./sparse -W -Wall doesn't diagnose
> >>the following TU, for example.
> >>
> >>extern int a;
> >>extern int as1[(a = 2)];
> >
> >sparse simply doesn't check that.  We don't have anything resembling
> >support of VLA.
> 
> If it did support VLAs it would point out that this is
> a constraint violation.  VLAs must have block or function
> prototype scope.

I know.  It's just that "let's do something about array size checks"
triggers "yeah, but the poor sod who does that will have to sort
VLAs *and* gcc extensions around VLAs out", which is not a nice thought
and so far nobody had touched that area at all.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 16/16] fix handling of integer constant expressions

2007-06-26 Thread Al Viro
On Tue, Jun 26, 2007 at 05:25:06PM -0700, Linus Torvalds wrote:
> 
> 
> On Wed, 27 Jun 2007, Al Viro wrote:
> >
> > > extern int a;
> > > extern int as1[(a = 2)];
> > 
> > sparse simply doesn't check that.  We don't have anything resembling
> > support of VLA.
> 
> Well, the above has two bugs that sparse could notice _independently_ of 
> variable-sized arrays:
>  - assignment outside of a function
>  - variable size array that isn't an automatic variable

Right; what I'm saying is that we don't do any checks on array sizes at
all, mostly since nobody is brave enough to deal with VLAs (which we'll
have to do if we start doing that).
 
> (strictly speaking, that's not even a variable size - it's a constant 2, 
> just with a non-constant expression - maybe you misread the "=" as an 
> "==")

With == it would be a different bug ;-)

BTW, VLA can be not just auto variable - it can be used in derivation of
such (i.e. you can say int (*p)[n], just not for anything not in block
or prototype scope).  And $DEITY help us[1] when ({...}) comes into the
game, since it allows leaking types out of the scope they'd been
declared in...

[1] or gcc - it gets an ICE galore in that class of testcases
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Problems with fb console [was Re: 2.6.12-rc4-mm2]

2007-06-26 Thread J.A. Magallón
On Mon, 16 May 2005 02:13:02 -0700, Andrew Morton <[EMAIL PROTECTED]> wrote:

> 
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc4/2.6.12-rc4-mm2/
> 
> 

Hi...

I have a (stupid, I suppose) problem with framebuffer console.
I have builtin VESAFB in this kernel, so:

werewolf:/boot# grep _FB config-2.6.21-jam09 | grep =y
CONFIG_FB=y
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
CONFIG_FB_DEFERRED_IO=y
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_VESA=y
werewolf:/boot# grep CONSO config-2.6.21-jam09
# CONFIG_NETCONSOLE is not set
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
# CONFIG_VT_HW_CONSOLE_BINDING is not set
CONFIG_VGA_CONSOLE=y
CONFIG_DUMMY_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE=y
# CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY is not set
# CONFIG_FRAMEBUFFER_CONSOLE_ROTATION is not set

I put this line in grub's menu.lst:

kernel /boot/vmlinuz video=vesafb:mtrr,ywrap vga=0x31A ro root=/dev/sdc1

(tried both with hex and decimal).

but grub keeps telling me it can't set that video mode, and I have no
/dev/fb0 device to try with fbset. I have a '29 fb' line in /proc/devices.

Any ideas about why the device is missing ? udev is 113...
I have followed al the info I could get (linux/Documentation/fb/, Google ;) )
and all say that what I'm doing should work. What am I doing wrong ?

TIA

--
J.A. Magallon  \   Software is like sex:
 \ It's better when it's free
Mandriva Linux release 2008.0 (Cooker) for i586
Linux 2.6.21-jam09 (gcc 4.1.2 20070302 (4.1.2-1mdv2007.1)) SMP PREEMPT
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/17] Add Texas Instruments OMAP LCD driver-v2

2007-06-26 Thread Andrew Morton
On Tue, 26 Jun 2007 18:00:22 +0530
"Trilok Soni" <[EMAIL PROTECTED]> wrote:

> This patch series contains Texas Instruments OMAP LCD framebuffer
> drivers. This driver is divided into
> 
> * main omapfb driver, which handles most common functions across
> processor series, like platform driver registration, ioctl handling,
> much like fb skeleton.
> 
> This driver then gets through the callback based on the
> internal/external lcd controller and panel registered to it based on
> processor and board. Internal/External LCD controller as per lcd panel
> data registration is being done in separate files and so does patches.
> 
> Overall this patches contains framebuffer driver for TI OMAP1
> (OMAP1510/1610/1710) and OMAP2 (OMAP2420/2430) and external
> controllers used in Nokia Internal Tablets (N770/N800).
> 
> These drivers were very well tested on OMAP GIT [1] tree from long
> time. Most of the code for this driver is written by Imre Deak
> <[EMAIL PROTECTED]>.

It seems churlish to complain about the 10-15 minutes spent reassembling
the mime mess when so much effort has gone into this work.  But for the
long-term, plze do have a talk with your email setup so that you no
longer need to send patches as attachments, OK?

> Also CCed to LKML for wider review, and this v2 went through checkpatch.pl and
> I have modified the patches to accept most of checkpatch comments.

Maybe you had an old version of checkpatch:

trailing statements should be on next line
#520: FILE: drivers/video/omap/omapfb_main.c:402:
+   else switch (var->bits_per_pixel) {

line over 80 characters
#1136: FILE: drivers/video/omap/omapfb_main.c:1018:
+static enum omapfb_update_mode omapfb_get_update_mode(struct omapfb_device 
*fbdev)

do not use assignment in if condition
#1251: FILE: drivers/video/omap/omapfb_main.c:1133:
+   if ((r = omapfb_query_plane(fbi, _info)) < 0)

do not use assignment in if condition
#1265: FILE: drivers/video/omap/omapfb_main.c:1147:
+   if ((r = omapfb_query_mem(fbi, _info)) < 0)

do not use assignment in if condition
#1279: FILE: drivers/video/omap/omapfb_main.c:1161:
+   if ((r = omapfb_get_color_key(fbdev, _key)) < 0)

do not use assignment in if condition
#1535: FILE: drivers/video/omap/omapfb_main.c:1417:
+   if ((r = device_create_file(fbdev->dev, _attr_caps_num)))

do not use assignment in if condition
#1538: FILE: drivers/video/omap/omapfb_main.c:1420:
+   if ((r = device_create_file(fbdev->dev, _attr_caps_text)))

do not use assignment in if condition
#1541: FILE: drivers/video/omap/omapfb_main.c:1423:
+   if ((r = sysfs_create_group(>dev->kobj, _attr_grp)))

do not use assignment in if condition
#1544: FILE: drivers/video/omap/omapfb_main.c:1426:
+   if ((r = sysfs_create_group(>dev->kobj, _attr_grp)))

do not use assignment in if condition
#1646: FILE: drivers/video/omap/omapfb_main.c:1528:
+   if ((r = fbinfo_init(fbdev, fbi)) < 0) {

else should follow close brace
#2001: FILE: drivers/video/omap/omapfb_main.c:1883:
+   }
+   else if (!strncmp(this_opt, "vxres:", 6))

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Plus there are quite a large number of extern declarations in C files,
which is poor practice, which checkpatch failed to detect (maintainer has
been notified).


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 16/16] fix handling of integer constant expressions

2007-06-26 Thread Linus Torvalds


On Wed, 27 Jun 2007, Al Viro wrote:
>
> > extern int a;
> > extern int as1[(a = 2)];
> 
> sparse simply doesn't check that.  We don't have anything resembling
> support of VLA.

Well, the above has two bugs that sparse could notice _independently_ of 
variable-sized arrays:
 - assignment outside of a function
 - variable size array that isn't an automatic variable

(strictly speaking, that's not even a variable size - it's a constant 2, 
just with a non-constant expression - maybe you misread the "=" as an 
"==")

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NDAs - ANY KNOWN RULES?

2007-06-26 Thread Daniel Barkalow
On Wed, 27 Jun 2007, hermann pitton wrote:

> Hi,
> 
> such stuff causes a lot of troubles since long.
> 
> Are there any rules, or can everybody go on as some sort of freelancer
> exclusively on such? I don't like it!

http://www.linux-foundation.org/en/NDA_program

In short, the Linux Foundation can negotiate a reasonable NDA for you to 
sign, and they may be able to show you relevant documents as a freelancer 
under a reasonable and standardized contract.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch, v2.6.22-rc6] sys_time() speedup

2007-06-26 Thread Andrea Arcangeli
On Tue, Jun 26, 2007 at 10:14:40AM -0700, Andrew Morton wrote:
> On my machine, time(2) doesn't do any syscall at all - it uses the vsyscall
> page.  I'd be surprised if a database uses sys_time() either.

Large boxes unfortunately can't always use vsyscalls...  that's a real
pity. I also had to disable the vsyscalls64 to generate some number.

I think there shall be a perfectly accurate but not monotone mode for
gettimeofday so we can enable rdtscp (via sysctl or/and prctl). Aware
apps can enable the prctl, aware or brave admins can turn on the
sysctl. Vojtech and others should have proper patches to merge for
this.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] atl1: disable 64bit DMA

2007-06-26 Thread Jay Cliburn
On Mon, 25 Jun 2007 23:18:55 +0200
Luca Tettamanti <[EMAIL PROTECTED]> wrote:

> Il Mon, Jun 25, 2007 at 07:42:44AM -0500, Jay Cliburn ha scritto: 
> > Jay L. T. Cornwall wrote:
> > >Jay Cliburn wrote:
> > >
> > >>For reasons not yet clear to me, it appears the L1 driver has a
> > >>bug or the device itself has trouble with DMA in high memory.
> > >>This patch, drafted by Luca Tettamanti, is being explored as a
> > >>workaround.  I'd be interested to know if it fixes your problem.
> > >
> > >Yes, it certainly seems to. Now running with this patch and 4GB
> > >active, I've transferred about 15GB with no problem so far. It
> > >usually oopses after a GB or two.
> > >
> > >I guess it's not an ideal solution, architecturally speaking, but
> > >it's a good deal better than an unstable driver. If there's any
> > >other patches you'd like me to test or traces to capture, I'm
> > >happy to help out. Otherwise I'll run with this one for now since
> > >it does the job!
> > 
> > Okay Jay, thanks.
> > 
> > Luca, would you please submit your patch to Jeff Garzik and netdev?
> 
> Hi Jeff,
> a couple of users reported hard lockups when using L1 NICs on machines
> with 4GB or more of RAM. We're still waiting official confirmation
> from the vendor, but it seems that L1 has problems doing DMA to/from
> high memory (physical address above the 4GB limit). Passing 32bit DMA
> mask cures the problem.
> 
> Signed-Off-By: Luca Tettamanti <[EMAIL PROTECTED]>
> 
> ---
> I think that the patch should be included in 2.6.22.
> 
>  drivers/net/atl1/atl1_main.c |   15 +++
>  1 file changed, 3 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/net/atl1/atl1_main.c
> b/drivers/net/atl1/atl1_main.c index 6862c11..a730f15 100644
> --- a/drivers/net/atl1/atl1_main.c
> +++ b/drivers/net/atl1/atl1_main.c
> @@ -2097,21 +2097,16 @@ static int __devinit atl1_probe(struct
> pci_dev *pdev, struct net_device *netdev;
>   struct atl1_adapter *adapter;
>   static int cards_found = 0;
> - bool pci_using_64 = true;
>   int err;
>  
>   err = pci_enable_device(pdev);
>   if (err)
>   return err;
>  
> - err = pci_set_dma_mask(pdev, DMA_64BIT_MASK);
> + err = pci_set_dma_mask(pdev, DMA_32BIT_MASK);
>   if (err) {
> - err = pci_set_dma_mask(pdev, DMA_32BIT_MASK);
> - if (err) {
> - dev_err(>dev, "no usable DMA
> configuration\n");
> - goto err_dma;
> - }
> - pci_using_64 = false;
> + dev_err(>dev, "no usable DMA configuration\n");
> + goto err_dma;
>   }
>   /* Mark all PCI regions associated with PCI device
>* pdev as being reserved by owner atl1_driver_name
> @@ -2176,7 +2171,6 @@ static int __devinit atl1_probe(struct pci_dev
> *pdev, 
>   netdev->ethtool_ops = _ethtool_ops;
>   adapter->bd_number = cards_found;
> - adapter->pci_using_64 = pci_using_64;
>  
>   /* setup the private structure */
>   err = atl1_sw_init(adapter);
> @@ -2193,9 +2187,6 @@ static int __devinit atl1_probe(struct pci_dev
> *pdev, */
>   /* netdev->features |= NETIF_F_TSO; */
>  
> - if (pci_using_64)
> - netdev->features |= NETIF_F_HIGHDMA;
> -
>   netdev->features |= NETIF_F_LLTX;
>  
>   /*

Acked-by: Jay Cliburn <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 16/16] fix handling of integer constant expressions

2007-06-26 Thread Al Viro
On Wed, Jun 27, 2007 at 08:32:26AM +0900, Neil Booth wrote:
> Al Viro wrote:-
> 
> > Hopefully correct handling of integer constant expressions.  Please, review.
> 
> Am I invoking sparse wrongly?  ./sparse -W -Wall doesn't diagnose
> the following TU, for example.
> 
> extern int a;
> extern int as1[(a = 2)];

sparse simply doesn't check that.  We don't have anything resembling
support of VLA.  Note that check for integer constant expression
has nothing to do with that;

int x[(int)(0.6 + 0.6)];

is valid (if stupid).  And yes, footnote in 6.6 contradicts 6.7.5.2(1);
too bad...

We certainly need to do checks on array sizes; however, that part
("if it has static storage duration, it should not be a VLA") is minor.
And then there are gccisms:
size_t foo(int n)
{
struct {
int a[n];
char b;
} x;
return offsetof(typeof(x), b);
}

Yes, it's eaten up just fine.  And yes, such structures are silently
accepted even with -pedantic -std=c99, which is a bug.  Sigh...

We'll need to tackle VLAs at some point, but it certainly won't be fun ;-/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch, v2.6.22-rc6] sys_time() speedup

2007-06-26 Thread Andrea Arcangeli
On Tue, Jun 26, 2007 at 10:13:31AM -0700, Ray Lee wrote:
> faster? Weird. It shouldn't. They must be doing something wrong,
> therefore the patch is stupid."

Just in case it's not obvious the above are Ray Lee words, mine not.

---
#!/usr/bin/env stap 

# edited top.stp from systemtap

global syscalls

function print_top () {
printf ("SYSCALL\t\t\t\tCOUNT\n")
foreach ([name] in syscalls- limit 20)
printf("%-20s\t\t%5d\n",name, syscalls[name])
printf("--\n")
}

probe syscall.time {
syscalls[probefunc()]++
}
probe syscall.gettimeofday {
syscalls[probefunc()]++
}

# print top syscalls every 5 seconds
probe timer.ms(5000) {
print_top ()
}
---

The above while running various huge sql operations with real life
postgresql app running sql in loop for a minute or so (sorry no mysql
setup but the world isn't mysql, I'd rather want to see oracle if
something):

SYSCALL COUNT
sys_gettimeofday 4998
sys_time  120
--
SYSCALL COUNT
sys_gettimeofday 9989
sys_time  185
--
SYSCALL COUNT
sys_gettimeofday15219
sys_time  335
--
SYSCALL COUNT
sys_gettimeofday21215
sys_time  428
--
SYSCALL COUNT
sys_gettimeofday26194
sys_time  629
--
SYSCALL COUNT
sys_gettimeofday30752
sys_time  734
--
SYSCALL COUNT
sys_gettimeofday37379
sys_time  976
--
SYSCALL COUNT
sys_gettimeofday42381
sys_time 1125
--
SYSCALL COUNT
sys_gettimeofday47722
sys_time 1391
--
SYSCALL COUNT
sys_gettimeofday53138
sys_time 1520
--
SYSCALL COUNT
sys_gettimeofday57499
sys_time 1651
--
SYSCALL COUNT
sys_gettimeofday62314
sys_time 1712
--
SYSCALL COUNT
sys_gettimeofday66874
sys_time 1827
--
SYSCALL COUNT
sys_gettimeofday71757
sys_time 2007
--
SYSCALL COUNT
sys_gettimeofday76335
sys_time 2240
SYSCALL COUNT
sys_gettimeofday80469
sys_time 2354
--
SYSCALL COUNT
sys_gettimeofday85420
sys_time 2519
--
SYSCALL COUNT
sys_gettimeofday90662
sys_time 2648
--
SYSCALL COUNT
sys_gettimeofday95513
sys_time 2909
--
SYSCALL COUNT
sys_gettimeofday100767
sys_time 3111
--
SYSCALL COUNT
sys_gettimeofday106553
sys_time 3427
--
SYSCALL COUNT
sys_gettimeofday112300
sys_time 3673
--
SYSCALL COUNT
sys_gettimeofday115706
sys_time 3793
SYSCALL COUNT
sys_gettimeofday119842
sys_time 3893
--
SYSCALL COUNT
sys_gettimeofday123054
sys_time 4113
--
SYSCALL COUNT
sys_gettimeofday126286
sys_time 4250
--
SYSCALL COUNT
sys_gettimeofday129077
sys_time 

[drm patch for 2.6.22-rc6] Add some pci ids for XGI chips

2007-06-26 Thread Dave Airlie


The attached patch just adds some XGI pci ids to the SIS driver. It should 
be harmless at this stage..


Dave.From 02031bf9190f6698e46e157196086ff416d0bf9b Mon Sep 17 00:00:00 2001
From: Ian Romanick <[EMAIL PROTECTED]>
Date: Wed, 27 Jun 2007 06:38:00 +1000
Subject: [PATCH] Add support SiS based XGI chips to SiS DRM.

This adds support for some of the XGI Volari family that are based on the
SiS.

Signed-off-by: Dave Airlie <[EMAIL PROTECTED]>
---
 drivers/char/drm/drm_pciids.h |2 ++
 drivers/char/drm/sis_drv.h|8 
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/char/drm/drm_pciids.h b/drivers/char/drm/drm_pciids.h
index aa63350..30b200b 100644
--- a/drivers/char/drm/drm_pciids.h
+++ b/drivers/char/drm/drm_pciids.h
@@ -219,6 +219,8 @@
{0x1039, 0x6300, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, \
{0x1039, 0x6330, PCI_ANY_ID, PCI_ANY_ID, 0, 0, SIS_CHIP_315}, \
{0x1039, 0x7300, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, \
+   {0x18CA, 0x0040, PCI_ANY_ID, PCI_ANY_ID, 0, 0, SIS_CHIP_315}, \
+   {0x18CA, 0x0042, PCI_ANY_ID, PCI_ANY_ID, 0, 0, SIS_CHIP_315}, \
{0, 0, 0}
 
 #define tdfx_PCI_IDS \
diff --git a/drivers/char/drm/sis_drv.h b/drivers/char/drm/sis_drv.h
index 2b8d6f6..70d4ede 100644
--- a/drivers/char/drm/sis_drv.h
+++ b/drivers/char/drm/sis_drv.h
@@ -33,11 +33,11 @@
 
 #define DRIVER_AUTHOR  "SIS, Tungsten Graphics"
 #define DRIVER_NAME"sis"
-#define DRIVER_DESC"SIS 300/630/540"
-#define DRIVER_DATE"20060704"
+#define DRIVER_DESC"SIS 300/630/540 and XGI V3XE/V5/V8"
+#define DRIVER_DATE"20070626"
 #define DRIVER_MAJOR   1
-#define DRIVER_MINOR   2
-#define DRIVER_PATCHLEVEL  1
+#define DRIVER_MINOR   3
+#define DRIVER_PATCHLEVEL  0
 
 enum sis_family {
SIS_OTHER = 0,
-- 
1.4.4.2



Re: [RFD 1/4] Pass no useless nameidata to the create, lookup, and permission IOPs

2007-06-26 Thread Erez Zadok
In message <[EMAIL PROTECTED]>, [EMAIL PROTECTED] writes:
> The create, lookup, and permission inode operations are all passed a
> full nameidata.  This is unfortunate because in nfsd and the mqueue
> filesystem, we must instantiate a struct nameidata but cannot provide
> all of the same information that a regular lookup would provide.  The
> unused fields take up space on the stack, but more importantly, it is
> not obvious which fields have meaningful values and which don't, and so
> things might easily break.
> 
> This patch introduces struct nameidata2 with only the fields that make
> sense independent of an actual lookup, and uses that struct in those
> places where a full nameidat is not needed.

I agree w/ Trond that a better name is needed other than 'nameidata2',
esp. for something that's a sub-structure (perhaps start it with a '__'?)

These changes would probably help stackable file systems (e.g., eCryptfs and
esp. Unionfs) a lot, b/c stackable f/s often call the lower f/s to lookup
files and such; and in most cases, we just need to pass the intent down, not
the full VFS-level state info.

> +/**
> + * Fields shared between nameidata and nameidata2 -- nameidata2 could
> + * simply be embedded in nameidata, but then the vfs code would become
> + * cluttered with dereferences.
> + */
> +#define __NAMEIDATA2 \
> + struct dentry   *dentry;\
> + struct vfsmount *mnt;   \
> + unsigned intflags;  \
> + \
> + union { \
> + struct open_intent open;\
> + } intent;

Perhaps it is also time to put the dentry + mnt into a single struct path?
It's a small change, but it emphasizes that the two items here, dentry+mnt,
really define a single path to be passed around:

#define __NAMEIDATA \
struct path path;   \
unsigned int flags; \
...

Of course, you'll have to change instances of nd->dentry to nd->path.dentry
and so on.

Erez.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH try #2] security: Convert LSM into a static interface

2007-06-26 Thread Kyle Moffett

On Jun 26, 2007, at 09:47:12, Serge E. Hallyn wrote:

Quoting Kyle Moffett ([EMAIL PROTECTED]):

On Jun 25, 2007, at 16:37:58, Andreas Gruenbacher wrote:
It's useful for some LSMs to be modular, and LSMs which are y/n  
options won't have any security architecture issues with  
unloading at all. The mere fact that SELinux cannot be built as a  
module is a rather weak argument for disabling LSM modules as a  
whole, so please don't.


Here are a few questions for you:

  1)  What do you expect to happen to all the megs of security  
data when you "rmmod selinux"?


Read the sentence right above yours again.

Noone is saying we should be able to rmmod selinux.


Ok, so say we extend LSM to do what AppArmor or TOMOYO need, what do  
you expect to happen when you "rmmod tomoyo", "rmmod apparmor", or  
whatever?  Each of those is also going to stick lots of context on  
various objects during the course of running, the same way that the  
VM subsystem sticks lots of context on filesystem pages while  
running.  Besides, even the standard "capabilities" module wants to  
attach a list of capabilities to every process and defines  
inheritance rules for them.  Ergo you have the problems described below:


Do you maintain a massive linked list of security data (with all  
the locking and performance problems) so that you can iterate over  
it calling kfree()? What synchronization primitive do we have  
right now which could safely stop all CPUs outside of security  
calls while we NULL out and free security data and disable  
security operations?  Don't say "software suspend" and "process  
freezer", since those have whole order-of-magnitude-complexity  
problems of their own (and don't always work right either).


  2)  When you "modprobe my_custom_security_module", how exactly  
do you expect that all the processes, files, shared memory  
segments, file descriptors, sockets, SYSV mutexes, packets, etc  
will get appropriate security pointers?


Those don't all need labels for capabilities, for instance.  This  
question is as wrong as the last one.


Ok, so let's just restrict ourselves to the simple dumb-as-dirt  
capabilities module.  Every process is "labeled" with capabilities  
while running under that LSM, right?  What happens when you "rmmod  
capabilities"?  Do you iterate over all the processes to remove their  
security data even while they may be using it?  Or do you just let it  
leak?  Some daemons test if capabilities are supported, and if so  
they modify their capability set instead of forking a high-priv and a  
low-priv process and doing IPC.  When you remove the capabilities  
module, suddenly all those programs will lose that critical "low- 
privilege" data and become full root.  What happens later when you  
"modprobe capabilities"?  Do you suddenly have to stop the system  
while you iterate over EVERY process to set capabilities based on  
whether it's root or not?  It's also impossible to determine from a  
given state in time what processes should have capabilities, as the  
model includes inheritance, which includes processes that don't even  
exist anymore.


3)  This sounds suspiciously like "The mere fact that the  
Linux-2.6-VM cannot be built as a module is a rather weak argument  
for disabling VFS modules as a whole".


No, your argument sounds like "my fs can't be a module so neither  
should any."


Let's go over the differences between "my fs" and "my LSM", and the  
similarities between "my VM" and "my LSM":  Filesystems don't get  
hooked from virtually every userspace-initiated operation, whereas  
both VMs and LSMs do.  VMs and LSMs attach anonymous state data to a  
large percentage of the allocated objects in the system, whereas  
filesystems allocate their own independent datastructure and use  
that.  Would you want to "rmmod ext3" and then "modprobe ext2" while  
you have an ext2-as-ext3 filesystem *mounted*???  If you want a good  
analogy, that's a better one than the "my fs can't be a module" crap.


This whole discussion boils down to 2 points:
  1) As currently implemented, no LSM may be safely rmmod-ed
  2) Someone has submitted a patch which fixes that problem (you  
can't rmmod them at all, so no crashes)


If you really want to do modular LSMs, then you need to submit a  
patch which fixes all the race conditions in LSM removal *without*  
adding much extra overhead.  I'm sure if your solutions works then  
everyone will be much more open to modular LSMs.  I said this before:


So... Do you have a proposal for solving those rather fundamental  
design gotchas?  If so, I'm sure everybody here would love to see  
your patch


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 7/7][TAKE5] ext4: support new modes

2007-06-26 Thread David Chinner
On Wed, Jun 27, 2007 at 12:59:08AM +0530, Amit K. Arora wrote:
> On Tue, Jun 26, 2007 at 12:14:00PM -0400, Andreas Dilger wrote:
> > On Jun 26, 2007  17:37 +0530, Amit K. Arora wrote:
> > > > I also thought another proposed flag was to determine whether mtime (and
> > > > maybe ctime) is changed when doing prealloc/dealloc space?  Default 
> > > > should
> > > > probably be to change mtime/ctime, and have FA_FL_NO_MTIME.  Someone 
> > > > else
> > > > should decide if we want to allow changing the file w/o changing ctime, 
> > > > if
> > > > that is required even though the file is not visibly changing.  Maybe 
> > > > the
> > > > ctime update should be implicit if the size or mtime are changing?
> > > 
> > > Is it really required ? I mean, why should we allow users not to update
> > > ctime/mtime even if the file metadata/data gets updated ? It sounds
> > > a bit "unnatural" to me.
> > > Is there any application scenario in your mind, when you suggest of
> > > giving this flexibility to userspace ?
> > 
> > One reason is that XFS does NOT update the mtime/ctime when doing the
> > XFS_IOC_* allocation ioctls.

Not totally correct.

XFS_IOC_ALLOCSP/FREESP change timestamps if they change
the file size (via the truncate call made to change the file size).
If they don't change the file size, then they are a no-op and should
not change the file size.

XFS_IOC_RESVSP/UNRESVSP don't change timestamps just like they don't
change file size. That is by design AFAICT so these calls can be
used by HSM-type applications that don't want to change timestamps
when punching out data blocks or preallocating new ones.

> Hmm.. I personally will call it a bug in XFS code then. :)

No, I'd call it useful. :)

> > > I think, modifying ctime/mtime should be dependent on the other flags.
> > > E.g., if we do not zero out data blocks on allocation/deallocation,
> > > update only ctime. Otherwise, update ctime and mtime both.
> > 
> > I'm only being the advocate for requirements David Chinner has put
> > forward due to existing behaviour in XFS.  This is one of the reasons
> > why I think the "flags" mechanism we now have - we can encode the
> > various different behaviours in any way we want and leave it to the
> > caller.
> 
> I understand. May be we can confirm once more with David Chinner if this
> is really required. Will it really be a compatibility issue if new XFS
> preallocations (ie. via fallocate) update mtime/ctime?

It should be left up to the filesystem to decide. Only the
filesystem knows whether something changed and the timestamp should
or should not be updated.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


NDAs - ANY KNOWN RULES?

2007-06-26 Thread hermann pitton

Hi,

such stuff causes a lot of troubles since long.

Are there any rules, or can everybody go on as some sort of freelancer
exclusively on such? I don't like it!

Cheers,
Hermann


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [AppArmor 00/44] AppArmor security module overview

2007-06-26 Thread Andrew Morton
On Tue, 26 Jun 2007 16:07:56 -0700
[EMAIL PROTECTED] wrote:

> This post contains patches to include the AppArmor application security
> framework, with request for inclusion into -mm for wider testing.

Patches 24 and 31 didn't come through.

Rolled-up diffstat (excluding 24&31):

 fs/attr.c|7 
 fs/dcache.c  |  181 ++-
 fs/ecryptfs/inode.c  |   41 
 fs/exec.c|3 
 fs/fat/file.c|2 
 fs/hpfs/namei.c  |2 
 fs/namei.c   |  115 +-
 fs/nfsd/nfs4recover.c|7 
 fs/nfsd/nfs4xdr.c|2 
 fs/nfsd/vfs.c|   89 +
 fs/ntfs/file.c   |2 
 fs/open.c|   50 
 fs/reiserfs/file.c   |2 
 fs/reiserfs/xattr.c  |8 
 fs/splice.c  |4 
 fs/stat.c|2 
 fs/sysfs/file.c  |2 
 fs/utimes.c  |   11 
 fs/xattr.c   |   75 -
 fs/xfs/linux-2.6/xfs_lrw.c   |2 
 include/linux/audit.h|   12 
 include/linux/fs.h   |   27 
 include/linux/nfsd/nfsd.h|3 
 include/linux/security.h |  182 ++-
 include/linux/sysctl.h   |2 
 include/linux/xattr.h|   11 
 ipc/mqueue.c |2 
 kernel/audit.c   |6 
 kernel/sysctl.c  |   27 
 mm/filemap.c |   12 
 mm/filemap_xip.c |2 
 mm/shmem.c   |2 
 mm/tiny-shmem.c  |2 
 net/unix/af_unix.c   |2 
 security/Kconfig |1 
 security/Makefile|1 
 security/apparmor/Kconfig|   10 
 security/apparmor/Makefile   |   13 
 security/apparmor/apparmor.h |  265 +
 security/apparmor/apparmorfs.c   |  252 +
 security/apparmor/inline.h   |  211 
 security/apparmor/list.c |   94 +
 security/apparmor/locking.txt|   68 +
 security/apparmor/lsm.c  |  817 
 security/apparmor/main.c | 1255 +
 security/apparmor/match.c|  248 
 security/apparmor/match.h|   83 +
 security/apparmor/module_interface.c |  589 +++
 security/apparmor/procattr.c |  155 +++
 security/commoncap.c |7 
 security/dummy.c |   43 
 security/selinux/hooks.c |   94 -
 52 files changed, 4701 insertions(+), 404 deletions(-)

which seems OK.


so...  where do we stand with this?  Fundamental, irreconcilable
differences over the use of pathname-based security?

Are there any other sticking points?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFD 0/4] AppArmor - Don't pass NULL nameidata to vfs_create/lookup/permission IOPs

2007-06-26 Thread Trond Myklebust
On Tue, 2007-06-26 at 16:15 -0700, [EMAIL PROTECTED] wrote:
> To remove conditionally passing of vfsmounts to the LSM, a nameidata
> struct can be instantiated in the nfsd and mqueue filesystems.  This
> however results in useless information being passed down, as not
> all fields in the nameidata struct will be meaingful.  The nameidata
> struct is split creating struct nameidata2 that contains only the
> fields
> that will carry meaningful information.

I don't object to the concept per se, but could you please give it a
more descriptive name please? "struct vfs_intent" would be a lot more
accurate than "nameidata2".

Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >