date:20070801

Re: [RFC][PATCH] Removal of duplicated include arch/x86_64/kernel/pci-calgary.c

2007-08-01 Thread Muli Ben-Yehuda

On Wed, Aug 01, 2007 at 07:53:15PM +0200, Michal Piotrowski wrote:
> Hi,
> 
> There is no need to include linux/init.h twice

Thanks, looks good. Will be pushed for 2.6.24.

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] type safe allocator

2007-08-01 Thread Christoph Lameter

On Wed, 1 Aug 2007, Miklos Szeredi wrote:

> I wonder why we don't have type safe object allocators a-la new() in
> C++ or g_new() in glib?
> 
>   fooptr = k_new(struct foo, GFP_KERNEL);
> 
> is nicer and more descriptive than
> 
>   fooptr = kmalloc(sizeof(*fooptr), GFP_KERNEL);
> 
> and more safe than
> 
>   fooptr = kmalloc(sizeof(struct foo), GFP_KERNEL);
> 
> And we have zillions of both variants.

Hmmm yes I think that would be good. However, please clean up the naming.
The variant on zeroing on zering get to be too much.

> + * k_new - allocate given type object
> + * @type: the type of the object to allocate
> + * @flags: the type of memory to allocate.
> + */
> +#define k_new(type, flags) ((type *) kmalloc(sizeof(type), flags))

kalloc?

> +
> + * k_new0 - allocate given type object, zero out allocated space
> + * @type: the type of the object to allocate
> + * @flags: the type of memory to allocate.
> + */
> +#define k_new0(type, flags) ((type *) kzalloc(sizeof(type), flags))

A new notation for zeroing! This is equivalent to

kalloc(type, flags | __GFP_ZERO)

maybe define new GFP_xxx instead?

> +/**
> + * k_new_array - allocate array of given type object
> + * @type: the type of the object to allocate
> + * @len: the length of the array
> + * @flags: the type of memory to allocate.
> + */
> +#define k_new_array(type, len, flags) \
> + ((type *) kmalloc(sizeof(type) * (len), flags))

We already have array initializations using kcalloc.

> +#define k_new0_array(type, len, flags) \
> + ((type *) kzalloc(sizeof(type) * (len), flags))

Same as before.


I do not see any _node variants?

How about the following minimal set


kmalloc(size, flags)kalloc(struct, flags)
kmalloc_node(size, flags, node) kalloc_node(struct, flags, node)


The array variants translate into kmalloc anyways and are used
in an inconsistent manner. Sometime this way sometimes the other. Leave 
them?

kcalloc(n, size, flags) == kmalloc(size, flags)

Then kzalloc is equivalent to adding the __GFP_ZERO flag. Thus

kzalloc(size, flags) == kmalloc(size, flags | __GFPZERO)

If you define a new flag like GFP_ZERO_ATOMIC and GFP_ZERO_KERNEL you 
could do

kalloc(struct, GFP_ZERO_KERNEL)

instead of adding new variants?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 3/4] Enable link power management for ata drivers

2007-08-01 Thread Tejun Heo

Kristen Carlson Accardi wrote:
> On Wed, 01 Aug 2007 17:27:39 +0900
> Tejun Heo <[EMAIL PROTECTED]> wrote:
> 
>> Kristen Carlson Accardi wrote:
> 
>> Is it safe to use ALPM on a device which only claims to support DIPM?
> 
> Yes - I doubled checked this with the AHCI people - and of course you
> have Edvin's testing to prove it does fine.

Alright.

> As far as moving the enable/disable_pm calls to EH - can you take
> a look at the other patch I sent which implements the shost_attrs
> to see if I still need to do this?  I really don't know much about
> the EH stuff - can you explain why we need to use it to set the
> link pm?

Unfortunately, yes, you still need to.  Only two threads of execution
(one is not a real thread tho) are allowed to access a libata port -
command execution and EH, and the two are mutually exclusive.  Invoking
something from the outside is done by doing the following.

1. recording what to do in ehi->[dev_]action, ap->pflags or dev->flags

2. schedule EH by calling either ata_port_schedule_eh(),
ata_port_abort() or ata_port_freeze().  The first one waits for the
currently in-flight commands to finish before entering EH.  The second
one aborts all in-flight commands and enters EH.  The third one aborts
all commands and freezes the port and enters EH.

3. wait for EH to finish by calling ata_port_wait_eh().

This achieves correct synchronization and other EH functionalities can
be easily used.  e.g. Resuming requires resetting the bus and
revalidating the attached devices, so the resume handler can just
request such actions together.  For link PS, I think it would probably
be a good idea to revalidate after mode change to make sure the device
works in the new mode.  If revalidation fails, it can reset and back off.

EH is done in three large steps - autopsy, report and recover.  To
implement an action, the 'recover' stage needs to be extended.  It
basically comes down to hooking the enable/disable functions into the
right places in ata_eh_recover().  Unconditionally disabling link PS
prior to reset and enabling it back again before revalidation would be a
pretty good choice, but haven't thought about it too hard so take it
with a grain of salt.

I'm not sure whether it would be necessary now but it would be nice to
have a proper recovery logic later.  e.g. If more than certain number of
ATA bus occurs in certain mount of time, disable link PS.  This kind of
logic is used during autopsy to determine whether stepping down
link/transfer speed is needed.  Please take a look at
ata_eh_speed_down().  It might be enough to piggy back on
ata_eh_speed_down() tho such that the first step of speeding down is
turning off link PS.

Hope the brief introduction to libata-EH-hacking helps.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] radix-tree: use indirect bit

2007-08-01 Thread Nick Piggin


Rather than sign direct radix-tree pointers with a special bit, sign
the indirect one that hangs off the root. This means that, given a
lookup_slot operation, the invalid result will be differentiated from
the valid (previously, valid results could have the bit either set or
clear).

This does not affect slot lookups which occur under lock -- they
can never return an invalid result. Is needed in future for lockless
pagecache.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/include/linux/radix-tree.h
===
--- linux-2.6.orig/include/linux/radix-tree.h
+++ linux-2.6/include/linux/radix-tree.h
@@ -26,28 +26,31 @@
 #include 
 
 /*
- * A direct pointer (root->rnode pointing directly to a data item,
- * rather than another radix_tree_node) is signalled by the low bit
- * set in the root->rnode pointer.
- *
- * In this case root->height is also NULL, but the direct pointer tests are
- * needed for RCU lookups when root->height is unreliable.
+ * An indirect pointer (root->rnode pointing to a radix_tree_node, rather
+ * than a data item) is signalled by the low bit set in the root->rnode
+ * pointer.
+ *
+ * In this case root->height is > 0, but the indirect pointer tests are
+ * needed for RCU lookups (because root->height is unreliable). The only
+ * time callers need worry about this is when doing a lookup_slot under
+ * RCU.
  */
-#define RADIX_TREE_DIRECT_PTR  1
+#define RADIX_TREE_INDIRECT_PTR1
+#define RADIX_TREE_RETRY ((void *)-1UL)
 
-static inline void *radix_tree_ptr_to_direct(void *ptr)
+static inline void *radix_tree_ptr_to_indirect(void *ptr)
 {
-   return (void *)((unsigned long)ptr | RADIX_TREE_DIRECT_PTR);
+   return (void *)((unsigned long)ptr | RADIX_TREE_INDIRECT_PTR);
 }
 
-static inline void *radix_tree_direct_to_ptr(void *ptr)
+static inline void *radix_tree_indirect_to_ptr(void *ptr)
 {
-   return (void *)((unsigned long)ptr & ~RADIX_TREE_DIRECT_PTR);
+   return (void *)((unsigned long)ptr & ~RADIX_TREE_INDIRECT_PTR);
 }
 
-static inline int radix_tree_is_direct_ptr(void *ptr)
+static inline int radix_tree_is_indirect_ptr(void *ptr)
 {
-   return (int)((unsigned long)ptr & RADIX_TREE_DIRECT_PTR);
+   return (int)((unsigned long)ptr & RADIX_TREE_INDIRECT_PTR);
 }
 
 /*** radix-tree API starts here ***/
@@ -130,7 +133,10 @@ do {   
\
  */
 static inline void *radix_tree_deref_slot(void **pslot)
 {
-   return radix_tree_direct_to_ptr(*pslot);
+   void *ret = *pslot;
+   if (unlikely(radix_tree_is_indirect_ptr(ret)))
+   ret = RADIX_TREE_RETRY;
+   return ret;
 }
 /**
  * radix_tree_replace_slot - replace item in a slot
@@ -142,10 +148,8 @@ static inline void *radix_tree_deref_slo
  */
 static inline void radix_tree_replace_slot(void **pslot, void *item)
 {
-   BUG_ON(radix_tree_is_direct_ptr(item));
-   rcu_assign_pointer(*pslot,
-   (void *)((unsigned long)item |
-   ((unsigned long)*pslot & RADIX_TREE_DIRECT_PTR)));
+   BUG_ON(radix_tree_is_indirect_ptr(item));
+   rcu_assign_pointer(*pslot, item);
 }
 
 int radix_tree_insert(struct radix_tree_root *, unsigned long, void *);
Index: linux-2.6/lib/radix-tree.c
===
--- linux-2.6.orig/lib/radix-tree.c
+++ linux-2.6/lib/radix-tree.c
@@ -104,7 +104,7 @@ radix_tree_node_alloc(struct radix_tree_
rtp->nr--;
}
}
-   BUG_ON(radix_tree_is_direct_ptr(ret));
+   BUG_ON(radix_tree_is_indirect_ptr(ret));
return ret;
 }
 
@@ -240,7 +240,7 @@ static int radix_tree_extend(struct radi
return -ENOMEM;
 
/* Increase the height.  */
-   node->slots[0] = radix_tree_direct_to_ptr(root->rnode);
+   node->slots[0] = radix_tree_indirect_to_ptr(root->rnode);
 
/* Propagate the aggregated tag info into the new root */
for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) {
@@ -251,6 +251,7 @@ static int radix_tree_extend(struct radi
newheight = root->height+1;
node->height = newheight;
node->count = 1;
+   node = radix_tree_ptr_to_indirect(node);
rcu_assign_pointer(root->rnode, node);
root->height = newheight;
} while (height > root->height);
@@ -274,7 +275,7 @@ int radix_tree_insert(struct radix_tree_
int offset;
int error;
 
-   BUG_ON(radix_tree_is_direct_ptr(item));
+   BUG_ON(radix_tree_is_indirect_ptr(item));
 
/* Make sure the tree is high enough.  */
if (index > radix_tree_maxindex(root->height)) {
@@ -283,7 +284,8 @@ int radix_tree_insert(struct radix_tree_
return error;
}
 
-   slot = root->rnode;
+   slot =

Re: [RFC 12/26] ext2 white-out support

2007-08-01 Thread Ph. Marek

On Mittwoch, 1. August 2007, Josef Sipek wrote:
> Alright not the greatest of examples, there is something to be said about
> symmetry, so...let me try again :)
...
> Oops! There's a whiteout in /b that hides the directory in /c -- rename(2)
> shouldn't make directory subtrees disappear.
>
> There are two ways to solve this:
>
> 1) "cp -r" the entire subtree ...
>
> 2) Don't store whiteouts within branches ...
Sorry for making uninformed guesses, but if there are already special nodes 
(whiteout), why not extending them to some more general format - specifying a 
(source, destination) pair at the topmost level?
- A delete is a (source, NULL) pair
- A rename is a (source, destination) pair, which causes lookups on source to
  use the string destination in the lower branches.


Would that work?


Regards,

Phil

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc1: no setup signature found... SOLVED!

2007-08-01 Thread Borislav Petkov

On Wed, Aug 01, 2007 at 10:36:07AM -0400, H. Peter Anvin wrote:
> Borislav Petkov wrote:
>>  Breakpoint 4, 0x00040200 in ?? ()
>>  1: x/i ($cs << 4) + $eip  0x40300:  lea(%si),%dx
>>  (gdb) c
>>  Continuing.
>> if i do delete here, it loads the second stage of grub and continues to 
>> load the
>> kernel. Is there another way to land at the jmp instruction instead of 
>> poking
>> blindly, maybe disassemble something parts of the initial code. \me 
>> reading
>> grub-docs...
>
> If you do "delete" without a breakpoint number, you're deleting all 
> breakpoints.  I just experimented with grub, and it looks like it should 
> break at 0x90200, so just set that breakpoint and none of the others.
>
>   -hpa

Hi,

now this is one of those cases where one tries to shoot a small fly with a
nuclear missile. The first assumption that something was wrong with the kernel
setup code was wrong and here's how i know:

The problem with my version of grub not hitting the breakpoint 0x90200 made me
think that something might be messed up in the grub part of the boot sequence.
Thus, i did the qemu simulation again and noticed on the initial boot screen of
grub it saying "Grub version 0.91." However, you remember from a different post
that the version of grub i have is the latest to be found in debian unstable,
0.97-29, so i thought that something has to be wrong with it and especially 
with all
those grub stages binaries, in my case in /boot/grub, which grub-install setups.
Checking their timestamps revealed that the files are from 2004 so i thought,
well, these are OLD! :) After refreshing the grub installation and replacing
the stages-binaries with the fresh ones, the kernel booted just fine :), here:

[EMAIL PROTECTED]:07:02:07:~:9994)->  uname -a
Linux gollum 2.6.22-4fd06960f120e02e9abc802a09f9511c400042a5-12 #12 PREEMPT Thu 
Jul 26 18:08:34 CEST 2007 i686 GNU/Linux

so i guess the problem was with the ancient parts of a grub installation i had
lying around which weren't replaced by the apt-get update process and somehow
messed up newer grub versions. Anyway, in the end one still learns a lot while 
at it.

Thanks for your help.

-- 
Regards/Gruß,
Boris.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix two potential mem leaks in MPT Fusion (mpt_attach())

2007-08-01 Thread Andrew Morton

On Wed, 1 Aug 2007 21:03:50 -0600 Matthew Wilcox <[EMAIL PROTECTED]> wrote:

> On Wed, Aug 01, 2007 at 05:26:53PM -0700, Andrew Morton wrote:
> > Why on earth is that using GFP_ATOMIC?  This function later goes on to
> > create procfs files and such things.
> 
> Seems fairly common in driver initialisation code.  I removed three
> instances of this in the advansys driver.

hrm.  People reach for GFP_ATOMIC so often that it becomes a habit, I guess.

It makes one wonder how much that lovely fault-injection framework is being
used.

> > y'know, we could have a debug option which will spit warnings if someone
> > does a !__GFP_WAIT allocation while !in_atomic() (only works if
> > CONFIG_PREEMPT).  
> > 
> > But please, make it depend on !CONFIG_AKPM.  I shudder to think about all
> > the stuff it would pick up.
> 
> Seems like you'd get a lot of false positives.

There would be a few.  mempool does a non-__GFP_WAIT allocation
deliberately, for example (I still think that's fishy btw).

But I don't expect there would be a large number of falsies.  We could add
a __GFP_I_REALLY_MEANT_ATOMIC flag to shut those up.

>  How about a call:
> 
> slab_warn_about_atomic_allocs();
> 
> right before calling the initcalls, and then
> 
> slab_stop_warning_about_atomic_allocs();
> 
> after calling them?  That should give people a lot to chew on for a few
> months.  Obviously, you would need to not warn about allocations from
> interrupt context, as you say above.

Could.  But GFP_ATOMIC at initcall-time really isn't a problem (except that
it can probably also happen at modprobe-time).

What is the major concern is needlessly atomic allocations at regular
runtime.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 025 of 35] Treat rq->hard_nr_sectors as setting an overriding limit in the size of the request

2007-08-01 Thread Tejun Heo

Neil Brown wrote:
> On Thursday August 2, [EMAIL PROTECTED] wrote:
>> This is pretty confusing.  In all other places, bi_size -> #sector
>> conversion is done by rounding down but only in blk_rq_bio_prep() it's
>> being rounded up.
>>
>> Is my following reasoning correct?
>>
>> It was okay till now because unaligned requests don't get merged and
>> also haven't done partial completions (end_that_request_first with
>> partial count)?  So till now, hard_nr_sectors and nr_sectors didn't
>> really matter for unaligned requests but now it matters because it's
>> considered while iterating over bvecs in rq.
> 
> Yes, that reasoning matches mine.
> 
>> If so, I think the correct thing to do would be changing bio_sectors()
>> to round up first or let block layer measure transfer in bytes not in
>> sectors.  I don't think everyone would agree with the latter tho.  I
>> (tentatively) think it would be better to represent length in bytes
>> tho.  A lot of requests which aren't aligned to 512 bytes pass through
>> the block layer and the mismatch can result in subtle bugs.
> 
> I suspect that having a byte count in 'struct request' would make
> sense too.  However I would rather avoid making that change myself - I
> think it would require reading and understanding a lot more code
> 
> I cannot see anything that would go wrong with rounding up bio_sectors
> unconditionally, so I think I will take that approach for this patch
> series.

Yes, converting to nbytes will probably take a lot of work and probably
deserves a separate series if it's ever gonna be done.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CFS review

2007-08-01 Thread Willy Tarreau

On Wed, Aug 01, 2007 at 07:17:51PM -0700, Linus Torvalds wrote:
> 
> 
> On Wed, 1 Aug 2007, Roman Zippel wrote:
> > 
> > I'm not so sure about that. sched_clock() has to be fast, so many archs 
> > may want to continue to use jiffies. As soon as one does that one can also 
> > save a lot of computational overhead by using 32bit instead of 64bit.
> > The question is then how easy that is possible.
> 
> I have to say, it would be interesting to try to use 32-bit arithmetic.
> 
> I also think it's likely a mistake to do a nanosecond resolution. That's 
> part of what forces us to 64 bits, and it's just not even an *interesting* 
> resolution.

I would add that I have been bothered by the 64-bit arithmetics when
trying to see what could be improved in the code. In fact, it's very
hard to optimize anything when you have arithmetics on integers larger
than the CPU's, and gcc is known not to emit very good code in this
situation (I remember it could not play with registers renaming, etc...).

However, I undertand why Ingo chose to use 64 bits. It has the advantage
that the numbers never wrap within 584 years. I'm well aware that it's
very difficult to keep tasks ordered according to a key which can wrap.

But if we consider that we don't need to be more precise than the return
value from gettimeofday() that all applications use, we see that a bunch
of microseconds is enough. 32 bits at the microsecond level wraps around
every hour. We may accept to recompute all keys every hour. It's not that
dramatic. The problem is how to detect that we will need to.

I remember a trick used by Tim Schmielau in his jiffies64 patch for 2.4.
He kept a copy of the highest bit of the lower word in the lowest bit of
the higher word, and considered that the lower one could not wrap before
we could check it. I liked this approach, which could be translated here
in something like the following :

Have all keys use 32-bit resolution, and monitor the 32nd bit. All tasks
must have the same value in this bit, otherwise we consider that their
keys have wrapped. The "current" value of this bit is copied somewhere.
When we walk the tree and find a task with a key which does not have its
32nd bit equal to the current value, it means that this key has wrapped,
so we have to use this information in our arithmetics.

When all keys have their 32nd bit different from the "current" value,
then we switch this value to reflect the new 32nd bit, and everything is
in sync again. The only requirement is that no key wraps around before
the "current" value is switched. This implies that no couple of tasks
could have their keys distant by more than 31 bits (35 minutes), which
seems reasonable. If we can recompute all tasks' keys when all of them
have wrapped, then we do not have to store the "current" bit value
anymore, and consider that it is always zero instead (I don't know if
the code permits this).

It is possible that using the 32nd bit to detect the wrapping may impose
us to perform some computations on 33 bits. If this is the case, then it
would be fine if we reduced the range to 31 bits, with all tasks distant
from at most 30 bits (17 minutes).

Also, I remember that the key is signed. I've never experimented with the
tricks above on signed values, but we might be able to define something
like this for the higher bits :

  00 = positive, no wrap
  01 = positive, wrapped
  10 = negative, wrapped
  11 = negative, no wrap

I have no code to show, I just wanted to expose this idea. I know that if
Ingo likes it, he will beat everyone at implementing it ;-)

>   Linus

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 67] net/ipv4/route.c: mostly kmalloc + memset conversion to k[cz]alloc

2007-08-01 Thread David Miller

From: Mariusz Kozlowski <[EMAIL PROTECTED]>
Date: Tue, 31 Jul 2007 23:55:02 +0200

> Signed-off-by: Mariusz Kozlowski <[EMAIL PROTECTED]>

Applied.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 66] net/ipv4/raw.c: kmalloc + memset conversion to kzalloc

2007-08-01 Thread David Miller

From: Mariusz Kozlowski <[EMAIL PROTECTED]>
Date: Tue, 31 Jul 2007 23:54:00 +0200

> Signed-off-by: Mariusz Kozlowski <[EMAIL PROTECTED]>

Applied.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 58] net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c: kmalloc + memset conversion to kzalloc

2007-08-01 Thread David Miller

From: Mariusz Kozlowski <[EMAIL PROTECTED]>
Date: Tue, 31 Jul 2007 23:32:04 +0200

> Signed-off-by: Mariusz Kozlowski <[EMAIL PROTECTED]>

Applied.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 57] net/netfilter/nf_conntrack_expect.c: kmalloc + memset conversion to kzalloc

2007-08-01 Thread David Miller

From: Mariusz Kozlowski <[EMAIL PROTECTED]>
Date: Tue, 31 Jul 2007 23:31:02 +0200

> Signed-off-by: Mariusz Kozlowski <[EMAIL PROTECTED]>

Applied.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Removal of duplicated include net/wanrouter/wanmain.c

2007-08-01 Thread David Miller

From: Michal Piotrowski <[EMAIL PROTECTED]>
Date: Wed, 01 Aug 2007 19:58:53 +0200

> Hi,
> 
> There is no need to include linux/init.h twice
...
> Signed-off-by: Michal Piotrowski <[EMAIL PROTECTED]>

Patch applied, thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] powerpc: Pegasos keyboard detection

2007-08-01 Thread Alan Curry

Matt Sealey writes the following:
>
>Yeah please do a fixup for the boot wrapper.
>
>Or, if you have trouble, go into the firmware and type "nvedit", add
>these lines;
>
>" /isa/8042" find-device
>" 8042" encode-string device-type
>
>(then ctrl-c to exit and nvstore to run it on next reboot. Try it without
>the patch first, on the firmware console, just to be sure I got it right,
>because I can't test it here)

It works from the ok prompt but in the nvramrc it doesn't find the device.
(pci/isa nodes not created yet?)

But the larger point:

>
>You don't need to patch Linux at all. In fact for silly things like this
>I would recommend against it :)

If the workaround doesn't go into the kernel, everybody with affected
hardware has to individually find out about the bug (probably by experiencing
an annoying keyboardless boot) and fix it himself. Is that worth the
reduction in kernel clutter?

-- 
Alan Curry
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc1-mm2

2007-08-01 Thread Torsten Kaiser

On 8/2/07, Mel Gorman <[EMAIL PROTECTED]> wrote:
> On (01/08/07 22:52), Torsten Kaiser didst pronounce:
> > Next try with 2.6.23-rc1-mm2 and SPARSEMEM:
> > Probably the same exception, but this time with Call Trace:
> > [0.00] Bootmem setup node 0 -8000
> > [0.00] Bootmem setup node 1 8000-00012000
> > [0.00] Zone PFN ranges:
> > [0.00]   DMA 0 -> 4096
> > [0.00]   DMA324096 ->  1048576
> > [0.00]   Normal1048576 ->  1179648
> > [0.00] Movable zone start PFN for each node
> > [0.00] early_node_map[4] active PFN ranges
> > [0.00] 0:0 ->  159
> > [0.00] 0:  256 ->   524288
> > [0.00] 1:   524288 ->   917488
> > [0.00] 1:  1048576 ->  1179648
> > PANIC: early exception rip 807cddb5 error 2 cr2 e2000310
> > [0.00]
> > [0.00] Call Trace:
> > [0.00]  [] memmap_init_zone+0xb5/0x130
> > [0.00]  [] init_currently_empty_zone+0x84/0x110
> > [0.00]  [] free_area_init_node+0x393/0x3e0
> > [0.00]  [] free_area_init_nodes+0x2da/0x320
> > [0.00]  [] paging_init+0x87/0x90
> > [0.00]  [] setup_arch+0x355/0x470
> > [0.00]  [] start_kernel+0x57/0x330
> > [0.00]  [] _sinittext+0x12d/0x140
> > [0.00]
> > [0.00] RIP memmap_init_zone+0xb5/0x130
> >
> > (gdb) list *0x807cddb5
> > 0x807cddb5 is in memmap_init_zone (include/linux/list.h:32).
> > 27  #define LIST_HEAD(name) \
> > 28  struct list_head name = LIST_HEAD_INIT(name)
> > 29
> > 30  static inline void INIT_LIST_HEAD(struct list_head *list)
> > 31  {
> > 32  list->next = list;
> > 33  list->prev = list;
> > 34  }
> > 35
> > 36  /*
> >
> > I will test more tomorrow...
>
> Well That doesn't make a whole pile of sense unless the memory map
> is not present. Looking at your boot log, we see this gem
>
> > [0.00] 1:   524288 ->   917488
> > [0.00] 1:  1048576 ->  1179648

Complete bootlog, if you need more info about the memmaps...
[0.00] Linux version 2.6.23-rc1-mm2 ([EMAIL PROTECTED]) (gcc
version 4.2.1 (Gentoo 4.2.1 p1.4)) #1 SMP Wed Aug 1 21:56:36 CEST 2007
[0.00] Command line: earlyprintk=serial,ttyS0,38400
console=ttyS0,38400 console=tty1 crypt_root=/dev/md1
[0.00] BIOS-provided physical RAM map:
[0.00]  BIOS-e820:  - 0009fc00 (usable)
[0.00]  BIOS-e820: 0009fc00 - 000a (reserved)
[0.00]  BIOS-e820: 000e4000 - 0010 (reserved)
[0.00]  BIOS-e820: 0010 - dfff (usable)
[0.00]  BIOS-e820: dfff - dfffe000 (ACPI data)
[0.00]  BIOS-e820: dfffe000 - e000 (ACPI NVS)
[0.00]  BIOS-e820: fec0 - fec01000 (reserved)
[0.00]  BIOS-e820: fee0 - fef0 (reserved)
[0.00]  BIOS-e820: ff70 - 0001 (reserved)
[0.00]  BIOS-e820: 0001 - 00012000 (usable)
[0.00] console [earlyser0] enabled
[0.00] end_pfn_map = 1179648
kernel direct mapping tables up to 12000 @ 8000-e000
[0.00] DMI present.
[0.00] ACPI: RSDP 000FB5E0, 0014 (r0 ACPIAM)
[0.00] ACPI: RSDT DFFF, 003C (r1 A M I  OEMRSDT   6000626
MSFT   97)
[0.00] ACPI: FACP DFFF0200, 0084 (r2 A M I  OEMFACP   6000626
MSFT   97)
[0.00] ACPI: DSDT DFFF0450, 48E1 (r1  S0027 S00270000
INTL 20051117)
[0.00] ACPI: FACS DFFFE000, 0040
[0.00] ACPI: APIC DFFF0390, 0080 (r1 A M I  OEMAPIC   6000626
MSFT   97)
[0.00] ACPI: MCFG DFFF0410, 003C (r1 A M I  OEMMCFG   6000626
MSFT   97)
[0.00] ACPI: OEMB DFFFE040, 0060 (r1 A M I  AMI_OEM   6000626
MSFT   97)
[0.00] ACPI: SRAT DFFF4D40, 0110 (r1 AMDHAMMER  1
AMD 1)
[0.00] ACPI: SSDT DFFF4E50, 04F0 (r1 A M I  ACPI2PPC1
AMI 1)
[0.00] SRAT: PXM 0 -> APIC 0 -> Node 0
[0.00] SRAT: PXM 0 -> APIC 1 -> Node 0
[0.00] SRAT: PXM 1 -> APIC 2 -> Node 1
[0.00] SRAT: PXM 1 -> APIC 3 -> Node 1
[0.00] SRAT: Node 0 PXM 0 0-a
[0.00] SRAT: Node 0 PXM 0 0-8000
[0.00] SRAT: Node 1 PXM 1 8000-e000
[0.00] SRAT: Node 1 PXM 1 8000-12000
[0.00] Bootmem setup node 0 -8000
[0.00] Bootmem setup node 1 8000-00012000
[0.00] Zone PFN ranges:
[0.00]   DMA 0 -> 4096
[0.00]   DMA324096 ->  1048576
[0.00]   Normal1048576 ->  1179648
[0.00] Movable zone start PFN for each node
[0.00] early_node_map[4] active PFN ranges
[0.00] 0:

Re: Is PIE randomization breaking klibc binaries?

2007-08-01 Thread Ulrich Kunitz

Jiri Kosina wrote:

> On Tue, 31 Jul 2007, H. Peter Anvin wrote:
> 
> > > So it seems to me that either it is something x86_64 specific or 
> > > initramfs-specific. Will try to reproduce it.
> > My guess would be the former, rather than the latter.  I haven't had a 
> > chance to reproduce it myself yet (I'm on the road), but I will try to 
> > get the time tomorrow.
> 
> I still wasn't successful reproducing it with 2.6.23-rc1 + 
> pie-randomization.patch with klibc-1.5 on x86_64 system -- all programs 
> from 'shared' and 'static' seem to be running well. 
> 
> Ulrich, could you please send me your .config? Thanks,

Here is the .config file. I used it directly with 60bfba7e8 and
could reproduce the bug. (All other builds after the patch could
also reproduce the bug.) I use klibc-utils 1.4.30-3ubuntu2 which
depends on libklibc with the same version number. The processor is
an Intel Core 2 Duo T5500 @ 1.66 GHz, if that helps.

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.22
# Wed Jul 25 07:48:18 2007
#
CONFIG_X86_64=y
CONFIG_64BIT=y
CONFIG_X86=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_ZONE_DMA32=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_CMPXCHG=y
CONFIG_EARLY_PRINTK=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_DMI=y
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_BUG=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION="-pie"
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
# CONFIG_IPC_NS is not set
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
# CONFIG_TASKSTATS is not set
# CONFIG_UTS_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=m
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
# CONFIG_CPUSETS is not set
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_KMOD=y
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
# CONFIG_BLK_DEV_IO_TRACE is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"

#
# Processor type and features
#
CONFIG_X86_PC=y
# CONFIG_X86_VSMP is not set
# CONFIG_MK8 is not set
# CONFIG_MPSC is not set
CONFIG_MCORE2=y
# CONFIG_GENERIC_CPU is not set
CONFIG_X86_L1_CACHE_BYTES=64
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_INTERNODE_CACHE_BYTES=64
CONFIG_X86_TSC=y
CONFIG_X86_GOOD_APIC=y
CONFIG_MICROCODE=m
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=m
CONFIG_X86_HT=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_MTRR=y
CONFIG_SMP=y
# CONFIG_SCHED_SMT is not set
CONFIG_SCHED_MC=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_BKL=y
# CONFIG_NUMA is not set
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
# CONFIG_DISCONTIGMEM_MANUAL is not set
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
# CONFIG_SPARSEMEM_STATIC is not set
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_RESOURCES_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_VIRT_TO_BUS=y
CONFIG_NR_CPUS=2
CONFIG_PHYSICAL_ALIGN=0x20
CONFIG_HOTPLUG_CPU=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_HPET_TIMER=y
# CONFIG_HPET_EMULATE_RTC is not set
CONFIG_IOMMU=y
# CONFIG_CALGARY_IOMMU is not set
CONFIG_SWIOTLB=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_INTEL=y
# CONFIG_X86_MCE_AMD is not set
CONFIG_KEXEC=y
# CONFIG_CRASH_DUMP is not set
# CONFIG_RELOCATABLE is not set
CONFIG_PHYSICAL_START=0x20
CONFIG_SECCOMP=y
# CONFIG_CC_STACKPROTECTOR is not set
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set

Re: [PATCH -mm] Introduce strtol_check_range()

2007-08-01 Thread Alexey Dobriyan

On Thu, Aug 02, 2007 at 05:16:59AM +0530, Satyam Sharma wrote:
> On Wed, 1 Aug 2007, Alexey Dobriyan wrote:
> 
> > On Tue, Jul 31, 2007 at 10:04:10PM +0530, Satyam Sharma wrote:
> > > Callers (especially "store" functions for sysfs or configfs attributes)
> > > that want to convert an input string to a number may often also want to
> > > check for simple input sanity or allowable range. strtol10_check_range()
> > > of netconsole does this, so extract it out into lib/vsprintf.c, make it
> > > generic w.r.t. base, and export it to the rest of the kernel and modules.
> > 
> > > --- a/drivers/net/netconsole.c
> > > +++ b/drivers/net/netconsole.c
> > > @@ -335,9 +307,11 @@ static ssize_t store_enabled(struct 
> > > netconsole_target *nt,
> > >   int err;
> > >   long enabled;
> > >  
> > > - enabled = strtol10_check_range(buf, 0, 1);
> > > - if (enabled < 0)
> > > + enabled = strtol_check_range(buf, 0, 1, 10);
> > > + if (enabled < 0) {
> > > + printk(KERN_ERR "netconsole: invalid input\n");
> > >   return enabled;
> > > + }
> > 
> > Please, copy strtonum() from BSD instead. Nobody needs another
> > home-grown converter.
>
> BSD's strtonum(3) is a detestful, horrible shame.
>
> The strtol_check_range() I implemented here does _all_ that strtonum()
> does, plus is generic w.r.t. base,

What you did with base argument is creating opportunity to fsckup,
namely, forgetting that base is last and putting it second.

> and minus the tasteless "errstr"
> argument.
>
> Tell me, how does that "errstr" ever make sense? We _anyway_ return
> errors (-EINVAL or -ERANGE) if any of those cases show up. And
> _because_ we use negative numbers to return errors, we can't use this
> function to convert negative inputs anyway ... an appropriate error
> message can always be outputted by the caller itself. [ hence the
> two WARN_ON's I added here ]
>
> But yeah, considering this implementation is so similar to strtonum(3)
> (minus the shortcomings, that is :-) we can probably rename it to
> something like kstrtonum() ...

> and we should probably be returning
> different errors for the two invalid conditions, yes.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Touchpad loses sync with ACPI on.

2007-08-01 Thread Matthew Marshall

I am having a problem with the touchpad and pointer stick on my HP compaq 
nc6000 laptop.  It only happens when using ACPI.

Both pointing devices work for a while, but eventually start to 'stick'.  The 
cursor won't move for about a second, and then it jerks all over the screen, 
clicking.

This problem gradually increases in frequency, until eventually neither device
works at all.  Even cat'ing the respective /dev/input/event* device doesn't 
return anything, and I have to shut down or suspend the computer for 
about 15 minutes before it will work again.  (simply rebooting doesn't fix 
it.)  (Actually, it sometimes works if I press down REALLY hard... perhaps 
that's due to some sensitivity threshold being automatically adjusted?)

Everything works perfectly if I boot with acpi=off.  

(I also left Windows open all day once, and it didn't happen there either.)

I'm using kubuntu, and have tested it with kernel 2.6.20, 2.6.22.1, and 
2.6.23-rc1.

When the problem happens, I get lines like this from dmesg:

psmouse.c: TouchPad at isa0060/serio4/input0 lost sync at byte 1

or sometimes like this:

psmouse.c: TouchPad at isa0060/serio4/input0 lost synchronization, throwing 4 
bytes away.


From Google I have found a few other similar reports.  For one person this 
seems to happen under load.  For me, it sometimes seems to happen right when 
I start to compile or something, but it will also happen when I boot up the 
computer and don't even touch it.  (literally: I'll come back in a half an 
hour and the TP doesn't work.)

Someone else seemed to have the problem from reading ACPI states.  I have not 
been able to find any correlation in this respect.  I booted up without the 
battery panel applet and without starting acpid, and it still started 
hiccuping after 5 minutes.

There seems to be no correlation between how much I use it and how long it 
lasts.  Sometimes I can be using it constantly and it lasts an hour.  
Sometimes I barely touch it and it lasts only 10 minutes.

I'm attaching the output of dmesg, lshw, lsmod and lspci.

So, what more can I do?  Is there any more information I can provide?  Are 
there any patches I can try?  I'm willing to help out any way I can.

Thanks for reading,
Matthew Marshall

P.S. I'm not subscribed, so please CC any replies.

P.P.S. Sometimes it seems to be able to read my mind... Just when I think "wow 
it's lasting a long time!" it will freeze up within two seconds.  I 
understand that this might be hard for others to reproduce.


lshw.out.gz
Description: GNU Zip compressed data


lsmod.out.gz
Description: GNU Zip compressed data


lspci.out.gz
Description: GNU Zip compressed data


dmesg.out.gz
Description: GNU Zip compressed data

Re: [PATCH] retrieve VBE EDID/DDC info independent of used video mode

2007-08-01 Thread H. Peter Anvin


Antonino A. Daplas wrote:

On Wed, 2007-08-01 at 09:54 +0800, Antonino A. Daplas wrote:

On Tue, 2007-07-31 at 21:17 -0400, Daniel Drake wrote:

Zwane Mwaikambo wrote:
Sorry if this has been hashed out before, but could you point me towards 
the gentoo bugzilla entry? I'm trying to understand how your setup broke. 
Which version VBE does your system have?

Here's the bug:
http://bugs.gentoo.org/show_bug.cgi?id=181067


Looking at the dmesg output of the working and failing kernel, it does
seem that there's no EDID block available in the failing kernel.



BTW, I looked at the above bug report, it seems his last dmesg does not
have fbcon enabled.  Make sure that CONFIG_FRAMEBUFFER_CONSOLE=y before
doing more tests (the problem of lack of the EDID block in the failing
kernel still applies).



Okay, I'm royally puzzled why that would be.  I've gone over the code 
quite a few times, and I do not see any way (other than VESA < 2.0) that 
could cause that.


I look forward to getting the debug output; depending on what it is we 
might have to get some debugging output from the setup code.


We can printf in the new setup code, although obviously that requires 
leaving the screen in text mode.  However, EDID information should still 
be available.


-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] type safe allocator

2007-08-01 Thread Linus Torvalds

On Wed, 1 Aug 2007, Miklos Szeredi wrote:
>
> I wonder why we don't have type safe object allocators a-la new() in
> C++ or g_new() in glib?
> 
>   fooptr = k_new(struct foo, GFP_KERNEL);

I would object to this if only because of the horrible name.

C++ is not a good language to take ideas from, and "new()" was not it's 
best feature to begin with. "k_new()" is just disgusting.

I'd call it something like "alloc_struct()" instead, which tells you 
exactly what it's all about. Especially since we try to avoid typedefs in 
the kernel, and as a result, it's basically almost always a struct thing.

That said, I'm not at all sure it's worth it. Especially not with all the 
various variations on a theme (zeroed, arrays, etc etc).

Quite frankly, I suspect you would be better off just instrumenting 
"sparse" instead, and matching up the size of the allocation with the type 
it gets assigned to.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 000 of 35] Refactor block layer to improve support for stacked devices.

2007-08-01 Thread Neil Brown

On Wednesday August 1, [EMAIL PROTECTED] wrote:
> 
> In any case, why does something so complicated need to be a macro, why
> not a function instead?

There needs to be a macro so you can put a statement after it to be
executed "for each ..."
But you are right that it doesn't all need to be in the one macro.

The idea of something like

#define bio_for_each_segment_offset(bv, bio, _i, offset, _size) \
for (bio_iterator_init(bio, &_i, , offset, _size);   \
 i.remaining > 0 ;  \
 bio_next(bio, &_i, ))

with bio_iterator_init and bio_next being (inline?) functions is a
very good one.  I'll see what works.

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [rfc] balance-on-fork NUMA placement

2007-08-01 Thread Nick Piggin

On Tue, Jul 31, 2007 at 04:40:18PM -0700, Christoph Lameter wrote:
> On Tue, 31 Jul 2007, Andi Kleen wrote:
> 
> > On Tuesday 31 July 2007 07:41, Nick Piggin wrote:
> > 
> > > I haven't given this idea testing yet, but I just wanted to get some
> > > opinions on it first. NUMA placement still isn't ideal (eg. tasks with
> > > a memory policy will not do any placement, and process migrations of
> > > course will leave the memory behind...), but it does give a bit more
> > > chance for the memory controllers and interconnects to get evenly
> > > loaded.
> > 
> > I didn't think slab honored mempolicies by default? 
> > At least you seem to need to set special process flags.
> 
> It does in the sense that slabs are allocated following policies. If you 
> want to place individual objects then you need to use kmalloc_node().

Is there no way to place objects via policy? At least kernel stack and page
tables on x86-64 should be covered by page allocator policy, so the patch
will still be useful.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22: oops in sbp2_remove_device

2007-08-01 Thread Stefan Richter

Olaf Hering wrote:
> On Wed, Aug 01, Stefan Richter wrote:
>> Revert commit 0555659d63c285ceb7ead3115532e1b71b0f27a7 from 2.6.22-rc1.

> This change fixes the oops, and I can access the drive again.

Did it oops always or only sometimes?
-- 
Stefan Richter
-=-=-=== =--- ---=-
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 013 of 35] Don't update bi_hw_*_size if we aren't going to merge.

2007-08-01 Thread Neil Brown

On Thursday August 2, [EMAIL PROTECTED] wrote:
> On Tue, Jul 31, 2007 at 12:16:55PM +1000, NeilBrown wrote:
> > 
> > ll_merge_requests_fn can update bi_hw_*_size in one case where we end
> > up not merging.  This is wrong.
> > 
> > Signed-off-by: Neil Brown <[EMAIL PROTECTED]>
> 
> As this is a bug fix, I think it would better to bump this to the top
> of the series such that it can be pushed into mainline.

Good point.  I'll do that, thanks.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix WARN_ON() on bitfield ops for all other archs

2007-08-01 Thread Satyam Sharma



On Thu, 2 Aug 2007, Heiko Carstens wrote:

> From: Heiko Carstens <[EMAIL PROTECTED]>
> 
> Fixes WARN_ON() on bitfiels ops for all architectures that have
> been left out in 8d4fbcfbe0a4bfc73e7f0297c59ae514e1f1436f.

Well, considering ...

On Tue, 31 Jul 2007, Alexey Dobriyan wrote:
> But I question the rationale of that commit:
> [...]
> I think that second case is more clear and immediately understandable.

and

On Tue, 31 Jul 2007, Linus Torvalds wrote:
> For all I know, the proper solution is
> to just revert the whole mess, and *not* make WARN_ON() return a value
> at all, since that seems to be the fundamental mistake here.

... I think it makes sense to stop returning the value from WARN_ON()
in the first place. There's only 5 places in the tree that uses its
return value anyway, and one of them ( net/xfrm/xfrm_policy.c:681 )
is a good example of why it's less readable that way.


Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 025 of 35] Treat rq->hard_nr_sectors as setting an overriding limit in the size of the request

2007-08-01 Thread Neil Brown

On Thursday August 2, [EMAIL PROTECTED] wrote:
> 
> This is pretty confusing.  In all other places, bi_size -> #sector
> conversion is done by rounding down but only in blk_rq_bio_prep() it's
> being rounded up.
> 
> Is my following reasoning correct?
> 
> It was okay till now because unaligned requests don't get merged and
> also haven't done partial completions (end_that_request_first with
> partial count)?  So till now, hard_nr_sectors and nr_sectors didn't
> really matter for unaligned requests but now it matters because it's
> considered while iterating over bvecs in rq.

Yes, that reasoning matches mine.

> 
> If so, I think the correct thing to do would be changing bio_sectors()
> to round up first or let block layer measure transfer in bytes not in
> sectors.  I don't think everyone would agree with the latter tho.  I
> (tentatively) think it would be better to represent length in bytes
> tho.  A lot of requests which aren't aligned to 512 bytes pass through
> the block layer and the mismatch can result in subtle bugs.

I suspect that having a byte count in 'struct request' would make
sense too.  However I would rather avoid making that change myself - I
think it would require reading and understanding a lot more code

I cannot see anything that would go wrong with rounding up bio_sectors
unconditionally, so I think I will take that approach for this patch
series.

Thanks.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 026 of 35] Split any large bios that arrive at __make_request.

2007-08-01 Thread Neil Brown

On Thursday August 2, [EMAIL PROTECTED] wrote:
> Neil Brown wrote:
> > 
> > If you confirm that 027 isn't applying, I'll track down what happened.
> 
> You're right.  I don't have patch 27.  Looking Ummm... It's not in
> my LKML folder either.  Can you resend it?
> 
> Thanks.
> 
> -- 
> tejun

It definitely got out:
  http://lkml.org/lkml/2007/7/30/504

but here it is.

Thanks,
NeilBrown


Subject: Remove bi_XXX_segments and related code.

__make_request now handles bios with too many segments, and it tracks
segment counts in 'struct request' so we no longer need to track
the counts in each bio, or to check the counts when adding a page
to a bio.
So bi_phys_segments, bi_hw_segments, blk_recount_segments(),
BIO_SEG_VALID, bio_phys_segments and bio_hw_segments can all go.

Signed-off-by: Neil Brown <[EMAIL PROTECTED]>

### Diffstat output
 ./Documentation/block/biodoc.txt |2 -
 ./block/ll_rw_blk.c  |   18 --
 ./drivers/md/dm.c|1 
 ./drivers/md/raid1.c |5 
 ./drivers/md/raid10.c|5 
 ./drivers/md/raid5.c |5 
 ./drivers/scsi/scsi_lib.c|1 
 ./fs/bio.c   |   47 ---
 ./include/linux/bio.h|   14 ---
 ./include/linux/blkdev.h |1 
 10 files changed, 1 insertion(+), 98 deletions(-)

diff .prev/block/ll_rw_blk.c ./block/ll_rw_blk.c
--- .prev/block/ll_rw_blk.c 2007-07-31 11:21:20.0 +1000
+++ ./block/ll_rw_blk.c 2007-07-31 11:21:22.0 +1000
@@ -1193,19 +1193,6 @@ void blk_dump_rq_flags(struct request *r
 
 EXPORT_SYMBOL(blk_dump_rq_flags);
 
-void blk_recount_segments(struct request_queue *q, struct bio *bio)
-{
-   struct request rq;
-   rq.q = q;
-   rq.bio = rq.biotail = bio;
-   rq.first_offset = 0;
-   blk_recalc_rq_segments();
-   bio->bi_phys_segments = rq.nr_phys_segments;
-   bio->bi_hw_segments = rq.nr_hw_segments;
-   bio->bi_flags |= (1 << BIO_SEG_VALID);
-}
-EXPORT_SYMBOL(blk_recount_segments);
-
 static void blk_recalc_rq_segments(struct request *rq)
 {
int nr_phys_segs;
@@ -1326,11 +1313,6 @@ static int blk_phys_contig_segment(struc
 static int blk_hw_contig_segment(struct request_queue *q, struct request *req,
 struct request *nxt)
 {
-   if (unlikely(!bio_flagged(req->biotail, BIO_SEG_VALID)))
-   blk_recount_segments(q, req->biotail);
-   if (unlikely(!bio_flagged(nxt->bio, BIO_SEG_VALID)))
-   blk_recount_segments(q, nxt->bio);
-
if (!rq_virt_mergeable(req, nxt) ||
BIOVEC_VIRT_OVERSIZE(req->hw_back_size +
 nxt->hw_front_size))

diff .prev/Documentation/block/biodoc.txt ./Documentation/block/biodoc.txt
--- .prev/Documentation/block/biodoc.txt2007-07-31 11:21:06.0 
+1000
+++ ./Documentation/block/biodoc.txt2007-07-31 11:21:22.0 +1000
@@ -456,8 +456,6 @@ struct bio {
unsigned intbi_idx; /* current index into bio_vec array */
 
unsigned intbi_size; /* total size in bytes */
-   unsigned short  bi_phys_segments; /* segments after physaddr coalesce*/
-   unsigned short  bi_hw_segments; /* segments after DMA remapping */
unsigned intbi_max;  /* max bio_vecs we can hold
 used as index into pool */
struct bio_vec   *bi_io_vec;  /* the actual vec list */

diff .prev/drivers/md/dm.c ./drivers/md/dm.c
--- .prev/drivers/md/dm.c   2007-07-31 11:21:03.0 +1000
+++ ./drivers/md/dm.c   2007-07-31 11:21:22.0 +1000
@@ -660,7 +660,6 @@ static struct bio *clone_bio(struct bio 
clone->bi_io_vec += idx;
clone->bi_vcnt = bv_count;
clone->bi_size = to_bytes(len);
-   clone->bi_flags &= ~(1 << BIO_SEG_VALID);
 
return clone;
 }

diff .prev/drivers/md/raid10.c ./drivers/md/raid10.c
--- .prev/drivers/md/raid10.c   2007-07-31 11:21:07.0 +1000
+++ ./drivers/md/raid10.c   2007-07-31 11:21:22.0 +1000
@@ -1277,8 +1277,6 @@ static void sync_request_write(mddev_t *
 */
tbio->bi_vcnt = vcnt;
tbio->bi_size = r10_bio->sectors << 9;
-   tbio->bi_phys_segments = 0;
-   tbio->bi_hw_segments = 0;
tbio->bi_flags &= ~(BIO_POOL_MASK - 1);
tbio->bi_flags |= 1 << BIO_UPTODATE;
tbio->bi_next = NULL;
@@ -1883,8 +1881,6 @@ static sector_t sync_request(mddev_t *md
if (bio->bi_end_io)
bio->bi_flags |= 1 << BIO_UPTODATE;
bio->bi_vcnt = 0;
-   bio->bi_phys_segments = 0;
-   bio->bi_hw_segments = 0;
bio->bi_size = 0;
}
 
@@ -1909,7 +1905,6 @@ static sector_t sync_request(mddev_t *md
/* remove last page from

[PATCH] debugfs helper for decimal challenged

2007-08-01 Thread Robin Getz

From: Robin Getz <[EMAIL PROTECTED]>

Allows debugfs helper functions to have a hex output, rather than just decimal
 
Signed-off-by:  Robin Getz <[EMAIL PROTECTED]>
---
 fs/debugfs/file.c |   36 
 1 file changed, 36 insertions(+)

Index: fs/debugfs/file.c
===
--- fs/debugfs/file.c   (revision 3529)
+++ fs/debugfs/file.c   (working copy)
@@ -179,6 +179,42 @@
 }
 EXPORT_SYMBOL_GPL(debugfs_create_u32);
 
+DEFINE_SIMPLE_ATTRIBUTE(fops_x8, debugfs_u8_get, debugfs_u8_set, "0x%02llx\n");
+
+DEFINE_SIMPLE_ATTRIBUTE(fops_x16, debugfs_u16_get, debugfs_u16_set, 
"0x%04llx\n");
+
+DEFINE_SIMPLE_ATTRIBUTE(fops_x32, debugfs_u32_get, debugfs_u32_set, 
"0x%08llx\n");
+
+/**
+ * debugfs_create_x8 - create a debugfs file that is used to read and write an 
unsigned 8-bit value
+ * debugfs_create_x16 - create a debugfs file that is used to read and write 
an unsigned 16-bit value
+ * debugfs_create_x32 - create a debugfs file that is used to read and write 
an unsigned 32-bit value
+ *
+ * These functions are exactly the same as the above functions, (but use a hex
+ * output for the decimal challenged) for details look at the above unsigned
+ * decimal functions.
+ */
+struct dentry *debugfs_create_x8(const char *name, mode_t mode,
+struct dentry *parent, u8 *value)
+{
+   return debugfs_create_file(name, mode, parent, value, _x8);
+}
+EXPORT_SYMBOL_GPL(debugfs_create_x8);
+
+struct dentry *debugfs_create_x16(const char *name, mode_t mode,
+struct dentry *parent, u16 *value)
+{
+   return debugfs_create_file(name, mode, parent, value, _x16);
+}
+EXPORT_SYMBOL_GPL(debugfs_create_x16);
+
+struct dentry *debugfs_create_x32(const char *name, mode_t mode,
+struct dentry *parent, u32 *value)
+{
+   return debugfs_create_file(name, mode, parent, value, _x32);
+}
+EXPORT_SYMBOL_GPL(debugfs_create_x32);
+
 static ssize_t read_file_bool(struct file *file, char __user *user_buf,
  size_t count, loff_t *ppos)
 {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] retrieve VBE EDID/DDC info independent of used video mode

2007-08-01 Thread Antonino A. Daplas

On Wed, 2007-08-01 at 09:54 +0800, Antonino A. Daplas wrote:
> On Tue, 2007-07-31 at 21:17 -0400, Daniel Drake wrote:
> > Zwane Mwaikambo wrote:
> > > Sorry if this has been hashed out before, but could you point me towards 
> > > the gentoo bugzilla entry? I'm trying to understand how your setup broke. 
> > > Which version VBE does your system have?
> > 
> > Here's the bug:
> > http://bugs.gentoo.org/show_bug.cgi?id=181067
> > 
> 
> Looking at the dmesg output of the working and failing kernel, it does
> seem that there's no EDID block available in the failing kernel.
> 

BTW, I looked at the above bug report, it seems his last dmesg does not
have fbcon enabled.  Make sure that CONFIG_FRAMEBUFFER_CONSOLE=y before
doing more tests (the problem of lack of the EDID block in the failing
kernel still applies).

Tony

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix WARN_ON() on bitfield ops for all other archs

2007-08-01 Thread Paul Mundt

On Thu, Aug 02, 2007 at 12:18:38AM +0200, Heiko Carstens wrote:
> From: Heiko Carstens <[EMAIL PROTECTED]>
> 
> Fixes WARN_ON() on bitfiels ops for all architectures that have
> been left out in 8d4fbcfbe0a4bfc73e7f0297c59ae514e1f1436f.
> 
> Cc: Alexey Dobriyan <[EMAIL PROTECTED]>
> Cc: Herbert Xu <[EMAIL PROTECTED]>
> Cc: Paul Mundt <[EMAIL PROTECTED]>
> Cc: Haavard Skinnemoen <[EMAIL PROTECTED]>
> Cc: Matthew Wilcox <[EMAIL PROTECTED]>
> Cc: Kyle McMartin <[EMAIL PROTECTED]>
> Cc: Martin Schwidefsky <[EMAIL PROTECTED]>
> Signed-off-by: Heiko Carstens <[EMAIL PROTECTED]>

Acked-by: Paul Mundt <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix two potential mem leaks in MPT Fusion (mpt_attach())

2007-08-01 Thread Matthew Wilcox

On Wed, Aug 01, 2007 at 05:26:53PM -0700, Andrew Morton wrote:
> Why on earth is that using GFP_ATOMIC?  This function later goes on to
> create procfs files and such things.

Seems fairly common in driver initialisation code.  I removed three
instances of this in the advansys driver.

> y'know, we could have a debug option which will spit warnings if someone
> does a !__GFP_WAIT allocation while !in_atomic() (only works if
> CONFIG_PREEMPT).  
> 
> But please, make it depend on !CONFIG_AKPM.  I shudder to think about all
> the stuff it would pick up.

Seems like you'd get a lot of false positives.  How about a call:

slab_warn_about_atomic_allocs();

right before calling the initcalls, and then

slab_stop_warning_about_atomic_allocs();

after calling them?  That should give people a lot to chew on for a few
months.  Obviously, you would need to not warn about allocations from
interrupt context, as you say above.

-- 
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 026 of 35] Split any large bios that arrive at __make_request.

2007-08-01 Thread Tejun Heo

Neil Brown wrote:
> On Thursday August 2, [EMAIL PROTECTED] wrote:
>> Hmmm... Patches don't apply beyond this one.  I'm applying against
>> clean 2.6.23-rc1-mm1 grabbed using ketchup.
>>
> 
> So do you mean 027 doesn't apply, or that 028 doesn't apply next?
> 
> It is possible that you missed 027.  It originally has 3 consecutive
> Xs in the subject line, so vger.kernel.org bounced it.
> I re-sent it, but it would have had a different References header and
> the might not appear in the same thread.
> 
> If you confirm that 027 isn't applying, I'll track down what happened.

You're right.  I don't have patch 27.  Looking Ummm... It's not in
my LKML folder either.  Can you resend it?

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 11/26] tmpfs white-out support

2007-08-01 Thread Matt Mackall

On Wed, Aug 01, 2007 at 04:13:46PM +0100, Hugh Dickins wrote:
> On Mon, 30 Jul 2007, Jan Blunck wrote:
> 
> > Introduce white-out support to tmpfs.
> > 
> > Signed-off-by: Jan Blunck <[EMAIL PROTECTED]>
> > ---
> >  include/linux/shmem_fs.h |1 
> >  mm/shmem.c   |   54 
> > +++
> >  2 files changed, 55 insertions(+)
> 
> I see there's debate about whether this (and its fellows) give the
> right semantic to whiteouts; and I've not begun to think about that.
> 
> But as a patch to tmpfs for what you're trying to do, it looks just
> about fine.  I say "just about" because the reference counting looks
> right, but I wouldn't dare say that it _is_ right without testing.
> 
> And I'd probably want to add a minor adjustment, so that a mount with
> nr_inodes=1000 could still support exactly 1000 inodes, despite your
> allocating one for the whiteout (usually never used) at mount time.
> But that can follow along later, no problem.

Also, you might want to make sure whiteouts work with ramfs, which
replaces tmpfs when tmpfs is disabled.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: lmbench ctxsw regression with CFS

2007-08-01 Thread Nick Piggin

On Wed, Aug 01, 2007 at 07:31:26PM -0700, Linus Torvalds wrote:
> 
> 
> On Thu, 2 Aug 2007, Nick Piggin wrote:
> > 
> > lmbench 3 lat_ctx context switching time with 2 processes bound to a
> > single core increases by between 25%-35% on my Core2 system (didn't do
> > enough runs to get more significance, but it is around 30%). The problem
> > bisected to the main CFS commit.
> 
> One thing to check out is whether the lmbench numbers are "correct". 
> Especially on SMP systems, the lmbench numbers are actually *best* when 
> the two processes run on the same CPU, even though that's not really at 
> all the best scheduling - it's just that it artificially improves lmbench 
> numbers because of the close cache affinity for the pipe data structures.

Yes, I bound them to a single core.


> So when running the lmbench scheduling benchmarks on SMP, it actually 
> makes sense to run them *pinned* to one CPU, because then you see the true 
> scheduler performance. Otherwise you easily get noise due to balancing 
> issues, and a clearly better scheduler can in fact generate worse 
> numbers for lmbench.
> 
> Did you do that? It's at least worth testing. I'm not saying it's the case 
> here, but it's one reason why lmbench3 has the option to either keep 
> processes on the same CPU or force them to spread out (and both cases are 
> very interesting for scheduler testing, and tell different things: the 
> "pin them to the same CPU" shows the latency on one runqueue, while the 
> "pin them to different CPU's" shows the latency of a remote wakeup).
> 
> IOW, while we used the lmbench scheduling benchmark pretty extensively in 
> early scheduler tuning, if you select the defaults ("let the system just 
> schedule processes on any CPU") the end result really isn't necessarily a 
> very meaningful value: getting the best lmbench numbers actually requires 
> you to do things that tend to be actively *bad* in real life.
> 
> Of course, a perfect scheduler would notice when two tasks are *so* 
> closely related and only do synchronous wakups, that it would keep them on 
> the same core, and get the best possible scores for lmbench, while not 
> doing that for other real-life situations. So with a *really* smart 
> scheduler, lmbench numbers would always be optimal, but I'm not sure 
> aiming for that kind of perfection is even worth it!

Agreed with all your comments on multiprocessor balancing, but that
was eliminated in these tests. I remote wakeup latency is another thing
I want to test, but it isn't so interesting until the serial regression
is fixed.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SD still better than CFS for 3d ?(was Re: 2.6.23-rc1)

2007-08-01 Thread Lee Revell

On 7/31/07, Ingo Molnar <[EMAIL PROTECTED]> wrote:
> Almost all of the Reiser3
> code runs under the BKL, and the only other major kernel infrastructure
> that has BKL dependencies is the TTY code.

Also NFS:

$ grep -rIi lock_kernel kernel-source/linux-2.6.17/fs/nfs/ | wc -l
94

Lee
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: lmbench ctxsw regression with CFS

2007-08-01 Thread Linus Torvalds

On Thu, 2 Aug 2007, Nick Piggin wrote:
> 
> lmbench 3 lat_ctx context switching time with 2 processes bound to a
> single core increases by between 25%-35% on my Core2 system (didn't do
> enough runs to get more significance, but it is around 30%). The problem
> bisected to the main CFS commit.

One thing to check out is whether the lmbench numbers are "correct". 
Especially on SMP systems, the lmbench numbers are actually *best* when 
the two processes run on the same CPU, even though that's not really at 
all the best scheduling - it's just that it artificially improves lmbench 
numbers because of the close cache affinity for the pipe data structures.

So when running the lmbench scheduling benchmarks on SMP, it actually 
makes sense to run them *pinned* to one CPU, because then you see the true 
scheduler performance. Otherwise you easily get noise due to balancing 
issues, and a clearly better scheduler can in fact generate worse 
numbers for lmbench.

Did you do that? It's at least worth testing. I'm not saying it's the case 
here, but it's one reason why lmbench3 has the option to either keep 
processes on the same CPU or force them to spread out (and both cases are 
very interesting for scheduler testing, and tell different things: the 
"pin them to the same CPU" shows the latency on one runqueue, while the 
"pin them to different CPU's" shows the latency of a remote wakeup).

IOW, while we used the lmbench scheduling benchmark pretty extensively in 
early scheduler tuning, if you select the defaults ("let the system just 
schedule processes on any CPU") the end result really isn't necessarily a 
very meaningful value: getting the best lmbench numbers actually requires 
you to do things that tend to be actively *bad* in real life.

Of course, a perfect scheduler would notice when two tasks are *so* 
closely related and only do synchronous wakups, that it would keep them on 
the same core, and get the best possible scores for lmbench, while not 
doing that for other real-life situations. So with a *really* smart 
scheduler, lmbench numbers would always be optimal, but I'm not sure 
aiming for that kind of perfection is even worth it!

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Examine user space locks

2007-08-01 Thread Satyam Sharma

Hi Rokas,

[ Your mailer does not maintain the Cc: list. That's not good
  when posting to LKML. Adding back Jan Engelhardt to Cc: ]

On Wed, 1 Aug 2007, Rokas Masiulis wrote:

> On Wed, Aug 01, 2007 at 09:51:06PM +0200, Jan Engelhardt wrote:
> > [...]
> > echo t >/proc/sysrq-trigger
> 
>  # cat /proc/kmsg > log &
>  [1] 28058
>  # echo t >/proc/sysrq-trigger
>  # sleep 3
>  # fg
>  cat /proc/kmsg > log
>  ^C

First, ensure:
# echo 1 > /proc/sys/kernel/sysrq

and that you've built with various debug stuff like CONFIG_LOCKDEP,
CONFIG_FRAME_POINTER, etc enabled.

If you're only interested in those processes that are hung in "D"
i.e. TASK_UNINTERRUPTIBLE wait, you could try:

# dmesg -c
# echo w > /proc/sysrq-trigger
# dmesg > uninterruptible-sleeping-tasks.txt

to limit output to only interesting stuff. Also, not all tasks waiting
in TASK_UNINTERRUPTIBLE may be blocked on locks / mutexes (most would
simply be waiting for IO to complete), so:

# dmesg -c
# echo d > /proc/sysrq-trigger
# dmesg > held-locks.txt

would tell us specifically what locks are held currently.

> it seems that i can't get full log. Not all proceses are listed on it.
> 
> But some of "bad proceses i have".
> (full log: http://89.190.108.145/~rokas/tasks.txt)

Eek, not very readable, that.

> it seems that some are stoped on cdrom_transfer_packet_command, some on 
> __mutex_lock_slowpath.
> I'm confused..
> 
>  hald-addon-st D 04E2 0  5826  1 17981 12866 (NOTLB)
> f55ebc34 c02c9717 c02c94b8 04e2 c0529780  00200202 
> c0529a64 
>  c0529780 c02c9656 0080 c2a17b20  b25fc600 
> 003d0c2e 
> c042e340 f54d1a70 f54d1b78 f55ea000 f55ebca8 f55ebc54 f55ebc8c 
> c03bdb8a 
>  Call Trace:
>cdrom_transfer_packet_command+0x80/0xf9   
> cdrom_timer_expiry+0x0/0x5d
>cdrom_start_packet_command+0x141/0x182   
> wait_for_completion+0x85/0xca
> ...
>do_sys_open+0x4a/0xca   sys_open+0x1c/0x20
>sysenter_past_esp+0x54/0x75 

Could be waiting on i/o ... (?)

>  eject D 0101 0 12662  1 12793 17981 (NOTLB)
> e5c17e9c c0156dfd  0101 0001  f308eac4 
>  
> 0003 0802 f56d5e40 0003 c2a17b20  0616c400 
> 003d2850 
> c042e340 eb65e030 eb65e138 f5ad970c f5ad9710 eb65e030 f5ad9718 
> c03be586 
>  Call Trace:
>free_pages_and_swap_cache+0x51/0x75   
> __mutex_lock_slowpath+0x46/0x7f
>.text.lock.mutex+0x5/0x14   do_open+0x55/0x382
> ...
>do_sys_open+0x4a/0xca   sys_open+0x1c/0x20
>sysenter_past_esp+0x54/0x75 

Not the BKL. Sounds like bdev->bd_mutex ...

>  hald-probe-st D 0101 0 17981  1 12662  5826 (NOTLB)
> dfa3be9c f5761144  0101 0001  e5ab03f4 
>  
> 0004 8800  dd243550 c2a1fb20  22986700 
> 003d64c8 
> dd243550 e4379550 e4379658 f5ad970c f5ad9710 e4379550 f5ad9718 
> c03be586 
>  Call Trace:
>__mutex_lock_slowpath+0x46/0x7f   
> .text.lock.mutex+0x5/0x14
>do_open+0x55/0x382   blkdev_open+0x0/0x5a
> ...
>do_sigaction+0xf1/0x192   do_sys_open+0x4a/0xca
>sys_open+0x1c/0x20   sysenter_past_esp+0x54/0x75

Hmm, a lot of stuff seems hung on the bdev->bd_mutex of your cdrom device.
Your cdrom drive becomes unusable after this, right?

Can you reproduce this, or let us know what exactly you did to lead to
this situation? You could put up your .config somewhere as well, also
probably give some description of your setup / cdrom hardware.

Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CFS review

2007-08-01 Thread Linus Torvalds

On Wed, 1 Aug 2007, Roman Zippel wrote:
> 
> I'm not so sure about that. sched_clock() has to be fast, so many archs 
> may want to continue to use jiffies. As soon as one does that one can also 
> save a lot of computational overhead by using 32bit instead of 64bit.
> The question is then how easy that is possible.

I have to say, it would be interesting to try to use 32-bit arithmetic.

I also think it's likely a mistake to do a nanosecond resolution. That's 
part of what forces us to 64 bits, and it's just not even an *interesting* 
resolution.

It would be better, I suspect, to make the scheduler clock totally 
distinct from the other clock sources (many architectures have per-cpu 
cycle counters), and *not* try to even necessarily force it to be a 
"time-based" one.

So I think it would be entirely appropriate to

 - do something that *approximates* microseconds.

   Using microseconds instead of nanoseconds would likely allow us to do 
   32-bit arithmetic in more areas, without any real overflow.

   And quite frankly, even on fast CPU's, the scheduler is almost 
   certainly not going to be able to take any advantage of the nanosecond 
   resolution. Just about anything takes a microsecond - including IO. I 
   don't think nanoseconds are worth the ten extra bits they need, if we 
   could do microseconds in 32 bits.

   And the "approximates" thing would be about the fact that we don't 
   actually care about "absolute" microseconds as much as something that 
   is in the "roughly a microsecond" area. So if we say "it doesn't have 
   to be microseconds, but it should be within a factor of two of a ms", 
   we could avoid all the expensive divisions (even if they turn into 
   multiplications with reciprocals), and just let people *shift* the CPU 
   counter instead.

   In fact, we could just say that we don't even care about CPU counters 
   that shift frequency - so what? It gets a bit further off the "ideal 
   microsecond", but the scheduler just cares about _relative_ times 
   between tasks (and that the total latency is within some reasonable 
   value), it doesn't really care about absolute time.

Hmm?

It would still be true that something that is purely based on timer ticks 
will always be liable to have rounding errors that will inevitably mean 
that you don't get good fairness - tuning threads to run less than a timer 
tick at a time would effectively "hide" them from the scheduler 
accounting. However, in the end, I think that's pretty much unavoidable. 
We should make sure that things mostly *work* for that situation, but I 
think it's ok to say that the *quality* of the fairness will obviously 
suffer (and possibly a lot in the extreme cases).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

lmbench ctxsw regression with CFS

2007-08-01 Thread Nick Piggin

Hi,

I didn't follow all of the scheduler debates and flamewars, so apologies
if this was already covered. Anyway.

lmbench 3 lat_ctx context switching time with 2 processes bound to a
single core increases by between 25%-35% on my Core2 system (didn't do
enough runs to get more significance, but it is around 30%). The problem
bisected to the main CFS commit.

I was really hoping that a smaller runqueue data structure could actually
increase performance with the common case of small numbers of tasks :(

I assume this was a known issue before CFS was merged. Do you know what is
causing the slowdown? Any plans to fix it?

Thanks,
Nick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2/2] 2.6.23-rc1: known regressions

2007-08-01 Thread Horms

> IA64
> 
> Subject : Regression in serial console on ia64 after 2.6.22
> References  : http://marc.info/?l=linux-ia64=118483645914066=2
> Last known good : ?
> Submitter   : Horms <[EMAIL PROTECTED]>
> Caused-By   : Yinghai Lu <[EMAIL PROTECTED]>
>   commit 18a8bd949d6adb311ea816125ff65050df1f3f6e
> Handled-By  : ?
> Status  : unknown

This was fixed by ensuring that machvex is set up before the
serial console

a07ee86205808d36973440e68c7277f9ed63b87f in Linus' tree

-- 
Horms
  H: http://www.vergenet.net/~horms/
  W: http://www.valinux.co.jp/en/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.23-rc6-mm1: compile error mm/sparse.c

2007-08-01 Thread sukadev


I get this compile error on 2.6.23-rc1-mm1 on i386. Config file
attached (this basic config file worked on 2.6.22-rc6-mm1)


lx26-23-rc1-mm1/mm/sparse.c: In function 'sparse_init':
lx26-23-rc1-mm1/mm/sparse.c:482: error: implicit declaration of function 
'sparse_early_usemap_alloc'
lx26-23-rc1-mm1/mm/sparse.c:482: warning: assignment makes pointer from integer 
without a cast
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.22-rc6-mm1-pidns1
# Sat Jul 14 11:16:33 2007
#
CONFIG_X86_64=y
CONFIG_64BIT=y
CONFIG_X86=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_NONIRQ_WAKEUP=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_ZONE_DMA32=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_NR_QUICK=2
CONFIG_RWSEM_GENERIC_SPINLOCK=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_CMPXCHG=y
CONFIG_EARLY_PRINTK=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_DMI=y
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_BUG=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SWAP=y
CONFIG_SWAP_PREFETCH=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
# CONFIG_TASKSTATS is not set
CONFIG_USER_NS=y
CONFIG_PID_NS=y
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=14
# CONFIG_CONTAINER_DEBUG is not set
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_CONTAINER_CPUACCT is not set
# CONFIG_CONTAINER_NS is not set
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_EMBEDDED=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_PROC_SMAPS=y
CONFIG_PROC_CLEAR_REFS=y
CONFIG_PROC_PAGEMAP=y
CONFIG_PROC_KPAGEMAP=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
# CONFIG_MODULES is not set
CONFIG_BLOCK=y
# CONFIG_BLK_DEV_IO_TRACE is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_AS=y
# CONFIG_DEFAULT_DEADLINE is not set
# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="anticipatory"

#
# Processor type and features
#
# CONFIG_TICK_ONESHOT is not set
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
CONFIG_X86_PC=y
# CONFIG_X86_VSMP is not set
# CONFIG_MK8 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
CONFIG_GENERIC_CPU=y
CONFIG_X86_L1_CACHE_BYTES=128
CONFIG_X86_L1_CACHE_SHIFT=7
CONFIG_X86_INTERNODE_CACHE_BYTES=128
CONFIG_X86_TSC=y
CONFIG_X86_GOOD_APIC=y
# CONFIG_MICROCODE is not set
# CONFIG_X86_MSR is not set
CONFIG_X86_CPUID=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_LOCAL_APIC=y
# CONFIG_MTRR is not set
# CONFIG_SMP is not set
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
# CONFIG_PREEMPT_BKL is not set
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_SELECT_MEMORY_MODEL=y
# CONFIG_FLATMEM_MANUAL is not set
# CONFIG_DISCONTIGMEM_MANUAL is not set
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_HAVE_MEMORY_PRESENT=y
# CONFIG_SPARSEMEM_STATIC is not set
CONFIG_SPARSEMEM_EXTREME=y
# CONFIG_MEMORY_HOTPLUG is not set
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_RESOURCES_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_PHYSICAL_ALIGN=0x20
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_HPET_TIMER=y
CONFIG_IOMMU=y
# CONFIG_CALGARY_IOMMU is not set
CONFIG_SWIOTLB=y
# CONFIG_X86_MCE is not set
# CONFIG_KEXEC is not set
# CONFIG_CRASH_DUMP is not set
# CONFIG_RELOCATABLE is not set
CONFIG_PHYSICAL_START=0x10
# CONFIG_SECCOMP is not set
# CONFIG_CC_STACKPROTECTOR is not set
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
CONFIG_K8_NB=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_ISA_DMA_API=y

#
# Power management options
#
# CONFIG_PM is not set

#
# CPU Frequency scaling
#
# CONFIG_CPU_FREQ is not set

#
# CPU idle PM support
#
# CONFIG_CPU_IDLE is not set

#
# Bus options (PCI etc.)
#
CONFIG_PCI=y
CONFIG_PCI_DIRECT=y
#

Re: NETPOLL=y , NETDEVICES=n compile error ( Re: 2.6.23-rc1-mm1 )

2007-08-01 Thread Matt Mackall

On Wed, Aug 01, 2007 at 11:59:21AM +0200, Jarek Poplawski wrote:
> On Tue, Jul 31, 2007 at 05:05:00PM +0200, Gabriel C wrote:
> > Jarek Poplawski wrote:
> > > On Tue, Jul 31, 2007 at 12:14:36PM +0200, Gabriel C wrote:
> > >> Jarek Poplawski wrote:
> > >>> On 28-07-2007 20:42, Gabriel C wrote:
> >  Andrew Morton wrote:
> > > On Sat, 28 Jul 2007 17:44:45 +0200 Gabriel C <[EMAIL PROTECTED]> 
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> I got this compile error with a randconfig ( 
> > >> http://194.231.229.228/MM/randconfig-auto-82.broken.netpoll.c ).
> > >>
> > >> ...
> > >>
> > >> net/core/netpoll.c: In function 'netpoll_poll':
> > >> net/core/netpoll.c:155: error: 'struct net_device' has no member 
> > >> named 'poll_controller'
> > >> net/core/netpoll.c:159: error: 'struct net_device' has no member 
> > >> named 'poll_controller'
> > >> net/core/netpoll.c: In function 'netpoll_setup':
> > >> net/core/netpoll.c:670: error: 'struct net_device' has no member 
> > >> named 'poll_controller'
> > >> make[2]: *** [net/core/netpoll.o] Error 1
> > >> make[1]: *** [net/core] Error 2
> > >> make: *** [net] Error 2
> > >> make: *** Waiting for unfinished jobs
> > >>
> > >> ...
> > >>
> > >>
> > >> I think is because KGDBOE selects just NETPOLL.
> > >>
> > > Looks like it.
> > >
> > > Select went and selected NETPOLL and NETPOLL_TRAP but things like
> > > CONFIG_NETDEVICES and CONFIG_NET_POLL_CONTROLLER remain unset.  
> > > `select'
> > > remains evil.
> > >>> ...
> >  I think there may be a logical issue ( again if I got it right ).
> >  We need some ethernet card to work with kgdboe right ? but we don't 
> >  have any if !NETDEVICES && !NET_ETHERNET.
> > 
> >  So maybe some ' depends on ... && NETDEVICES!=n && NET_ETHERNET!=n ' 
> >  is needed too ? 
> > >>> IMHO, the only logical issue here is netpoll.c mustn't use 
> > >>> CONFIG_NET_POLL_CONTROLLER code without #ifdef if it doesn't
> > >>> add this dependency itself.
> > >>>
> > >> Well it does if NETDEVICES && if NET_ETHERNET which booth are N when 
> > >> !NETDEVICES is why KGDBOE uses select and not depends on.
> > > 
> > > "does if XXX" means may "use if XXX".
> > 
> > From what I know means only use "if xxx" on !xxx everything inside the "if 
> > xxx" is n and "depends on  
> > does not work.
> > 
> > ...
> > 
> > menuconfig FOO
> > bool "FOO"
> > depends on BAR
> > default y 
> > -- help --
> > something
> > if FOO
> > 
> > config BAZ
> > depends on WHATEVR && !NOT_THIS
> > 
> > menuconfig SOMETHING_ELSE
> > 
> > if SOMETHING_ELSE
> > 
> > config BLUBB
> > depends on PCI && WHATNOT
> > 
> > endif # SOMETHING_ELSE
> > 
> > config NETPOLL
> > def_bool NETCONSOLE
> > 
> > config NETPOLL_TRAP
> > bool "Netpoll traffic trapping"
> > default n
> > depends on NETPOLL
> >   
> > config NET_POLL_CONTROLLER
> > def_bool NETPOLL
> > 
> > endif # FOO
> > 
> > Now if you set FOO=n all is gone and your driver have to select whatever it 
> > needs from there.
> 
> Probably not exactly so...
> 
> >From drivers/net/Kconfig:
> 
> > # All the following symbols are dependent on NETDEVICES - do not repeat
> > # that for each of the symbols.
> > if NETDEVICES
> 
> So, according to this netpoll could presume NETDEVICES and
> NET_POLL_CONTROLLER are always on.
> 
> But, as you've found, it's possible to select NETPOLL and !NETDEVICES,
> so this comment is at least not precise.
> 
> On the other side something like this:
> 
> ...
> endif # NETDEVICES
> 
> config NETPOLL
> depends on NETDEVICES
> def_bool NETCONSOLE
> 
> config NETPOLL_TRAP
> bool "Netpoll traffic trapping"
> default n
> depends on NETPOLL
> 
> config NET_POLL_CONTROLLER
> def_bool NETPOLL
> depends on NETPOLL
> 
> 
> seems to select NET_POLL_CONTROLLER after selecting NETPOLL, but
> still doesn't check for NETDEVICES dependency.

That's odd. Adding Sam to the cc:.

> > >> Now KGDBOE just selects NETPOLL and NETPOLL_TRAP.
> > >> Adding 'select CONFIG_NET_POLL_CONTROLLER' let kgdboe compiles but the 
> > >> question is does it work without any ethernet card ?
> > > 
> > > Why kgdboe should care what netpoll needs? So, I hope, you are adding
> > > this select under config NETPOLL. On the other hand, if NETPOLL should
> > > depend on NET_POLL_CONTROLLER there is probably no reason to have them
> > > both.
> > 
> > NET_POLL_CONTROLLER has def_bool NETPOLL if NETDEVICES .
> > 
> > Net peoples ping ?:)

How about cc:ing the netpoll maintainer?
 
> OK, I wasn't right here: there is no visible reason for both in the
> kernel code, but I can imagine there could be some external users of
> NET_POLL_CONTROLLER without NETPOLL.

I don't know of any. As far as I can tell at this point,
NET_POLL_CONTROLLER == NETPOLL.

Re: kupdate weirdness

2007-08-01 Thread David Chinner

On Wed, Aug 01, 2007 at 10:45:16PM +0200, Miklos Szeredi wrote:
> The following strange behavior can be observed:
> 
> 1. large file is written
> 2. after 30 seconds, nr_dirty goes down by 1024
> 3. then for some time (< 30 sec) nothing happens (disk idle)
> 4. then nr_dirty again goes down by 1024
> 5. repeat from 3. until whole file is written
> 
> So basically a 4Mbyte chunk of the file is written every 30 seconds.
> I'm quite sure this is not the intended behavior.
> 
> The reason seems to be that __sync_single_inode() will move the
> partially written inode from s_io onto s_dirty, and sync_sb_inode()
> will not splice it back onto s_io until the rest of the inodes on s_io
> has been processed.

It's been doing this for a long time.

http://marc.info/?l=linux-kernel=113919849421679=2

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22-rc1-mm1 huge pages VM freeze (maybe?)

2007-08-01 Thread Zan Lynx

On Wed, 2007-08-01 at 08:52 -0700, Nish Aravamudan wrote:
> On 7/31/07, Zan Lynx <[EMAIL PROTECTED]> wrote:
> > On Tue, 2007-07-31 at 15:02 -0700, Randy Dunlap wrote:
> > > On Tue, 31 Jul 2007 15:44:21 -0600 Zan Lynx wrote:
> > >
> > > > I was playing with huge pages and libhugetlbfs.  Small programs like
> > > > "ls" work fine.  I tried running Evolution through libhugetlbfs and the
> > > > system slowly stops running.  One interesting thing is the "ps" command,
> > > > it gets stuck like this:
> > >
> > > Do you mean 2.6.22-rc1-mm1 or 2.6.23-rc1-mm1?
> >
> > D'oh!  I mean 2.6.23-rc1-mm1, the 22 was a typo.  Cut & paste to be
> > sure:
> > Linux zephyr 2.6.23-rc1-mm1 #1 SMP PREEMPT Wed Jul 25 17:33:04 MDT 2007
> > x86_64 AMD Athlon(tm) 64 Processor 3400+ AuthenticAMD GNU/Linux
> 
> Just to confirm, still happens with -mm2?

No, it does not seem to.  Evolution runs OK.  ps, top, pmap all work
fine.

However, a couple of other things happened.  Could be unrelated or only
loosely related.

Evolution launches spamd (spamassassin) to filter junk mail.  spamd died
and I have this in dmesg to show for it:

VM: killing process spamd

spamd would have inherited the libhugetlbfs.so environment variables.
There are no other clues as to why it died though.

Also, immediately after launching evolution with libhugetlbfs, I got
that USB bug where the mouse starts creating keyboard input.  I got some
of these in dmesg:
keyboard.c: can't emulate rawmode for keycode 240

That could be pure coincidence, although I had been using the system
almost all day before that, and it hadn't happened.  
-- 
Zan Lynx <[EMAIL PROTECTED]>

signature.asc
Description: This is a digitally signed message part

What archs need flush_tlb_page() in handle_pte_fault() ?

2007-08-01 Thread Benjamin Herrenschmidt

Heya !

In my page table accessor spring cleaning, one of my targets is
flush_tlb_page(). At this stage, it's only called by generic code in one
place (in addition to the asm-generic bits that use it to implement
missing accessors, but I'm taking care of those spearately) :

In handle_pte_fault(), when the PTE is present -and-
ptep_set_access_flags() returns false -and- it's a write fault, we do a
flush_tlb_page().

ptep_set_access_flags() returning false typically means we don't
actually need to call update_mmu_cache() and haven't updated the PTE.

Now, I would like to understand what archs actually need that. If we
have lazy _PAGE_DIRTY handling, then ptep_set_access_flags() would have
done the flush already. I can imagine people may want to avoid the SMP
IPI in that case and only lazily flush on that CPU but that doesn't seem
to be what i386 does today.

In any case, I believe that this flush could be moved to inside
ptep_set_access_flags() for archs that need it, thus totally removing
the else { ... } clause in handle_pte_fault(). Archs that want to be
smart can do a local flush inside ptep_set_access_flags() if !changed &&
dirty, it all gets under arch control, and that last flush_tlb_page()
can be removed from generic code.

Now, before I actually remove it, I need to understand what archs
actually -need- that flush, so I can move it to their respective
ptep_set_access_flags() implementations.

I don't see i386 needing it unless I missed something.

For now, I'll assume nobody needs it. So please tell me if your arch
does and I'll make sure my patch has it fixed up properly.

Thanks !
Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [rfc] balance-on-fork NUMA placement

2007-08-01 Thread Nick Piggin

On Wed, Aug 01, 2007 at 03:52:11PM -0700, Martin Bligh wrote:
> 
> >And so forth.  Initial forks will balance.  If the children refuse to
> >die, forks will continue to balance.  If the parent starts seeing short
> >lived children, fork()s will eventually start to stay local.  
> 
> Fork without exec is much more rare than without. Optimising for
> the uncommon case is the Wrong Thing to Do (tm). What we decided

It's only the wrong thing to do if it hurts the common case too
much. Considering we _already_ balance on exec, then adding another
balance on fork is not going to introduce some order of magnitude
problem -- at worst it would be 2x but it really isn't too slow
anyway (at least nobody complained when we added it).

One place where we found it helps is clone for threads.

If we didn't do such a bad job at keeping tasks together with their
local memory, then we might indeed reduce some of the balance-on-crap
and increase the aggressiveness of periodic balancing.

Considering we _already_ balance on fork/clone, I don't know what
your argument is against this patch is? Doing the balance earlier
and allocating more stuff on the local node is surely not a bad
idea.

> the last time(s) this came up was to allow userspace to pass
> a hint in if they wanted to fork and not exec.
> 
> >I believe that this solved the pathological behavior we were seeing with
> >shell scripts taking way longer on the larger, supposedly more powerful,
> >platforms.
> >
> >Of course, that OS could migrate the equivalent of task structs and
> >kernel stack [the old Unix user struct that was traditionally swappable,
> >so fairly easy to migrate].  On Linux, all bets are off, once the
> >scheduler starts migrating tasks away from the node that contains their
> >task struct, ...  [Remember Eric Focht's "NUMA Affine Scheduler" patch
> >with it's "home node"?]
> 
> Task migration doesn't work well at all without userspace hints.
> SGI tried for ages (with IRIX) and failed. There's long discussions
> of all of these things back in the days when we merged the original
> NUMA scheduler in late 2.5 ...

Task migration? Automatic memory migration you mean? I think it deserves
another look regardless of what SGI could or could not do, and Lee and I
are slowly getting things in place. We'll see what happens...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CS5530 Alsa driver fails

2007-08-01 Thread Ash Willis

> the VGA video. If your box has VSA2 then VSA2 firmware has some kind of
> hooks to allow a native sound driver to take over and to reroute the
> interrupts without SB emulation. I don't have the docs for VSA2 but the
> horribly big natsemi provided audio driver does show how to do it.
> 

I wouldn't mind porting VSA2 support to ALSA. I just don't have test
hardware. Do you happen to know any examples of hardware that run VSA2
firmware or is it just a case of a firmware update?

> Alan

Thanks,

Ash

-- 
We've Got Your Name at http://www.mail.com!
Get a FREE E-mail Account Today - Choose From 100+ Domains

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm] linux-audit list is subscribers-only

2007-08-01 Thread Randy Dunlap

On Thu, 02 Aug 2007 01:59:54 +0200 Gabriel C wrote:

> Signed-off-by: Gabriel Craciunescu <[EMAIL PROTECTED]>
> 
> ---
> 
> --- linux-2.6.23-rc1-mm/MAINTAINERS.orig  2007-08-02 01:51:40.0 
> +0200
> +++ linux-2.6.23-rc1-mm/MAINTAINERS   2007-08-02 01:52:17.0 +0200
> @@ -672,7 +672,7 @@ S:Maintained
>  AUDIT SUBSYSTEM
>  P:   David Woodhouse
>  M:   [EMAIL PROTECTED]
> -L:   [EMAIL PROTECTED]
> +L:   [EMAIL PROTECTED] (subscribers-only)
>  W:   http://people.redhat.com/sgrubb/audit/
>  T:   git kernel.org:/pub/scm/linux/kernel/git/dwmw2/audit-2.6.git
>  S:   Maintained
> 
> -

Ack, I sent this patch last week...

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Profiling the Kernel

2007-08-01 Thread Lee Revell

On 8/1/07, Mohamed Bamakhrama <[EMAIL PROTECTED]> wrote:
> Hi *,
> I have a question regarding profiling the Linux kernel code during
> runtime (by "profile", I mean the usage of each function/module within
> the kernel itself). I googled and found many "system-wide" profiler
> such as sysprof, Oprofile, etc... I am working on an embedded system
> project and currently we are using an on-chip debugger which
> interfaces with the system through EJTAG port. All what it can provide
> is just a "uniform sampling" of the kernel code usage and according to
> the manufacturer, it is not a safe way to determine the "hot spots"
> within the kernel. Does anyone know about any hardware/software tool
> that can provide a "good" profile of the kernel code usage?

Oprofile can do what you want, check the docs and google.

Lee
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7] sysfs: fix locking in sysfs_lookup() and sysfs_rename_dir()

2007-08-01 Thread Eric W. Biederman

Greg KH <[EMAIL PROTECTED]> writes:

> On Tue, Jul 31, 2007 at 07:15:08PM +0900, Tejun Heo wrote:
>> sd children list walking in sysfs_lookup() and sd renaming in
>> sysfs_rename_dir() were left out during i_mutex -> sysfs_mutex
>> conversion.  Fix them.
>> 
>> Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
>
> Ok, to apply this, it messes with Eric's further patches for the shadow
> directory stuff.  But since it looks like you and Eric seem to have
> worked something else in that area, I'll drop Eric's patches for now, as
> that's the safest thing.
>
> Eric, is that ok?

Sounds good.  With a little luck I should have something working in a couple
of hours and then I can see about getting a tree with both my patches
and Tejun.  So I will probably rebase on top of Tejun's latest patches.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7] sysfs: fix locking in sysfs_lookup() and sysfs_rename_dir()

2007-08-01 Thread Greg KH

On Wed, Aug 01, 2007 at 05:29:00PM -0700, Greg KH wrote:
> On Tue, Jul 31, 2007 at 07:15:08PM +0900, Tejun Heo wrote:
> > sd children list walking in sysfs_lookup() and sd renaming in
> > sysfs_rename_dir() were left out during i_mutex -> sysfs_mutex
> > conversion.  Fix them.
> > 
> > Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
> 
> Ok, to apply this, it messes with Eric's further patches for the shadow
> directory stuff.  But since it looks like you and Eric seem to have
> worked something else in that area, I'll drop Eric's patches for now, as
> that's the safest thing.
> 
> Eric, is that ok?

Oh nevermind, it looks like you already took care of this with the later
patches...

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][RFC] unbreak generic futex_atomic_cmpxchg_inatomic() on UP

2007-08-01 Thread Lennert Buytenhek

On Thu, Aug 02, 2007 at 02:06:27AM +0200, Mikael Pettersson wrote:

> > > @@ -52,7 +53,34 @@ futex_atomic_op_inuser (int encoded_op, 
> > >  static inline int
> > >  futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval)
> > >  {
> > > +#ifdef CONFIG_SMP
> > >   return -ENOSYS;
> > > +#else
> > 
> > Since the callers of futex_atomic_cmpxchg_inatomic() don't really
> > seem prepared to deal with -ENOSYS (e.g. the handle_futex_death()
> > infinite loop when it gets -ENOSYS), it seems better never to
> > return -ENOSYS from this function at all.
> > 
> > What if you just stick an #error in here in the SMP case?
> 
> The problem with #error is that it would cause compile-time
> regressions. I assume that e.g. alpha supports building SMP
> kernels, but #error would prevent that.
> 
> Thus I opted to fix the UP case while leaving the SMP case
> unchanged. Actually I think the SMP case should be a BUG()
> rather than -ENOSYS,

Probably.  Or handle -ENOSYS in the callers -- but that's more
work, and would cease to be necessary once everyone implements
futex_atomic_cmpxchg_inatomic().


> but that's a different issue from the UP case which I really do
> want to see fixed.

ACK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 026 of 35] Split any large bios that arrive at __make_request.

2007-08-01 Thread Neil Brown

On Thursday August 2, [EMAIL PROTECTED] wrote:
> Hmmm... Patches don't apply beyond this one.  I'm applying against
> clean 2.6.23-rc1-mm1 grabbed using ketchup.
> 

So do you mean 027 doesn't apply, or that 028 doesn't apply next?

It is possible that you missed 027.  It originally has 3 consecutive
Xs in the subject line, so vger.kernel.org bounced it.
I re-sent it, but it would have had a different References header and
the might not appear in the same thread.

If you confirm that 027 isn't applying, I'll track down what happened.

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] 2.6.23-rc1-mm1 - fix missing numa_zonelist_order sysctl

2007-08-01 Thread KAMEZAWA Hiroyuki

On Wed, 01 Aug 2007 15:02:51 -0400
Lee Schermerhorn <[EMAIL PROTECTED]> wrote:
> [But, maybe reordering the zonelists is not such a good idea
> when ZONE_MOVABLE is populated?]
> 

It's case-by-case I think. In zone order with ZONE_MOVABLE case,
user's page cache will not use ZONE_NORMAL until ZONE_MOVABLE in all node
is exhausted. This is an expected behavior, I think.

I think the real problem is the scheme for "How to set zone movable size to
appropriate value for the system". This needs more study and documentation.
(but maybe depends on system configuration to some extent.)

Thanks,
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: cpufreq.c:(.text+0xaf178): undefined reference to `cpufreq_gov_performance'

2007-08-01 Thread Gabriel C

Andrew Morton wrote:
> On Wed, 1 Aug 2007 16:31:46 -0700
> "Miles Lane" <[EMAIL PROTECTED]> wrote:
> 
>>   LD  .tmp_vmlinux1
>> drivers/built-in.o: In function `__cpufreq_governor':
>> cpufreq.c:(.text+0xaf178): undefined reference to `cpufreq_gov_performance'
>> cpufreq.c:(.text+0xaf18a): undefined reference to `cpufreq_gov_performance'
>> make: *** [.tmp_vmlinux1] Error 1
> 
> One for Thomas, I expect.

Is this patch :

cpufreq-allow-ondemand-and-conservative-cpufreq-governors-to-be-used-as-default.patch

Reverting it here fixes the error.

Gabriel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7] sysfs: fix locking in sysfs_lookup() and sysfs_rename_dir()

2007-08-01 Thread Greg KH

On Tue, Jul 31, 2007 at 07:15:08PM +0900, Tejun Heo wrote:
> sd children list walking in sysfs_lookup() and sd renaming in
> sysfs_rename_dir() were left out during i_mutex -> sysfs_mutex
> conversion.  Fix them.
> 
> Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>

Ok, to apply this, it messes with Eric's further patches for the shadow
directory stuff.  But since it looks like you and Eric seem to have
worked something else in that area, I'll drop Eric's patches for now, as
that's the safest thing.

Eric, is that ok?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix two potential mem leaks in MPT Fusion (mpt_attach())

2007-08-01 Thread Andrew Morton

On Thu, 2 Aug 2007 01:55:33 +0200
Jesper Juhl <[EMAIL PROTECTED]> wrote:

> Greetings & Salutations,
> 
> The Coverity checker spotted two potential memory leaks in 
> drivers/message/fusion/mptbase.c::mpt_attach().
> 
> There are two returns that may leak the storage allocated for 
> 'ioc' (sizeof(MPT_ADAPTER) bytes).
> A simple fix would be to simply add two kfree() calls before 
> the return statements, but a better fix (that this patch 
> implements) is to reorder the code so that if we hit the first 
> return condition we don't have to do the allocation at all and 
> then just add a kfree() call for the second case.
> 
> Please consider applying.  Patch has been compile tested only.
> 
> 

umm,

> ---
> 
>  drivers/message/fusion/mptbase.c |   13 +++--
>  1 files changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/message/fusion/mptbase.c 
> b/drivers/message/fusion/mptbase.c
> index e866dac..f9bb705 100644
> --- a/drivers/message/fusion/mptbase.c
> +++ b/drivers/message/fusion/mptbase.c
> @@ -1393,18 +1393,18 @@ mpt_attach(struct pci_dev *pdev, const struct 
> pci_device_id *id)
>   struct proc_dir_entry *dent, *ent;
>  #endif
>  
> + if (mpt_debug_level)
> + printk(KERN_INFO MYNAM ": mpt_debug_level=%xh\n", 
> mpt_debug_level);
> +
> + if (pci_enable_device(pdev))
> + return r;
> +
>   ioc = kzalloc(sizeof(MPT_ADAPTER), GFP_ATOMIC);

Why on earth is that using GFP_ATOMIC?  This function later goes on to
create procfs files and such things.




y'know, we could have a debug option which will spit warnings if someone
does a !__GFP_WAIT allocation while !in_atomic() (only works if
CONFIG_PREEMPT).  

But please, make it depend on !CONFIG_AKPM.  I shudder to think about all
the stuff it would pick up.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm] Introduce strtol_check_range()

2007-08-01 Thread Satyam Sharma

On Thu, 2 Aug 2007, Satyam Sharma wrote:

> On Wed, 1 Aug 2007, Alexey Dobriyan wrote:
> 
> > On Tue, Jul 31, 2007 at 10:04:10PM +0530, Satyam Sharma wrote:
> > > Callers (especially "store" functions for sysfs or configfs attributes)
> > > that want to convert an input string to a number may often also want to
> > > check for simple input sanity or allowable range. strtol10_check_range()
> > > of netconsole does this, so extract it out into lib/vsprintf.c, make it
> > > generic w.r.t. base, and export it to the rest of the kernel and modules.
> > [...]
> > 
> > Please, copy strtonum() from BSD instead. Nobody needs another
> > home-grown converter.
> 
> BSD's strtonum(3) is a detestful, horrible shame.
> 
> The strtol_check_range() I implemented here does _all_ that strtonum()
> does, plus is generic w.r.t. base, and minus the tasteless "errstr"
> argument.
> 
> Tell me, how does that "errstr" ever make sense? We _anyway_ return
> errors (-EINVAL or -ERANGE) if any of those cases show up. And
> _because_ we use negative numbers to return errors, we can't use this
> function to convert negative inputs anyway ... an appropriate error
> message can always be outputted by the caller itself. [ hence the
> two WARN_ON's I added here ]

Glargh, lemme take that back :-p

Wait, it takes in an const char ** argument, and then the whole error
return checking is based on the (const char *) *arg, and not the
return value (-EINVAL, -ERANGE) itself. And it appears, we set the
const char **errstr from inside the function only if the caller didn't
explicitly give us a NULL itself ... hmm, it all does make sense, plus
allows to convert negative inputs as well -- now that the error return
checking in the callsite won't be based on the return value anyway
(but strtonum(3) is still a bit daft, in that it "pretends" to return
error values in ret even though the caller can't use that to test if
an error actually occurred).

[ Thankfully there won't be any BSD lovers around here, so probably I'll
  be able to get out of this clean :-) So okay, I'll port it over ... ]

Thanks,
Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][RFC] unbreak generic futex_atomic_cmpxchg_inatomic() on UP

2007-08-01 Thread Mikael Pettersson

On Thu, 2 Aug 2007 01:49:02 +0200, Lennert Buytenhek wrote:
> On Thu, Aug 02, 2007 at 01:00:21AM +0200, Mikael Pettersson wrote:
> 
> > @@ -52,7 +53,34 @@ futex_atomic_op_inuser (int encoded_op, 
> >  static inline int
> >  futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval)
> >  {
> > +#ifdef CONFIG_SMP
> > return -ENOSYS;
> > +#else
> 
> Since the callers of futex_atomic_cmpxchg_inatomic() don't really
> seem prepared to deal with -ENOSYS (e.g. the handle_futex_death()
> infinite loop when it gets -ENOSYS), it seems better never to
> return -ENOSYS from this function at all.
> 
> What if you just stick an #error in here in the SMP case?

The problem with #error is that it would cause compile-time
regressions. I assume that e.g. alpha supports building SMP
kernels, but #error would prevent that.

Thus I opted to fix the UP case while leaving the SMP case
unchanged. Actually I think the SMP case should be a BUG()
rather than -ENOSYS, but that's a different issue from the
UP case which I really do want to see fixed.

/Mikael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH -mm] linux-audit list is subscribers-only

2007-08-01 Thread Gabriel C

Signed-off-by: Gabriel Craciunescu <[EMAIL PROTECTED]>

---

--- linux-2.6.23-rc1-mm/MAINTAINERS.orig2007-08-02 01:51:40.0 
+0200
+++ linux-2.6.23-rc1-mm/MAINTAINERS 2007-08-02 01:52:17.0 +0200
@@ -672,7 +672,7 @@ S:  Maintained
 AUDIT SUBSYSTEM
 P: David Woodhouse
 M: [EMAIL PROTECTED]
-L: [EMAIL PROTECTED]
+L: [EMAIL PROTECTED] (subscribers-only)
 W: http://people.redhat.com/sgrubb/audit/
 T: git kernel.org:/pub/scm/linux/kernel/git/dwmw2/audit-2.6.git
 S: Maintained

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: drbd 8.0.2/3 doesn't load under kernel 2.6.21

2007-08-01 Thread Adrian Bunk

On Wed, Aug 01, 2007 at 07:02:14PM -0400, Maurice Volaski wrote:
> First, did you confirm this behavior? Can you please explain that? How 
> could they possibly interact with one another?

It's obvious when looking at the source code that both modules you are
trying to use are buggy, and the sum of the bugs in both modules is the 
drbd breakage you observe.

30% of the guilt go to the drbd developers for doing the following:

#ifdef NETLINK_ROUTE6
/* pre 2.6.16 */
err = cn_init();
if(err) return err;
#endif

The author wanted to check for pre-2.6.14 when the connector code was 
added to the kernel, not for pre-2.6.16 as the comment implies or 
pre-2.6.13 as the code does.

Or he wanted to check whether it's a recent kernel and the connector 
code is compiled into the kernel.

70% of the guilt go to the web100 developers for shipping the following 
to their users:

--- linux-2.6-web100/include/linux/netlink.h19 Jul 2007 17:49:17 -  
1.1.1.16
+++ linux-2.6-web100/include/linux/netlink.h19 Jul 2007 18:11:01 -  
1.17
@@ -14,6 +14,7 @@
 #define NETLINK_SELINUX7   /* SELinux event notifications 
*/
 #define NETLINK_ISCSI  8   /* Open-iSCSI */
 #define NETLINK_AUDIT  9   /* auditing */
+#define NETLINK_ROUTE6 11  /* af_inet6 route comm channel */
 #define NETLINK_FIB_LOOKUP 10  
 #define NETLINK_CONNECTOR  11
 #define NETLINK_NETFILTER  12  /* netfilter subsystem */

That's not only buggy but also not used by web100.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Fix two potential mem leaks in MPT Fusion (mpt_attach())

2007-08-01 Thread Jesper Juhl

Greetings & Salutations,

The Coverity checker spotted two potential memory leaks in 
drivers/message/fusion/mptbase.c::mpt_attach().

There are two returns that may leak the storage allocated for 
'ioc' (sizeof(MPT_ADAPTER) bytes).
A simple fix would be to simply add two kfree() calls before 
the return statements, but a better fix (that this patch 
implements) is to reorder the code so that if we hit the first 
return condition we don't have to do the allocation at all and 
then just add a kfree() call for the second case.

Please consider applying.  Patch has been compile tested only.


Signed-off-by: Jesper Juhl <[EMAIL PROTECTED]>
---

 drivers/message/fusion/mptbase.c |   13 +++--
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c
index e866dac..f9bb705 100644
--- a/drivers/message/fusion/mptbase.c
+++ b/drivers/message/fusion/mptbase.c
@@ -1393,18 +1393,18 @@ mpt_attach(struct pci_dev *pdev, const struct 
pci_device_id *id)
struct proc_dir_entry *dent, *ent;
 #endif
 
+   if (mpt_debug_level)
+   printk(KERN_INFO MYNAM ": mpt_debug_level=%xh\n", 
mpt_debug_level);
+
+   if (pci_enable_device(pdev))
+   return r;
+
ioc = kzalloc(sizeof(MPT_ADAPTER), GFP_ATOMIC);
if (ioc == NULL) {
printk(KERN_ERR MYNAM ": ERROR - Insufficient memory to add 
adapter!\n");
return -ENOMEM;
}
-
ioc->debug_level = mpt_debug_level;
-   if (mpt_debug_level)
-   printk(KERN_INFO MYNAM ": mpt_debug_level=%xh\n", 
mpt_debug_level);
-
-   if (pci_enable_device(pdev))
-   return r;
 
dinitprintk(ioc, printk(KERN_WARNING MYNAM ": mpt_adapter_install\n"));
 
@@ -1413,6 +1413,7 @@ mpt_attach(struct pci_dev *pdev, const struct 
pci_device_id *id)
": 64 BIT PCI BUS DMA ADDRESSING SUPPORTED\n"));
} else if (pci_set_dma_mask(pdev, DMA_32BIT_MASK)) {
printk(KERN_WARNING MYNAM ": 32 BIT PCI BUS DMA ADDRESSING NOT 
SUPPORTED\n");
+   kfree(ioc);
return r;
}
 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: cpufreq.c:(.text+0xaf178): undefined reference to `cpufreq_gov_performance'

2007-08-01 Thread Andrew Morton

On Wed, 1 Aug 2007 16:31:46 -0700
"Miles Lane" <[EMAIL PROTECTED]> wrote:

>   LD  .tmp_vmlinux1
> drivers/built-in.o: In function `__cpufreq_governor':
> cpufreq.c:(.text+0xaf178): undefined reference to `cpufreq_gov_performance'
> cpufreq.c:(.text+0xaf18a): undefined reference to `cpufreq_gov_performance'
> make: *** [.tmp_vmlinux1] Error 1

One for Thomas, I expect.

> # Automatically generated make config: don't edit
> # Linux kernel version: 2.6.23-rc1-mm2
> # Wed Aug  1 15:46:12 2007


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] RT: Add priority-queuing and priority-inheritance to workqueue infrastructure

2007-08-01 Thread Gregory Haskins

On Thu, 2007-08-02 at 02:22 +0400, Oleg Nesterov wrote:

> 
> No.
> 
> > However, IIUC the point of flush_workqueue() is a barrier only relative
> > to your own submissions, correct?.  E.g. to make sure *your* requests
> > are finished, not necessarily the entire queue.
> 
> No,

You sure are a confident one ;)

> 
> > If flush_workqueue is supposed to behave differently than I describe,
> > then I agree its broken even in my original patch.
> 
> The comment near flush_workqueue() says:
> 
>   * We sleep until all works which were queued on entry have been handled,
>   * but we are not livelocked by new incoming ones.

Dude, of *course* says that.  It would be completely illogical for it to
say otherwise with the linear priority queue that mainline has.  Since
we are changing things here you have to read between the lines and ask
yourself "what is the intention of this barrier logic?".  Generally
speaking, the point of a barrier is to flush relevant work from your own
context, sometimes at the granularity of flushing everyone elses work
inadvertently if the flush mechanism isn't fine grained enough.  But
that is a side-effect, not a requirement.

So now my turn:

No. :P

But in all seriousness, let me ask you this:  Why do you need a barrier
that flushes *all* work instead of just *your* work.  Do you really
care?  If you do, could we adapt the API to support the notion of
"flush() and "flush_all".  Could we stay with one API call and make it
flush all work again and you are happy?

To be honest, I think you have made me realize there is actually a
legitimate problem w.r.t. what I mentioned earlier (unguarded local
PI-boost messing things up), and its my bad.  I originally wrote this
patch for a different RT subsystem which used an entirely different
barrier mechanism and therefore didn't have this problem(*).  I think it
just didn't translate in the port to workqueues directly, and now we
need to address it.

Even though I disagree with you on the semantics of flush_workqueue, the
fact that we don't protect against a local PI-boost means the current
mechanism isn't safe (and thank you for banging that home).  However,
you seem to have objections to the overall change in general aside from
this bug, and there we can agree to disagree.

(*)Basically, the signaling mechanisms in the original design were
tightly coupled to the work-units and therefore there was no
relationship between disparate items in the queue such as there is in
the workqueue subsystem.  

> 
> > > > 2) The priority of the workqueue task would be temporarily elevated to
> > > > RT99 so that the currently dispatched task will complete at the same
> > > > priority as the waiter.
> > > 
> > > _Which_ waiter?
> > 
> > The RT99 task that submitted the request.
> 
> Now, again, why do you think this task should wait?

I don't think it *should* wait.  It *will* wait and we don't want that.
And without PI-boost, it could wait indefinitely.  I think the detail
you are missing is that the RT kernel introduces some new workqueue APIs
that allow for "RPC" like behavior.  E.g. they are like
"smp_call_function()", but instead of using an IPI, it uses workqueues
to dispatch work to other CPUs.  I could go on and on about why this is,
but hopefully you just immediately understand that this is a *good*
thing to do, especially in RT.

So now, we are enhancing that RPC mechanism to be RT aware with this
proposed changeset so it A) priority-queues, and B) prevents inversion.
I hope that helps to clear it up.

Originally I had proposed this RPC mechanism to be a separate subsystem
from workqueues.  But it involved new kthreads, rt-queuing, and PI.  It
was sensibly suggested in review that the kthread work was redundant
with workqueues, but that the rt/pi stuff had general merit.  So it was
suggested that we port the rt/pi concepts to workqueues and base the
work on that.  So here we are ;)

> 
> > >  I can't understand at all why work_struct should "inherit"
> > > the priority of the task which queued it. 
> > 
> > Think about it:  If you are an RT99 task and you have work to do,
> > shouldn't *all* of the work you need be done at RT99 if possible. 
> 
> No, I don't think so. Quite opposite,

Ouch, there's that confidence again...

>  I think sometimes it makes
> sense to "offload" some low-priority work from RT99 to workqueue
> thread exactly because it has no rt policy.

The operative word in your statement being "sometimes"?  I could flip
your argument right around on you and say "sometimes we want to use
workqueues to remotely represent our high priority butts somewhere
else" ;)  Perhaps that is the key to compromise.  Perhaps the API needs
to be adjusted to deal with the fact that sometimes you want to inherit
priority, sometimes you don't.

> 
> And what about queue_work() from irq? Why should that work take a
> "random" priority?

Yes, you raise a legitimate point here.  In RT, the irq is a kthread
with an RT priority so it would

Re: [PATCH -mm 3/3] Freezer: Measure freezing time

2007-08-01 Thread Andrew Morton

On Wed, 1 Aug 2007 23:36:39 +0200
"Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:

> + do_gettimeofday();
> + elapsed_csecs64 = timeval_to_ns() - timeval_to_ns();
> + do_div(elapsed_csecs64, NSEC_PER_SEC / 100);
> + elapsed_csecs = elapsed_csecs64;

I'd have thought that we had enough timeval library code by now to
not need to open-code things like this.

No, it seems that we don't.  So people keep on open-coding the same
thing, or inventing private code which shouldn't be.

What the hell is all that stuff doing in there?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][RFC] unbreak generic futex_atomic_cmpxchg_inatomic() on UP

2007-08-01 Thread Lennert Buytenhek

On Thu, Aug 02, 2007 at 01:00:21AM +0200, Mikael Pettersson wrote:

> @@ -52,7 +53,34 @@ futex_atomic_op_inuser (int encoded_op, 
>  static inline int
>  futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval)
>  {
> +#ifdef CONFIG_SMP
>   return -ENOSYS;
> +#else

Since the callers of futex_atomic_cmpxchg_inatomic() don't really
seem prepared to deal with -ENOSYS (e.g. the handle_futex_death()
infinite loop when it gets -ENOSYS), it seems better never to
return -ENOSYS from this function at all.

What if you just stick an #error in here in the SMP case?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm 2/3] Freezer: Use wait queue instead of busy looping

2007-08-01 Thread Andrew Morton

On Wed, 1 Aug 2007 23:32:48 +0200
"Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote:

> +/*
> + * Used to notify try_to_freeze_tasks() that the refrigerator has been 
> entered
> + * by a task.
> + */
> +static int refrigerator_called;

this is rather icky.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SD still better than CFS for 3d ?(was Re: 2.6.23-rc1)

2007-08-01 Thread Kasper Sandberg

On Tue, 2007-07-31 at 10:57 +0200, Ingo Molnar wrote:
> * Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> 
> > On Tue, 2007-07-31 at 01:46 +0200, Kasper Sandberg wrote:
> > 
> > > could perhaps be filesystem related, i have my maildir(extremely 
> > > large) on reiserfs, and /home on xfs. what my mail client will do is 
> > > download mail, spamasassin it(loading database from home), then it 
> > > will put to imap server placing it on reiserfs, and then a "local" 
> > > copy in my home.
> > 
> > Ooh, do you perchance have PREEMPT_BKL=y?
> > 

sorry late response.

nope, i run totally without preemption, i did however test with, it
seemed to not matter in terms of smoothness, but reduced the throughput
slightly.

> > If so, try on another filesystem than reiserfs (or disable 
> > PREEMPT_BKL, but that is obviously the lesser of the two choices).
> > 
> > Ingo traced a 1+ second latency at my end to BKL priority inversion 
> > between tty and reiserfs.
> 
> ah, indeed, that makes quite a bit of sense. Almost all of the Reiser3 
> code runs under the BKL, and the only other major kernel infrastructure 
> that has BKL dependencies is the TTY code. Kasper, as a debugging 
> matter, could you try to move that spamassassin workload off into a 
> non-Reiser3 filesystem and/or disable PREEMPT_BKL? If that makes a 
> noticeable difference (for the better ;) then we can continue figuring 
> out what's happening exactly.

the pricess is as this:
mail client fetches mail
mail client invokes spamasassin
 if spam -> spam
 else filtering
if it matches certain filters, it gets put into my imap server, which is
reiserfs.


>   Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch -mm][Intel-IOMMU] Optimize sg map/unmap calls

2007-08-01 Thread Andrew Morton

On Wed, 1 Aug 2007 13:06:23 -0700
"Keshavamurthy, Anil S" <[EMAIL PROTECTED]> wrote:

> +/* Computes the padding size required, to make the
> + * the start address naturally aligned on its size
> + */
> +static int
> +iova_get_pad_size(int size, unsigned int limit_pfn)
> +{
> + unsigned int pad_size = 0;
> + unsigned int order = ilog2(size);
> +
> + if (order)
> + pad_size = (limit_pfn + 1) % (1 << order);
> +
> + return pad_size;
> +}

This isn't obviously doing the right thing for non-power-of-2 inputs.
ilog2() rounds down...

Please check that this, and all the other ilog2()s which have been added
are doing the right thing if they can be presented with non-power-of-2
inputs?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc1-mm2

2007-08-01 Thread Mel Gorman

On (01/08/07 22:52), Torsten Kaiser didst pronounce:
> On 8/1/07, Andrew Morton <[EMAIL PROTECTED]> wrote:
> > On Wed, 01 Aug 2007 16:30:08 -0400
> > [EMAIL PROTECTED] wrote:
> >
> > > As an aside, it looks like bits of dynticks-for-x86_64 are in 
> > > there.
> > > In particular, x86_64-enable-high-resolution-timers-and-dynticks.patch is 
> > > in
> > > there, adding a menu that depends on GENERIC_CLOCKEVENTS, but then nothing
> > > in the x86_64 tree actually *sets* it.  There's a few other 
> > > dynticks-related
> > > prep patches in there as well.  Does this mean it's back to "coming soon 
> > > to
> > > a CPU near you" status? :)
> >
> > I've lost the plot on that stuff: I'm just leaving things as-is for now,
> > wait for Thomas to return from vacation so we can have another run at it.
> 
> For what its worth: 2.6.22-rc6-mm1 with NO_HZ works for me on an AMD
> SMP system without trouble.
> 
> Next try with 2.6.23-rc1-mm2 and SPARSEMEM:
> Probably the same exception, but this time with Call Trace:
> [0.00] Bootmem setup node 0 -8000
> [0.00] Bootmem setup node 1 8000-00012000
> [0.00] Zone PFN ranges:
> [0.00]   DMA 0 -> 4096
> [0.00]   DMA324096 ->  1048576
> [0.00]   Normal1048576 ->  1179648
> [0.00] Movable zone start PFN for each node
> [0.00] early_node_map[4] active PFN ranges
> [0.00] 0:0 ->  159
> [0.00] 0:  256 ->   524288
> [0.00] 1:   524288 ->   917488
> [0.00] 1:  1048576 ->  1179648
> PANIC: early exception rip 807cddb5 error 2 cr2 e2000310
> [0.00]
> [0.00] Call Trace:
> [0.00]  [] memmap_init_zone+0xb5/0x130
> [0.00]  [] init_currently_empty_zone+0x84/0x110
> [0.00]  [] free_area_init_node+0x393/0x3e0
> [0.00]  [] free_area_init_nodes+0x2da/0x320
> [0.00]  [] paging_init+0x87/0x90
> [0.00]  [] setup_arch+0x355/0x470
> [0.00]  [] start_kernel+0x57/0x330
> [0.00]  [] _sinittext+0x12d/0x140
> [0.00]
> [0.00] RIP memmap_init_zone+0xb5/0x130
> 
> (gdb) list *0x807cddb5
> 0x807cddb5 is in memmap_init_zone (include/linux/list.h:32).
> 27  #define LIST_HEAD(name) \
> 28  struct list_head name = LIST_HEAD_INIT(name)
> 29
> 30  static inline void INIT_LIST_HEAD(struct list_head *list)
> 31  {
> 32  list->next = list;
> 33  list->prev = list;
> 34  }
> 35
> 36  /*
> 
> I will test more tomorrow...

Well That doesn't make a whole pile of sense unless the memory map
is not present. Looking at your boot log, we see this gem

> [0.00] 1:   524288 ->   917488
> [0.00] 1:  1048576 ->  1179648

Node 1 spans a region with a nice little hole in the middle of DMA32. In our
test machines, we wouldn't see a hole like this, at least that I can recall
so it would appear to work on some machines. On SPARSEMEM, sparse_init()
is responsible for allocating memmap for each section. In 2.6.22-rc6-mm1,
it allocated the memory if the section was *valid*. In 2.6.23-rc1-mm1,
it allocates the memory if the section is *present* due to the patch
sparsemem-record-when-a-section-has-a-valid-mem_map.patch[1]. Much later in
the init process, memmap is initialised based on spanned memory, not present
memory so initialisation will init memmap that resides in holes if a zone
spans that area in a node which is the case on this machine.  I think this
is why it kablamos - it's inits memmap that wasn't allocated because it's
not present and the suprise is that it doesn't blow up sooner. Please try
the patch below Torsten, thanks.

[1] yeah, I acked this patch and I had read through it. My bad if the
patch below does fix the problem

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.23-rc1-mm2-clean/mm/sparse.c 
linux-2.6.23-rc1-mm2-present_revert/mm/sparse.c
--- linux-2.6.23-rc1-mm2-clean/mm/sparse.c  2007-08-01 10:09:39.0 
+0100
+++ linux-2.6.23-rc1-mm2-present_revert/mm/sparse.c 2007-08-02 
00:27:00.0 +0100
@@ -483,7 +483,7 @@ void __init sparse_init(void)
unsigned long *usemap;
 
for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
-   if (!present_section_nr(pnum))
+   if (!valid_section_nr(pnum))
continue;
 
map = sparse_early_mem_map_alloc(pnum);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -mm] Introduce strtol_check_range()

2007-08-01 Thread Satyam Sharma

Hi Alexey,

On Wed, 1 Aug 2007, Alexey Dobriyan wrote:

> On Tue, Jul 31, 2007 at 10:04:10PM +0530, Satyam Sharma wrote:
> > Callers (especially "store" functions for sysfs or configfs attributes)
> > that want to convert an input string to a number may often also want to
> > check for simple input sanity or allowable range. strtol10_check_range()
> > of netconsole does this, so extract it out into lib/vsprintf.c, make it
> > generic w.r.t. base, and export it to the rest of the kernel and modules.
> 
> > --- a/drivers/net/netconsole.c
> > +++ b/drivers/net/netconsole.c
> > @@ -335,9 +307,11 @@ static ssize_t store_enabled(struct netconsole_target 
> > *nt,
> > int err;
> > long enabled;
> >  
> > -   enabled = strtol10_check_range(buf, 0, 1);
> > -   if (enabled < 0)
> > +   enabled = strtol_check_range(buf, 0, 1, 10);
> > +   if (enabled < 0) {
> > +   printk(KERN_ERR "netconsole: invalid input\n");
> > return enabled;
> > +   }
> 
> Please, copy strtonum() from BSD instead. Nobody needs another
> home-grown converter.

BSD's strtonum(3) is a detestful, horrible shame.

The strtol_check_range() I implemented here does _all_ that strtonum()
does, plus is generic w.r.t. base, and minus the tasteless "errstr"
argument.

Tell me, how does that "errstr" ever make sense? We _anyway_ return
errors (-EINVAL or -ERANGE) if any of those cases show up. And
_because_ we use negative numbers to return errors, we can't use this
function to convert negative inputs anyway ... an appropriate error
message can always be outputted by the caller itself. [ hence the
two WARN_ON's I added here ]

But yeah, considering this implementation is so similar to strtonum(3)
(minus the shortcomings, that is :-) we can probably rename it to
something like kstrtonum() ... and we should probably be returning
different errors for the two invalid conditions, yes.

Thanks,
Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH]hpet and rtc max_user_freq predefine in Kconfig

2007-08-01 Thread [EMAIL PROTECTED]

I think users should be able to set max_user_freq values for rtc and
hpet during kernel configuration. The default value is set to 1024
with this patch.
The default value of 64 is really too small for modern multimedia apps.
Besides, this patch fixes link on intel hpet spec.


Signed-off-by: Milinevsky Dmitry <[EMAIL PROTECTED]>

diff -Nupar linux-2.6.22-cfs-19/drivers/char/hpet.c
linux-2.6.22-cfs-19.niam/drivers/char/hpet.c
--- linux-2.6.22-cfs-19/drivers/char/hpet.c 2007-07-10 20:57:20.0 
+0300
+++ linux-2.6.22-cfs-19.niam/drivers/char/hpet.c2007-08-02
01:34:54.0 +0300
@@ -44,9 +44,9 @@
 /*
  * The High Precision Event Timer driver.
  * This driver is closely modelled after the rtc.c driver.
- * http://www.intel.com/hardwaredesign/hpetspec.htm
+ * http://www.intel.com/technology/architecture/hpetspec.htm
  */
-#defineHPET_USER_FREQ  (64)
+#defineHPET_USER_FREQ  (CONFIG_HPET_MAX_USER_FREQ)
 #defineHPET_DRIFT  (500)

 #define HPET_RANGE_SIZE1024/* from HPET spec */
diff -Nupar linux-2.6.22-cfs-19/drivers/char/Kconfig
linux-2.6.22-cfs-19.niam/drivers/char/Kconfig
--- linux-2.6.22-cfs-19/drivers/char/Kconfig2007-07-10 20:57:15.0 
+0300
+++ linux-2.6.22-cfs-19.niam/drivers/char/Kconfig   2007-08-02
01:33:33.0 +0300
@@ -791,6 +791,13 @@ config RTC
  To compile this driver as a module, choose M here: the
  module will be called rtc.

+config RTC_MAX_USER_FREQ
+   int "User interrupt frequency"
+   depends on RTC
+   default "1024"
+   help
+ Default value of user interrupt 
frequency(/proc/sys/dev/rtc/max-user-freq).
+
 config SGI_DS1286
tristate "SGI DS1286 RTC support"
depends on SGI_IP22
@@ -1022,6 +1029,13 @@ config HPET
  open selects one of the timers supported by the HPET.  The timers are
  non-periodic and/or periodic.

+config HPET_MAX_USER_FREQ
+   int "User interrupt frequency"
+   depends on HPET
+   default "1024"
+   help
+ Default value of user interrupt 
frequency(/proc/sys/dev/hpet/max-user-freq).
+
 config HPET_RTC_IRQ
bool "HPET Control RTC IRQ" if !HPET_EMULATE_RTC
default n
diff -Nupar linux-2.6.22-cfs-19/drivers/char/rtc.c
linux-2.6.22-cfs-19.niam/drivers/char/rtc.c
--- linux-2.6.22-cfs-19/drivers/char/rtc.c  2007-07-10 20:57:15.0 
+0300
+++ linux-2.6.22-cfs-19.niam/drivers/char/rtc.c 2007-08-02
01:33:48.0 +0300
@@ -191,7 +191,7 @@ static int rtc_proc_open(struct inode *i
 static unsigned long rtc_status = 0;   /* bitmapped status byte.   */
 static unsigned long rtc_freq = 0; /* Current periodic IRQ rate*/
 static unsigned long rtc_irq_data = 0; /* our output to the world  */
-static unsigned long rtc_max_user_freq = 64; /* > this, need
CAP_SYS_RESOURCE */
+static unsigned long rtc_max_user_freq = CONFIG_RTC_MAX_USER_FREQ; /*
> this, need CAP_SYS_RESOURCE */

 #ifdef RTC_IRQ
 /*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] scripts/ver_linux : correct printing of binutils version

2007-08-01 Thread Jesper Juhl

Hi,

Currently scripts/ver_linux prints "Binutils" or other random 
information for the version number in the "binutils" output line 
on some distributions. This patch corrects that.

When I initially submitted a patch to correct that, I was not aware 
that the output from "ld -v" could differ as much as it turned out 
it can, so my original fix turned out to not cover all bases.
This patch works correctly with all the different "ld -v" output 
that people posted in replys to my first patch, so it should be a 
clear win over what we have currently.

Please apply.


Signed-off-by: Jesper Juhl <[EMAIL PROTECTED]>
---

 scripts/ver_linux |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/scripts/ver_linux b/scripts/ver_linux
index 8f8df93..27a5a21 100755
--- a/scripts/ver_linux
+++ b/scripts/ver_linux
@@ -21,9 +21,7 @@ gcc --version 2>&1| grep gcc | awk \
 make --version 2>&1 | awk -F, '{print $1}' | awk \
   '/GNU Make/{print "Gnu make  ",$NF}'
 
-ld -v | awk -F\) '{print $1}' | awk \
-'/BFD/{print "binutils  ",$NF} \
-/^GNU/{print "binutils  ",$4}'
+echo "binutils   $(ld -v | egrep -o '[0-9]+\.[0-9\.]+')"
 
 echo -n "util-linux "
 fdformat --version | awk '{print $NF}' | sed -e s/^util-linux-// -e s/\)$//




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22: oops in sbp2_remove_device

2007-08-01 Thread Olaf Hering

On Wed, Aug 01, Stefan Richter wrote:

> Revert commit 0555659d63c285ceb7ead3115532e1b71b0f27a7 from 2.6.22-rc1.
> The dma_set_mask call somehow failed on a PowerMac G5, PPC64:
> http://lkml.org/lkml/2007/8/1/344

This change fixes the oops, and I can access the drive again.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: smaller kernel with no real time futexes

2007-08-01 Thread Jakub Jelinek

On Wed, Aug 01, 2007 at 09:24:34PM +0200, Andi Kleen wrote:
> Adrian,
> 
> You said earlier you're looking at smaller allnoconfig kernels.
> One thing I noticed recently that realtime pi futexes are always
> enabled and that pulls in a lot of other code (like the plists) 
> 
> Userland needs to handle them not being available anyways for older
> kernels.
> 
> Might be worth looking into turning that into a CONFIG.

That's a very bad idea.  glibc configured for 2.6.18 and higher kernels
assumes PI futexes are present.

Jakub
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: drbd 8.0.2/3 doesn't load under kernel 2.6.21

2007-08-01 Thread Maurice Volaski

First, did you confirm this behavior? Can you please explain that? 
How could they possibly interact with one another?




On Wed, Aug 01, 2007 at 04:21:29PM -0400, Maurice Volaski wrote:

 I'm making an assumption that depmod is somehow to blame and have logged
 this as a kernel bug, http://bugzilla.kernel.org/show_bug.cgi?id=8829


depmod is working fine.

It's the interaction between your two patches that breaks it for you.


 It turns out I was adding the web100 patch (http://www.web100.org) to
 the 2.6.21 kernel and that's what causes the symbol resolving problem
 below. Adding the corresponding version of the web100 patch to the
 2.6.20 kernel makes this problem appear there as well. On fresh
 versions of the kernel, this problem does not occur. At the moment,
 it's not possible to have a current kernel that contains both drbd
 and web100.


 On a 64-bit Gentoo system with Gentoo's 2.6.21 kernel, drbd 8.0.2/3
 complains when I try to load the module:

 [  134.141363] drbd: Unknown symbol cn_fini
 [  134.141399] drbd: Unknown symbol cn_init

 It works fine when I compile it and load in the previous kernel
 version, 2.6.20 and the symbols are present in the map file

 ./System.map-genkernel-x86_64-2.6.21-gentoo-r2:802935aa t
 cn_fini
 ./System.map-genkernel-x86_64-2.6.21-gentoo-r2:8029362a t
 cn_init

 I am c'cing the kernel mailing list because this appears to be a
 problem with how any module accesses symbols in the kernel, not just

  >drbd. Source was compiled with Gentoo gcc-4.1.2, glibc-2.5-r3


cu
Adrian


--

Maurice Volaski, [EMAIL PROTECTED]
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH][RFC] unbreak generic futex_atomic_cmpxchg_inatomic() on UP

2007-08-01 Thread Mikael Pettersson

[Resend. I messed up the To: line the first time]

As has been reported recently by Lennert Buytenhek, robust futexes
are broken on ARM:

>If you're also running into glibc's tst-robust1 test suite test
>locking up your ARM machine, you're probably running into the fact
>that asm-arm/futex.h includes asm-generic/futex.h, and
>asm-generic/futex.h defines futex_atomic_cmpxchg_inatomic() to
>return -ENOSYS.  This causes handle_futex_death() to loop forever.

I can confirm this statement: building glibc-2.4 with NPTL on
ARM hangs the kernel when the test suite reaches tst-robust1.

The problem is that kernel/futex.c expects futex_atomic_cmpxchg_inatomic()
to return -EFAULT or the new value. It doesn't expect -ENOSYS at all, and
generally -ENOSYS causes the futex code to loop, hanging the kernel.

The higher-end archs (x86, sparc64, ppc64, etc) provide fully-functional
asm/futex.h implementations, but a number of archs (alpha, arm, arm26,
avr32, blackfin, cris, h8300, m32r, m68k, mk68knommu, sh64, sparc, um,
v850, and xtensa) use asm-generic/futex.h, which makes robust futexes
horribly broken on them. There have also been reports recently that PI
futexes are broken due to the generic futex_atomic_cmpxchg_inatomic()
just being an -ENOSYS stub.

The patch below implements the generic futex_atomic_cmpxchg_inatomic() in
terms of __copy_{from,to}_user_inatomic() and preempt_{disable,enable}().
It obviously doesn't support SMP, but UP-only support should go a long
way for users of the affected archs.

I'm using this patch now and it has allowed me to build and use glibc-2.4
with NPTL on ARM (glibc-2.4-11.src.rpm from FC5 + ARM fixes).
(Finally I can ditch LinuxThreads :->)

Comments?

/Mikael

--- linux-2.6.22/include/asm-generic/futex.h.~1~2007-02-04 
19:44:54.0 +0100
+++ linux-2.6.22/include/asm-generic/futex.h2007-08-01 19:03:43.0 
+0200
@@ -4,6 +4,7 @@
 #ifdef __KERNEL__
 
 #include 
+#include 
 #include 
 #include 
 
@@ -52,7 +53,34 @@ futex_atomic_op_inuser (int encoded_op, 
 static inline int
 futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval)
 {
+#ifdef CONFIG_SMP
return -ENOSYS;
+#else
+   int curval, ret;
+
+   if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int)))
+   return -EFAULT;
+
+   preempt_disable();
+
+   ret = -EFAULT;
+   if (__copy_from_user_inatomic(, uaddr, sizeof(int)))
+   goto out;
+
+   ret = curval;
+   if (curval != oldval)
+   goto out;
+
+   ret = -EFAULT;
+   if (__copy_to_user_inatomic(uaddr, , sizeof(int)))
+   goto out;
+
+   ret = newval;
+
+ out:
+   preempt_enable();
+   return ret;
+#endif
 }
 
 #endif
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Deprecate a.out ELF interpreters

2007-08-01 Thread Andi Kleen

On Thursday 02 August 2007 00:14:48 Theodore Tso wrote:
> Do you mean deprecate a.out interpreters?

No, just a.out interpreters for ELF binaries.

> I could imagine that there might be some people running some very old
> statically linked programs from a decade or so ago, but I agree they
> are pretty small in number.

Nothing would change for them.

The only thing that would change is that if someone has a a.out system
with ELF executables then they would need to update their ELF ld.so to ELF.
Dynamically linked a.out executables should also still run because they
use a different ld.so.

> Is the fs/binfmt_aout.c causing problems? 
> It's only 562 lines of code...

I'm not concerned about binfmt_aout; just about the special case a.out ld.so 
code in binfmt_elf.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [rfc] balance-on-fork NUMA placement

2007-08-01 Thread Martin Bligh




This topic seems to come up periodically every since we first introduced
the NUMA scheduler, and every time we decide it's a bad idea. What's
changed? What workloads does this improve (aside from some artificial
benchmark like stream)?

To repeat the conclusions of last time ... the primary problem is that
99% of the time, we exec after we fork, and it makes that fork/exec
cycle slower, not faster, so exec is generally a much better time to do
this. There's no good predictor of whether we'll exec after fork, unless
one has magically appeared since late 2.5.x ?



As Nick points out, one reason to balance on fork() rather than exec()
is that with balance on exec you already have the new task's kernel
structs allocated on the "wrong" node.  However, as you point out, this
slows down the fork/exec cycle.  This is especially noticeable on larger
node-count systems in, e.g., shell scripts that spawn a lot of short
lived child processes.  "Back in the day", we got bitten by this on the
Alpha EV7 [a.k.a. Marvel] platform with just ~64 nodes--small compared
to, say, the current Altix platform.  


On the other hand, if you're launching a few larger, long-lived
applications with any significant %-age of system time, you might want
to consider spreading them out across nodes and having their warmer
kernel data structures close to them.  A dilemma.

Altho' I was no longer working on this platform when this issue came up,
I believe that the kernel developers came up with something along these
lines:

+ define a "credit" member of the "task" struct, initialized to, say,
zero.

+ when "credit" is zero, or below some threshold, balance on fork--i.e.,
spread out the load--otherwise fork "locally" and decrement credit
[maybe not < 0].

+ when reaping dead children, if the poor thing's cpu utilization is
below some threshold, give the parent some credit.  [blood money?]

And so forth.  Initial forks will balance.  If the children refuse to
die, forks will continue to balance.  If the parent starts seeing short
lived children, fork()s will eventually start to stay local.  


Fork without exec is much more rare than without. Optimising for
the uncommon case is the Wrong Thing to Do (tm). What we decided
the last time(s) this came up was to allow userspace to pass
a hint in if they wanted to fork and not exec.


I believe that this solved the pathological behavior we were seeing with
shell scripts taking way longer on the larger, supposedly more powerful,
platforms.

Of course, that OS could migrate the equivalent of task structs and
kernel stack [the old Unix user struct that was traditionally swappable,
so fairly easy to migrate].  On Linux, all bets are off, once the
scheduler starts migrating tasks away from the node that contains their
task struct, ...  [Remember Eric Focht's "NUMA Affine Scheduler" patch
with it's "home node"?]


Task migration doesn't work well at all without userspace hints.
SGI tried for ages (with IRIX) and failed. There's long discussions
of all of these things back in the days when we merged the original
NUMA scheduler in late 2.5 ...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: LinuxPPS & spinlocks

2007-08-01 Thread Satyam Sharma

Hi,

On Wed, 1 Aug 2007, Christopher Hoover wrote:

> Satyam Sharma  infradead.org> writes:
> > On Mon, 30 Jul 2007, Rodolfo Giometti wrote:
> > 
> > > On Mon, Jul 30, 2007 at 10:33:35AM +0530, Satyam Sharma wrote:
> > > Currently the RFC says to you that you should open the serial port:
> > > 
> > >   fd = open("/dev/ttyS0", ...);
> > 
> > No, it does *NOT*. All it says is:
> > 
> > The time_pps_create() is used to convert an already-open UNIX file
> > descriptor, for an appropriate special file, into a PPS handle.
> > 
> > See? What I said is precisely the implementation the RFC envisages
> > (and the only sane way to implement it too).
> 
> If we were totally rigurous about representing each device as a device node, 
> your solution would be fine.  But we don't.

Of course.

> The clocksource model (/sys/devices/system/clocksource) is a better way to 
> go.  One sysfs file is used to enumerate the possible sources and another is 
> used to read or set the current source.   No new system calls; no new ioctls.

Oh, not introducing any syscalls _at all_ would be fine, too. But the
RFC does /require/ an implementation to have them. I was only mentioning
the kind of implementation the RFC had in mind. But there are other ways
to achieve the same end goal, and yes, probably it's better to avoid
introducing syscalls in the first place and think of other mechanisms.

[ It's not that we're talking of IPsec or IPv6 or something here --
  so RFC-compliance isn't overly important. But the final result
  needs to be good, secure and well-designed, still. ]

Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] sched: yield debugging

2007-08-01 Thread Tim Chen

On Tue, 2007-07-31 at 22:33 +0200, Ingo Molnar wrote:

> ok, good! Could you try the updated debug patch below? I've done two 
> changes: made '1' the default, and added the 
> /proc/sys/kernel/sched_yield_granularity_ns tunable. (available if 
> CONFIG_SCHED_DEBUG=y)
> 
> Could you try to change the yield-granularity tunable and see which 
> value gives the best performance? A value of '10' should in theory 
> give the current (80% degraded) volanomark performance, the default 
> value should give the above '20% down' result. The question is, is '20% 
> down' the best we can get out of it? Does larger/smaller 
> yield-granularity help perhaps? You can change it to any value between 
> 100 usecs and 1 second.
> 

Turning up the granuality helped.  Here's the data I got
for Volanomark performance relative to 2.6.22
Granuality
10  (max)   9% down
 8  8% down
  8000  13% down
   800  20% down
10  56% down

Tim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Processes spinning forever, apparently in lock_timer_base()?

2007-08-01 Thread Chuck Ebbert

Looks like the same problem with spinlock unfairness we've seen
elsewhere: it seems to be looping here? Or is everyone stuck
just waiting for writeout?

lock_timer_base():
for (;;) {
tvec_base_t *prelock_base = timer->base;
base = tbase_get_base(prelock_base);
if (likely(base != NULL)) {
spin_lock_irqsave(>lock, *flags);
if (likely(prelock_base == timer->base))
return base;
/* The timer has migrated to another CPU */
spin_unlock_irqrestore(>lock, *flags);
}
cpu_relax();
}

The problem goes away completely if filesystem are mounted
*without* noatime. Has happened in 2.6.20 through 2.6.22...

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=249563

Part of sysrq-t listing:

mysqldD 17c0  2196 23162   1562
   e383fcb8 0082 61650954 17c0 e383fc9c  c0407208 e383f000 
   a12b0434 4d1d c6ed2c00 c6ed2d9c c200fa80  c0724640 f6c60540 
   c4ff3c70 0508 0286 c042ffcb e383fcc8 00014926  0286 
Call Trace:
 [] do_IRQ+0xbd/0xd1
 [] lock_timer_base+0x19/0x35
 [] __mod_timer+0x9a/0xa4
 [] schedule_timeout+0x70/0x8f
 [] process_timeout+0x0/0x5
 [] schedule_timeout+0x6b/0x8f
 [] io_schedule_timeout+0x39/0x5d
 [] congestion_wait+0x50/0x64
 [] autoremove_wake_function+0x0/0x35
 [] balance_dirty_pages_ratelimited_nr+0x148/0x193
 [] generic_file_buffered_write+0x4c7/0x5d3


named D 17c0  2024  1454  1
   f722acb0 0082 6165ed96 17c0 c1523e80 c16f0c00 c16f20e0 f722a000 
   a12be87d 4d1d f768ac00 f768ad9c c200fa80   f75bda80 
   c0407208 0508 0286 c042ffcb f722acc0 00020207  0286 
Call Trace:
 [] do_IRQ+0xbd/0xd1
 [] lock_timer_base+0x19/0x35
 [] __mod_timer+0x9a/0xa4
 [] schedule_timeout+0x70/0x8f
 [] process_timeout+0x0/0x5
 [] schedule_timeout+0x6b/0x8f
 [] io_schedule_timeout+0x39/0x5d
 [] congestion_wait+0x50/0x64
 [] autoremove_wake_function+0x0/0x35
 [] balance_dirty_pages_ratelimited_nr+0x148/0x193
 [] generic_file_buffered_write+0x4c7/0x5d3


mysqldD 17c0  2196 23456   1562
   e9293cb8 0082 616692ed 17c0 e9293c9c  e9293cc8 e9293000 
   a12c8dd0 4d1d c3d5ac00 c3d5ad9c c200fa80  c0724640 f6c60540 
   e9293d10 c07e1f00 0286 c042ffcb e9293cc8 0002b57f  0286 
Call Trace:
 [] lock_timer_base+0x19/0x35
 [] __mod_timer+0x9a/0xa4
 [] schedule_timeout+0x70/0x8f
 [] process_timeout+0x0/0x5
 [] schedule_timeout+0x6b/0x8f
 [] io_schedule_timeout+0x39/0x5d
 [] congestion_wait+0x50/0x64
 [] autoremove_wake_function+0x0/0x35
 [] balance_dirty_pages_ratelimited_nr+0x148/0x193
 [] generic_file_buffered_write+0x4c7/0x5d3
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: LinuxPPS & spinlocks

2007-08-01 Thread Christopher Hoover

Satyam Sharma  infradead.org> writes:
> On Mon, 30 Jul 2007, Rodolfo Giometti wrote:
> 
> > On Mon, Jul 30, 2007 at 10:33:35AM +0530, Satyam Sharma wrote:
> > Currently the RFC says to you that you should open the serial port:
> > 
> > fd = open("/dev/ttyS0", ...);
> 
> No, it does *NOT*. All it says is:
> 
> The time_pps_create() is used to convert an already-open UNIX file
> descriptor, for an appropriate special file, into a PPS handle.
> 
> See? What I said is precisely the implementation the RFC envisages
> (and the only sane way to implement it too).

If we were totally rigurous about representing each device as a device node, 
your solution would be fine.  But we don't.

The clocksource model (/sys/devices/system/clocksource) is a better way to 
go.  One sysfs file is used to enumerate the possible sources and another is 
used to read or set the current source.   No new system calls; no new ioctls.

-ch

ch (at) murgatroid (dot) com
ch (at) hpl (dot) hp (dot) com




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: drbd 8.0.2/3 doesn't load under kernel 2.6.21

2007-08-01 Thread Adrian Bunk

On Wed, Aug 01, 2007 at 04:21:29PM -0400, Maurice Volaski wrote:
> I'm making an assumption that depmod is somehow to blame and have logged 
> this as a kernel bug, http://bugzilla.kernel.org/show_bug.cgi?id=8829

depmod is working fine.

It's the interaction between your two patches that breaks it for you.

>>> It turns out I was adding the web100 patch (http://www.web100.org) to
>>> the 2.6.21 kernel and that's what causes the symbol resolving problem
>>> below. Adding the corresponding version of the web100 patch to the
>>> 2.6.20 kernel makes this problem appear there as well. On fresh
>>> versions of the kernel, this problem does not occur. At the moment,
>>> it's not possible to have a current kernel that contains both drbd
>>> and web100.
>>>
 On a 64-bit Gentoo system with Gentoo's 2.6.21 kernel, drbd 8.0.2/3
 complains when I try to load the module:

 [  134.141363] drbd: Unknown symbol cn_fini
 [  134.141399] drbd: Unknown symbol cn_init

 It works fine when I compile it and load in the previous kernel
 version, 2.6.20 and the symbols are present in the map file

 ./System.map-genkernel-x86_64-2.6.21-gentoo-r2:802935aa t 
 cn_fini
 ./System.map-genkernel-x86_64-2.6.21-gentoo-r2:8029362a t 
 cn_init

 I am c'cing the kernel mailing list because this appears to be a
 problem with how any module accesses symbols in the kernel, not just
>>>  >drbd. Source was compiled with Gentoo gcc-4.1.2, glibc-2.5-r3

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: More documentation: system call how-to

2007-08-01 Thread Heiko Carstens

On Wed, Aug 01, 2007 at 02:06:57PM -0400, Ulrich Drepper wrote:

> I've added a few rules I could think of right now.  What should be
> added as well is a rule for 64-bit parameters on 32-bit platforms.  I
> leave this to the s390 people who have the biggest restrictions when
> it comes to this.

David Woodhouse wrote that already. Don't know if there is a patch
pending: http://marc.info/?l=linux-arch=118277150812137=2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: debugfs helper for decimal challenged

2007-08-01 Thread Greg KH

On Wed, Aug 01, 2007 at 05:52:58PM -0400, Robin Getz wrote:
> Greg:
> 
> For those of us who forget that when bits 21 and bit 31 in a hardware 
> register exposed with debugfs, I should see 2149580800 when I cat it (vs
> 0x8020), any objections to providing a hex output interface to the 
> debugfs?
> 
> Since the input side already takes decimal & hex, I don't think this is a big
> change:
> 
> DEFINE_SIMPLE_ATTRIBUTE(fops_x16, debugfs_u16_get, debugfs_u16_set, 
> "0x%04llx\n");
> 
> struct dentry *debugfs_create_x16(const char *name, mode_t mode,
>   struct dentry *parent, u16 *value)
> {
> return debugfs_create_file(name, mode, parent, value, _x16);
> }
> 
> DEFINE_SIMPLE_ATTRIBUTE(fops_x32, debugfs_u32_get, debugfs_u32_set, 
> "0x%08llx\n");
> 
> struct dentry *debugfs_create_x32(const char *name, mode_t mode,
>   struct dentry *parent, u32 *value)
> {
> return debugfs_create_file(name, mode, parent, value, _x32);
> }
> 
> If this is OK - I will send a real patch.

That sounds good to me, feel free to send a real patch.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] RT: Add priority-queuing and priority-inheritance to workqueue infrastructure

2007-08-01 Thread Oleg Nesterov

On 08/01, Gregory Haskins wrote:
>
> On Thu, 2007-08-02 at 01:34 +0400, Oleg Nesterov wrote:
> > On 08/01, Gregory Haskins wrote:
> > >
> > > On Thu, 2007-08-02 at 00:50 +0400, Oleg Nesterov wrote:
> > > > On 08/01, Daniel Walker wrote:
> > > > >
> > > > > It's translating priorities through the work queues, which doesn't 
> > > > > seem
> > > > > to happen with the current implementation. A high priority, say
> > > > > SCHED_FIFO priority 99, task may have to wait for a nice -5 work queue
> > > > > to finish..
> > > > 
> > > > Why should that task wait?
> > > 
> > > I assume "that task" = the RT99 task?  If so, that is precisely the
> > > question.  It shouldn't wait.  ;)  With mainline, it is simply queued
> > > with every other request.  There could be an RT40, and a SCHED_NORMAL in
> > > front of it in the queue that will get processed first.  In addition,
> > > the system could suffer from a priority inversion if some unrelated but
> > > lower priority task (say RT98) was blocking the workqueue thread from
> > > making forward progress on the nice -5 job. 
> > > 
> > > To clarify: when a design utilizes a singlethread per workqueue (such as
> > > in both mainline and this patch), the RT99 will always have to wait
> > > behind any already dispatched jobs.
> > 
> > It is not that "RT99 will always have to wait". But yes, the work_struct
> > queued by RT99 has to wait.
> 
> Agreed.  We are talking only within the scope of workqueues here.
> 
> 
> > > 1) The RT99 task would move ahead in the queue of anything else that was
> > > also scheduled on the workqueue that is < RT99.
> > 
> > this itself is wrong, breaks flush_workqueue() semantics
> 
> Perhaps in Daniels patch.

No.

> However, IIUC the point of flush_workqueue() is a barrier only relative
> to your own submissions, correct?.  E.g. to make sure *your* requests
> are finished, not necessarily the entire queue.

No,

> If flush_workqueue is supposed to behave differently than I describe,
> then I agree its broken even in my original patch.

The comment near flush_workqueue() says:

* We sleep until all works which were queued on entry have been handled,
* but we are not livelocked by new incoming ones.

> > > 2) The priority of the workqueue task would be temporarily elevated to
> > > RT99 so that the currently dispatched task will complete at the same
> > > priority as the waiter.
> > 
> > _Which_ waiter?
> 
> The RT99 task that submitted the request.

Now, again, why do you think this task should wait?

> >  I can't understand at all why work_struct should "inherit"
> > the priority of the task which queued it. 
> 
> Think about it:  If you are an RT99 task and you have work to do,
> shouldn't *all* of the work you need be done at RT99 if possible. 

No, I don't think so. Quite opposite, I think sometimes it makes
sense to "offload" some low-priority work from RT99 to workqueue
thread exactly because it has no rt policy.

And what about queue_work() from irq? Why should that work take a
"random" priority?

> Why
> should something like a measly RT98 task block you from completing your
> work. ;) The fact that you need to do some work via a workqueue (perhaps
> because you need specific cpu routing) is inconsequential, IMHO.

In that case I think it is better to create a special workqueue
and raise its priority.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Fix WARN_ON() on bitfield ops for all other archs

2007-08-01 Thread Heiko Carstens

From: Heiko Carstens <[EMAIL PROTECTED]>

Fixes WARN_ON() on bitfiels ops for all architectures that have
been left out in 8d4fbcfbe0a4bfc73e7f0297c59ae514e1f1436f.

Cc: Alexey Dobriyan <[EMAIL PROTECTED]>
Cc: Herbert Xu <[EMAIL PROTECTED]>
Cc: Paul Mundt <[EMAIL PROTECTED]>
Cc: Haavard Skinnemoen <[EMAIL PROTECTED]>
Cc: Matthew Wilcox <[EMAIL PROTECTED]>
Cc: Kyle McMartin <[EMAIL PROTECTED]>
Cc: Martin Schwidefsky <[EMAIL PROTECTED]>
Signed-off-by: Heiko Carstens <[EMAIL PROTECTED]>
---
 include/asm-avr32/bug.h  |2 +-
 include/asm-parisc/bug.h |2 +-
 include/asm-s390/bug.h   |2 +-
 include/asm-sh/bug.h |2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

Index: linux-2.6/include/asm-avr32/bug.h
===
--- linux-2.6.orig/include/asm-avr32/bug.h
+++ linux-2.6/include/asm-avr32/bug.h
@@ -57,7 +57,7 @@
 
 #define WARN_ON(condition) 
\
({  \
-   typeof(condition) __ret_warn_on = (condition);  \
+   int __ret_warn_on = !!(condition);  \
if (unlikely(__ret_warn_on))\
_BUG_OR_WARN(BUGFLAG_WARNING);  \
unlikely(__ret_warn_on);\
Index: linux-2.6/include/asm-parisc/bug.h
===
--- linux-2.6.orig/include/asm-parisc/bug.h
+++ linux-2.6/include/asm-parisc/bug.h
@@ -74,7 +74,7 @@
 
 
 #define WARN_ON(x) ({  \
-   typeof(x) __ret_warn_on = (x);  \
+   int __ret_warn_on = !!(x);  \
if (__builtin_constant_p(__ret_warn_on)) {  \
if (__ret_warn_on)  \
__WARN();   \
Index: linux-2.6/include/asm-s390/bug.h
===
--- linux-2.6.orig/include/asm-s390/bug.h
+++ linux-2.6/include/asm-s390/bug.h
@@ -50,7 +50,7 @@
 #define BUG()  __EMIT_BUG(0)
 
 #define WARN_ON(x) ({  \
-   typeof(x) __ret_warn_on = (x);  \
+   int __ret_warn_on = !!(x);  \
if (__builtin_constant_p(__ret_warn_on)) {  \
if (__ret_warn_on)  \
__EMIT_BUG(BUGFLAG_WARNING);\
Index: linux-2.6/include/asm-sh/bug.h
===
--- linux-2.6.orig/include/asm-sh/bug.h
+++ linux-2.6/include/asm-sh/bug.h
@@ -61,7 +61,7 @@ do {  \
 } while (0)
 
 #define WARN_ON(x) ({  \
-   typeof(x) __ret_warn_on = (x);  \
+   int __ret_warn_on = !!(x);  \
if (__builtin_constant_p(__ret_warn_on)) {  \
if (__ret_warn_on)  \
__WARN();   \
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VT_PROCESS, VT_LOCKSWITCH capabilities

2007-08-01 Thread Andrew Morton

On Wed, 01 Aug 2007 04:44:32 +0200
Frank Benkstein <[EMAIL PROTECTED]> wrote:

> Frank Benkstein wrote:
> > I wonder why there are different permissions needed for VT_PROCESS
> > (access to the current virtual console) and VT_LOCKSWITCH
> > (CAP_SYS_TTY_CONFIG).
> 
> To be more direct:
> 
> require CAP_SYS_TTY_CONFIG for VT_SETMODE as its essentially the same as
> VT_LOCKSWITCH and said capability is already required there
> 
> diff --git a/drivers/char/vt_ioctl.c b/drivers/char/vt_ioctl.c
> index c6f6f42..7034a68 100644
> --- a/drivers/char/vt_ioctl.c
> +++ b/drivers/char/vt_ioctl.c
> @@ -662,7 +662,7 @@ int vt_ioctl(struct tty_struct *tty, struct file * file,
> {
> struct vt_mode tmp;
> 
> -   if (!perm)
> +   if (!perm || !capable(CAP_SYS_TTY_CONFIG))
> return -EPERM;
> if (copy_from_user(, up, sizeof(struct vt_mode)))
> return -EFAULT;
> 

There's a good risk of breaking stuff with this change.  A quick peek
through http://www.google.com/codesearch shows that.

We need good reasons for making that change, and for handling the
subsequent fallout, getting shouted at by aggrieved users, etc.

It's tricky.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Deprecate a.out ELF interpreters

2007-08-01 Thread Theodore Tso

Do you mean deprecate a.out interpreters?

I could imagine that there might be some people running some very old
statically linked programs from a decade or so ago, but I agree they
are pretty small in number.  Is the fs/binfmt_aout.c causing problems?
It's only 562 lines of code...

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] qla2xxx: allocate enough space for the full PCI descriptor.

2007-08-01 Thread Andrew Vasquez

Signed-off-by: Andrew Vasquez <[EMAIL PROTECTED]>
---

On Thu, 26 Jul 2007, Andrew Vasquez wrote:

> On Thu, 26 Jul 2007, Andrew Patterson wrote:
> 
> > On Thu, 2007-07-26 at 15:36 +0200, Ulrich Windl wrote:
> > > Hi,
> > > 
> > > <6>QLogic Fibre Channel HBA Driver
> > > <6>GSI 49 (level, low) -> CPU 3 (0x0300) vector 51
> > > <6>ACPI: PCI Interrupt :0f:01.0[A] -> GSI 49 (level, low) -> 
IRQ 51
> > > <6>qla2xxx :0f:01.0: Found an ISP2422, irq 51, iobase 
0xc000b004
> > > [...]
> > > <6>qla2xxx :0f:01.0: LOOP UP detected (4 Gbps).
> > > <6>qla2xxx :0f:01.0: Topology - (F_Port), Host Loop address 
0x0
> > > <6>scsi0 : qla2xxx
> > > <6>qla2xxx :0f:01.0:
> > > <4> QLogic Fibre Channel HBA Driver: 8.01.07-k3
> > > <4>  QLogic HP AB378-60001 -
> > > <4>  ISP2422: PCI-X Mode 2 (133 MH4.00.26 [IP]  @ :0f:01.0 
hdma+, host#=0, 
> > > fw=4.00.26 [IP]
> 
> The 33/66/100/133 values refer to the bus-clock speed at which the
> card is operating.  As is seen here (although a bit truncated --
> separate issue, I'll try to see if I can reproduce this on one of my
> HPQ rigs),

Ok, so what's happening here is the buffer passed in (pci_info)
does not have bytes allocated (off by 3).

James, please apply...

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 93c0c7e..acca898 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -1564,7 +1564,7 @@ qla2x00_probe_one(struct pci_dev *pdev, const struct 
pci_device_id *id)
struct Scsi_Host *host;
scsi_qla_host_t *ha;
unsigned long   flags = 0;
-   char pci_info[20];
+   char pci_info[30];
char fw_str[30];
struct scsi_host_template *sht;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 12/26] ext2 white-out support

2007-08-01 Thread Erez Zadok

In message <[EMAIL PROTECTED]>, Dave Kleikamp writes:
> On Wed, 2007-08-01 at 15:33 -0400, Josef Sipek wrote:
> > On Wed, Aug 01, 2007 at 02:10:31PM -0500, Dave Kleikamp wrote:
> > > On Wed, 2007-08-01 at 14:44 -0400, Josef Sipek wrote:
> > > > Now what? How do you rename? Do you rename in the same branch (assuming 
> > > > it
> > > > is rw)?
> > > 
> > > Er, no.  According to Documentation/filesystems/union-mounts.txt, "only
> > > the topmost layer of the mount stack can be altered".
> > 
> > This brings up an very interesting (but painful) question...which makes more
> > sense? Allowing the modifications in only the top-most branch, or any branch
> > (given the user allows it at mount-time)?
> 
> Your examples point out the complexity of trying to allow modifications
> at lower levels.  It seems to me to be simpler (even if recursive copies
> are needed) to leave it as proposed.
[...]

There are three other reasons why Unionfs and our users like to have
multiple writable branches:

1. If only the topmost layer is writable, then every little change tends to
   cause a copyup, which tends to clutter the top layer more quickly.  Some
   of our users didn't like that idea, while others explicitly wanted it --
   so we give them a choice to decide, on a per layer/branch whether it
   should be writable or readonly.

2. Some users unify different packages together.  Imagine you union under
   /union, several installed packages: /X11R6/{bin,man,lib,conf},
   /apache/{bin,man,lib,etc}, and /mysql/{bin,man,lib,etc}, and so on.  If a
   user modifies /union/apache/etc/apache.conf, they sometimes want
   apache.conf to remain in the writable branch it came from, not copied up.
   That way all apache related files are logically left where they came
   from, which makes administration easier.  Again, some users like to have
   multiple writable branches, and some don't -- so in Unionfs we give them
   the choice.  And yes, it does make our implementation more complex.

3. Some people use Unionfs in the scenario described in point #2 above, as a
   poor man's space- and load- distribution system.  Some of our users like
   the idea of controlling how much storage space they give each branch, and
   how much it might grow, and even how much CPU or I/O load might be placed
   on each of the lower filesystems which serve a given branch.  That way
   they worry less about the top-layer's space filling up more quickly than
   expected.  Now Unionfs was never designed to be a load-balancing f/s (we
   have RAIF for that, see ),
   but users seems to always find creative ways to [ab]use one's software in
   ways one never thought of. :-)

BTW, does Union Mounts copyup on meta-data changes (e.g., chmod, chgrp,
etc.)?

Erez.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [ck] Re: Linus 2.6.23-rc1 -- It does not matter who's code gets merged!

2007-08-01 Thread Arjan van de Ven

On Wed, 2007-08-01 at 11:40 -0700, Hua Zhong wrote:
> > > And, from a standpoint of ONGOING, long-term innovation: what matters
> > > is that brilliant, new ideas get rewarded one way or another.
> > 
> > and in this case, the reward is that the idea got used and credit was
> > given
> 
> You mean, when Ingo announced CFS he mentioned Con's name?

and put his name in the code too

> When you said "it does not matter whose code got merged", I have to
> disagree. Sure, for the Linux community as a whole, for Linux itself,
> it may not matter, but for the individuals involved, it does. And I
> think benefits of individuals are as important as benefits of the
> community (or the nation).

I agree it's a nice ego boost to see your code merged.
But... do you care more about your ego boost or about your problem
getting solved? I really want to change this if you say "ego for code
merging"... "ego boost for getting linux improved and being involved in
solving an important problem" is a lot better type of ego boost..

No developer can or should expect that most, or even half of his code to
be merged. Even Linus doesn't get half the code he writes into linux :)

Con did get a whole bunch of stuff merged over the years, and for the
rest he mostly got the problem solved. That's pretty successful

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] RT: Add priority-queuing and priority-inheritance to workqueue infrastructure

2007-08-01 Thread Gregory Haskins

On Thu, 2007-08-02 at 01:34 +0400, Oleg Nesterov wrote:
> On 08/01, Gregory Haskins wrote:
> >
> > On Thu, 2007-08-02 at 00:50 +0400, Oleg Nesterov wrote:
> > > On 08/01, Daniel Walker wrote:
> > > >
> > > > It's translating priorities through the work queues, which doesn't seem
> > > > to happen with the current implementation. A high priority, say
> > > > SCHED_FIFO priority 99, task may have to wait for a nice -5 work queue
> > > > to finish..
> > > 
> > > Why should that task wait?
> > 
> > I assume "that task" = the RT99 task?  If so, that is precisely the
> > question.  It shouldn't wait.  ;)  With mainline, it is simply queued
> > with every other request.  There could be an RT40, and a SCHED_NORMAL in
> > front of it in the queue that will get processed first.  In addition,
> > the system could suffer from a priority inversion if some unrelated but
> > lower priority task (say RT98) was blocking the workqueue thread from
> > making forward progress on the nice -5 job. 
> > 
> > To clarify: when a design utilizes a singlethread per workqueue (such as
> > in both mainline and this patch), the RT99 will always have to wait
> > behind any already dispatched jobs.
> 
> It is not that "RT99 will always have to wait". But yes, the work_struct
> queued by RT99 has to wait.

Agreed.  We are talking only within the scope of workqueues here.

> > 1) The RT99 task would move ahead in the queue of anything else that was
> > also scheduled on the workqueue that is < RT99.
> 
> this itself is wrong, breaks flush_workqueue() semantics

Perhaps in Daniels patch.  I am not familiar enough with plist to be
able to tell you if it has a problem there or not.  However, I think
this works in the original patch I submitted with had a different
queuing mechanism.  Daniels patch was derived from that so he may have
inadvertently picked up something from mine which was no longer true in
the new design.  I'll defer to Daniel for clarification there.

However, IIUC the point of flush_workqueue() is a barrier only relative
to your own submissions, correct?.  E.g. to make sure *your* requests
are finished, not necessarily the entire queue.

If thats true, the flush would be injected at the same priority(*) as
your other pending tasks at the tail of that priority level.  Then you
would still block until all of your tasks complete.

If flush_workqueue is supposed to behave differently than I describe,
then I agree its broken even in my original patch.

(*) We would probably have to protect against somebody else PI-boosting
*us* ;)

> 
> > 2) The priority of the workqueue task would be temporarily elevated to
> > RT99 so that the currently dispatched task will complete at the same
> > priority as the waiter.
> 
> _Which_ waiter?

The RT99 task that submitted the request.

>  I can't understand at all why work_struct should "inherit"
> the priority of the task which queued it. 

Think about it:  If you are an RT99 task and you have work to do,
shouldn't *all* of the work you need be done at RT99 if possible.  Why
should something like a measly RT98 task block you from completing your
work. ;) The fact that you need to do some work via a workqueue (perhaps
because you need specific cpu routing) is inconsequential, IMHO.

Regards,
-Greg

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CS5530 Alsa driver fails

2007-08-01 Thread Alan Cox

> I will give up. I didn't checked code earlier. This driver is using SMM. 
> Probably  
> firmware isn't what it should be, or I have overwritten it when I was 
> flashing 
> Linux. I see in datasheet that it isn't possible to write driver in other 
> way. 
> Sadly sound card is generating SMI only.

The 5530 in native mode only generates SMI. I've always felt however that
if you make the buffers big enough you ought to be able to drive it off
the 1KHz timer tick by polling. Interesting project.

You btw won't have removed the SMM firmware or the box wouldn't boot. The
5530 uses SMM to emulate some of the most basic PC components including
the VGA video. If your box has VSA2 then VSA2 firmware has some kind of
hooks to allow a native sound driver to take over and to reroute the
interrupts without SB emulation. I don't have the docs for VSA2 but the
horribly big natsemi provided audio driver does show how to do it.

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 0/1] extending low-level markers

2007-08-01 Thread nwatkins

Mathieu

I have been working with your Kernel Markers infrastructure now for some
time and have run into an extendability issue.

Essentially I am failing to find a way to extend the current
__trace_mark macro with site-specific context. That is, I would like the
ability to create different 'types' of instrumentation points by bulding
upon the __trace_mark macro. A consumer of this marker could examine
the type of marker, and attach an appropriate callback function /
private data.

I have included a patch which adds a flavor field to the __trace_mark
macro. This simplified example demonstrates the functionality I'm
looking for:

#define __trace_mark(flavor, name, format, args...)


#define marker_flavor_XXX(name, format, args...)
__trace_mark(XXX, name, format, args)

Here a marker of type XXX is build upon the __trace_mark macro. When a
consumer of type XXX finds markers with the XXX flavor appropriate
registration can take place.

Unless I don't fully understand all the use cases of the markers, I
don't see any other way to do this except to encode the 'type'
information in the name of the marker, and require the consumer to parse
the string to determine the type. Restricting the names of the markers
in this way seems like a bad solution.

Any help and feedback is greatly appreciated.

- Noah

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 1/1] Adds flavor field to __mark_marker

2007-08-01 Thread nwatkins

This patch demonstrates an extension field to the
low-level marker functionality, and updates macros to accept
this additional data.


Index: linux-2.6.22-rc6-mm1/include/linux/marker.h
===
--- linux-2.6.22-rc6-mm1.orig/include/linux/marker.h2007-08-01 
15:39:36.0 -0500
+++ linux-2.6.22-rc6-mm1/include/linux/marker.h 2007-08-01 15:39:37.0 
-0500
@@ -31,10 +31,14 @@
const char *args;   /* List of arguments litteraly transformed
 * into a string: "arg1, arg2, arg3".
 */
+   const int flavor;   /* site-defined marker flavor  */
immediate_char_t state; /* Immediate value state. */
marker_probe_func *call;/* Probe handler function pointer */
void *pdata;/* Private probe data */
-} __attribute__((aligned(8)));
+} __attribute__((aligned(32)));
+
+/* Default marker flavor */
+#define MARKER_DEFAULT 0
 
 #ifdef CONFIG_MARKERS
 
@@ -46,7 +50,7 @@
  * not add unwanted padding between the beginning of the section and the
  * structure. Force alignment to the same alignment as the section start.
  */
-#define __trace_mark(generic, name, format, args...)   \
+#define ___trace_mark(generic, flavor, name, format, args...)  \
do {\
static const char __mstrtab_name_##name[]   \
__attribute__((section("__markers_strings")))   \
@@ -60,7 +64,7 @@
static struct __mark_marker __mark_##name   \
__attribute__((section("__markers"))) = \
{ __mstrtab_name_##name, __mstrtab_format_##name,   \
-   __mstrtab_args_##name, { 0 },   \
+   __mstrtab_args_##name, flavor, { 0 },   \
__mark_empty_function, NULL };  \
asm volatile ( "" : : "i" (&__mark_##name));\
__mark_check_format(format, ## args);   \
@@ -81,6 +85,9 @@
}   \
} while (0)
 
+#define __trace_mark(generic, name, format, args...) \
+   ___trace_mark(generic, MARKER_DEFAULT, name, format, ## args)
+
 extern void module_marker_update(struct module *mod);
 #else /* !CONFIG_MARKERS */
 #define __trace_mark(generic, name, format, args...) \
Index: linux-2.6.22-rc6-mm1/include/asm-generic/vmlinux.lds.h
===
--- linux-2.6.22-rc6-mm1.orig/include/asm-generic/vmlinux.lds.h 2007-08-01 
15:39:36.0 -0500
+++ linux-2.6.22-rc6-mm1/include/asm-generic/vmlinux.lds.h  2007-08-01 
15:40:06.0 -0500
@@ -13,7 +13,7 @@
 #define DATA_DATA  \
*(.data)\
*(.data.init.refok) \
-   . = ALIGN(8);   \
+   . = ALIGN(32);  \
VMLINUX_SYMBOL(__start___markers) = .;  \
*(__markers)\
VMLINUX_SYMBOL(__stop___markers) = .;

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[git patches] IDE fixes

2007-08-01 Thread Bartlomiej Zolnierkiewicz


Please pull from:

master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6.git/

to receive the following updates:

 drivers/ide/arm/icside.c   |3 +-
 drivers/ide/ide-tape.c |2 +-
 drivers/ide/pci/alim15x3.c |2 +-
 drivers/ide/pci/cmd64x.c   |4 +-
 drivers/ide/pci/cs5520.c   |2 +-
 drivers/ide/pci/cs5535.c   |   42 +-
 drivers/ide/pci/it8213.c   |   33 ---
 drivers/ide/pci/jmicron.c  |   21 +++
 drivers/ide/pci/piix.c |   17 ++--
 drivers/ide/pci/scc_pata.c |   61 ++-
 drivers/ide/pci/sis5513.c  |1 +
 drivers/ide/pci/slc90e66.c |   15 +--
 drivers/scsi/ide-scsi.c|   10 +++
 13 files changed, 85 insertions(+), 128 deletions(-)


Bartlomiej Zolnierkiewicz (7):
  alim15x3: Correct HP detect
  cs5520: fix PIO auto-tuning in ->ide_dma_check method
  cs5535: PIO fixes
  it8213: PIO fixes (take 2)
  jmicron: PIO fixes
  piix/slc90e66: fix PIO1 handling in ->speedproc method (take 2)
  scc_pata: PIO fixes

David Lamparter (1):
  sis5513: Add FSC Amilo A1630 PCI subvendor/dev to laptops

Jordan Crouse (1):
  ide: Fix an overrun found in the CS5535 IDE driver

Mariusz Kozlowski (2):
  drivers/ide/arm/icside.c: kmalloc + memset conversion to kzalloc
  drivers/scsi/ide-scsi.c: kmalloc + memset conversion to kzalloc

Meelis Roos (1):
  ide: fix runtogether printk's in cmd64x IDE driver

Stephen Rothwell (1):
  ide: eliminate warnings in ide-tape.c


diff --git a/drivers/ide/arm/icside.c b/drivers/ide/arm/icside.c
index c89b5f4..8a9b98f 100644
--- a/drivers/ide/arm/icside.c
+++ b/drivers/ide/arm/icside.c
@@ -693,13 +693,12 @@ icside_probe(struct expansion_card *ec, const struct 
ecard_id *id)
if (ret)
goto out;
 
-   state = kmalloc(sizeof(struct icside_state), GFP_KERNEL);
+   state = kzalloc(sizeof(struct icside_state), GFP_KERNEL);
if (!state) {
ret = -ENOMEM;
goto release;
}
 
-   memset(state, 0, sizeof(state));
state->type = ICS_TYPE_NOTYPE;
state->dev  = >dev;
 
diff --git a/drivers/ide/ide-tape.c b/drivers/ide/ide-tape.c
index e82bfa5..1fa5794 100644
--- a/drivers/ide/ide-tape.c
+++ b/drivers/ide/ide-tape.c
@@ -640,7 +640,7 @@ typedef enum {
 } idetape_chrdev_direction_t;
 
 struct idetape_bh {
-   unsigned short b_size;
+   u32 b_size;
atomic_t b_count;
struct idetape_bh *b_reqnext;
char *b_data;
diff --git a/drivers/ide/pci/alim15x3.c b/drivers/ide/pci/alim15x3.c
index 5511c86..025689d 100644
--- a/drivers/ide/pci/alim15x3.c
+++ b/drivers/ide/pci/alim15x3.c
@@ -593,7 +593,7 @@ static struct dmi_system_id cable_dmi_table[] = {
.ident = "HP Pavilion N5430",
.matches = {
DMI_MATCH(DMI_BOARD_VENDOR, "Hewlett-Packard"),
-   DMI_MATCH(DMI_BOARD_NAME, "OmniBook N32N-736"),
+   DMI_MATCH(DMI_BOARD_VERSION, "OmniBook N32N-736"),
},
},
{ }
diff --git a/drivers/ide/pci/cmd64x.c b/drivers/ide/pci/cmd64x.c
index 19633c5..0e3b5de 100644
--- a/drivers/ide/pci/cmd64x.c
+++ b/drivers/ide/pci/cmd64x.c
@@ -475,11 +475,11 @@ static unsigned int __devinit init_chipset_cmd64x(struct 
pci_dev *dev, const cha
switch (rev) {
case 0x07:
case 0x05:
-   printk("%s: UltraDMA capable", name);
+   printk("%s: UltraDMA capable\n", name);
break;
case 0x03:
default:
-   printk("%s: MultiWord DMA force limited", name);
+   printk("%s: MultiWord DMA force limited\n", name);
break;
case 0x01:
printk("%s: MultiWord DMA limited, "
diff --git a/drivers/ide/pci/cs5520.c b/drivers/ide/pci/cs5520.c
index bccedf9..b89e816 100644
--- a/drivers/ide/pci/cs5520.c
+++ b/drivers/ide/pci/cs5520.c
@@ -133,7 +133,7 @@ static void cs5520_tune_drive(ide_drive_t *drive, u8 pio)
 static int cs5520_config_drive_xfer_rate(ide_drive_t *drive)
 {
/* Tune the drive for PIO modes up to PIO 4 */  
-   cs5520_tune_drive(drive, 4);
+   cs5520_tune_drive(drive, 255);
 
/* Then tell the core to use DMA operations */
return 0;
diff --git a/drivers/ide/pci/cs5535.c b/drivers/ide/pci/cs5535.c
index ce44e38..082ca7d 100644
--- a/drivers/ide/pci/cs5535.c
+++ b/drivers/ide/pci/cs5535.c
@@ -2,6 +2,7 @@
  * linux/drivers/ide/pci/cs5535.c
  *
  * Copyright (C) 2004-2005 Advanced Micro Devices, Inc.
+ * Copyright (C)  2007 Bartlomiej Zolnierkiewicz
  *
  * History:
  * 09/20/2005 - Jaya Kumar <[EMAIL PROTECTED]>
@@ -83,14 +84,17 @@ static void cs5535_set_speed(ide_drive_t *drive, u8 speed)
 
/* Set the PIO timings */
if

debugfs helper for decimal challenged

2007-08-01 Thread Robin Getz

Greg:

For those of us who forget that when bits 21 and bit 31 in a hardware 
register exposed with debugfs, I should see 2149580800 when I cat it (vs
0x8020), any objections to providing a hex output interface to the 
debugfs?

Since the input side already takes decimal & hex, I don't think this is a big
change:

DEFINE_SIMPLE_ATTRIBUTE(fops_x16, debugfs_u16_get, debugfs_u16_set, 
"0x%04llx\n");

struct dentry *debugfs_create_x16(const char *name, mode_t mode,
  struct dentry *parent, u16 *value)
{
return debugfs_create_file(name, mode, parent, value, _x16);
}

DEFINE_SIMPLE_ATTRIBUTE(fops_x32, debugfs_u32_get, debugfs_u32_set, 
"0x%08llx\n");

struct dentry *debugfs_create_x32(const char *name, mode_t mode,
  struct dentry *parent, u32 *value)
{
return debugfs_create_file(name, mode, parent, value, _x32);
}

If this is OK - I will send a real patch.

-Robin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1048 matches

Mail list logo