date:20070420

Re: [patch] CFS scheduler, v3

2007-04-20 Thread Peter Williams


William Lee Irwin III wrote:

William Lee Irwin III wrote:

This essentially doesn't look correct because while you want to enforce
the CPU bandwidth allocation, this doesn't have much to do with that
apart from the CPU bandwidth appearing as a term. It's more properly
a rate of service as opposed to a time at which anything should happen
or a number useful for predicting such. When service should begin more
properly depends on the other tasks in the system and a number of other
decisions that are part of the scheduling policy.


On Sat, Apr 21, 2007 at 10:23:07AM +1000, Peter Williams wrote:
This model takes all of those into consideration.  The idea is not just 
to predict but to use the calculated time to decide when to boot the 
current process of the CPU (if it doesn't leave voluntarily) and put 
this one on.  This more or less removes the need to give each task a 
predetermined chunk of CPU when they go on to the CPU.  This should, in 
general, reduce the number context switches as tasks get to run until 
they've finished what they're doing or another task becomes higher 
priority rather than being booted off after an arbitrary time interval. 
 (If this ever gets tried it will be interesting to see if this 
prediction comes true.)
BTW Even if Ingo doesn't choose to try this model, I'll probably make a 
patch (way in the future after Ingo's changes are settled) to try it out 
myself.


I think I smoked out what you were doing.


William Lee Irwin III wrote:

If you want to choose a "quasi-inter-arrival time" to achieve the
specified CPU bandwidth allocation, this would be it, but to use that
to actually enforce the CPU bandwidth allocation, you would need to
take into account the genuine inter-arrival time to choose an actual
time for service to begin. In other words, this should be a quota for
the task to have waited. If it's not waited long enough, then it should
be delayed by the difference to achieve the inter-arrival time you're
trying to enforce. If it's waited longer, it should be executed
sooner modulo other constraints, and perhaps even credited for future
scheduling cycles.


On Sat, Apr 21, 2007 at 10:23:07AM +1000, Peter Williams wrote:
The idea isn't to enforce the bandwidth entitlement to the extent of 
throttling tasks if they exceed their entitlement and there's no other 
tasks ready to use the CPU.  This is mainly because the bandwidth 
entitlement isn't fixed -- it's changing constantly as the number and 
type of runnable tasks changes.


Well, a little hysteresis will end up throttling in such a manner
anyway as a side-effect,


Think of this as a calming influence :-)


or you'll get anomalies. Say two tasks with
equal entitlements compete, where one sleeps for 1/3 of the time and
the other is fully CPU-bound. If only the times when they're in direct
competition are split 50/50, then the CPU-bound task gets 2/3 and the
sleeper 1/3, which is not the intended effect. I don't believe this
model will be very vulnerable to it, though.


Nor me.




William Lee Irwin III wrote:

In order to partially avoid underallocating CPU bandwidth to p, one
should track the time last spent sleeping and do the following:


On Sat, Apr 21, 2007 at 10:23:07AM +1000, Peter Williams wrote:
Yes I made a mistake in omitting to take into account sleep interval. 
See another e-mail to Ingo correcting this problem.


I took it to be less trivial of an error than it was. No big deal.


No, you were right it was definitely a NON trivial error.




William Lee Irwin III wrote:

In order to do better, longer-term history is required,


On Sat, Apr 21, 2007 at 10:23:07AM +1000, Peter Williams wrote:
The half life of the Kalman filter (roughly equivalent to a running 
average) used to calculate the averages determines how much history is 
taken into account.  It could be made configurable (at least, until 
enough real life experience was available to decide on the best value to 
use).


A Kalman filter would do better than a running average. I'm all for it.


As a long time user of Kalman filters I tend to think of them as the 
same thing.  I use the term running average when talking about the idea 
behind a scheduler because I think that more people will understand what 
the general idea is.  When it comes to implementation I always replace 
the idea of "running average" with a roughly equivalent Kalman filter.





William Lee Irwin III wrote:

To attempt to maintain an infinite history of
bandwidth underutilization to be credited too far in the future would
enable potentially long-term overutilization when such credits are
cashed en masse for a sustained period of time. At some point you have
to say "use it or lose it;" over a shorter period of time some smoothing
is still admissible and even desirable.


On Sat, Apr 21, 2007 at 10:23:07AM +1000, Peter Williams wrote:
Yes, that's why I suggest a running average over the last few scheduling 
cycles for the task.  But thinking about it some more I'm now not so

Re: 2.6.20.7 locking up hard on boot

2007-04-20 Thread Greg KH

On Fri, Apr 20, 2007 at 11:30:59PM -0500, Marcos Pinto wrote:
>  Yes, I just tried 2.6.20.3 with ACPI enabled and it booted perfectly.
>  I'm hoping this means you know what's wrong? :-)

Can you do a 'git bisect' on the versions between 2.6.20.3 and 2.6.20.7
to try to find the problem patch?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PCI bridge range sizing bug

2007-04-20 Thread Rik van Riel


Jesse Barnes wrote:

On Friday, April 20, 2007 11:28 am Linus Torvalds wrote:

On Fri, 20 Apr 2007, Jesse Barnes wrote:

Sounds good, hopefully reassigning the bridge resources won't cause
too much trouble.  Do you have time to hack this up?  If not, I
could give it a try, as long as ajax is willing to test...

Actually, I would suggest we not do it automatically (because the
need for it is just so low, and the downsides are potentially huge -
there are just too many resources that are "hidden" from us through
ACPI tricks and having hardware that doesn't actually expose their
PCI resources fully through the normal PCI resource setup).


Yeah, that's probably prudent.  OTOH we should probably let the user 
know in no uncertain terms that some of the stuff behind one of their 
bridges will be inaccessible.


Something like that would have made it a lot more obvious
why my Matrox PCIe x1 video card will not work in my Dell
9150, while a PCI video card does work.

The PCI video card directly sits on the bus, and gets its
resources assigned by the BIOS.

The PCIe video card turned out to be a PCIe to AGP bridge,
and the BIOS did not assign the needed PCI resources, making
the system crash when I started X.

X seemed to have some trouble reading the ROM, too...

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, v3

2007-04-20 Thread William Lee Irwin III

William Lee Irwin III wrote:
>> This essentially doesn't look correct because while you want to enforce
>> the CPU bandwidth allocation, this doesn't have much to do with that
>> apart from the CPU bandwidth appearing as a term. It's more properly
>> a rate of service as opposed to a time at which anything should happen
>> or a number useful for predicting such. When service should begin more
>> properly depends on the other tasks in the system and a number of other
>> decisions that are part of the scheduling policy.

On Sat, Apr 21, 2007 at 10:23:07AM +1000, Peter Williams wrote:
> This model takes all of those into consideration.  The idea is not just 
> to predict but to use the calculated time to decide when to boot the 
> current process of the CPU (if it doesn't leave voluntarily) and put 
> this one on.  This more or less removes the need to give each task a 
> predetermined chunk of CPU when they go on to the CPU.  This should, in 
> general, reduce the number context switches as tasks get to run until 
> they've finished what they're doing or another task becomes higher 
> priority rather than being booted off after an arbitrary time interval. 
>  (If this ever gets tried it will be interesting to see if this 
> prediction comes true.)
> BTW Even if Ingo doesn't choose to try this model, I'll probably make a 
> patch (way in the future after Ingo's changes are settled) to try it out 
> myself.

I think I smoked out what you were doing.


William Lee Irwin III wrote:
>> If you want to choose a "quasi-inter-arrival time" to achieve the
>> specified CPU bandwidth allocation, this would be it, but to use that
>> to actually enforce the CPU bandwidth allocation, you would need to
>> take into account the genuine inter-arrival time to choose an actual
>> time for service to begin. In other words, this should be a quota for
>> the task to have waited. If it's not waited long enough, then it should
>> be delayed by the difference to achieve the inter-arrival time you're
>> trying to enforce. If it's waited longer, it should be executed
>> sooner modulo other constraints, and perhaps even credited for future
>> scheduling cycles.

On Sat, Apr 21, 2007 at 10:23:07AM +1000, Peter Williams wrote:
> The idea isn't to enforce the bandwidth entitlement to the extent of 
> throttling tasks if they exceed their entitlement and there's no other 
> tasks ready to use the CPU.  This is mainly because the bandwidth 
> entitlement isn't fixed -- it's changing constantly as the number and 
> type of runnable tasks changes.

Well, a little hysteresis will end up throttling in such a manner
anyway as a side-effect, or you'll get anomalies. Say two tasks with
equal entitlements compete, where one sleeps for 1/3 of the time and
the other is fully CPU-bound. If only the times when they're in direct
competition are split 50/50, then the CPU-bound task gets 2/3 and the
sleeper 1/3, which is not the intended effect. I don't believe this
model will be very vulnerable to it, though.


William Lee Irwin III wrote:
>> In order to partially avoid underallocating CPU bandwidth to p, one
>> should track the time last spent sleeping and do the following:

On Sat, Apr 21, 2007 at 10:23:07AM +1000, Peter Williams wrote:
> Yes I made a mistake in omitting to take into account sleep interval. 
> See another e-mail to Ingo correcting this problem.

I took it to be less trivial of an error than it was. No big deal.


William Lee Irwin III wrote:
>> In order to do better, longer-term history is required,

On Sat, Apr 21, 2007 at 10:23:07AM +1000, Peter Williams wrote:
> The half life of the Kalman filter (roughly equivalent to a running 
> average) used to calculate the averages determines how much history is 
> taken into account.  It could be made configurable (at least, until 
> enough real life experience was available to decide on the best value to 
> use).

A Kalman filter would do better than a running average. I'm all for it.


William Lee Irwin III wrote:
>> To attempt to maintain an infinite history of
>> bandwidth underutilization to be credited too far in the future would
>> enable potentially long-term overutilization when such credits are
>> cashed en masse for a sustained period of time. At some point you have
>> to say "use it or lose it;" over a shorter period of time some smoothing
>> is still admissible and even desirable.

On Sat, Apr 21, 2007 at 10:23:07AM +1000, Peter Williams wrote:
> Yes, that's why I suggest a running average over the last few scheduling 
> cycles for the task.  But thinking about it some more I'm now not so 
> sure.  The lack of apparent "smoothness" when I've done this sort of 
> thing with raw rather than smooth data (in modifications to the current 
> dynamic priority based scheduler model) is usually noticed by running 
> top and seeing wildly fluctuating dynamic priorities.  I'm not sure that 
> the actual responsiveness of the system reflects this.  So I'm now 
> willing to reserve my

sysfs: Need ability to remove all symlinks pointing to an object

2007-04-20 Thread Christoph Lameter

How do I remove all references to an object in sysfs? The following patch
attempt to get that functionality in sysfs but I am not that familiar with 
it. Help



SLUB: Remove alias before installing symlink

We cannot really track the aliases that are created when aliasing
a slab. kmem_cache_destroy only decrements a refcounter. This
means that the aliases are never removed. However, when the last
ref count to a slab is dropped then we should remove all symlinks.

Signed-off-by: Chriustoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.21-rc6/mm/slub.c
===
--- linux-2.6.21-rc6.orig/mm/slub.c 2007-04-20 16:44:14.0 -0700
+++ linux-2.6.21-rc6/mm/slub.c  2007-04-20 17:12:18.0 -0700
@@ -3315,6 +3315,7 @@ static int sysfs_slab_add(struct kmem_ca
/* Defer until later */
return 0;
 
+   sysfs_remove_link(_subsys.kset.kobj, s->name);
kobj_set_kset_s(s, slab_subsys);
kobject_set_name(>kobj, s->name);
kobject_init(>kobj);
@@ -3329,8 +3330,18 @@ static int sysfs_slab_add(struct kmem_ca
return 0;
 }
 
+static void sysfs_remove_aliases(struct kmem_cache *s)
+{
+   /*
+* Remove all symlinks pointing to the kobject of
+* in the kmem_cache structure
+*/
+   sysfs_remove_links(_subsys.kset, >kobj);
+}
+
 static void sysfs_slab_remove(struct kmem_cache *s)
 {
+   sysfs_remove_aliases(s);
kobject_uevent(>kobj, KOBJ_REMOVE);
kobject_del(>kobj);
 }
@@ -3351,9 +3362,11 @@ static int sysfs_slab_alias(struct kmem_
 {
struct saved_alias *al;
 
-   if (slab_state == SYSFS)
+   if (slab_state == SYSFS) {
+   sysfs_remove_link(_subsys.kset.kobj, name);
return sysfs_create_link(_subsys.kset.kobj,
>kobj, name);
+   }
 
al = kmalloc(sizeof(struct saved_alias), GFP_KERNEL);
if (!al)
Index: linux-2.6.21-rc6/fs/sysfs/symlink.c
===
--- linux-2.6.21-rc6.orig/fs/sysfs/symlink.c2007-04-20 17:05:00.0 
-0700
+++ linux-2.6.21-rc6/fs/sysfs/symlink.c 2007-04-20 17:18:50.0 -0700
@@ -117,6 +117,38 @@ void sysfs_remove_link(struct kobject * 
sysfs_hash_and_remove(kobj->dentry,name);
 }
 
+/*
+ * Remove all symlinks pointing to the indicated object
+ */
+void sysfs_remove_links(struct kset *kset, struct kobject *needle)
+{
+   struct list_head *entry;
+
+restart:
+   spin_lock(>list_lock);
+   list_for_each(entry,>list) {
+   struct kobject * k = container_of(entry, struct kobject, entry);
+   struct sysfs_symlink *sl =
+   container_of(k, struct sysfs_symlink, target_kobj);
+
+   if (sl->target_kobj == needle) {
+   /* sysfs_remove_link needs the lock. sigh */
+   spin_unlock(>list_lock);
+
+   sysfs_remove_link(k, sl->link_name);
+   /*
+* Somehow sysfs_remove_link does
+* not clean up after itself
+*/
+   kfree(sl->link_name);
+   kfree(sl);
+   kobject_put(needle);
+   goto restart;
+   }
+}
+spin_unlock(>list_lock);
+}
+
 static int sysfs_get_target_path(struct kobject * kobj, struct kobject * 
target,
 char *path)
 {
@@ -188,5 +220,6 @@ const struct inode_operations sysfs_syml
 };
 
 
-EXPORT_SYMBOL_GPL(sysfs_create_link);
 EXPORT_SYMBOL_GPL(sysfs_remove_link);
+EXPORT_SYMBOL_GPL(sysfs_remove_links);
+EXPORT_SYMBOL_GPL(sysfs_create_link);
Index: linux-2.6.21-rc6/include/linux/kobject.h
===
--- linux-2.6.21-rc6.orig/include/linux/kobject.h   2007-04-20 
17:09:03.0 -0700
+++ linux-2.6.21-rc6/include/linux/kobject.h2007-04-20 17:09:49.0 
-0700
@@ -166,6 +166,9 @@ static inline struct kobj_type * get_kty
 
 extern struct kobject * kset_find_obj(struct kset *, const char *);
 
+extern void sysfs_remove_links(struct kset *kset, struct kobject *needle);
+
+
 
 /**
  * Use this when initializing an embedded kset with no other 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.20.7 locking up hard on boot

2007-04-20 Thread Marcos Pinto


Yes, I just tried 2.6.20.3 with ACPI enabled and it booted perfectly.
I'm hoping this means you know what's wrong? :-)

Thanks again,
Marcos

On 4/20/07, Adrian Bunk <[EMAIL PROTECTED]> wrote:

On Fri, Apr 20, 2007 at 07:47:13PM -0500, Marcos Pinto wrote:
> I'm not subscribed, so please personally CC me any answers/comments.
> Thank you.
>
> While booting, (AMD64 Turion x2) 2.6.20.7 kernel locks up hard.  The
> last kernel that I tried, 2.6.18.8, worked perfectly without any
> trickery.  2.6.20.7 only boots up with "acpi=off" being added to the
> kernel line.  Note that 2.6.18.8 works perfectly with acpi on, which
> is really the
> only way I can run this box because with "acpi=off" it overheats and
> freezes.
> Please let me know if there's anything else that I could do to help with
> this.
>
>
> Here's what's on the screen when it happens:
>
> Brought up 2 CPUs
> testing NMI watchdog ... OK.
> Disabling vsyscall due to  use of PM timer
> time.c: Using 3.579545 MHz WALL PM GTOD PM timer.
> time.c: Detected 1808.264 MHz processor.
> migration_cost=281
> NET: Registered protocol family 16
> ACPI: bus type pci registered
> PCI: Using MMCONFIG at e000
> PCI: No mmconfig possible on device 00:18
> PCI: No mmconfig possible on device 07:05
> ACPI: Interpreter enabled
> ACPI: Using IOAPIC for interrupt routing
> ACPI: PCI Root Bridge [PCI0] (:00)
> ACPI: Assume root bridge [\_SB_.PCI0] bus is 0
> :00:0d.0: cannot adjust BAR0 (not I/O)
> :00:0d.0: cannot adjust BAR1 (not I/O)
>...

Does 2.6.20.3 boot with ACPI enabled?

cu
Adrian

--

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Make new setting of panic_on_oom

2007-04-20 Thread Yasunori Goto


> > >   read_lock(_lock);
> > >  
> > > + if (sysctl_panic_on_oom == 2)
> > > + panic("out of memory. Compulsory panic_on_oom is selected.\n");
> > > +
> > 
> > Wouldn't it be safer to put the panic before the read_lock()?
> 
> I agree. Otherwise the patch seem to be okay.

Ok. This is take 2.
Thanks for your comment.

-

The current panic_on_oom may not work if there is a process using 
cpusets/mempolicy, because other nodes' memory may remain.
But some people want failover by panic ASAP even if they are used.
This patch makes new setting for its request.

This is not tested yet. But it would work.

Please apply.

Signed-off-by: Yasunori Goto <[EMAIL PROTECTED]>

---
 Documentation/sysctl/vm.txt |   23 +--
 mm/oom_kill.c   |3 +++
 2 files changed, 20 insertions(+), 6 deletions(-)

Index: panic_on_oom2/Documentation/sysctl/vm.txt
===
--- panic_on_oom2.orig/Documentation/sysctl/vm.txt  2007-04-21 
12:39:09.0 +0900
+++ panic_on_oom2/Documentation/sysctl/vm.txt   2007-04-21 12:39:58.0 
+0900
@@ -197,11 +197,22 @@
 
 panic_on_oom
 
-This enables or disables panic on out-of-memory feature.  If this is set to 1,
-the kernel panics when out-of-memory happens.  If this is set to 0, the kernel
-will kill some rogue process, called oom_killer.  Usually, oom_killer can kill
-rogue processes and system will survive.  If you want to panic the system
-rather than killing rogue processes, set this to 1.
+This enables or disables panic on out-of-memory feature.
 
-The default value is 0.
+If this is set to 0, the kernel will kill some rogue process,
+called oom_killer.  Usually, oom_killer can kill rogue processes and
+system will survive.
+
+If this is set to 1, the kernel panics when out-of-memory happens.
+However, if a process limits using nodes by mempolicy/cpusets,
+and those nodes become memory exhaustion status, one process
+may be killed by oom-killer. No panic occurs in this case.
+Because other nodes' memory may be free. This means system total status
+may be not fatal yet.
 
+If this is set to 2, the kernel panics compulsorily even on the
+above-mentioned.
+
+The default value is 0.
+1 and 2 are for failover of clustering. Please select either
+according to your policy of failover.
Index: panic_on_oom2/mm/oom_kill.c
===
--- panic_on_oom2.orig/mm/oom_kill.c2007-04-21 12:39:09.0 +0900
+++ panic_on_oom2/mm/oom_kill.c 2007-04-21 12:40:31.0 +0900
@@ -409,6 +409,9 @@
show_mem();
}
 
+   if (sysctl_panic_on_oom == 2)
+   panic("out of memory. Compulsory panic_on_oom is selected.\n");
+
cpuset_lock();
read_lock(_lock);
 


-- 
Yasunori Goto 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Extend Linux to support proportional-share scheduling

2007-04-20 Thread William Lee Irwin III

On Fri, Apr 20, 2007 at 11:30:04AM -0700, Tong Li wrote:
> This patch extends the existing Linux scheduler with support for
> proportional-share scheduling (as a new KConfig option).
> http://www.cs.duke.edu/~tongli/linux/linux-2.6.19.2-trio.patch
> It uses a scheduling algorithm, called Distributed Weighted Round-Robin 
> (DWRR), which retains the existing scheduler design as much as possible, 
> and extends it to achieve proportional fairness with O(1) time complexity 
> and a constant error bound, compared to the ideal fair scheduling 
> algorithm. The code is by no means final and has been only tested on a 
> four-processor dual-core x86-64 system. Rather than focusing on coding 
> issues, the intent of this RFC is to invite discussions on the proposed 
> DWRR algorithm and proportional-share scheduling in general.

Very nice. I think we need this kind of functionality in mainline.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] lazy freeing of memory through MADV_FREE

2007-04-20 Thread Rik van Riel


Eric Dumazet wrote:

Rik van Riel a écrit :

Andrew Morton wrote:

On Fri, 20 Apr 2007 17:38:06 -0400
Rik van Riel <[EMAIL PROTECTED]> wrote:


Andrew Morton wrote:


I've also merged Nick's "mm: madvise avoid exclusive mmap_sem".

- Nick's patch also will help this problem.  It could be that your 
patch

  no longer offers a 2x speedup when combined with Nick's patch.

  It could well be that the combination of the two is even better, 
but it
  would be nice to firm that up a bit.  

I'll test that.


Thanks.


Well, good news.

It turns out that Nick's patch does not improve peak
performance much, but it does prevent the decline when
running with 16 threads on my quad core CPU!

We _definately_ want both patches, there's a huge benefit
in having them both.

Here are the transactions/seconds for each combination:

   vanilla   new glibc  madv_free kernel   madv_free + mmap_sem
threads

1 610 609 596545


545 tps versus 610 tps for one thread ? It seems quite bad, no ?

Could you please find an explanation for this ?


I have no idea why this happens.  Especially the last one,
going from a write lock to a read lock on the mmap_sem
should not make ANY difference whatsoever since we're
running single threaded!


2103211361196   1200
4107011282014   2024
8100010881665   2087
1677910731310   1999


Performance with 2 database threads is way better though,
and performance with 4 or more threads more than doubles...

If you have an explanation on why single threaded performance
went down a little on my quad core system, please let me know.

Does performance suffer at all on a real UP system?

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] KMEM_CACHE() simplify slab cache creation

2007-04-20 Thread Christoph Lameter

This patch provides a new macro

KMEM_CACHE(, )

to simplify slab creation. KMEM_CACHE creates a slab with the name of the
struct, with the size of the struct and with the alignment of the struct.
Additional slab flags may be specified if necessary.

Example



struct test_slab {
int a,b,c;
struct list_head;
} __cacheline_aligned_in_smp;

test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC)



willl create a new slab named "test_slab" of the size
sizeof(struct test_slab) and aligned to the alignment of test
slab. If it fails then we panic.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.21-rc6/include/linux/slab.h
===
--- linux-2.6.21-rc6.orig/include/linux/slab.h  2007-04-20 20:15:14.0 
-0700
+++ linux-2.6.21-rc6/include/linux/slab.h   2007-04-20 20:24:03.0 
-0700
@@ -55,6 +55,18 @@ unsigned int kmem_cache_size(struct kmem
 const char *kmem_cache_name(struct kmem_cache *);
 int kmem_ptr_validate(struct kmem_cache *cachep, const void *ptr);
 
+/*
+ * Please use this macro to create slab caches. Simply specify the
+ * name of the structure and maybe some flags that are listed above.
+ *
+ * The alignment of the struct determines object alignment. If you
+ * f.e. add cacheline_aligned_in_smp to the struct declaration
+ * then the objects will be properly aligned in SMP configurations.
+ */
+#define KMEM_CACHE(__struct, __flags) kmem_cache_create(#__struct,\
+   sizeof(struct __struct), __alignof__(struct __struct),\
+   (__flags), NULL, NULL)
+
 #ifdef CONFIG_NUMA
 extern void *kmem_cache_alloc_node(struct kmem_cache *, gfp_t flags, int node);
 #else
Index: linux-2.6.21-rc6/kernel/delayacct.c
===
--- linux-2.6.21-rc6.orig/kernel/delayacct.c2007-04-20 20:15:14.0 
-0700
+++ linux-2.6.21-rc6/kernel/delayacct.c 2007-04-20 20:17:47.0 -0700
@@ -31,11 +31,7 @@ __setup("nodelayacct", delayacct_setup_d
 
 void delayacct_init(void)
 {
-   delayacct_cache = kmem_cache_create("delayacct_cache",
-   sizeof(struct task_delay_info),
-   0,
-   SLAB_PANIC,
-   NULL, NULL);
+   delayacct_cache = KMEM_CACHE(task_delay_info, SLAB_PANIC);
delayacct_tsk_init(_task);
 }
 
Index: linux-2.6.21-rc6/kernel/pid.c
===
--- linux-2.6.21-rc6.orig/kernel/pid.c  2007-04-20 20:15:14.0 -0700
+++ linux-2.6.21-rc6/kernel/pid.c   2007-04-20 20:17:47.0 -0700
@@ -412,7 +412,5 @@ void __init pidmap_init(void)
set_bit(0, init_pid_ns.pidmap[0].page);
atomic_dec(_pid_ns.pidmap[0].nr_free);
 
-   pid_cachep = kmem_cache_create("pid", sizeof(struct pid),
-   __alignof__(struct pid),
-   SLAB_PANIC, NULL, NULL);
+   pid_cachep = KMEM_CACHE(pid, SLAB_PANIC);
 }
Index: linux-2.6.21-rc6/kernel/signal.c
===
--- linux-2.6.21-rc6.orig/kernel/signal.c   2007-04-20 20:15:14.0 
-0700
+++ linux-2.6.21-rc6/kernel/signal.c2007-04-20 20:17:47.0 -0700
@@ -2636,9 +2636,5 @@ __attribute__((weak)) const char *arch_v
 
 void __init signals_init(void)
 {
-   sigqueue_cachep =
-   kmem_cache_create("sigqueue",
- sizeof(struct sigqueue),
- __alignof__(struct sigqueue),
- SLAB_PANIC, NULL, NULL);
+   sigqueue_cachep = KMEM_CACHE(sigqueue, SLAB_PANIC);
 }
Index: linux-2.6.21-rc6/kernel/taskstats.c
===
--- linux-2.6.21-rc6.orig/kernel/taskstats.c2007-04-20 20:15:14.0 
-0700
+++ linux-2.6.21-rc6/kernel/taskstats.c 2007-04-20 20:17:47.0 -0700
@@ -524,9 +524,7 @@ void __init taskstats_init_early(void)
 {
unsigned int i;
 
-   taskstats_cache = kmem_cache_create("taskstats_cache",
-   sizeof(struct taskstats),
-   0, SLAB_PANIC, NULL, NULL);
+   taskstats_cache = KMEM_CACHE(taskstats, SLAB_PANIC);
for_each_possible_cpu(i) {
INIT_LIST_HEAD(&(per_cpu(listener_array, i).list));
init_rwsem(&(per_cpu(listener_array, i).sem));
Index: linux-2.6.21-rc6/block/cfq-iosched.c
===
--- linux-2.6.21-rc6.orig/block/cfq-iosched.c   2007-04-20 20:15:14.0 
-0700
+++ linux-2.6.21-rc6/block/cfq-iosched.c2007-04-20 20:17:47.0 
-0700
@@ -,13 +,11 @@ static void cfq_slab_kill(void)
 
 static int __init cfq_slab_setup(void)

Re: [RFC 0/8] Cpuset aware writeback

2007-04-20 Thread Christoph Lameter

On Fri, 20 Apr 2007, Ethan Solomita wrote:

> cpuset_write_dirty_map.htm
> 
>In __set_page_dirty_nobuffers() you always call cpuset_update_dirty_nodes()
> but in __set_page_dirty_buffers() you call it only if page->mapping is still
> set after locking. Is there a reason for the difference? Also a question not
> about your patch: why do those functions call __mark_inode_dirty() even if the
> dirty page has been truncated and mapping == NULL?

If page->mapping has been cleared then the page was removed from the 
mapping. __mark_inode_dirty just dirties the inode. If a truncation occurs 
then the inode was modified.

> cpuset_write_throttle.htm
> 
>I noticed that several lines have leading spaces. I didn't check if other
> patches have the problem too.

Maybe download the patches? How did those strange .htm endings get 
appended to the patches?

>In get_dirty_limits(), when cpusets are configd you don't subtract highmen
> the same way that is done without cpusets. Is this intentional?

That is something in flux upstream. Linus changed it recently. Do it one 
way or the other.

>It seems that dirty_exceeded is still a global punishment across cpusets.
> Should it be addressed?

Sure. It would be best if you could place that somehow in a cpuset.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Fwd: Fw: [2.6.20.4] BUG: dentry xattrs still in use in shrink_dcache_for_umount() with reiserfs

2007-04-20 Thread Jeff Mahoney

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Andrew Morton wrote:
> On Wed, 18 Apr 2007 11:00:00 -0400
> Jeff Mahoney <[EMAIL PROTECTED]> wrote:
> 
>>> Do you think that could be a reason of the extra reference count on   
>>> xattr_root dentry?
>> No, I don't think it is. Looking at the code now, it seems obvious, but
>> I didn't notice it before and nobody else has reported a problem.
>>
>> getxattr() doesn't require any VFS locking. When we get down into the
>> reiserfs code, it takes a read lock. If we get two concurrent threads
>> looking up an xattr before the root has been saved, there's a window
>> where REISERFS_SB(s)->xattr_root is NULL but we've already looked it up
>> and taken a reference on it.
>>
>> I have a patch set to clean up the extended attribute code that fixes
>> this problem along the way by killing off the xattr locks and using the
>> backing files/dirs i_mutex instead. I'll post them to the reiserfs
>> mailing list.
> 
> Do we have anything suitable for 2.6.21 which will address this crash?
> 
> Also, it's not clear to me how many users we can expect to be impacted by it.
> I assume that if the same bug is in 2.6.20 then the answer is "not many".
> How come Andrea is able to keep hitting it?

I have the patchset that I mentioned, but I'm not proposing it for 2.6.21.
It's much too invasive to be introduced in an -rc7, but it does include
locking changes that I believe avoid this bug.

Vladimir was right in his analysis that sometimes get_xa_root() takes the
reference once and other times twice, but not for the right reasons.  I save
a reference to the xattr dir to avoid a lookup later, but when there are 
multiple
getxattrs or listxattrs as the first xattr operation on the file system, we can 
end
up taking a second reference when we shouldn't. This is because those operations
are protected by read locks and the ->xattr_root pointer isn't protected by 
anything
else. A quick fix would be to just extend the protection of the priv root's 
i_mutex
around the assignment, and test first. The right fix involves a complete rework 
of
the locking, and I have code to do that, it's just too late to include it in 
2.6.21.

I'd love to know what Andrea (and now Andi Kleen) are doing to hit this now. 
There
haven't been any changes in this code in a while, and the 
shrink_dcache_for_umount()
has been around since October. I'm unable to reproduce locally so far, so if 
Andrea
or Andi could see if this fixes it for them, I'd appreciate it.

- -Jeff

- --- a/fs/reiserfs/xattr.c 2007-04-20 21:19:05.0 -0400
+++ b/fs/reiserfs/xattr.c   2007-04-20 21:41:16.0 -0400
@@ -72,14 +72,16 @@
err =
privroot->d_inode->i_op->mkdir(privroot->d_inode, xaroot,
   0700);
- - mutex_unlock(>d_inode->i_mutex);

if (err) {
+   mutex_unlock(>d_inode->i_mutex);
dput(xaroot);
dput(privroot);
return ERR_PTR(err);
}
- - REISERFS_SB(sb)->xattr_root = dget(xaroot);
+   if (REISERFS_SB(sb)->xattr_root == NULL)
+   REISERFS_SB(sb)->xattr_root = dget(xaroot);
+   mutex_unlock(>d_inode->i_mutex);
}

   out:

- -- 
Jeff Mahoney
SUSE Labs
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFGKWy4LPWxlyuTD7IRAhcVAJ9vpYk2ayYf7xP7eB40inFpkiERvgCglayP
P7pDkPouMuBlw07rs1qaKPo=
=jdRe
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

slab allocators: Remove SLAB_DEBUG_INITIAL flag

2007-04-20 Thread Christoph Lameter

I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by SLAB.

I think its purpose was to have a callback after an object has been freed
to verify that the state is the constructor state again? The callback is
performed before each freeing of an object.

I would think that it is much easier to check the object state manually
before the free. That also places the check near the code object
manipulation of the object.

Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was 
compiled with SLAB debugging on. If there would be code in a constructor 
handling SLAB_DEBUG_INITIAL then it would have to be conditional on 
SLAB_DEBUG otherwise it would just be dead code. But there is no such code 
in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real 
use of, difficult to understand and there are easier ways to accomplish 
the same effect (i.e. add debug code before kfree).

There is a related flag SLAB_CTOR_VERIFY that is frequently checked
to be clear in fs inode caches. Remove the pointless checks 
(they would even be pointless without removeal of SLAB_DEBUG_INITIAL) 
from the fs constructors.

This is the last slab flag that SLUB did not support. Remove the check
for unimplemented flags from SLUB.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.21-rc6/include/linux/slab.h
===
--- linux-2.6.21-rc6.orig/include/linux/slab.h  2007-04-20 18:07:16.0 
-0700
+++ linux-2.6.21-rc6/include/linux/slab.h   2007-04-20 18:08:22.0 
-0700
@@ -21,7 +21,6 @@ typedef struct kmem_cache kmem_cache_t _
  * The ones marked DEBUG are only valid if CONFIG_SLAB_DEBUG is set.
  */
 #define SLAB_DEBUG_FREE0x0100UL/* DEBUG: Perform 
(expensive) checks on free */
-#define SLAB_DEBUG_INITIAL 0x0200UL/* DEBUG: Call constructor (as 
verifier) */
 #define SLAB_RED_ZONE  0x0400UL/* DEBUG: Red zone objs in a 
cache */
 #define SLAB_POISON0x0800UL/* DEBUG: Poison objects */
 #define SLAB_HWCACHE_ALIGN 0x2000UL/* Align objs on cache lines */
@@ -36,7 +35,6 @@ typedef struct kmem_cache kmem_cache_t _
 /* Flags passed to a constructor functions */
 #define SLAB_CTOR_CONSTRUCTOR  0x001UL /* If not set, then 
deconstructor */
 #define SLAB_CTOR_ATOMIC   0x002UL /* Tell constructor it can't 
sleep */
-#define SLAB_CTOR_VERIFY   0x004UL /* Tell constructor it's a 
verify call */
 
 /*
  * struct kmem_cache related prototypes
Index: linux-2.6.21-rc6/mm/slab.c
===
--- linux-2.6.21-rc6.orig/mm/slab.c 2007-04-20 18:07:16.0 -0700
+++ linux-2.6.21-rc6/mm/slab.c  2007-04-20 18:08:22.0 -0700
@@ -116,8 +116,7 @@
 #include   
 
 /*
- * DEBUG   - 1 for kmem_cache_create() to honour; SLAB_DEBUG_INITIAL,
- *   SLAB_RED_ZONE & SLAB_POISON.
+ * DEBUG   - 1 for kmem_cache_create() to honour; SLAB_RED_ZONE & 
SLAB_POISON.
  *   0 for faster, smaller code (especially in the critical paths).
  *
  * STATS   - 1 to collect stats for /proc/slabinfo.
@@ -172,7 +171,7 @@
 
 /* Legal flag mask for kmem_cache_create(). */
 #if DEBUG
-# define CREATE_MASK   (SLAB_DEBUG_INITIAL | SLAB_RED_ZONE | \
+# define CREATE_MASK   (SLAB_RED_ZONE | \
 SLAB_POISON | SLAB_HWCACHE_ALIGN | \
 SLAB_CACHE_DMA | \
 SLAB_STORE_USER | \
@@ -2182,12 +2181,6 @@ kmem_cache_create (const char *name, siz
 
 #if DEBUG
WARN_ON(strchr(name, ' ')); /* It confuses parsers */
-   if ((flags & SLAB_DEBUG_INITIAL) && !ctor) {
-   /* No constructor, but inital state check requested */
-   printk(KERN_ERR "%s: No con, but init state check "
-  "requested - %s\n", __FUNCTION__, name);
-   flags &= ~SLAB_DEBUG_INITIAL;
-   }
 #if FORCED_DEBUG
/*
 * Enable redzoning and last user accounting, except for caches with
@@ -2892,15 +2885,6 @@ static void *cache_free_debugcheck(struc
BUG_ON(objnr >= cachep->num);
BUG_ON(objp != index_to_obj(cachep, slabp, objnr));
 
-   if (cachep->flags & SLAB_DEBUG_INITIAL) {
-   /*
-* Need to call the slab's constructor so the caller can
-* perform a verify of its state (debugging).  Called without
-* the cache-lock held.
-*/
-   cachep->ctor(objp + obj_offset(cachep),
-cachep, SLAB_CTOR_CONSTRUCTOR | SLAB_CTOR_VERIFY);
-   }
if (cachep->flags & SLAB_POISON && cachep->dtor) {
/* we want to cache poison the object,
 * call the destruction callback
Index: linux-2.6.21-rc6/drivers/mtd/ubi/eba.c
===

Re: [RFC 0/8] Cpuset aware writeback

2007-04-20 Thread Ethan Solomita


Christoph Lameter wrote:
H Sorry. I got distracted and I have sent them to Kame-san who was 
interested in working on them. 


I have placed the most recent version at
http://ftp.kernel.org/pub/linux/kernel/people/christoph/cpuset_dirty
  


   Hi Christoph -- a few comments on the patches:

cpuset_write_dirty_map.htm

   In __set_page_dirty_nobuffers() you always call 
cpuset_update_dirty_nodes() but in __set_page_dirty_buffers() you call 
it only if page->mapping is still set after locking. Is there a reason 
for the difference? Also a question not about your patch: why do those 
functions call __mark_inode_dirty() even if the dirty page has been 
truncated and mapping == NULL?


cpuset_write_throttle.htm

   I noticed that several lines have leading spaces. I didn't check if 
other patches have the problem too.


   In get_dirty_limits(), when cpusets are configd you don't subtract 
highmen the same way that is done without cpusets. Is this intentional?


   It seems that dirty_exceeded is still a global punishment across 
cpusets. Should it be addressed?



   -- Ethan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.20.7 locking up hard on boot

2007-04-20 Thread Adrian Bunk

On Fri, Apr 20, 2007 at 07:47:13PM -0500, Marcos Pinto wrote:
> I'm not subscribed, so please personally CC me any answers/comments.
> Thank you.
>
> While booting, (AMD64 Turion x2) 2.6.20.7 kernel locks up hard.  The
> last kernel that I tried, 2.6.18.8, worked perfectly without any
> trickery.  2.6.20.7 only boots up with "acpi=off" being added to the
> kernel line.  Note that 2.6.18.8 works perfectly with acpi on, which
> is really the
> only way I can run this box because with "acpi=off" it overheats and 
> freezes.
> Please let me know if there's anything else that I could do to help with 
> this.
>
>
> Here's what's on the screen when it happens:
>
> Brought up 2 CPUs
> testing NMI watchdog ... OK.
> Disabling vsyscall due to  use of PM timer
> time.c: Using 3.579545 MHz WALL PM GTOD PM timer.
> time.c: Detected 1808.264 MHz processor.
> migration_cost=281
> NET: Registered protocol family 16
> ACPI: bus type pci registered
> PCI: Using MMCONFIG at e000
> PCI: No mmconfig possible on device 00:18
> PCI: No mmconfig possible on device 07:05
> ACPI: Interpreter enabled
> ACPI: Using IOAPIC for interrupt routing
> ACPI: PCI Root Bridge [PCI0] (:00)
> ACPI: Assume root bridge [\_SB_.PCI0] bus is 0
> :00:0d.0: cannot adjust BAR0 (not I/O)
> :00:0d.0: cannot adjust BAR1 (not I/O)
>...

Does 2.6.20.3 boot with ACPI enabled?

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] utilities: add helper functions for safe 64-bit integer operations as 32-bit halves

2007-04-20 Thread John Anthony Kazos Jr.

From: John Anthony Kazos Jr. <[EMAIL PROTECTED]>

Add helper functions "upper_32_bits" and "lower_32_bits" to 
 to allow 64-bit integers to be separated into 
their 32-bit upper and lower halves without promoting integers, without 
stretching sign bits, and without generating compiler warnings when used 
with any integer not greater than 64 bits wide. High-order bits are 
assumed to be zero for integers with fewer than 64 of them.

Signed-off-by: John Anthony Kazos Jr. <[EMAIL PROTECTED]>

---

Using these functions with signed quantities is an error, especially if 
you read a 32-bit quantity from disk that happens to have the high bit set 
into an int on a 32-bit machine, then use it with a function taking a u64 
which screws your data. When switching to using these functions, it's a 
good opportunity to check for these signedness errors. (Haven't we learned 
anything over the past decades of computing about assuming that one little 
bit doesn't matter?)

Not sure exactly whom the maintainer is for this, so I added 
[EMAIL PROTECTED] It's certainly not limited to one subsystem anymore, 
and converting the whole kernel to this could be a good step for 
readability and correctness across architectures of any word size.

--- linux-2.6.21-rc7-git4.orig/include/linux/kernel.h   2007-04-20 
20:22:13.0 -0400
+++ linux-2.6.21-rc7-git4.mod/include/linux/kernel.h2007-04-20 
20:37:41.0 -0400
@@ -40,6 +40,23 @@ extern const char linux_proc_banner[];
 #define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))
 #define roundup(x, y) x) + ((y) - 1)) / (y)) * (y))
 
+/**
+ * lower_32_bits, upper_32_bits - separate the halves of a 64-bit integer
+ * @n: the integer to separate
+ *
+ * Separate a 64-bit integer into its upper and lower 32-bit halves.
+ * Designed to avoid integer promotions and compiler warnings when used
+ * with smaller integers, in which case the missing bits are assumed to
+ * be zero. Designed to treat integers as unsigned whether or not they
+ * really are. (If you are using these with signed integers, your code
+ * is almost certainly wrong. The cast is good for people too lazy to
+ * type "unsigned" in their code, since breaking things is bad.)
+ *
+ * These assume the integer used is NOT greater than 64 bits wide.
+ */
+#define upper_32_bits(n) (sizeof(n) == 8 ? (u64)(n) >> 32 : 0)
+#define lower_32_bits(n) (sizeof(n) == 8 ? (u32)(n) : (n))
+
 #defineKERN_EMERG  "<0>"   /* system is unusable   
*/
 #defineKERN_ALERT  "<1>"   /* action must be taken immediately 
*/
 #defineKERN_CRIT   "<2>"   /* critical conditions  
*/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.20.7 locking up hard on boot

2007-04-20 Thread Marcos Pinto


I'm not subscribed, so please personally CC me any answers/comments.
Thank you.

While booting, (AMD64 Turion x2) 2.6.20.7 kernel locks up hard.  The
last kernel that I tried, 2.6.18.8, worked perfectly without any
trickery.  2.6.20.7 only boots up with "acpi=off" being added to the
kernel line.  Note that 2.6.18.8 works perfectly with acpi on, which
is really the
only way I can run this box because with "acpi=off" it overheats and freezes.
Please let me know if there's anything else that I could do to help with this.


Here's what's on the screen when it happens:

Brought up 2 CPUs
testing NMI watchdog ... OK.
Disabling vsyscall due to  use of PM timer
time.c: Using 3.579545 MHz WALL PM GTOD PM timer.
time.c: Detected 1808.264 MHz processor.
migration_cost=281
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using MMCONFIG at e000
PCI: No mmconfig possible on device 00:18
PCI: No mmconfig possible on device 07:05
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (:00)
ACPI: Assume root bridge [\_SB_.PCI0] bus is 0
:00:0d.0: cannot adjust BAR0 (not I/O)
:00:0d.0: cannot adjust BAR1 (not I/O)

lspci -nn
00:00.0 RAM memory [0500]: nVidia Corporation C51 Host Bridge
[10de:02f7] (rev a2)
00:00.1 RAM memory [0500]: nVidia Corporation C51 Memory Controller 0
[10de:02fa] (rev a2)
00:00.2 RAM memory [0500]: nVidia Corporation C51 Memory Controller 1
[10de:02fe] (rev a2)
00:00.3 RAM memory [0500]: nVidia Corporation C51 Memory Controller 5
[10de:02f8] (rev a2)
00:00.4 RAM memory [0500]: nVidia Corporation C51 Memory Controller 4
[10de:02f9] (rev a2)
00:00.5 RAM memory [0500]: nVidia Corporation C51 Host Bridge
[10de:02ff] (rev a2)
00:00.6 RAM memory [0500]: nVidia Corporation C51 Memory Controller 3
[10de:027f] (rev a2)
00:00.7 RAM memory [0500]: nVidia Corporation C51 Memory Controller 2
[10de:027e] (rev a2)
00:02.0 PCI bridge [0604]: nVidia Corporation C51 PCI Express Bridge
[10de:02fc] (rev a1)
00:03.0 PCI bridge [0604]: nVidia Corporation C51 PCI Express Bridge
[10de:02fd] (rev a1)
00:04.0 PCI bridge [0604]: nVidia Corporation C51 PCI Express Bridge
[10de:02fb] (rev a1)
00:09.0 RAM memory [0500]: nVidia Corporation MCP51 Host Bridge
[10de:0270] (rev a2)
00:0a.0 ISA bridge [0601]: nVidia Corporation MCP51 LPC Bridge
[10de:0260] (rev a3)
00:0a.1 SMBus [0c05]: nVidia Corporation MCP51 SMBus [10de:0264] (rev a3)
00:0a.3 Co-processor [0b40]: nVidia Corporation MCP51 PMU [10de:0271] (rev a3)
00:0b.0 USB Controller [0c03]: nVidia Corporation MCP51 USB Controller
[10de:026d] (rev a3)
00:0b.1 USB Controller [0c03]: nVidia Corporation MCP51 USB Controller
[10de:026e] (rev a3)
00:0d.0 IDE interface [0101]: nVidia Corporation MCP51 IDE [10de:0265] (rev f1)
00:0e.0 IDE interface [0101]: nVidia Corporation MCP51 Serial ATA
Controller [10de:0266] (rev f1)
00:10.0 PCI bridge [0604]: nVidia Corporation MCP51 PCI Bridge
[10de:026f] (rev a2)
00:10.1 Audio device [0403]: nVidia Corporation MCP51 High Definition
Audio [10de:026c] (rev a2)
00:14.0 Bridge [0680]: nVidia Corporation MCP51 Ethernet Controller
[10de:0269] (rev a3)
00:18.0 Host bridge [0600]: Advanced Micro Devices [AMD] K8
[Athlon64/Opteron] HyperTransport Technology Configuration [1022:1100]
00:18.1 Host bridge [0600]: Advanced Micro Devices [AMD] K8
[Athlon64/Opteron] Address Map [1022:1101]
00:18.2 Host bridge [0600]: Advanced Micro Devices [AMD] K8
[Athlon64/Opteron] DRAM Controller [1022:1102]
00:18.3 Host bridge [0600]: Advanced Micro Devices [AMD] K8
[Athlon64/Opteron] Miscellaneous Control [1022:1103]
03:00.0 Network controller [0280]: Broadcom Corporation Dell Wireless
1390 WLAN Mini-PCI Card [14e4:4311] (rev 01)
05:00.0 VGA compatible controller [0300]: nVidia Corporation GeForce
Go 7200 [10de:01d6] (rev a1)
07:05.0 FireWire (IEEE 1394) [0c00]: Ricoh Co Ltd R5C832 IEEE 1394
Controller [1180:0832]
07:05.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822
SD/SDIO/MMC/MS/MSPro Host Adapter [1180:0822] (rev 19)
07:05.2 System peripheral [0880]: Ricoh Co Ltd Unknown device
[1180:0843] (rev 01)
07:05.3 System peripheral [0880]: Ricoh Co Ltd R5C592 Memory Stick Bus
Host Adapter [1180:0592] (rev 0a)
07:05.4 System peripheral [0880]: Ricoh Co Ltd xD-Picture Card
Controller [1180:0852] (rev 05)

lspci -vnn
00:00.0 RAM memory [0500]: nVidia Corporation C51 Host Bridge
[10de:02f7] (rev a2)
   Subsystem: Hewlett-Packard Company Unknown device [103c:30b7]
   Flags: bus master, 66MHz, fast devsel, latency 0
   Capabilities: [44] HyperTransport: Slave or Primary Interface
   Capabilities: [e0] HyperTransport: MSI Mapping

00:00.1 RAM memory [0500]: nVidia Corporation C51 Memory Controller 0
[10de:02fa] (rev a2)
   Subsystem: Hewlett-Packard Company Unknown device [103c:30b7]
   Flags: 66MHz, fast devsel

00:00.2 RAM memory [0500]: nVidia Corporation C51 Memory Controller 1
[10de:02fe] (rev a2)
   Subsystem: Hewlett-Packard Company Unknown device [103c:30b7]

Re: [PATCH] lazy freeing of memory through MADV_FREE

2007-04-20 Thread Eric Dumazet


Rik van Riel a écrit :

Andrew Morton wrote:

On Fri, 20 Apr 2007 17:38:06 -0400
Rik van Riel <[EMAIL PROTECTED]> wrote:


Andrew Morton wrote:


I've also merged Nick's "mm: madvise avoid exclusive mmap_sem".

- Nick's patch also will help this problem.  It could be that your 
patch

  no longer offers a 2x speedup when combined with Nick's patch.

  It could well be that the combination of the two is even better, 
but it
  would be nice to firm that up a bit.  

I'll test that.


Thanks.


Well, good news.

It turns out that Nick's patch does not improve peak
performance much, but it does prevent the decline when
running with 16 threads on my quad core CPU!

We _definately_ want both patches, there's a huge benefit
in having them both.

Here are the transactions/seconds for each combination:

   vanilla   new glibc  madv_free kernel   madv_free + mmap_sem
threads

1 610 609 596545


545 tps versus 610 tps for one thread ? It seems quite bad, no ?

Could you please find an explanation for this ?


2103211361196   1200
4107011282014   2024
8100010881665   2087
1677910731310   1999




Thank you
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Linux 2.6.16.49-rc1

2007-04-20 Thread Adrian Bunk

Location:
ftp://ftp.kernel.org/pub/linux/kernel/people/bunk/linux-2.6.16.y/testing/

git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.16.y.git

RSS feed of the git tree:
http://www.kernel.org/git/?p=linux/kernel/git/stable/linux-2.6.16.y.git;a=rss


Changes since 2.6.16.48:

Adrian Bunk (1):
  Linux 2.6.16.49-rc1

Ard van Breemen (1):
  start_kernel: test if irq's got enabled early, barf, and disable them 
again

Aristeu Sergio Rozanski Filho (1):
  tty_io: fix race in master pty close/slave pty close path

Aubrey Li (1):
  [NET]: Fix UDP checksum issue in net poll mode.

David S. Miller (3):
  [SCSI] QLOGICPTI: Do not unmap DMA unless we actually mapped something.
  [SPARC64]: Fix SBUS IOMMU allocation code.
  [SPARC64]: Fix arg passing to compat_sys_ipc().

Jean Delvare (1):
  hwmon/w83627ehf: Fix the fan5 clock divider write

Linas Vepstas (1):
  elevator: move clearing of unplug flag earlier

Olaf Kirch (1):
  [IrDA]: Correctly handling socket error

Tom Callaway (1):
  [SPARC64]: Fix inline directive in pci_iommu.c


 Makefile|2 
 arch/sparc64/kernel/pci_iommu.c |2 
 arch/sparc64/kernel/sbus.c  |  560 +---
 arch/sparc64/kernel/sys32.S |1 
 arch/sparc64/kernel/systbls.S   |2 
 block/elevator.c|   11 
 drivers/char/tty_io.c   |   14 
 drivers/hwmon/w83627ehf.c   |6 
 drivers/scsi/qlogicpti.c|2 
 init/main.c |5 
 net/core/netpoll.c  |7 
 net/irda/af_irda.c  |3 
 12 files changed, 272 insertions(+), 343 deletions(-)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] fix ext2 allocator overflows above 31 bit blocks

2007-04-20 Thread Eric Sandeen


Mingming Cao wrote:

On Fri, 2007-04-20 at 18:14 -0500, Eric Sandeen wrote:
It's a bug, today.   


They are fixed in mm tree, as part of the patches which backports ext3
block reservation code to ext2. filesystem block numbers are all
ext2_fsblk_t type(i.e. unsigned long)(see ext2_new_blocks()). Maybe need
a round of thorough review to see if anything left, but I think what in
mm tree looks good.


Oh... oops.  I didn't think to check mm, didn't expect to find those 
changes on ext2.  Ok, I will double-check that against what I did.



And those patches in mm tree also backports the ext3 best-effort
allocates multiple blocks code (allocate multiple blocks within the
block reservation window as much as possible), FYI.


Ok, thanks Mingming!

-Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] fixed spinlock use in hysdn_log_close()

2007-04-20 Thread Andrew Morton

On Sat, 14 Apr 2007 07:09:07 +0200
Matthias Kaehlcke <[EMAIL PROTECTED]> wrote:

> fixed incorrect spinlock use in hysdn_log_close(). the function
> declared a spinlock on the stack and used it to 'protect' a shared
> driver structure. the patch removes the declaration of hysdn_lock and 
> uses card->hysdn_lock instead.
> 

Interesting.

> 
> ---
> diff --git a/drivers/isdn/hysdn/hysdn_proclog.c 
> b/drivers/isdn/hysdn/hysdn_proclog.c
> index f7e83a8..32f0b75 100644
> --- a/drivers/isdn/hysdn/hysdn_proclog.c
> +++ b/drivers/isdn/hysdn/hysdn_proclog.c
> @@ -299,7 +299,6 @@ hysdn_log_close(struct inode *ino, struct file *filep)
>   hysdn_card *card;
>   int retval = 0;
>   unsigned long flags;
> - spinlock_t hysdn_lock = SPIN_LOCK_UNLOCKED;
>  
>   lock_kernel();
>   if ((filep->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_WRITE) {
> @@ -309,7 +308,7 @@ hysdn_log_close(struct inode *ino, struct file *filep)
>   /* read access -> log/debug read, mark one further file as 
> closed */
>  
>   pd = NULL;
> - spin_lock_irqsave(_lock, flags);
> + spin_lock_irqsave(>hysdn_lock, flags);

I guess it won't hurt - are you actually able to test this code?

afaict most of the data in there is locked with lock_kernel(), if it's
locked at all.

If you had some runtime problem and this patch fixed it then fine.  If
however you're not able to test this code then perhaps the safest option is
to simply remove that locking altogether, which is pretty much a
runtime-equivalent change.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Acecad USB Tablet: usbmouse takeover and odd motion

2007-04-20 Thread Giuseppe Bilotta

On 4/21/07, Jiri Kosina <[EMAIL PROTECTED]> wrote:

On Fri, 20 Apr 2007, Giuseppe Bilotta wrote:

> Oh, I see. I'll blacklist those modules, maybe also issue a ticket on
> the Debian BTS.

If Debian enables usbmouse and usbkbd by default in their standard
kernels, would you be so kind and raise a proper ticket on them not to do
so? Thanks.

This also makes me to speed up with one of my items on TODO list - rename
usbmouse and usbkbd to something that wouldn't be so confusing and
wouldn't make people think that they should enable these drivers if they
want support for USB keyboards/mice. Will queue this for 2.6.22.

Actually, I just found out that usbmouse and usbkbd are in the
blacklist file (/etc/modprobe.d/blacklist), so the fact that they are
being called reveals some kind of fscked up setup on my side. I'll try
to fix that, sorry for the noise.

--
Giuseppe "Oblomov" Bilotta
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: AGPGart / AMD K7

2007-04-20 Thread Dave Jones

On Fri, Apr 20, 2007 at 04:26:51PM -0700, Greg Kroah-Hartman wrote:
 > > We should always have a bus in bus_add_driver()
 > > Instead of returning success when we don't, BUG().
 > 
 > Nah, I don't like adding BUG() calls to the kernel if it can be helped,
 > how about the version I copied you on a few hours ago, which is also
 > below?

Either works for me..

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] fix ext2 allocator overflows above 31 bit blocks

2007-04-20 Thread Mingming Cao

On Fri, 2007-04-20 at 18:14 -0500, Eric Sandeen wrote:
> Andreas Dilger wrote:
> > On Apr 20, 2007  12:10 -0500, Eric Sandeen wrote:
> >> If ext3 can do 16T, ext2 probably should be able to as well.
> >> There are still "int" block containers in the block allocation path
> >> that need to be fixed up.
> > 
> > Yeah, but who wants to do 16TB e2fsck on every boot?  I think there
> > needs to be some limits imposed for the sake of usability.
> 
> I figure this is in the fine tradition of "enough rope to hang oneself"
> 
> If you have 16T of filesystem you probably know enough to not hang 
> yourself this way.
> 
> *shrug*
> 
> It's a bug, today.   

They are fixed in mm tree, as part of the patches which backports ext3
block reservation code to ext2. filesystem block numbers are all
ext2_fsblk_t type(i.e. unsigned long)(see ext2_new_blocks()). Maybe need
a round of thorough review to see if anything left, but I think what in
mm tree looks good.

And those patches in mm tree also backports the ext3 best-effort
allocates multiple blocks code (allocate multiple blocks within the
block reservation window as much as possible), FYI.

Mingming

> If we need another change to limit ext2 to 500G or 
> something, fine by me.  :)
> 
> -Eric
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, v3

2007-04-20 Thread Peter Williams


William Lee Irwin III wrote:

On Fri, Apr 20, 2007 at 10:10:45AM +1000, Peter Williams wrote:
I have a suggestion I'd like to make that addresses both nice and 
fairness at the same time.  As I understand the basic principle behind 
this scheduler it to work out a time by which a task should make it onto 
the CPU and then place it into an ordered list (based on this value) of 
tasks waiting for the CPU.  I think that this is a great idea and my 
suggestion is with regard to a method for working out this time that 
takes into account both fairness and nice.


Hmm. Let me take a look...


On Fri, Apr 20, 2007 at 10:10:45AM +1000, Peter Williams wrote:
First suppose we have the following metrics available in addition to 
what's already provided.

rq->avg_weight_load /* a running average of the weighted load on the CPU */
p->avg_cpu_per_cycle /* the average time in nsecs that p spends on the 
CPU each scheduling cycle */


I'm suspicious of mean service times not paired with mean inter-arrival
times.


On Fri, Apr 20, 2007 at 10:10:45AM +1000, Peter Williams wrote:
where a scheduling cycle for a task starts when it is placed on the 
queue after waking or being preempted and ends when it is taken off the 
CPU either voluntarily or after being preempted.  So 
p->avg_cpu_per_cycle is just the average amount of time p spends on the 
CPU each time it gets on to the CPU.  Sorry for the long explanation 
here but I just wanted to make sure there was no chance that "scheduling 
cycle" would be construed as some mechanism being imposed on the scheduler.)


I went and looked up priority queueing queueing theory garbage and
re-derived various things I needed. The basics check out. Probably no
one cares that I checked.


On Fri, Apr 20, 2007 at 10:10:45AM +1000, Peter Williams wrote:

We can then define:
effective_weighted_load = max(rq->raw_weighted_load, rq->avg_weighted_load)
If p is just waking (i.e. it's not on the queue and its load_weight is 
not included in rq->raw_weighted_load) and we need to queue it, we say 
that the maximum time (in all fairness) that p should have to wait to 
get onto the CPU is:
expected_wait = p->avg_cpu_per_cycle * effective_weighted_load / 
p->load_weight


You're right.  The time that the task spent sleeping before being woken 
should be subtracted from this value.  If the answer is less than or 
equal to zero pre-emption should occur.




This doesn't look right, probably because the scaling factor of
p->avg_cpu_per_cycle is the reciprocal of its additive contribution to
the ->avg_weight_load as opposed to a direct estimate of its initial
delay or waiting time before completing its current requested service.

p->load_weight/effective_weighted_load more properly represents an
entitlement to CPU bandwidth.


Yes.  But expected_wait isn't entitlement it's its inverse.


p->avg_cpu_per_cycle/(p->load_weight/effective_weighted_load)
would be more like the expected time spent on the runqueue


When I went to school that would be just another way of expressing the 
equation that I expressed.



(whether
waiting to run or actually running) for a time interval spent in a
runnable state and the expected time runnable and waiting to run in such
an interval would be
p->avg_cpu_per_cycle*(1-effective_weighted_load/p->load_weight),

Neither represents the initial delay between entering the runqeueue and
first acquiring the CPU, but that's a bit hard to figure out without
deciding the scheduling policy up-front anyway.

This essentially doesn't look correct because while you want to enforce
the CPU bandwidth allocation, this doesn't have much to do with that
apart from the CPU bandwidth appearing as a term. It's more properly
a rate of service as opposed to a time at which anything should happen
or a number useful for predicting such. When service should begin more
properly depends on the other tasks in the system and a number of other
decisions that are part of the scheduling policy.


This model takes all of those into consideration.  The idea is not just 
to predict but to use the calculated time to decide when to boot the 
current process of the CPU (if it doesn't leave voluntarily) and put 
this one on.  This more or less removes the need to give each task a 
predetermined chunk of CPU when they go on to the CPU.  This should, in 
general, reduce the number context switches as tasks get to run until 
they've finished what they're doing or another task becomes higher 
priority rather than being booted off after an arbitrary time interval. 
 (If this ever gets tried it will be interesting to see if this 
prediction comes true.)


BTW Even if Ingo doesn't choose to try this model, I'll probably make a 
patch (way in the future after Ingo's changes are settled) to try it out 
myself.




If you want to choose a "quasi-inter-arrival time" to achieve the
specified CPU bandwidth allocation, this would be it, but to use that
to actually enforce the CPU bandwidth allocation, you would

[RFC PATCH - Try #2] Re: BUG in sysfs_remove_group

2007-04-20 Thread James Morris

Updated version of the patch, which splits __lookup_hash() into normal and 
kernel variants, to prevent a check of the type of lookup.  Also splits 
lookup_one_len().  Tests ok on my system.  Please review.


Subject: [PATCH] security: prevent permission checking of file removal via 
sysfs_remove_group()

Prevent permission checking from being peformed when the kernel wants to
unconditionally remove a sysfs group, by introducing an kernel-only
variant of lookup_one_len(), lookup_one_len_kern().

Additionally, as sysfs_remove_group() does not check the return value of
the lookup before using it, a BUG_ON has been added to pinpoint the cause
of any problems potentially caused by this (and as a form of annotation).

Signed-off-by: James Morris <[EMAIL PROTECTED]>
---
 fs/namei.c|   72 +++-
 fs/sysfs/group.c  |6 +++-
 include/linux/namei.h |1 +
 3 files changed, 57 insertions(+), 22 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index ee60cc4..cabe2b8 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1243,22 +1243,13 @@ int __user_path_lookup_open(const char __user *name, 
unsigned int lookup_flags,
return err;
 }
 
-/*
- * Restricted form of lookup. Doesn't follow links, single-component only,
- * needs parent already locked. Doesn't follow mounts.
- * SMP-safe.
- */
-static struct dentry * __lookup_hash(struct qstr *name, struct dentry * base, 
struct nameidata *nd)
+static inline struct dentry *__lookup_hash_kern(struct qstr *name, struct 
dentry *base, struct nameidata *nd)
 {
-   struct dentry * dentry;
+   struct dentry *dentry;
struct inode *inode;
int err;
 
inode = base->d_inode;
-   err = permission(inode, MAY_EXEC, nd);
-   dentry = ERR_PTR(err);
-   if (err)
-   goto out;
 
/*
 * See if the low-level filesystem might want
@@ -1287,35 +1278,76 @@ out:
return dentry;
 }
 
+/*
+ * Restricted form of lookup. Doesn't follow links, single-component only,
+ * needs parent already locked. Doesn't follow mounts.
+ * SMP-safe.
+ */
+static inline struct dentry * __lookup_hash(struct qstr *name, struct dentry 
*base, struct nameidata *nd)
+{
+   struct dentry *dentry;
+   struct inode *inode;
+   int err;
+
+   inode = base->d_inode;
+
+   err = permission(inode, MAY_EXEC, nd);
+   dentry = ERR_PTR(err);
+   if (err)
+   goto out;
+
+   dentry = __lookup_hash_kern(name, base, nd);
+out:
+   return dentry;
+}
+
 static struct dentry *lookup_hash(struct nameidata *nd)
 {
return __lookup_hash(>last, nd->dentry, nd);
 }
 
 /* SMP-safe */
-struct dentry * lookup_one_len(const char * name, struct dentry * base, int 
len)
+static inline int __lookup_one_len(const char *name, struct qstr *this, struct 
dentry *base, int len)
 {
unsigned long hash;
-   struct qstr this;
unsigned int c;
 
-   this.name = name;
-   this.len = len;
+   this->name = name;
+   this->len = len;
if (!len)
-   goto access;
+   return -EACCES;
 
hash = init_name_hash();
while (len--) {
c = *(const unsigned char *)name++;
if (c == '/' || c == '\0')
-   goto access;
+   return -EACCES;
hash = partial_name_hash(c, hash);
}
-   this.hash = end_name_hash(hash);
+   this->hash = end_name_hash(hash);
+   return 0;
+}
 
+struct dentry *lookup_one_len(const char *name, struct dentry *base, int len)
+{
+   int err;
+   struct qstr this;
+   
+   err = __lookup_one_len(name, , base, len);
+   if (err)
+   return ERR_PTR(err);
return __lookup_hash(, base, NULL);
-access:
-   return ERR_PTR(-EACCES);
+}
+
+struct dentry *lookup_one_len_kern(const char *name, struct dentry *base, int 
len)
+{
+   int err;
+   struct qstr this;
+   
+   err = __lookup_one_len(name, , base, len);
+   if (err)
+   return ERR_PTR(err);
+   return __lookup_hash_kern(, base, NULL);
 }
 
 /*
diff --git a/fs/sysfs/group.c b/fs/sysfs/group.c
index b20951c..52eed2a 100644
--- a/fs/sysfs/group.c
+++ b/fs/sysfs/group.c
@@ -70,9 +70,11 @@ void sysfs_remove_group(struct kobject * kobj,
 {
struct dentry * dir;
 
-   if (grp->name)
-   dir = lookup_one_len(grp->name, kobj->dentry,
+   if (grp->name) {
+   dir = lookup_one_len_kern(grp->name, kobj->dentry,
strlen(grp->name));
+   BUG_ON(IS_ERR(dir));
+   }
else
dir = dget(kobj->dentry);
 
diff --git a/include/linux/namei.h b/include/linux/namei.h
index d39a5a6..b7dd249 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -82,6 +82,7 @@ extern struct file *nameidata_to_filp(struct nameidata *nd, 
int flags);
 extern void

Re: AGPGart / AMD K7

2007-04-20 Thread Greg KH

On Fri, Apr 20, 2007 at 07:42:33PM -0400, Preston A. Elder wrote:
> Final followup,
> 
> If I compile EDAC out of the kernel completely, everything works now.
> 
> This should be resolved though.
> 1) dd.c should produce some kind of warning when it wants to assign a
> driver to a device, but it can't because a driver is already assigned to
> a device
> 
> ie. change:
>   if (!dev->driver)
>   driver_probe_device(drv, dev);
> to:
>   if (!dev->driver)
>   driver_probe_device(drv, dev);
>   else
>   printk(KERN_WARNING "__driver_attach (%s): alreay registered
> with driver %s\n",
>  dev->bus_id, dev->driver->name);
> 
> 2) Possibly a device should be able to have more than one driver
> associated with it - so the AGP driver and EDAC could both use the
> device in question here (though this would probably be a sizable change).

I'm working on this change for PCI devices right now, but it's slow
going due to some other external things (OLS paper that I am woefully
behind on, etc...)

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6 patch] the scheduled -EINVAL for invalid timevals in setitimer

2007-04-20 Thread Andrew Morton

On Sun, 15 Apr 2007 05:29:22 +0200
Thomas Gleixner <[EMAIL PROTECTED]> wrote:

> On Sat, 2007-04-14 at 17:03 +0200, Adrian Bunk wrote:
> > As scheduled, do_setitimer() now returns -EINVAL for invalid timeval.
> > 
> > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
> 
> Acked-by: Thomas Gleixner <[EMAIL PROTECTED]>

Worried-about-by: me

I guess if it starts biting people we can revert it from 2.6.22.x.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Fwd: Fw: [2.6.20.4] BUG: dentry xattrs still in use in shrink_dcache_for_umount() with reiserfs

2007-04-20 Thread Andrew Morton

On Wed, 18 Apr 2007 11:00:00 -0400
Jeff Mahoney <[EMAIL PROTECTED]> wrote:

> > Do you think that could be a reason of the extra reference count on   
> > xattr_root dentry?
> 
> No, I don't think it is. Looking at the code now, it seems obvious, but
> I didn't notice it before and nobody else has reported a problem.
> 
> getxattr() doesn't require any VFS locking. When we get down into the
> reiserfs code, it takes a read lock. If we get two concurrent threads
> looking up an xattr before the root has been saved, there's a window
> where REISERFS_SB(s)->xattr_root is NULL but we've already looked it up
> and taken a reference on it.
> 
> I have a patch set to clean up the extended attribute code that fixes
> this problem along the way by killing off the xattr locks and using the
> backing files/dirs i_mutex instead. I'll post them to the reiserfs
> mailing list.

Do we have anything suitable for 2.6.21 which will address this crash?

Also, it's not clear to me how many users we can expect to be impacted by it.
I assume that if the same bug is in 2.6.20 then the answer is "not many".
How come Andrea is able to keep hitting it?

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFD] alternative kobject release wait mechanism

2007-04-20 Thread Greg KH

On Fri, Apr 20, 2007 at 11:40:39AM -0400, Alan Stern wrote:
> Here's a patch to do what I mentioned earlier.  Not tested -- it may 
> expose some existing bugs.  It may even break something, but I'm not aware 
> of anything that depends on it explicitly.
> 
> Greg, do you know of anything in particular that depends on a kobjects not 
> being released before their children are released?

Yes, the whole driver model :)

When adding a new device, we always grab a reference to the parent
device so it can not go away before we do.

Look at the last kobject_put(parent); in kobject_cleanup() which ensures
this.

> Index: usb-2.6/lib/kobject.c
> ===
> --- usb-2.6.orig/lib/kobject.c
> +++ usb-2.6/lib/kobject.c
> @@ -192,12 +192,15 @@ void kobject_init(struct kobject * kobj)
>  
>  static void unlink(struct kobject * kobj)
>  {
> + struct kobject *parent = kobj->parent;
> +
>   if (kobj->kset) {
>   spin_lock(>kset->list_lock);
>   list_del_init(>entry);
>   spin_unlock(>kset->list_lock);
>   }
>   kobject_put(kobj);
> + kobject_put(parent);
>  }
>  
>  /**
> @@ -241,7 +244,6 @@ int kobject_shadow_add(struct kobject * 
>   if (error) {
>   /* unlink does the kobject_put() for us */
>   unlink(kobj);
> - kobject_put(parent);
>  
>   /* be noisy on error issues */
>   if (error == -EEXIST)
> @@ -489,7 +491,6 @@ void kobject_cleanup(struct kobject * ko
>  {
>   struct kobj_type * t = get_ktype(kobj);
>   struct kset * s = kobj->kset;
> - struct kobject * parent = kobj->parent;
>  
>   pr_debug("kobject %s: cleaning up\n",kobject_name(kobj));
>   if (kobj->k_name != kobj->name)
> @@ -505,7 +506,6 @@ void kobject_cleanup(struct kobject * ko
>  
>   if (s)
>   kset_put(s);
> - kobject_put(parent);
>  }

Ick, no, I think this used to be the way things worked, but bad things
would end up happening, so we fixed it up to be the way things are
today.  Read the comments for the changelog for this file for details.

Specifically, look at commit 10921a8f1305b8ec97794941db78b825db5839bc
in the history.git repo which is almost exactly what you are proposing
to be reverted...

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2/2] 2.6.21-rc7: known regressions

2007-04-20 Thread Jeremy Fitzhardinge

Dave Jones wrote:
> On Fri, Apr 20, 2007 at 10:16:54AM -0700, Jeremy Fitzhardinge wrote:
>  > Dave Jones wrote:
>  > >  > Andi, I think.  I've got his firstfloor.org patches applied to this 
> kernel.
>  > >
>  > > Ah, I saw you patched in CFS too, and thought it may be related.
>  > >   
>  > 
>  > Well, I have CONFIG_FB_BACKLIGHT enabled, and it still works.
>  > 
>  > Maybe there's something in Andi's queue which is making it work?
>
> Shrug, I'm out of ideas.  I'm hoping that it'll magically start working
> when people start flushing their git trees for .22
> Maybe that'll yield a clue for something that can be backported to .21.x
> because right now, I'm completely puzzled.

Well, it seemed reliable, but I just got a resume failure.  Different
from any previous symptom I've seen:

Intel machine check architecture supported
Intel machine check reporting enabled enabled on CPU#1
Back to C!


J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] lazy freeing of memory through MADV_FREE

2007-04-20 Thread Rik van Riel


Andrew Morton wrote:

On Fri, 20 Apr 2007 17:38:06 -0400
Rik van Riel <[EMAIL PROTECTED]> wrote:


Andrew Morton wrote:


I've also merged Nick's "mm: madvise avoid exclusive mmap_sem".

- Nick's patch also will help this problem.  It could be that your patch
  no longer offers a 2x speedup when combined with Nick's patch.

  It could well be that the combination of the two is even better, but it
  would be nice to firm that up a bit.  

I'll test that.


Thanks.


Well, good news.

It turns out that Nick's patch does not improve peak
performance much, but it does prevent the decline when
running with 16 threads on my quad core CPU!

We _definately_ want both patches, there's a huge benefit
in having them both.

Here are the transactions/seconds for each combination:

   vanilla   new glibc  madv_free kernel   madv_free + mmap_sem
threads

1 610 609 596545
2103211361196   1200
4107011282014   2024
8100010881665   2087
1677910731310   1999


--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] cxacru: Add Documentation file

2007-04-20 Thread Simon Arlott

The sysfs attributes for exposing cxacru statistics/status information 
with possible values is now explained in Documentation/networking/cxacru.txt 
including information on the writable adsl_state attribute's commands and 
a sample of the kernel log format.


---
Documentation/networking/00-INDEX   |2 +
Documentation/networking/cxacru.txt |   84 +++
2 files changed, 86 insertions(+), 0 deletions(-)
create mode 100644 Documentation/networking/cxacru.txt

diff --git a/Documentation/networking/00-INDEX 
b/Documentation/networking/00-INDEX
index e06b6e3..153d84d 100644
--- a/Documentation/networking/00-INDEX
+++ b/Documentation/networking/00-INDEX
@@ -32,6 +32,8 @@ cops.txt
- info on the COPS LocalTalk Linux driver
cs89x0.txt
- the Crystal LAN (CS8900/20-based) Ethernet ISA adapter driver
+cxacru.txt
+   - Conexant AccessRunner USB ADSL Modem
de4x5.txt
- the Digital EtherWORKS DE4?? and DE5?? PCI Ethernet driver
decnet.txt
diff --git a/Documentation/networking/cxacru.txt 
b/Documentation/networking/cxacru.txt
new file mode 100644
index 000..2623eaa
--- /dev/null
+++ b/Documentation/networking/cxacru.txt
@@ -0,0 +1,84 @@
+Firmware is required for this device: http://accessrunner.sourceforge.net/
+
+While it is capable of managing/maintaining the ADSL connection without the
+module loaded, the device will sometimes stop responding after unloading the
+driver and it is necessary to unplug/remove power to the device to fix this.
+
+Detected devices will appear as ATM devices named "cxacru". In /sys/class/atm/
+these are directories named cxacruN where N is the device number. A symlink
+named device points to the USB interface device's directory which contains
+several sysfs attribute files for retriving device statistics:
+
+* adsl_controller_version
+
+* adsl_headend
+* adsl_headend_environment
+   Information about the remote headend.
+
+* downstream_attenuation (dB)
+* downstream_bits_per_frame
+* downstream_rate (kbps)
+* downstream_snr_margin (dB)
+   Downstream stats.
+
+* upstream_attenuation (dB)
+* upstream_bits_per_frame
+* upstream_rate (kbps)
+* upstream_snr_margin (dB)
+* transmitter_power (dBm/Hz)
+   Upstream stats.
+
+* downstream_crc_errors
+* downstream_fec_errors
+* downstream_hec_errors
+* upstream_crc_errors
+* upstream_fec_errors
+* upstream_hec_errors
+   Error counts.
+
+* line_startable
+   Indicates that ADSL support on the device
+   is/can be enabled, see adsl_start.
+
+* line_status
+   "initialising"
+   "down"
+   "attempting to activate"
+   "training"
+   "channel analysis"
+   "exchange"
+   "waiting"
+   "up"
+
+   Changes between "down" and "attempting to activate"
+   if there is no signal.
+
+* link_status
+   "not connected"
+   "connected"
+   "lost"
+
+* mac_address
+
+* modulation
+   "ANSI T1.413"
+   "ITU-T G.992.1 (G.DMT)"
+   "ITU-T G.992.2 (G.LITE)"
+
+* startup_attempts
+   Count of total attempts to initialise ADSL.
+
+To enable/disable ADSL, the following can be written to the adsl_state file:
+   "start"
+   "stop
+   "restart" (stops, waits 1.5s, then starts)
+   "poll" (used to resume status polling if it was disabled due to failure)
+
+Changes in adsl/line state are reported via kernel log messages:
+   [4942145.150704] ATM dev 0: ADSL state: running
+   [4942243.663766] ATM dev 0: ADSL line: down
+   [4942249.665075] ATM dev 0: ADSL line: attempting to activate
+   [4942253.654954] ATM dev 0: ADSL line: training
+   [4942255.666387] ATM dev 0: ADSL line: channel analysis
+   [4942259.656262] ATM dev 0: ADSL line: exchange
+   [2635357.696901] ATM dev 0: ADSL line: up (8128 kb/s down | 832 kb/s up)
--
1.5.0.1

--
Simon Arlott
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Acecad USB Tablet: usbmouse takeover and odd motion

2007-04-20 Thread Jiri Kosina

On Fri, 20 Apr 2007, Giuseppe Bilotta wrote:

> Oh, I see. I'll blacklist those modules, maybe also issue a ticket on 
> the Debian BTS.

If Debian enables usbmouse and usbkbd by default in their standard 
kernels, would you be so kind and raise a proper ticket on them not to do 
so? Thanks.

This also makes me to speed up with one of my items on TODO list - rename 
usbmouse and usbkbd to something that wouldn't be so confusing and 
wouldn't make people think that they should enable these drivers if they 
want support for USB keyboards/mice. Will queue this for 2.6.22.

-- 
Jiri Kosina
SUSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: AGPGart / AMD K7

2007-04-20 Thread Preston A. Elder

Final followup,

If I compile EDAC out of the kernel completely, everything works now.

This should be resolved though.
1) dd.c should produce some kind of warning when it wants to assign a
driver to a device, but it can't because a driver is already assigned to
a device

ie. change:
  if (!dev->driver)
  driver_probe_device(drv, dev);
to:
  if (!dev->driver)
  driver_probe_device(drv, dev);
  else
  printk(KERN_WARNING "__driver_attach (%s): alreay registered
with driver %s\n",
 dev->bus_id, dev->driver->name);

2) Possibly a device should be able to have more than one driver
associated with it - so the AGP driver and EDAC could both use the
device in question here (though this would probably be a sizable change).

Either way, at least I found the culprit :)

PreZ :)

Preston A. Elder wrote:
> Dave Jones wrote:
>   
>> On Fri, Apr 20, 2007 at 04:22:06PM -0400, Preston A. Elder wrote:
>>  > Dave, Greg,
>>  > 
>>  > Here is the trace with 2.6.20.6
>>  > 
>>  > I added back in my trace code, as you see.  As you can also see,
>>  > agp_amdk7_probe is still not called.
>>
>> Try looking down in __driver_attach()
>> The fact that we're not calling the ->probe function is quite bizarre.
>>
>> It could be this in __driver_attach
>>
>> if (!dev->driver)
>> driver_probe_device(drv, dev);
>>
>> Though that'd be odd.
>>
>> Putting a #define DEBUG 1 in drivers/base/dd.c may also yield some clues.
>>
>>  Dave
>>
>>   
>> 
> OK, I found it!
>
> Here is more trace:
> Linux agpgart interface v0.101 (c) Dave Jones
> agp_amdk7_init: In function
> agp_amdk7_init: Before pci_register_driver
> __pci_register_driver: In Function (driver = agpgart-amdk7, multithread = 0)
> __pci_register_driver: Before Spinlock
> __pci_register_driver: Before Init List Head
> __pci_register_driver: Before driver_register
> bus_add_driver: In Function (c0492920)
> bus_add_driver: Before kobject_set_name
> bus_add_driver: error = 0
> bus_add_driver: Before kobject_register
> bus_add_driver: error = 0
> bus_add_driver: Before driver_attach
> __driver_attach (:00:00.0,1): Before Down (parent) (c21c8600)
> __driver_attach (:00:00.0, 1): Before Down
> __driver_attach (:00:00.0, 1): Before Probe Device (c049fe54)
> __driver_attach (:00:00.0, 1): alreay registered with driver amd76x_edac
> __driver_attach (:00:00.0, 1): Before Up
> __driver_attach (:00:00.0, 1): Before Up (parent) (c21c8600)
> __driver_attach (:00:00.0, 1): Returning 0
> bus_add_driver: error = 0
> bus_add_driver: Before klist_add_tail
> bus_add_driver: Before module_add_driver
> bus_add_driver: Before driver_add_attrs
> bus_add_driver: error = 0
> bus_add_driver: Before add_bind_files
> bus_add_driver: error = 0
> bus_add_driver: Returning 0
> __pci_register_driver: error = 0
> __pci_register_driver: Before pci_create_newid_file
> __pci_register_driver: error = 0
> __pci_register_driver: Returning 0
>
> I snipped some since __driver_attach is called many times.
>
> But the long and short is that 00:00:00 is already associated with the
> 'amd76x_edac' driver, and as such will not call the agp probe call. 
> What is this edac, btw?
>
> PreZ :)
>
>
>
>   

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Permanent Kgdb integration into the kernel - lets get with it.

2007-04-20 Thread Andrew Morton

On Fri, 20 Apr 2007 15:51:35 -0700
Piet Delaney <[EMAIL PROTECTED]> wrote:

> > Is there any movement on this?
> 
> Hi Randy:
> 
> Jason Wessel <[EMAIL PROTECTED]> is currently leading yet
> another attempt at getting kgdb permanently into the kernel. Jason
> has a linux2_6_21 patch on SourceForge:
>   
> http://kgdb.cvs.sourceforge.net/kgdb/kgdb-2/8250.patch?view=log=linux2_6_21_uprev
> 
> and has been working with Sergei Shtylyov <[EMAIL PROTECTED]>
> recently on getting KGDBOE Netpoll patches that got lost around the 
> time of Tom's attempt. Just on Monday there were a dozen posting to 
> the source forge mailing list:
> 
>   
> 
> [EMAIL PROTECTED]
> 
> on this effort.

Unfortunately there doesn't seem to be anything there which I can autopull
into the -mm tree.  I'm presently set up for quilt trees in open
directories and for git trees.  I could add plain-old-gzipped-diffs or
whatever.

> I'd like to see a git repository on kernel.org that is used to update
> the mainstream kernel. Unfortunately getting accounts on kernel.org is
> next to impossible. If Jason doesn't have one yet it would be nice to
> offer him one for the kgdb developers to use. 

This seems to be to do with some silly spamfilter issue or something.  I
sent an email off-list.

> I agree with Andi that the kgdb code seems to be getting more
> complicated that needed thought I don't find the hooks offensive.
> Here I keep my kgdb hooks completely under #ifdef KGDB, so there
> is absolutely no difference to the kernel when KGDB isn't configured.

We can address that sort of thing via review: you send send the diffs out,
we read them and comment on them.  We do this 100 times a day - it is
simply a non-issue, as long as there's actually someone who has the
time/effort/inclination to push this feature to completion, which is what
kgdb has sadly lacked for the past decade or so.

> I also like having debug printks, similar to the SOCK_DEBUG() macros,
> to make it easy to watch kgdb internals in action. Ya can't run kgdb
> on itself. 
> 
> I find these blemish's a minor concern compared to the damage/lost
> of not integrating kgdb into the kernel permanently. When developers
> can't rely on using kgdb for easy development they tend to write code
> without consideration for what it's like using their code with the
> debugger. Linux is making a major headway into $100 embedded systems;
> the recent use in the Linksys WRT54GL (DD-WRT) and Engenius EOC-3220
> for example. Making kgdb easily accessible will make the viability of
> using Linux for embedded system greatly increased, IMHO. 

Lots of people want kgdb.  One person is famously less keen on it, but
we'll be able to talk him around, as long as the patches aren't daft.

> Perhaps with a bit of support from the kernel.org folks we could get
> the kgdb patch, with all of it's blemishes, into Andrews 2.6.21-rc7-mm1
> patch. Accounts on kernel.org for kgdb developers would be a modest
> effort. I find the CVS patch framework rather clumsy and would rather
> follow the KISS principle and just use git repositories like the rest
> of the kernel developers appear to be using.

Yes, a git tree on k.org is appropriate - let's make that happen.

But beware that it will need to be updated pretty reguarly, and it'll need
to be against Linux-tree-of-the-day, please.  The x86 code continues to
change at a tremendous rate (I count 204 x86 patches queued for 2.6.22),
and kgdb supports lots of other architectures too.

So whoever is signed up to maintain this will have quite a bit of messy
maintaneance to do during the getting-it-ready phase.  This will of course
be minimised by being VERY careful to mimimise the impact of the patchset
on the existing code.

And if/when it is merged, there will be quite a bit of tricky maintenance
work to do, if my experience of the kgdb stub is anything to go by.

There inevitably will be long-term low-level impact upon the arch
maintainers too.  But that's OK, because the way to merge this feature is
to put the arch-neutral core into the tree first, and to then send each
per-arch patch to the relevant arch maintainer for merging.  That way, they
get to decide whether they wish to take on the burden of participating in
its long-term maintenance.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: AGPGart / AMD K7

2007-04-20 Thread Greg KH

On Fri, Apr 20, 2007 at 04:33:42PM -0400, Dave Jones wrote:
> On Fri, Apr 20, 2007 at 04:22:06PM -0400, Preston A. Elder wrote:
>  > Dave, Greg,
>  > 
>  > Here is the trace with 2.6.20.6
>  > 
>  > I added back in my trace code, as you see.  As you can also see,
>  > agp_amdk7_probe is still not called.
> 
> Try looking down in __driver_attach()
> The fact that we're not calling the ->probe function is quite bizarre.
> 
> It could be this in __driver_attach
> 
> if (!dev->driver)
> driver_probe_device(drv, dev);
> 
> Though that'd be odd.
> 
> Putting a #define DEBUG 1 in drivers/base/dd.c may also yield some clues.

Setting CONFIG_DEBUG_DRIVER automatically enables this also and might
provide some more hints.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: AGPGart / AMD K7

2007-04-20 Thread Greg KH

On Fri, Apr 20, 2007 at 03:00:28PM -0400, Dave Jones wrote:
> On Fri, Apr 20, 2007 at 11:29:52AM -0700, Greg Kroah-Hartman wrote:
>  > On Fri, Apr 20, 2007 at 02:20:29PM -0400, Dave Jones wrote:
>  > > 
>  > > btw Greg, wtf does driver_register return a 0 as 'success' if it
>  > > completes the function, and 0 as 'failure' if !bus ?
>  > > That seems doomed to failure.
>  > 
>  > I don't know why the code does that, we should always have a bus
>  > assigned to a driver.  I'll change that and watch to see what breaks :)
> 
> Maybe this?
> 
> We should always have a bus in bus_add_driver()
> Instead of returning success when we don't, BUG().

Nah, I don't like adding BUG() calls to the kernel if it can be helped,
how about the version I copied you on a few hours ago, which is also
below?

thanks,

greg k-h

--
From: Greg Kroah-Hartman <[EMAIL PROTECTED]>
Subject: driver core: bus_add_driver should return an error if no bus

As pointed out by Dave Jones.

Cc: Dave Jones <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>

---
 drivers/base/bus.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -601,7 +601,7 @@ int bus_add_driver(struct device_driver 
int error = 0;
 
if (!bus)
-   return 0;
+   return -EINVAL;
 
pr_debug("bus %s: add driver %s\n", bus->name, drv->name);
error = kobject_set_name(>kobj, "%s", drv->name);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

other potentially deletable, dead stuff

2007-04-20 Thread Robert P. J. Day


  and while len is working on detaching APM and ACPI from legacy power
management, here's the short list of other stuff that is listed as on
its way to being dead, based on the contents of Kconfig files.  any of
this stuff candidates for removal, if not scheduling for removal?

  (note:  i made no effort to cull this list of entries that i know
folks are already aware of or might be working on, or stuff that we've
already established is *not* really obsolete.  it's just a list.)


config NET_CLS_POLICE
bool "Traffic Policing (obsolete)"

config IP_NF_CONNTRACK_SUPPORT
bool "Layer 3 Dependent Connection tracking (OBSOLETE)"

config IP6_NF_QUEUE
tristate "IP6 Userspace queueing via NETLINK (OBSOLETE)"

config IP_NF_QUEUE
tristate "IP Userspace queueing via NETLINK (OBSOLETE)"

config ARPD
bool "IP: ARP daemon support (EXPERIMENTAL)"
...
This code is experimental and also obsolete...

config BRIDGE_EBT_ULOG
tristate "ebt: ulog support (OBSOLETE)"

config PCMCIA_IOCTL
bool "PCMCIA control ioctl (obsolete)"

config SHAPER
tristate "Traffic Shaper (OBSOLETE)"

config SUN_BPP
tristate "Bidirectional parallel port support (OBSOLETE)"

config I2O_CONFIG_OLD_IOCTL
bool "Enable ioctls (OBSOLETE)"

config MOXA_SMARTIO
tristate "Moxa SmartIO support (OBSOLETE)"

config RAW_DRIVER
tristate "RAW driver (/dev/raw/rawN) (OBSOLETE)"

config ISDN_I4L
tristate "Old ISDN4Linux (obsolete)"

config MODE_TT
bool "Tracing thread support (DEPRECATED)"
...
This option controls whether tracing thread support is compiled
into UML. This option is largely obsolete ...

rday
-- 

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://fsdev.net/wiki/index.php?title=Main_Page

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] fix ext2 allocator overflows above 31 bit blocks

2007-04-20 Thread Eric Sandeen


Andreas Dilger wrote:

On Apr 20, 2007  12:10 -0500, Eric Sandeen wrote:

If ext3 can do 16T, ext2 probably should be able to as well.
There are still "int" block containers in the block allocation path
that need to be fixed up.


Yeah, but who wants to do 16TB e2fsck on every boot?  I think there
needs to be some limits imposed for the sake of usability.


I figure this is in the fine tradition of "enough rope to hang oneself"

If you have 16T of filesystem you probably know enough to not hang 
yourself this way.


*shrug*

It's a bug, today.   If we need another change to limit ext2 to 500G or 
something, fine by me.  :)


-Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] NET: Add packet sock option to return orig_dev to userspace when bonded

2007-04-20 Thread David Miller

From: "Waskiewicz Jr, Peter P" <[EMAIL PROTECTED]>
Date: Thu, 8 Mar 2007 18:17:39 -0800

> Peter P. Waskiewicz Jr. <[EMAIL PROTECTED]>
>   NET: Add packet sock option to return orig_dev to userspace when
> bonded

I'm going to apply this patch (by hand, your email client corrupted
the patch massively, adding newlines and whatnot all over the patch).

But I'm going to rename the option to be just PACKET_ORIGDEV
because although bonding is the only user of this orig_dev
decapsulation method, that might not always be true and it'd
be a shame to give a special cased name when it is not deserved
here.

But please do fix your email client for future patch submissions.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Permanent Kgdb integration into the kernel - lets get with it.

2007-04-20 Thread Piet Delaney

On Tue, 2007-04-17 at 11:30 -0700, Randy Dunlap wrote:
> On Thu, 08 Mar 2007 14:24:10 -0800 Piet Delaney wrote:
> 
> > On Thu, 2007-03-08 at 11:49 -0700, Tom Rini wrote:
> > > On Thu, Mar 08, 2007 at 07:37:56PM +0100, Andi Kleen wrote:
> > > > On Thursday 08 March 2007 18:44, Dave Jiang wrote:
> > > > 
> > > > > In spite of kgdb, shouldn't it have that \n anyways in case some 
> > > > > other code
> > > > > gets added in the future after the macro? Or are you saying that 
> > > > > there should
> > > > > never be any code ever after that macro?
> > > > 
> > > > Sure if there is mainline code added after that macro we add the \n.
> > > > But only if it makes sense to add code there, which it didn't in kgdb.
> > > 
> > > Was that because with recent enough tools and config options there was
> > > enough annotations so GDB could finally figure out where things had
> > > stopped?  Thanks.
> > 
> > The reason Linus said he didn't allow George's kgdb mm patch to 
> > be integrating into the kernel a year or two ago was that Amit and
> > George had significantly different implementations. So Amit, Tom, 
> > George, and the rest of the kgdb development gang worked together 
> > and came up with a unified version that we now support on SourceForge. 
> > 
> > Tom rolled up a mm patch back in December for Andrew and then the
> > integration process stopped. I suggest we work together on getting
> > the kgdb patch back into the mm series and permanently into the kernel
> > like the kexec code and then we can avoid this kernel development
> > obfuscation.
> 
> Hi,
> Is there any movement on this?

Hi Randy:

Jason Wessel <[EMAIL PROTECTED]> is currently leading yet
another attempt at getting kgdb permanently into the kernel. Jason
has a linux2_6_21 patch on SourceForge:

http://kgdb.cvs.sourceforge.net/kgdb/kgdb-2/8250.patch?view=log=linux2_6_21_uprev

and has been working with Sergei Shtylyov <[EMAIL PROTECTED]>
recently on getting KGDBOE Netpoll patches that got lost around the 
time of Tom's attempt. Just on Monday there were a dozen posting to 
the source forge mailing list:

[EMAIL PROTECTED]

on this effort.

I'd like to see a git repository on kernel.org that is used to update
the mainstream kernel. Unfortunately getting accounts on kernel.org is
next to impossible. If Jason doesn't have one yet it would be nice to
offer him one for the kgdb developers to use. 

I agree with Andi that the kgdb code seems to be getting more
complicated that needed thought I don't find the hooks offensive.
Here I keep my kgdb hooks completely under #ifdef KGDB, so there
is absolutely no difference to the kernel when KGDB isn't configured.
I also like having debug printks, similar to the SOCK_DEBUG() macros,
to make it easy to watch kgdb internals in action. Ya can't run kgdb
on itself. 

I find these blemish's a minor concern compared to the damage/lost
of not integrating kgdb into the kernel permanently. When developers
can't rely on using kgdb for easy development they tend to write code
without consideration for what it's like using their code with the
debugger. Linux is making a major headway into $100 embedded systems;
the recent use in the Linksys WRT54GL (DD-WRT) and Engenius EOC-3220
for example. Making kgdb easily accessible will make the viability of
using Linux for embedded system greatly increased, IMHO. 

Perhaps with a bit of support from the kernel.org folks we could get
the kgdb patch, with all of it's blemishes, into Andrews 2.6.21-rc7-mm1
patch. Accounts on kernel.org for kgdb developers would be a modest
effort. I find the CVS patch framework rather clumsy and would rather
follow the KISS principle and just use git repositories like the rest
of the kernel developers appear to be using.

-piet

> 
> Thanks,
> ---
> ~Randy
> *** Remember to use Documentation/SubmitChecklist when testing your code ***
-- 
Piet DelaneyPhone: (408) 200-5256
Blue Lane Technologies  Fax:   (408) 200-5299
10450 Bubb Rd.
Cupertino, Ca. 95014Email: [EMAIL PROTECTED]

signature.asc
Description: This is a digitally signed message part

Re: [PATCH] fix ext2 allocator overflows above 31 bit blocks

2007-04-20 Thread Andreas Dilger

On Apr 20, 2007  12:10 -0500, Eric Sandeen wrote:
> If ext3 can do 16T, ext2 probably should be able to as well.
> There are still "int" block containers in the block allocation path
> that need to be fixed up.

Yeah, but who wants to do 16TB e2fsck on every boot?  I think there
needs to be some limits imposed for the sake of usability.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 2/3] x86: new API for modifying CPU feature flags

2007-04-20 Thread Chuck Ebbert

x86: new API for modifying CPU feature flags

Use an API for setting/clearing CPU features, so the
process can be debugged.

Adds:
set_cpu_feature()
clear_cpu_feature()
clear_all_cpu_features()

Todo:
mask_boot_cpu_features()
set_cpu_feature_word()
more?

(Hardcoded printk for now, should be dprintk.)

Signed-off-by: Chuck Ebbert <[EMAIL PROTECTED]>
---
 include/asm-i386/cpufeature.h |   17 +
 1 file changed, 17 insertions(+)

--- 2.6.21-rc7-d390.orig/include/asm-i386/cpufeature.h
+++ 2.6.21-rc7-d390/include/asm-i386/cpufeature.h
@@ -108,6 +108,23 @@
 #define cpu_has(c, bit)test_bit(bit, (c)->x86_capability)
 #define boot_cpu_has(bit)  test_bit(bit, boot_cpu_data.x86_capability)
 
+#define alter_cpu_feature(fn, feat, c) \
+   do {typeof(c) __c = (c); \
+   printk("CPU: %s: %s feature %s for CPU %p", \
+   __func__, #fn, #feat, __c); \
+   fn ## _bit(X86_FEATURE_ ## feat, __c->x86_capability); \
+   } while (0)
+
+#define set_cpu_feature(f, c)  alter_cpu_feature(set, f, c)
+#define clear_cpu_feature(f, c)alter_cpu_feature(clear, f, c)
+
+#define clear_all_cpu_features(c) \
+   do {typeof(c) __c = (c); \
+   printk("CPU: %s: clearing all capabilities for CPU %p", 
\
+   __func__, __c); \
+   memset(&__c->x86_capability, 0, sizeof 
__c->x86_capability); \
+   } while (0)
+
 #define cpu_has_fpuboot_cpu_has(X86_FEATURE_FPU)
 #define cpu_has_vmeboot_cpu_has(X86_FEATURE_VME)
 #define cpu_has_de boot_cpu_has(X86_FEATURE_DE)

[RFC PATCH 3/3] x86: use the x86 CPU feature API

2007-04-20 Thread Chuck Ebbert

x86: use the x86 CPU feature API

Just a small demo for now.

Signed-off-by: Chuck Ebbert <[EMAIL PROTECTED]>
---
 arch/i386/kernel/cpu/amd.c|4 ++--
 arch/i386/kernel/cpu/common.c |2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

--- 2.6.21-rc7-d390.orig/arch/i386/kernel/cpu/amd.c
+++ 2.6.21-rc7-d390/arch/i386/kernel/cpu/amd.c
@@ -109,8 +109,8 @@ static void __cpuinit init_amd(struct cp
{
/* Based on AMD doc 20734R - June 2000 */
if ( c->x86_model == 0 ) {
-   clear_bit(X86_FEATURE_APIC, 
c->x86_capability);
-   set_bit(X86_FEATURE_PGE, 
c->x86_capability);
+   clear_cpu_feature(APIC, c);
+   set_cpu_feature(PGE, c);
}
break;
}
--- 2.6.21-rc7-d390.orig/arch/i386/kernel/cpu/common.c
+++ 2.6.21-rc7-d390/arch/i386/kernel/cpu/common.c
@@ -381,7 +381,7 @@ void __cpuinit identify_cpu(struct cpuin
c->x86_model_id[0] = '\0';  /* Unset */
c->x86_max_cores = 1;
c->x86_clflush_size = 32;
-   memset(>x86_capability, 0, sizeof c->x86_capability);
+   clear_all_cpu_features(c);
 
if (!have_cpuid_p()) {
/* First of all, decide if this is a 486 or higher */

[RFC PATCH 1/3] x86: use defined names for all CPU feature flags

2007-04-20 Thread Chuck Ebbert


x86: use defined names for all CPU feature flags

Don't use hard coded values for CPU flags.

Signed-off-by: Chuck Ebbert <[EMAIL PROTECTED]>
---
 arch/i386/kernel/cpu/amd.c  |2 +-
 arch/i386/kernel/cpu/centaur.c  |2 +-
 arch/i386/kernel/cpu/cyrix.c|6 +++---
 arch/x86_64/kernel/setup.c  |2 +-
 include/asm-i386/cpufeature.h   |4 +++-
 include/asm-x86_64/cpufeature.h |1 +
 6 files changed, 10 insertions(+), 7 deletions(-)

--- 2.6.21-rc7-d390.orig/arch/x86_64/kernel/setup.c
+++ 2.6.21-rc7-d390/arch/x86_64/kernel/setup.c
@@ -576,7 +576,7 @@ static void __cpuinit init_amd(struct cp
 
/* Bit 31 in normal CPUID used for nonstandard 3DNow ID;
   3DNow is IDd by bit 31 in extended CPUID (1*32+31) anyway */
-   clear_bit(0*32+31, >x86_capability);
+   clear_bit(X86_FEATURE_PBE, >x86_capability);

/* On C+ stepping K8 rep microcode works well for copy/memset */
level = cpuid_eax(1);
--- 2.6.21-rc7-d390.orig/include/asm-i386/cpufeature.h
+++ 2.6.21-rc7-d390/include/asm-i386/cpufeature.h
@@ -42,6 +42,7 @@
 #define X86_FEATURE_HT (0*32+28) /* Hyper-Threading */
 #define X86_FEATURE_ACC(0*32+29) /* Automatic clock control */
 #define X86_FEATURE_IA64   (0*32+30) /* IA-64 processor */
+#define X86_FEATURE_PBE(0*32+31) /* PBE */
 
 /* AMD-defined CPU features, CPUID level 0x8001, word 1 */
 /* Don't duplicate feature flags which are redundant with Intel! */
@@ -49,6 +50,7 @@
 #define X86_FEATURE_MP (1*32+19) /* MP Capable. */
 #define X86_FEATURE_NX (1*32+20) /* Execute Disable */
 #define X86_FEATURE_MMXEXT (1*32+22) /* AMD MMX extensions */
+#define X86_FEATURE_CXMMXORIG  (1*32+24) /* Cyrix MMX, initial location */
 #define X86_FEATURE_LM (1*32+29) /* Long Mode (x86-64) */
 #define X86_FEATURE_3DNOWEXT   (1*32+30) /* AMD 3DNow! extensions */
 #define X86_FEATURE_3DNOW  (1*32+31) /* 3DNow! */
@@ -60,7 +62,7 @@
 
 /* Other features, Linux-defined mapping, word 3 */
 /* This range is used for feature bits which conflict or are synthesized */
-#define X86_FEATURE_CXMMX  (3*32+ 0) /* Cyrix MMX extensions */
+#define X86_FEATURE_CXMMX  (3*32+ 0) /* Cyrix MMX extensions, final 
location */
 #define X86_FEATURE_K6_MTRR(3*32+ 1) /* AMD K6 nonstandard MTRRs */
 #define X86_FEATURE_CYRIX_ARR  (3*32+ 2) /* Cyrix ARRs (= MTRRs) */
 #define X86_FEATURE_CENTAUR_MCR(3*32+ 3) /* Centaur MCRs (= MTRRs) */
--- 2.6.21-rc7-d390.orig/arch/i386/kernel/cpu/cyrix.c
+++ 2.6.21-rc7-d390/arch/i386/kernel/cpu/cyrix.c
@@ -192,11 +192,11 @@ static void __cpuinit init_cyrix(struct 
 
/* Bit 31 in normal CPUID used for nonstandard 3DNow ID;
   3DNow is IDd by bit 31 in extended CPUID (1*32+31) anyway */
-   clear_bit(0*32+31, c->x86_capability);
+   clear_bit(X86_FEATURE_PBE, c->x86_capability);
 
/* Cyrix used bit 24 in extended (AMD) CPUID for Cyrix MMX extensions */
-   if ( test_bit(1*32+24, c->x86_capability) ) {
-   clear_bit(1*32+24, c->x86_capability);
+   if ( test_bit(X86_FEATURE_CXMMXORIG, c->x86_capability) ) {
+   clear_bit(X86_FEATURE_CXMMXORIG, c->x86_capability);
set_bit(X86_FEATURE_CXMMX, c->x86_capability);
}
 
--- 2.6.21-rc7-d390.orig/include/asm-x86_64/cpufeature.h
+++ 2.6.21-rc7-d390/include/asm-x86_64/cpufeature.h
@@ -40,6 +40,7 @@
 #define X86_FEATURE_HT (0*32+28) /* Hyper-Threading */
 #define X86_FEATURE_ACC(0*32+29) /* Automatic clock control */
 #define X86_FEATURE_IA64   (0*32+30) /* IA-64 processor */
+#define X86_FEATURE_PBE(0*32+31) /* PBE */
 
 /* AMD-defined CPU features, CPUID level 0x8001, word 1 */
 /* Don't duplicate feature flags which are redundant with Intel! */
--- 2.6.21-rc7-d390.orig/arch/i386/kernel/cpu/amd.c
+++ 2.6.21-rc7-d390/arch/i386/kernel/cpu/amd.c
@@ -83,7 +83,7 @@ static void __cpuinit init_amd(struct cp
 
/* Bit 31 in normal CPUID used for nonstandard 3DNow ID;
   3DNow is IDd by bit 31 in extended CPUID (1*32+31) anyway */
-   clear_bit(0*32+31, c->x86_capability);
+   clear_bit(X86_FEATURE_PBE, c->x86_capability);

r = get_model_name(c);
 
--- 2.6.21-rc7-d390.orig/arch/i386/kernel/cpu/centaur.c
+++ 2.6.21-rc7-d390/arch/i386/kernel/cpu/centaur.c
@@ -334,7 +334,7 @@ static void __cpuinit init_centaur(struc
 
/* Bit 31 in normal CPUID used for nonstandard 3DNow ID;
   3DNow is IDd by bit 31 in extended CPUID (1*32+31) anyway */
-   clear_bit(0*32+31, c->x86_capability);
+   clear_bit(X86_FEATURE_PBE, c->x86_capability);
 
switch (c->x86) {

[RFC PATCH 0/3] Clean up x86 CPU feature setup

2007-04-20 Thread Chuck Ebbert

x86 CPU feature flag setup has become impossible to debug.
Every user just does set_bit()/clear_bit() or writes the
entire set to change the flags, so there's no way to trace
how they're being set.

This patchset creates an API and debug messages for tracking
how the flags get set. It's not nearly done, but I want to
know whether or not to continue.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] hpet: Detect hidden HPET on NVidia motherboards

2007-04-20 Thread Andrew Morton

On Wed, 18 Apr 2007 00:57:48 +0300
Mikko Tiihonen <[EMAIL PROTECTED]> wrote:

> Enables HPET for NVidia motherboards with broken BIOS. The patch reads
> the HPET address from the pci config space. The patch should also work
> if ACPI is disabled.
> 
> The new quirk activates use of HPET only run if
> - CONFIG_HPET_NFORCE_DETECT is enabled
> - nohpet boot option is not set
> - main chipset is from NVidia
> - ACPI tables do not list HPET
> - matching PCI ID for device with HPET is found
> - BIOS has set up the HPET to some address
> - there is no other resource allocated at the HPET address
> 
> This is true at least for some Asus, Gigabyte and DFI motherboards.
> 
> Patch is against 2.6.21-rc6-git7 but should apply cleanly to most
> kernels.

I looked at applying this but

a) there have been rather a lot of underlying changes in Andi's devel tree and

b) we still haven't heard from Andy?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/6] [RFC]mlx4_core public includes

2007-04-20 Thread Roland Dreier

Include files for hardware/firmware information and interface of
mlx4_core module for protocol-specific drivers (such as mlx4_ib).

Signed-off-by: Roland Dreier <[EMAIL PROTECTED]>

---

 cmd.h  |  178 +
 cq.h   |  123 +++
 device.h   |  323 +
 doorbell.h |   97 ++
 driver.h   |   59 +++
 qp.h   |  288 ++
 srq.h  |   42 +++
 7 files changed, 1110 insertions(+)

diff --git a/include/linux/mlx4/cmd.h b/include/linux/mlx4/cmd.h
new file mode 100644
index 000..4fb552d
--- /dev/null
+++ b/include/linux/mlx4/cmd.h
@@ -0,0 +1,178 @@
+/*
+ * Copyright (c) 2006 Cisco Systems, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef MLX4_CMD_H
+#define MLX4_CMD_H
+
+#include 
+
+enum {
+   /* initialization and general commands */
+   MLX4_CMD_SYS_EN  = 0x1,
+   MLX4_CMD_SYS_DIS = 0x2,
+   MLX4_CMD_MAP_FA  = 0xfff,
+   MLX4_CMD_UNMAP_FA= 0xffe,
+   MLX4_CMD_RUN_FW  = 0xff6,
+   MLX4_CMD_MOD_STAT_CFG= 0x34,
+   MLX4_CMD_QUERY_DEV_CAP   = 0x3,
+   MLX4_CMD_QUERY_FW= 0x4,
+   MLX4_CMD_ENABLE_LAM  = 0xff8,
+   MLX4_CMD_DISABLE_LAM = 0xff7,
+   MLX4_CMD_QUERY_DDR   = 0x5,
+   MLX4_CMD_QUERY_ADAPTER   = 0x6,
+   MLX4_CMD_INIT_HCA= 0x7,
+   MLX4_CMD_CLOSE_HCA   = 0x8,
+   MLX4_CMD_INIT_PORT   = 0x9,
+   MLX4_CMD_CLOSE_PORT  = 0xa,
+   MLX4_CMD_QUERY_HCA   = 0xb,
+   MLX4_CMD_SET_PORT= 0xc,
+   MLX4_CMD_ACCESS_DDR  = 0x2e,
+   MLX4_CMD_MAP_ICM = 0xffa,
+   MLX4_CMD_UNMAP_ICM   = 0xff9,
+   MLX4_CMD_MAP_ICM_AUX = 0xffc,
+   MLX4_CMD_UNMAP_ICM_AUX   = 0xffb,
+   MLX4_CMD_SET_ICM_SIZE= 0xffd,
+
+   /* TPT commands */
+   MLX4_CMD_SW2HW_MPT   = 0xd,
+   MLX4_CMD_QUERY_MPT   = 0xe,
+   MLX4_CMD_HW2SW_MPT   = 0xf,
+   MLX4_CMD_READ_MTT= 0x10,
+   MLX4_CMD_WRITE_MTT   = 0x11,
+   MLX4_CMD_SYNC_TPT= 0x2f,
+
+   /* EQ commands */
+   MLX4_CMD_MAP_EQ  = 0x12,
+   MLX4_CMD_SW2HW_EQ= 0x13,
+   MLX4_CMD_HW2SW_EQ= 0x14,
+   MLX4_CMD_QUERY_EQ= 0x15,
+
+   /* CQ commands */
+   MLX4_CMD_SW2HW_CQ= 0x16,
+   MLX4_CMD_HW2SW_CQ= 0x17,
+   MLX4_CMD_QUERY_CQ= 0x18,
+   MLX4_CMD_RESIZE_CQ   = 0x2c,
+
+   /* SRQ commands */
+   MLX4_CMD_SW2HW_SRQ   = 0x35,
+   MLX4_CMD_HW2SW_SRQ   = 0x36,
+   MLX4_CMD_QUERY_SRQ   = 0x37,
+   MLX4_CMD_ARM_SRQ = 0x40,
+
+   /* QP/EE commands */
+   MLX4_CMD_RST2INIT_QP = 0x19,
+   MLX4_CMD_INIT2RTR_QP = 0x1a,
+   MLX4_CMD_RTR2RTS_QP  = 0x1b,
+   MLX4_CMD_RTS2RTS_QP  = 0x1c,
+   MLX4_CMD_SQERR2RTS_QP= 0x1d,
+   MLX4_CMD_2ERR_QP = 0x1e,
+   MLX4_CMD_RTS2SQD_QP  = 0x1f,
+   MLX4_CMD_SQD2SQD_QP  = 0x38,
+   MLX4_CMD_SQD2RTS_QP  = 0x20,
+   MLX4_CMD_2RST_QP = 0x21,
+   MLX4_CMD_QUERY_QP= 0x22,
+   MLX4_CMD_INIT2INIT_QP= 0x2d,
+   MLX4_CMD_SUSPEND_QP  = 0x32,
+   MLX4_CMD_UNSUSPEND_QP= 0x33,
+   /* special QP and management commands */
+   MLX4_CMD_CONF_SPECIAL_QP = 0x23,
+   MLX4_CMD_MAD_IFC = 0x24,
+
+   /* multicast commands */
+   MLX4_CMD_READ_MCG= 0x25,
+   MLX4_CMD_WRITE_MCG   =

[PATCH 0/6] [RFC]IB/mlx4: Mellanox ConnectX adapter driver

2007-04-20 Thread Roland Dreier

As promised, here is a series of patches adding the mlx4_core and
mlx4_ib drivers for the new Mellanox ConnectX adapter.  These patches
are split up in an ad hoc way to avoid mailing list size limits, but
when this driver is finally merged, I will give it to Linus to pull in
a single changeset.  The full driver is also available via git from

master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git connectx

and it is also in my for-mm patch, so Andrew will pick it up for -mm
kernels automatically.

The driver is split into two kernel modules because the ConnectX
adapter can be used as an InfiniBand adapter, 1G/10G ethernet NIC, and
an fibre channel HBA at the same time, and so resource management and
basic tasks such as issuing commands to the firmware are handled in a
mlx4_core module, while everything InfiniBand-specific is in mlx4_ib.
In the not-to-distant future, an mlx4_eth module that handles ethernet
NIC stuff will be released.

My goal is to merge this for 2.6.22.  If you feel that would not be
appropriate, please do let me know and I will hold off.  And of course
all criticisms, suggestions, comments, etc. are very much appreciated.
My feeling is that the driver is fairly clean already (and I will do
some further cleanup before merging) and seems to be reasonably
usable, and I trust myself to continue cleaning things up, so there's
not much to be gained by waiting a release cycle.

The overall driver is not too huge -- 11371 insertions in the diffstat:

 drivers/infiniband/Kconfig|2 +
 drivers/infiniband/Makefile   |1 +
 drivers/infiniband/hw/mlx4/Kconfig|9 +
 drivers/infiniband/hw/mlx4/Makefile   |3 +
 drivers/infiniband/hw/mlx4/ah.c   |  100 +++
 drivers/infiniband/hw/mlx4/cq.c   |  525 ++
 drivers/infiniband/hw/mlx4/doorbell.c |  215 ++
 drivers/infiniband/hw/mlx4/mad.c  |  339 +
 drivers/infiniband/hw/mlx4/main.c |  612 
 drivers/infiniband/hw/mlx4/mlx4_ib.h  |  285 
 drivers/infiniband/hw/mlx4/mr.c   |  184 +
 drivers/infiniband/hw/mlx4/qp.c   | 1263 +
 drivers/infiniband/hw/mlx4/srq.c  |  334 +
 drivers/infiniband/hw/mlx4/user.h |   91 +++
 drivers/net/Kconfig   |   14 +
 drivers/net/Makefile  |1 +
 drivers/net/mlx4/Makefile |4 +
 drivers/net/mlx4/alloc.c  |  179 +
 drivers/net/mlx4/cmd.c|  429 +++
 drivers/net/mlx4/cq.c |  254 +++
 drivers/net/mlx4/eq.c |  704 ++
 drivers/net/mlx4/fw.c |  758 
 drivers/net/mlx4/fw.h |  165 +
 drivers/net/mlx4/icm.c|  379 ++
 drivers/net/mlx4/icm.h|  135 
 drivers/net/mlx4/intf.c   |  142 
 drivers/net/mlx4/main.c   |  939 
 drivers/net/mlx4/mcg.c|  370 ++
 drivers/net/mlx4/mlx4.h   |  334 +
 drivers/net/mlx4/mr.c |  482 +
 drivers/net/mlx4/pd.c |  102 +++
 drivers/net/mlx4/profile.c|  238 +++
 drivers/net/mlx4/qp.c |  270 +++
 drivers/net/mlx4/reset.c  |  172 +
 drivers/net/mlx4/srq.c|  227 ++
 include/linux/mlx4/cmd.h  |  178 +
 include/linux/mlx4/cq.h   |  123 
 include/linux/mlx4/device.h   |  323 +
 include/linux/mlx4/doorbell.h |   97 +++
 include/linux/mlx4/driver.h   |   59 ++
 include/linux/mlx4/qp.h   |  288 
 include/linux/mlx4/srq.h  |   42 ++
 42 files changed, 11371 insertions(+), 0 deletions(-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 6/6] [RFC]mlx4 build system stuff

2007-04-20 Thread Roland Dreier

Hook up mlx4_core and mlx4_ib drivers to Kconfig and Makefiles.

Signed-off-by: Roland Dreier <[EMAIL PROTECTED]>

---

 infiniband/Kconfig  |2 ++
 infiniband/Makefile |1 +
 infiniband/hw/mlx4/Kconfig  |9 +
 infiniband/hw/mlx4/Makefile |3 +++
 net/Kconfig |   14 ++
 net/Makefile|1 +
 net/mlx4/Makefile   |4 
 7 files changed, 34 insertions(+)

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index 82afba5..37deaae 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -45,6 +45,8 @@ source "drivers/infiniband/hw/ehca/Kconfig"
 source "drivers/infiniband/hw/amso1100/Kconfig"
 source "drivers/infiniband/hw/cxgb3/Kconfig"
 
+source "drivers/infiniband/hw/mlx4/Kconfig"
+
 source "drivers/infiniband/ulp/ipoib/Kconfig"
 
 source "drivers/infiniband/ulp/srp/Kconfig"
diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile
index da2066c..75f325e 100644
--- a/drivers/infiniband/Makefile
+++ b/drivers/infiniband/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_INFINIBAND_IPATH)  += hw/ipath/
 obj-$(CONFIG_INFINIBAND_EHCA)  += hw/ehca/
 obj-$(CONFIG_INFINIBAND_AMSO1100)  += hw/amso1100/
 obj-$(CONFIG_INFINIBAND_CXGB3) += hw/cxgb3/
+obj-$(CONFIG_MLX4_INFINIBAND)  += hw/mlx4/
 obj-$(CONFIG_INFINIBAND_IPOIB) += ulp/ipoib/
 obj-$(CONFIG_INFINIBAND_SRP)   += ulp/srp/
 obj-$(CONFIG_INFINIBAND_ISER)  += ulp/iser/
diff --git a/drivers/infiniband/hw/mlx4/Kconfig 
b/drivers/infiniband/hw/mlx4/Kconfig
new file mode 100644
index 000..b8912cd
--- /dev/null
+++ b/drivers/infiniband/hw/mlx4/Kconfig
@@ -0,0 +1,9 @@
+config MLX4_INFINIBAND
+   tristate "Mellanox ConnectX HCA support"
+   depends on INFINIBAND
+   select MLX4_CORE
+   ---help---
+ This driver provides low-level InfiniBand support for
+ Mellanox ConnectX PCI Express host channel adapters (HCAs).
+ This is required to use InfiniBand protocols such as
+ IP-over-IB or SRP with these devices.
diff --git a/drivers/infiniband/hw/mlx4/Makefile 
b/drivers/infiniband/hw/mlx4/Makefile
new file mode 100644
index 000..70f09c7
--- /dev/null
+++ b/drivers/infiniband/hw/mlx4/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_MLX4_INFINIBAND)  += mlx4_ib.o
+
+mlx4_ib-y :=   ah.o cq.o doorbell.o mad.o main.o mr.o qp.o srq.o
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index c3f9f59..842f020 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2493,6 +2493,20 @@ config PASEMI_MAC
  This driver supports the on-chip 1/10Gbit Ethernet controller on
  PA Semi's PWRficient line of chips.
 
+config MLX4_CORE
+   tristate
+   depends on PCI
+   default n
+
+config MLX4_DEBUG
+   bool "Verbose debugging output" if (MLX4_CORE && EMBEDDED)
+   default y
+   ---help---
+ This option causes debugging code to be compiled into the
+ mlx4_core driver.  The output can be turned on via the
+ debug_level module parameter (which can also be set after
+ the driver is loaded through sysfs).
+
 endmenu
 
 source "drivers/net/tokenring/Kconfig"
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 33af833..1604e1a 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -197,6 +197,7 @@ obj-$(CONFIG_SMC911X) += smc911x.o
 obj-$(CONFIG_DM9000) += dm9000.o
 obj-$(CONFIG_FEC_8XX) += fec_8xx/
 obj-$(CONFIG_PASEMI_MAC) += pasemi_mac.o
+obj-$(CONFIG_MLX4_CORE) += mlx4/
 
 obj-$(CONFIG_MACB) += macb.o
 
diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
new file mode 100644
index 000..4f18889
--- /dev/null
+++ b/drivers/net/mlx4/Makefile
@@ -0,0 +1,4 @@
+obj-$(CONFIG_MLX4_CORE)+= mlx4_core.o
+
+mlx4_core-y := alloc.o cmd.o cq.o eq.o fw.o icm.o intf.o main.o mcg.o mr.o \
+   pd.o profile.o qp.o reset.o srq.o
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Check for error returned by kthread_create on creating journal thread

2007-04-20 Thread Andrew Morton

On Mon, 16 Apr 2007 11:41:14 +0400
Pavel Emelianov <[EMAIL PROTECTED]> wrote:

> If the thread failed to create the subsequent wait_event
> will hang forever.
> 
> This is likely to happen if kernel hits max_threads limit.
> 
> Will be critical for virtualization systems that limit the
> number of tasks and kernel memory usage within the container.
> 
> 
> [diff-jbd-check-start-journal-thread-return-value  text/plain (1.7KB)]
> --- ./fs/jbd/journal.c.jbdthreads 2007-04-16 11:17:36.0 +0400
> +++ ./fs/jbd/journal.c2007-04-16 11:30:09.0 +0400
> @@ -211,10 +211,16 @@ end_loop:
>   return 0;
>  }
>  
> -static void journal_start_thread(journal_t *journal)
> +static int journal_start_thread(journal_t *journal)
>  {
> - kthread_run(kjournald, journal, "kjournald");
> + struct task_struct *t;
> +
> + t = kthread_run(kjournald, journal, "kjournald");
> + if (IS_ERR(t))
> + return PTR_ERR(t);
> +
>   wait_event(journal->j_wait_done_commit, journal->j_task != 0);
> + return 0;
>  }

Thanks.   Please don't forget those Signed-off-by:s

I assume that you runtime tested this and that the mount failed in
an appropriate fashion?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/6] [RFC]mlx4_ib main files

2007-04-20 Thread Roland Dreier

Main include file and .c file for mlx4_ib.

Signed-off-by: Roland Dreier <[EMAIL PROTECTED]>

---

 main.c|  612 ++
 mlx4_ib.h |  285 
 2 files changed, 897 insertions(+)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
new file mode 100644
index 000..6f7165f
--- /dev/null
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -0,0 +1,612 @@
+/*
+ * Copyright (c) 2006, 2007 Cisco Systems, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include 
+#include 
+
+#include "mlx4_ib.h"
+#include "user.h"
+
+#define DRV_NAME   "mlx4_ib"
+#define DRV_VERSION"0.01"
+#define DRV_RELDATE"May 1, 2006"
+
+MODULE_AUTHOR("Roland Dreier");
+MODULE_DESCRIPTION("Mellanox ConnectX HCA InfiniBand driver");
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_VERSION(DRV_VERSION);
+
+static const char mlx4_ib_version[] __devinitdata =
+   DRV_NAME ": Mellanox ConnectX InfiniBand driver v"
+   DRV_VERSION " (" DRV_RELDATE ")\n";
+
+static void init_query_mad(struct ib_smp *mad)
+{
+   mad->base_version  = 1;
+   mad->mgmt_class= IB_MGMT_CLASS_SUBN_LID_ROUTED;
+   mad->class_version = 1;
+   mad->method= IB_MGMT_METHOD_GET;
+}
+
+static int mlx4_ib_query_device(struct ib_device *ibdev,
+   struct ib_device_attr *props)
+{
+   struct mlx4_ib_dev *dev = to_mdev(ibdev);
+   struct ib_smp *in_mad  = NULL;
+   struct ib_smp *out_mad = NULL;
+   int err = -ENOMEM;
+
+   in_mad  = kzalloc(sizeof *in_mad, GFP_KERNEL);
+   out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL);
+   if (!in_mad || !out_mad)
+   goto out;
+
+   init_query_mad(in_mad);
+   in_mad->attr_id = IB_SMP_ATTR_NODE_INFO;
+
+   err = mlx4_MAD_IFC(to_mdev(ibdev), 1, 1, 1, NULL, NULL, in_mad, 
out_mad);
+   if (err)
+   goto out;
+
+   memset(props, 0, sizeof *props);
+
+   props->fw_ver = dev->dev->caps.fw_ver;
+   props->device_cap_flags= IB_DEVICE_CHANGE_PHY_PORT |
+   IB_DEVICE_PORT_ACTIVE_EVENT |
+   IB_DEVICE_SYS_IMAGE_GUID|
+   IB_DEVICE_RC_RNR_NAK_GEN;
+   if (dev->dev->caps.flags & MLX4_DEV_CAP_FLAG_BAD_PKEY_CNTR)
+   props->device_cap_flags |= IB_DEVICE_BAD_PKEY_CNTR;
+   if (dev->dev->caps.flags & MLX4_DEV_CAP_FLAG_BAD_QKEY_CNTR)
+   props->device_cap_flags |= IB_DEVICE_BAD_QKEY_CNTR;
+   if (dev->dev->caps.flags & MLX4_DEV_CAP_FLAG_APM)
+   props->device_cap_flags |= IB_DEVICE_AUTO_PATH_MIG;
+   if (dev->dev->caps.flags & MLX4_DEV_CAP_FLAG_UD_AV_PORT)
+   props->device_cap_flags |= IB_DEVICE_UD_AV_PORT_ENFORCE;
+
+   props->vendor_id   = be32_to_cpup((__be32 *) (out_mad->data + 
36)) &
+   0xff;
+   props->vendor_part_id  = be16_to_cpup((__be16 *) (out_mad->data + 
30));
+   props->hw_ver  = be32_to_cpup((__be32 *) (out_mad->data + 
32));
+   memcpy(>sys_image_guid, out_mad->data +  4, 8);
+
+   props->max_mr_size = ~0ull;
+   props->page_size_cap   = dev->dev->caps.page_size_cap;
+   props->max_qp  = dev->dev->caps.num_qps - 
dev->dev->caps.reserved_qps;
+   props->max_qp_wr   = dev->dev->caps.max_wqes;
+   props->max_sge = min(dev->dev->caps.max_sq_sg,
+dev->dev->caps.max_rq_sg);
+   props->max_cq

/proc/sef/fd/0 is a socket ?!

2007-04-20 Thread J.A. Magallón

Hi all...

After a big update in my systems, two of them just does not let me ssh into it.
It says that stdin is not a terminal. The same hapens if I try to open any
terminal emulator, like aterm.

It finally let me do somathing like ssh [EMAIL PROTECTED] /bin/bash -i, to get 
a terminal,
and I saw this:

nada:/etc/rc.d# ll /proc/self/fd/
total 0
lrwx-- 1 root root 64 2007.04.21 00:13 0 -> socket:[23705]
lrwx-- 1 root root 64 2007.04.21 00:13 1 -> socket:[23705]
lrwx-- 1 root root 64 2007.04.21 00:13 2 -> socket:[23707]
lr-x-- 1 root root 64 2007.04.21 00:13 3 -> /proc/6814/fd/

whats that ? udev really messed something ?

One other box is working just fine.
Any ideas ?

--
J.A. Magallon  \   Software is like sex:
 \ It's better when it's free
Mandriva Linux release 2008.0 (Cooker) for i586
Linux 2.6.20-jam10 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #1 SMP 
PREEMPT
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH -mm] freezer: Document task_lock in thaw_process

2007-04-20 Thread Rafael J. Wysocki

From: Rafael J. Wysocki <[EMAIL PROTECTED]>

The task_lock() in include/linux/freezer.h:thaw_process() looks as though it
were protecting p->flags, which is not the case.  Add a comment that explains
why it's there.

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
---
 include/linux/freezer.h |6 ++
 1 file changed, 6 insertions(+)

Index: linux-2.6.21-rc6-mm1/include/linux/freezer.h
===
--- linux-2.6.21-rc6-mm1.orig/include/linux/freezer.h   2007-04-09 
15:24:25.0 +0200
+++ linux-2.6.21-rc6-mm1/include/linux/freezer.h2007-04-21 
00:17:30.0 +0200
@@ -37,6 +37,12 @@ static inline void do_not_freeze(struct 
 
 /*
  * Wake up a frozen process
+ *
+ * task_lock() is taken to prevent the race with refrigerator() which may
+ * occur if the freezing of tasks fails.  Namely, without the lock, if the
+ * freezing of tasks failed, thaw_tasks() might have run before a task in
+ * refrigerator() could call frozen_process(), in which case the task would be
+ * frozen and no one would thaw it.
  */
 static inline int thaw_process(struct task_struct *p)
 {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Sleep during spinlock in TPM driver

2007-04-20 Thread David Kyle


I've been working with the TPM driver, and I found that if I opened,
used, then closed the TPM char device very frequently, I would get a
kernel BUG message saying that the kernel tried to sleep while holding
a spinlock.  I think I've isolated the problem to this function, in
drivers/char/tpm/tpm.c:

int tpm_release(struct inode *inode, struct file *file)
{
   struct tpm_chip *chip = file->private_data;
   spin_lock(_lock);
   file->private_data = NULL;
   chip->num_opens--;
   del_singleshot_timer_sync(>user_read_timer);
   flush_scheduled_work();
   atomic_set(>data_pending, 0);
   put_device(chip->dev);
   kfree(chip->data_buffer);
   spin_unlock(_lock);
   return 0;
}
EXPORT_SYMBOL_GPL(tpm_release);

I believe that flush_scheduled_work can sleep, correct?  Does anyone
know why this function is called while the spinlock is held?

-David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] lazy freeing of memory through MADV_FREE

2007-04-20 Thread Andrew Morton

On Fri, 20 Apr 2007 17:38:06 -0400
Rik van Riel <[EMAIL PROTECTED]> wrote:

> Andrew Morton wrote:
> 
> > I've also merged Nick's "mm: madvise avoid exclusive mmap_sem".
> > 
> > - Nick's patch also will help this problem.  It could be that your patch
> >   no longer offers a 2x speedup when combined with Nick's patch.
> > 
> >   It could well be that the combination of the two is even better, but it
> >   would be nice to firm that up a bit.  
> 
> I'll test that.

Thanks.

> >   I do go on about that.  But we're adding page flags at about one per
> >   year, and when we run out we're screwed - we'll need to grow the
> >   pageframe.
> 
> If you want, I can take a look at folding this into the
> ->mapping pointer.  I can guarantee you it won't be
> pretty, though :)

Well, let's see how fugly it ends up looking?

> > - I need to update your patch for Nick's patch.  Please confirm that
> >   down_read(mmap_sem) is sufficient for MADV_FREE.
> 
> It is.  MADV_FREE needs no more protection than MADV_DONTNEED.
> 
> > Stylistic nit:
> > 
> >> +  if (PageLazyFree(page) && !migration) {
> >> +  /* There is new data in the page.  Reinstate it. */
> >> +  if (unlikely(pte_dirty(pteval))) {
> >> +  set_pte_at(mm, address, pte, pteval);
> >> +  ret = SWAP_FAIL;
> >> +  goto out_unmap;
> >> +  }
> > 
> > The comment should be inside the second `if' statement.  As it is, It
> > looks like we reinstate the page if (PageLazyFree(page) && !migration).
> 
> Want me to move it?

I did that, thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: AGPGart / AMD K7

2007-04-20 Thread Preston A. Elder

Dave Jones wrote:
> On Fri, Apr 20, 2007 at 04:22:06PM -0400, Preston A. Elder wrote:
>  > Dave, Greg,
>  > 
>  > Here is the trace with 2.6.20.6
>  > 
>  > I added back in my trace code, as you see.  As you can also see,
>  > agp_amdk7_probe is still not called.
>
> Try looking down in __driver_attach()
> The fact that we're not calling the ->probe function is quite bizarre.
>
> It could be this in __driver_attach
>
> if (!dev->driver)
> driver_probe_device(drv, dev);
>
> Though that'd be odd.
>
> Putting a #define DEBUG 1 in drivers/base/dd.c may also yield some clues.
>
>   Dave
>
>   
OK, I found it!

Here is more trace:
Linux agpgart interface v0.101 (c) Dave Jones
agp_amdk7_init: In function
agp_amdk7_init: Before pci_register_driver
__pci_register_driver: In Function (driver = agpgart-amdk7, multithread = 0)
__pci_register_driver: Before Spinlock
__pci_register_driver: Before Init List Head
__pci_register_driver: Before driver_register
bus_add_driver: In Function (c0492920)
bus_add_driver: Before kobject_set_name
bus_add_driver: error = 0
bus_add_driver: Before kobject_register
bus_add_driver: error = 0
bus_add_driver: Before driver_attach
__driver_attach (:00:00.0,1): Before Down (parent) (c21c8600)
__driver_attach (:00:00.0, 1): Before Down
__driver_attach (:00:00.0, 1): Before Probe Device (c049fe54)
__driver_attach (:00:00.0, 1): alreay registered with driver amd76x_edac
__driver_attach (:00:00.0, 1): Before Up
__driver_attach (:00:00.0, 1): Before Up (parent) (c21c8600)
__driver_attach (:00:00.0, 1): Returning 0
bus_add_driver: error = 0
bus_add_driver: Before klist_add_tail
bus_add_driver: Before module_add_driver
bus_add_driver: Before driver_add_attrs
bus_add_driver: error = 0
bus_add_driver: Before add_bind_files
bus_add_driver: error = 0
bus_add_driver: Returning 0
__pci_register_driver: error = 0
__pci_register_driver: Before pci_create_newid_file
__pci_register_driver: error = 0
__pci_register_driver: Returning 0

I snipped some since __driver_attach is called many times.

But the long and short is that 00:00:00 is already associated with the
'amd76x_edac' driver, and as such will not call the agp probe call. 
What is this edac, btw?

PreZ :)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.21-rc7: HPET enabled freeze my machine at boot

2007-04-20 Thread john stultz


On 4/19/07, guilherme <[EMAIL PROTECTED]> wrote:

Hi,

If i enable "High Resolution Timer Support", my machine stops here at boot:

Clocksource tsc unstable (delta = -297340790165 ns)
Time: hpet clocksource has been installed.

If i disable HPET, it boots fine.


Hmmm.. What happens if you boot w/ clocksource=acpi_pm ?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

BUG: Dentry still in use during umount in 2.6.21-rc5-git6

2007-04-20 Thread Andi Kleen


One of my autoboot test clients gave me this during shutdown. It used
reiserfs and autofs and NFS heavily.

Unmounting file systems
BUG: Dentry 8100f3693a40{i=2352220,n=xattrs} still in use (1) [unmount of 
reiserfs sda9]
[ cut here ]
kernel BUG at 
/mnt/dm-2/newautoboot/autoboot/lsrc/mainline/linux/fs/dcache.c:623!
invalid opcode:  [1] SMP 
CPU 1 
Modules linked in:
Pid: 15791, comm: umount Not tainted 2.6.21-rc5-git6 #44
RIP: 0010:[]  [] 
shrink_dcache_for_umount_subtree+0x178/0x250
RSP: 0018:8100f5f67e18  EFLAGS: 00010292
RAX: 0060 RBX: 8100f3693a40 RCX: 5207
RDX:  RSI: 0046 RDI: 00014661
RBP: 8100f6dc9cc0 R08: 00a0 R09: 0005
R10:  R11:  R12: 8100f3693aa0
R13: 00014661 R14: 0050ea70 R15: 0050ead0
FS:  2adc863a86d0() GS:8100f7fdc1c0() knlGS:b7be38d0
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 2adc8626a688 CR3: f628b000 CR4: 06e0
Process umount (pid: 15791, threadinfo 8100f5f66000, task 8100f7a08100)
Stack:  810004dab218 810004dab000 80558860 810004dab000
  8028815b 810004dab000 8027a1a5
  8100f6c50980 806c1600 8027a2a4
Call Trace:
 [] shrink_dcache_for_umount+0x2f/0x3d
 [] generic_shutdown_super+0x19/0xf2
 [] kill_block_super+0x26/0x3b
 [] deactivate_super+0x47/0x60
 [] sys_umount+0x1f7/0x22a
 [] sys_newstat+0x19/0x31
 [] system_call+0x7e/0x83


Code: 0f 0b eb fe 48 8b 6b 28 48 39 dd 75 04 31 ed eb 04 f0 ff 4d 
RIP  [] shrink_dcache_for_umount_subtree+0x178/0x250
 RSP 
/etc/init.d/boot.d/K14boot.localfs: line 93: 15791 Segmentation fault  
umount -avt noproc,nonfs,nonfs4,nosmbfs,nocifs,notmpfs



-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: how to tell linux (on x86) to ignore 1M or memory

2007-04-20 Thread Bodo Eggert

Rene Herman <[EMAIL PROTECTED]> wrote:
> On 04/19/2007 04:18 PM, Bart Trojanowski wrote:

>> I need to preserve some state from the bios before entering protected
>> mode.  For now I want to copy it into some ram accessible by real-mode,
>> say the last megabyte visible in real-mode.
>> 
>> What's the easiest way to have linux ignore the megabyte starting at 15M?
> 
> Note that real-mode can only access the first megabyte (*) and not the first
> 16. 16MB is the 16-bit protected mode (286) limit.
> 
> (*) well, the first 1M + 64K - 16 bytes using segment  assuming A20 is
> enabled and x > 1 in x86...

Interrupt 15h, function 87h allows copying from/to extended memory.
You might like to look into Ralph Brown's interrupt list for more details.

You could also cpio-gzip the data and append it to the initramfs.
-- 
Fun things to slip into your budget
Does that line item say 'Personal Massage System' Oops, it's supposed to be
'Message'. Go ahead and sign the authorization, Boss; I'll correct it later.
(Iike Hell I will)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cciss: Fix warnings during compilation under 32bitenvironment

2007-04-20 Thread John Anthony Kazos Jr.

On Fri, 20 Apr 2007, Andrew Morton wrote:

> On Fri, 20 Apr 2007 16:20:59 -0400
> James Bottomley <[EMAIL PROTECTED]> wrote:
> 
> > On Fri, 2007-04-20 at 12:30 -0700, Andrew Morton wrote:
> > > On Fri, 20 Apr 2007 14:50:06 -0400
> > > James Bottomley <[EMAIL PROTECTED]> wrote:
> > > 
> > > > > CONFIG_LBD=y gives us an additional 3kb of instructions on i386
> > > > > allnoconfig.  Other architectures might do less well.  It's not a huge
> > > > > difference, but that's the way in which creeping bloatiness happens.
> > > > 
> > > > OK, sure, but if we really care about this saving, then unconditionally
> > > > casting to u64 is therefore wrong as well ... this is starting to open
> > > > quite a large can of worms ...
> > > > 
> > > > For the record, if we have to do this, I fancy sector_upper_32() ... we
> > > > should already have some similar accessor for dma_addr_t as well.
> > > 
> > > hm.  How about this?
> > > 
> > > --- a/include/linux/kernel.h~upper-32-bits
> > > +++ a/include/linux/kernel.h
> > > @@ -40,6 +40,17 @@ extern const char linux_proc_banner[];
> > >  #define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))
> > >  #define roundup(x, y) x) + ((y) - 1)) / (y)) * (y))
> > >  
> > > +/**
> > > + * upper_32_bits - return bits 32-63 of a number
> > > + * @n: the number we're accessing
> > > + *
> > > + * A basic shift-right of a 64- or 32-bit quantity.  Use this to suppress
> > > + * the "right shift count >= width of type" warning when that quantity is
> > > + * 32-bits.
> > > + */
> > > +#define upper_32_bits(n) (((u64)(n)) >> 32)
> > 
> > Won't this have the unwanted side effect of promoting everything in a
> > calculation to long long on 32 bit platforms, even if n was only 32
> > bits?
> 
> bummer.
> 
> > > +
> > > +
> > >  #define  KERN_EMERG  "<0>"   /* system is unusable   
> > > */
> > >  #define  KERN_ALERT  "<1>"   /* action must be taken immediately 
> > > */
> > >  #define  KERN_CRIT   "<2>"   /* critical conditions  
> > > */
> > > _
> > > 
> > > It seems to generate the desired code.  I avoided Alan's ((n >> 31) >> 1)
> > > trick because it'll generate peculiar results with signed 64-bit
> > > quantities.
> > 
> > I've seen the trick done similarly with ((n >> 16) >> 16) which
> > shouldn't have the issue.
> 
> That works if we know the caller is treating the return value as 32 bits,
> but we don't know that.
> 
> If we have
> 
> #define upper_32_bits(x)  ((x >> 16) >> 16)
> 
> then
> 
>   upper_32_bits(0x)
> 
> will return 0x if it's treated as 32-bits, but it'll return
> 0x if the caller is using 64-bits.
> 
> I spose
> 
> #define upper_32_bits(x)  ((u32)((x >> 16) >> 16))
> 
> will do the trick.

What about this?

#define upper_32_bits(x) (sizeof(x) == 8 ? (u64)(x) >> 32 : 0)

The u64 cast prevents the sign bit from being carried over and therefore 
eliminates the need for a subsequent cast to u32 since the upper 32 of the 
result will be 0. Shouldn't be any case where an integer gets promoted if 
64 bits is the largest possible promotion.

Assuming, of course, I'm not an idiot. Which I somewhat frequently am.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Acecad USB Tablet: usbmouse takeover and odd motion

2007-04-20 Thread Giuseppe Bilotta


On 4/20/07, Vojtech Pavlik <[EMAIL PROTECTED]> wrote:

On Fri, Apr 20, 2007 at 06:09:55PM +0200, Giuseppe Bilotta wrote:
> On 4/20/07, Dmitry Torokhov <[EMAIL PROTECTED]> wrote:
> >On 4/20/07, Giuseppe Bilotta <[EMAIL PROTECTED]> wrote:
> >>
> >> Sorry, it seems I was wrong, it's not usbhid but usbmouse taking over.
> >> After a fresh plug (e.g. at bootup) I get the following:
> >>
> >
> >Well, the question is - why do you have usbmouse module on your system?
>
> Stock Debian kernel 2.6.18 comes with it.
>
> With my custom kernels I can probably skip compiling it at all, if you
> so suggest; should I blacklist it for the distro kernel? Or is there a
> chance that some random USB mouse plugged in would fail to function by
> doing so?

usbmouse and usbkbd are only intended for embedded systems where the
full usbhid doesn't fit and for testing purposes: Normal distros
shouldn't have them enabled.


Oh, I see. I'll blacklist those modules, maybe also issue a ticket on
the Debian BTS.

Thanks all.

--
Giuseppe "Oblomov" Bilotta
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH(experimental) 2/2] Fix freezer-kthread_stop race

2007-04-20 Thread Rafael J. Wysocki

On Friday, 20 April 2007 23:20, Oleg Nesterov wrote:
> On 04/20, Gautham R Shenoy wrote:
> >
> > On Fri, Apr 20, 2007 at 10:54:36AM +0200, Rafael J. Wysocki wrote:
> > > 
> > > Hmm, can't we do something like this instead:
> > > 
> > > ---
> > >  kernel/kthread.c |   10 ++
> > >  1 file changed, 10 insertions(+)
> > > 
> > > Index: linux-2.6.21-rc7/kernel/kthread.c
> > > ===
> > > --- linux-2.6.21-rc7.orig/kernel/kthread.c
> > > +++ linux-2.6.21-rc7/kernel/kthread.c
> > > @@ -13,6 +13,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >  #include 
> > > 
> > >  /*
> > > @@ -232,6 +233,15 @@ int kthread_stop(struct task_struct *k)
> > > 
> > >   /* Now set kthread_should_stop() to true, and wake it up. */
> > >   kthread_stop_info.k = k;
> > > + if (!(current->flags & PF_NOFREEZE)) {
> > > + /* If we are freezable, the freezer will wait for us */
> > > + task_lock(k);
> > > + k->flags |= PF_NOFREEZE;
> > > + if (frozen(k))
> > > + k->flags &= ~PF_FROZEN;
> > > +
> > > + task_unlock(k);
> > > + }
> > 
> > Yes, we can do this for now since the tasks have only two freeze states,
> > namely Freezeable and Non Freezeable. 
> 
> No, we can't change k->flags, k owns its ->flags, and it is not atomic.

Yes, but if we move PF_FROZEN to a separate field in task_struct with
appropriate locking, then it won't be a problem any more IMO.
 
> Rafael, may I suggest you to document task_lock() in thaw_process() ? This
> looks really confusing, as if task_lock() protects "p->flags &= ~PF_FROZEN".
> 
> Actually, task_lock() is needed to prevent the race with refrigerator()
> when the freezing fails, but this is not obvious.

Sure, I will.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, v4

2007-04-20 Thread mdew .


Any chance of supporting 2.6.20?

On 4/21/07, Ingo Molnar <[EMAIL PROTECTED]> wrote:


i'm pleased to announce release -v4 of the CFS patchset. The patch
against v2.6.21-rc7 can be downloaded from:

http://redhat.com/~mingo/cfs-scheduler/

this CFS release too is mainly about fixing regressions and improving
interactivity, so the rate of change is relatively low:

11 files changed, 136 insertions(+), 72 deletions(-)

in particular the preemption fix could resolve the 'desktop slows down
under IO load' reports and the 'firefox does not switch tabs fast
enough' reports as well. The suspend2 crash and the yield related
Kaffeine hangs should be resolved as well.

Changes since -v3:

 - usability fix: automatic renicing of kernel threads such as keventd,
   OOM tasks and tasks doing privileged hardware access (such as Xorg).
   (This is a substitute for group scheduling until the group scheduling
details have been worked out.)

 - bugfix: buggy yield() caused suspend2 problems

 - preemption fix: it caused desktop app latencies

As usual, any sort of feedback, bugreport, fix and suggestion is more
than welcome,

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, v4

2007-04-20 Thread Gene Heskett

On Friday 20 April 2007, Ingo Molnar wrote:
>i'm pleased to announce release -v4 of the CFS patchset. The patch
>against v2.6.21-rc7 can be downloaded from:
>
>http://redhat.com/~mingo/cfs-scheduler/
>
>this CFS release too is mainly about fixing regressions and improving
>interactivity, so the rate of change is relatively low:
>
>11 files changed, 136 insertions(+), 72 deletions(-)
>
>in particular the preemption fix could resolve the 'desktop slows down
>under IO load' reports and the 'firefox does not switch tabs fast
>enough' reports as well. The suspend2 crash and the yield related
>Kaffeine hangs should be resolved as well.
>
>Changes since -v3:
>
> - usability fix: automatic renicing of kernel threads such as keventd,
>   OOM tasks and tasks doing privileged hardware access (such as Xorg).
>   (This is a substitute for group scheduling until the group scheduling
>details have been worked out.)
>
> - bugfix: buggy yield() caused suspend2 problems
>
> - preemption fix: it caused desktop app latencies
>
>As usual, any sort of feedback, bugreport, fix and suggestion is more
>than welcome,

I've been running this one for several hours now, with amanda running in the 
background due a typu in one of my scripts, so now its playing catchup.

This one is another keeper IMO, or as we are fond of saying around here, its 
good enough for the girls I go with.  If this isn't the best one so far, its 
very very close and I'm getting pickier.  kmail is the only thing that's 
lagging, and that's just kmail, which I believe is single threaded.  Even 
with gzip eating 95% of the cpu, graphics animations like the cards in 
patience are moving at at least 80% speed.  Nice, keep this one and use it 
for the reference. 

>   Ingo



-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
In order to dial out, it is necessary to broaden one's dimension.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] lazy freeing of memory through MADV_FREE

2007-04-20 Thread Rik van Riel


Andrew Morton wrote:


I've also merged Nick's "mm: madvise avoid exclusive mmap_sem".

- Nick's patch also will help this problem.  It could be that your patch
  no longer offers a 2x speedup when combined with Nick's patch.

  It could well be that the combination of the two is even better, but it
  would be nice to firm that up a bit.  


I'll test that.


  I do go on about that.  But we're adding page flags at about one per
  year, and when we run out we're screwed - we'll need to grow the
  pageframe.


If you want, I can take a look at folding this into the
->mapping pointer.  I can guarantee you it won't be
pretty, though :)


- I need to update your patch for Nick's patch.  Please confirm that
  down_read(mmap_sem) is sufficient for MADV_FREE.


It is.  MADV_FREE needs no more protection than MADV_DONTNEED.


Stylistic nit:


+   if (PageLazyFree(page) && !migration) {
+   /* There is new data in the page.  Reinstate it. */
+   if (unlikely(pte_dirty(pteval))) {
+   set_pte_at(mm, address, pte, pteval);
+   ret = SWAP_FAIL;
+   goto out_unmap;
+   }


The comment should be inside the second `if' statement.  As it is, It
looks like we reinstate the page if (PageLazyFree(page) && !migration).


Want me to move it?

--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ieee1394: update MAINTAINERS database

2007-04-20 Thread Ben Collins

On Fri, 2007-04-20 at 23:21 +0200, Stefan Richter wrote:
> - update Ben's address
>   - replace Ben's contact by mine as raw1394's 2nd contact
>   - eth1394's and pcilynx's maintenance doesn't really differ from that
> of other parts of the stack like video1394
> 
> Signed-off-by: Stefan Richter <[EMAIL PROTECTED]>
> ---
> 
> Ben, is this correct?

Looks good to me.

-- 
Ubuntu:http://www.ubuntu.com/
Linux1394: http://www.linux1394.org/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Getting the new RxRPC patches upstream

2007-04-20 Thread Oleg Nesterov

On 04/20, Andrew Morton wrote:
>
> On Fri, 20 Apr 2007 11:41:46 +0100
> David Howells <[EMAIL PROTECTED]> wrote:
> 
> > There are only two non-net patches that AF_RXRPC depends on:
> > 
> >  (1) The key facility changes.  That's all my code anyway, and shouldn't be 
> > a
> >  problem to merge unless someone else has put some changes in there 
> > that I
> >  don't know about.
> > 
> >  (2) try_to_cancel_delayed_work().  I suppose I could use
> >  cancel_delayed_work() instead, but that's less efficient as it waits 
> > for
> >  the timer completion function to finish.
> 
> There are significant workqueue changes in -mm and I plan to send them
> in for 2.6.22.  I doubt if there's anything in there which directly
> affects cancel_delayed_work(), but making changes of this nature against
> 2.6.21 might lead to grief.

I think it is better to use cancel_delayed_work(), but change it to use
del_timer(). I belive cancel_delayed_work() doesn't need del_timer_sync().

We only care when del_timer() returns true. In that case, if the timer
function still runs (possible for single-threaded wqs), it has already
passed __queue_work().

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] lazy freeing of memory through MADV_FREE 2/2

2007-04-20 Thread Ulrich Drepper


On 4/20/07, Andrew Morton <[EMAIL PROTECTED]> wrote:

OK, we need to flesh this out a lot please.  People often get confused
about what our MADV_DONTNEED behaviour is.


Well, there's not really much to flesh out.  The current MADV_DONTNEED
is useful in some situations.  The behavior cannot be changed, even
glibc will rely on it for the case when MADV_FREE is not supported.

What might be nice to have is to have a POSIX-compliant
POSIX_MADV_DONTNEED implementation.  We currently do nothing which is
OK since no test suite can detect that.  But some code might want to
use the real behavior and we're missing an optimization possibility.

Just for reference: the MADV_CURRENT behavior is to throw away data in
the range.  The POSIX_MADV_DONTNEED behavior is to never lose data.
I.e., file backed data is written back, anon data is at most swapped
out.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ieee1394: update MAINTAINERS database

2007-04-20 Thread Stefan Richter

  - update Ben's address
  - replace Ben's contact by mine as raw1394's 2nd contact
  - eth1394's and pcilynx's maintenance doesn't really differ from that
of other parts of the stack like video1394

Signed-off-by: Stefan Richter <[EMAIL PROTECTED]>
---

Ben, is this correct?


 MAINTAINERS |   22 --
 1 file changed, 4 insertions(+), 18 deletions(-)

Index: linux/MAINTAINERS
===
--- linux.orig/MAINTAINERS
+++ linux/MAINTAINERS
@@ -1681,7 +1681,7 @@ S:Maintained
 
 IEEE 1394 SUBSYSTEM
 P: Ben Collins
-M: [EMAIL PROTECTED]
+M: [EMAIL PROTECTED]
 P: Stefan Richter
 M: [EMAIL PROTECTED]
 L: [EMAIL PROTECTED]
@@ -1689,25 +1689,11 @@ W:  http://www.linux1394.org/
 T: git kernel.org:/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6.git
 S: Maintained
 
-IEEE 1394 IPV4 DRIVER (eth1394)
-P: Stefan Richter
-M: [EMAIL PROTECTED]
-L: [EMAIL PROTECTED]
-S: Odd Fixes
-
-IEEE 1394 PCILYNX DRIVER
-P: Jody McIntyre
-M: [EMAIL PROTECTED]
-P: Stefan Richter
-M: [EMAIL PROTECTED]
-L: [EMAIL PROTECTED]
-S: Odd Fixes
-
-IEEE 1394 RAW I/O DRIVER
-P: Ben Collins
-M: [EMAIL PROTECTED]
+IEEE 1394 RAW I/O DRIVER (raw1394)
 P: Dan Dennedy
 M: [EMAIL PROTECTED]
+P: Stefan Richter
+M: [EMAIL PROTECTED]
 L: [EMAIL PROTECTED]
 S: Maintained
 

-- 
Stefan Richter
-=-=-=== -=-- =-=--
http://arcgraph.de/sr/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH(experimental) 2/2] Fix freezer-kthread_stop race

2007-04-20 Thread Oleg Nesterov

On 04/20, Gautham R Shenoy wrote:
>
> On Fri, Apr 20, 2007 at 10:54:36AM +0200, Rafael J. Wysocki wrote:
> > 
> > Hmm, can't we do something like this instead:
> > 
> > ---
> >  kernel/kthread.c |   10 ++
> >  1 file changed, 10 insertions(+)
> > 
> > Index: linux-2.6.21-rc7/kernel/kthread.c
> > ===
> > --- linux-2.6.21-rc7.orig/kernel/kthread.c
> > +++ linux-2.6.21-rc7/kernel/kthread.c
> > @@ -13,6 +13,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> > 
> >  /*
> > @@ -232,6 +233,15 @@ int kthread_stop(struct task_struct *k)
> > 
> > /* Now set kthread_should_stop() to true, and wake it up. */
> > kthread_stop_info.k = k;
> > +   if (!(current->flags & PF_NOFREEZE)) {
> > +   /* If we are freezable, the freezer will wait for us */
> > +   task_lock(k);
> > +   k->flags |= PF_NOFREEZE;
> > +   if (frozen(k))
> > +   k->flags &= ~PF_FROZEN;
> > +
> > +   task_unlock(k);
> > +   }
> 
> Yes, we can do this for now since the tasks have only two freeze states,
> namely Freezeable and Non Freezeable. 

No, we can't change k->flags, k owns its ->flags, and it is not atomic.

Rafael, may I suggest you to document task_lock() in thaw_process() ? This
looks really confusing, as if task_lock() protects "p->flags &= ~PF_FROZEN".

Actually, task_lock() is needed to prevent the race with refrigerator()
when the freezing fails, but this is not obvious.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cciss: Fix warnings during compilation under 32bitenvironment

2007-04-20 Thread Andrew Morton

On Fri, 20 Apr 2007 16:20:59 -0400
James Bottomley <[EMAIL PROTECTED]> wrote:

> On Fri, 2007-04-20 at 12:30 -0700, Andrew Morton wrote:
> > On Fri, 20 Apr 2007 14:50:06 -0400
> > James Bottomley <[EMAIL PROTECTED]> wrote:
> > 
> > > > CONFIG_LBD=y gives us an additional 3kb of instructions on i386
> > > > allnoconfig.  Other architectures might do less well.  It's not a huge
> > > > difference, but that's the way in which creeping bloatiness happens.
> > > 
> > > OK, sure, but if we really care about this saving, then unconditionally
> > > casting to u64 is therefore wrong as well ... this is starting to open
> > > quite a large can of worms ...
> > > 
> > > For the record, if we have to do this, I fancy sector_upper_32() ... we
> > > should already have some similar accessor for dma_addr_t as well.
> > 
> > hm.  How about this?
> > 
> > --- a/include/linux/kernel.h~upper-32-bits
> > +++ a/include/linux/kernel.h
> > @@ -40,6 +40,17 @@ extern const char linux_proc_banner[];
> >  #define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))
> >  #define roundup(x, y) x) + ((y) - 1)) / (y)) * (y))
> >  
> > +/**
> > + * upper_32_bits - return bits 32-63 of a number
> > + * @n: the number we're accessing
> > + *
> > + * A basic shift-right of a 64- or 32-bit quantity.  Use this to suppress
> > + * the "right shift count >= width of type" warning when that quantity is
> > + * 32-bits.
> > + */
> > +#define upper_32_bits(n) (((u64)(n)) >> 32)
> 
> Won't this have the unwanted side effect of promoting everything in a
> calculation to long long on 32 bit platforms, even if n was only 32
> bits?

bummer.

> > +
> > +
> >  #defineKERN_EMERG  "<0>"   /* system is unusable   
> > */
> >  #defineKERN_ALERT  "<1>"   /* action must be taken immediately 
> > */
> >  #defineKERN_CRIT   "<2>"   /* critical conditions  
> > */
> > _
> > 
> > It seems to generate the desired code.  I avoided Alan's ((n >> 31) >> 1)
> > trick because it'll generate peculiar results with signed 64-bit
> > quantities.
> 
> I've seen the trick done similarly with ((n >> 16) >> 16) which
> shouldn't have the issue.

That works if we know the caller is treating the return value as 32 bits,
but we don't know that.

If we have

#define upper_32_bits(x)  ((x >> 16) >> 16)

then

upper_32_bits(0x)

will return 0x if it's treated as 32-bits, but it'll return
0x if the caller is using 64-bits.

I spose

#define upper_32_bits(x)  ((u32)((x >> 16) >> 16))

will do the trick.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH(experimental) 2/2] Fix freezer-kthread_stop race

2007-04-20 Thread Oleg Nesterov

On 04/19, Gautham R Shenoy wrote:
>
> @@ -63,12 +74,16 @@ void refrigerator(void)
>   recalc_sigpending(); /* We sent fake signal, clean it up */
>   spin_unlock_irq(>sighand->siglock);
>  
> + task_lock(current);
>   for (;;) {
>   set_current_state(TASK_UNINTERRUPTIBLE);
>   if (!frozen(current))
>   break;
> + task_unlock(current);
>   schedule();
> + task_lock(current);
>   }
> + task_unlock(current);
>   pr_debug("%s left refrigerator\n", current->comm);
>   current->state = save;

Just curious, why this change?

> +int hold_freezer_for_task(struct task_struct *p)
> +{
> + int ret = 0;
> + spin_lock(_status.lock);
> + if (freezer_status.count >= 0)
> + {
> + set_tsk_thread_flag(p, TIF_FREEZER_HELD);
> + thaw_process(p);
> + freezer_status.count++;
> + ret = 1;
> + }
> + spin_unlock(_status.lock);
> +
> + return ret;
> +}

I think this can work if it is used only in kthread_stop(). But what if
another task wants to do hold_freezer_for_task(p) ? freezer_status.count
is recursive, but TIF_FREEZER_HELD is not. IOW, I believe this is not
generic enough.

Also, you are planning to add different freezing states (FE_HOTPLUG_CPU,
FE_SUSPEND, etc). In that case each of them needs a separate .count, because
it should be negative when try_to_freeze_tasks() returns. Now consider
the case when we are doing freeze_processes(FE_A | FE_B) ...

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH(experimental) 2/2] Fix freezer-kthread_stop race

2007-04-20 Thread Rafael J. Wysocki

On Friday, 20 April 2007 20:31, Ingo Molnar wrote:
> 
> * Andrew Morton <[EMAIL PROTECTED]> wrote:
> 
> > > > I mean, we already have four of them (PF_NOFREEZE, PF_FROZEN, 
> > > > PF_FREEZER_SKIP, TIF_FREEZE), and you will need to introduce two 
> > > > more for the freezer-based CPU hotplug, so if yet another one is 
> > > > needed, that will make up almost a separate u8 field ...
> > > 
> > > I am perfectly ok with it. But I am not sure if everybody would 
> > > agree to have another field in the task struct, though in this case 
> > > it does make sense :-)
> > 
> > OK by me.  You might want to consider making that fields's locking 
> > protocol be set_bit(), clear_bit(), etc rather than task_lock().
> 
> is OK to me too, the extra field isnt a problem.

OK, so I'll try to prepare a patch introducing it over the weekend. :-)

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] lazy freeing of memory through MADV_FREE 2/2

2007-04-20 Thread Andrew Morton

On Thu, 19 Apr 2007 17:15:28 -0400
Rik van Riel <[EMAIL PROTECTED]> wrote:

> Restore MADV_DONTNEED to its original Linux behaviour.  This is still
> not the same behaviour as POSIX, but applications may be depending on
> the Linux behaviour already. Besides, glibc catches POSIX_MADV_DONTNEED
> and makes sure nothing is done...

OK, we need to flesh this out a lot please.  People often get confused
about what our MADV_DONTNEED behaviour is.  I regularly forget, then look
at the code, then get it wrong.  That's for mainline, let alone older
kernels whose behaviour is gawd-knows-what.

So...  For the changelog (and the manpage) could we please have a full
description of the 2.6.21 behaviour and the 2.6.21-post-rik behaviour (and
the 2.4 behaviour, if it differs at all)?  Also some code comments to
demystify all of this once and for all?

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] dev_dbg: check dev_dbg() arguments

2007-04-20 Thread Dan Williams

Duplicate what Zach Brown did for pr_debug in commit
8b2a1fd1b394c60eaa2587716102dd5e9b4e5990

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 include/linux/device.h |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 5cf30e9..b6825d0 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -554,7 +554,11 @@ extern const char *dev_driver_string(struct device *dev);
 #define dev_dbg(dev, format, arg...)   \
dev_printk(KERN_DEBUG , dev , format , ## arg)
 #else
-#define dev_dbg(dev, format, arg...) do { (void)(dev); } while (0)
+static inline int __attribute__ ((format (printf, 2, 3)))
+dev_dbg(struct device * dev, const char * fmt, ...)
+{
+   return 0;
+}
 #endif
 
 #define dev_err(dev, format, arg...)   \
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] lazy freeing of memory through MADV_FREE

2007-04-20 Thread Andrew Morton

On Tue, 17 Apr 2007 03:15:51 -0400
Rik van Riel <[EMAIL PROTECTED]> wrote:

> Make it possible for applications to have the kernel free memory
> lazily.  This reduces a repeated free/malloc cycle from freeing
> pages and allocating them, to just marking them freeable.  If the
> application wants to reuse them before the kernel needs the memory,
> not even a page fault will happen.
> 
> This patch, together with Ulrich's glibc change, increases
> MySQL sysbench performance by a factor of 2 on my quad core
> test system.
> 
> Signed-off-by: Rik van Riel <[EMAIL PROTECTED]>
> 
> ---
> Ulrich Drepper has test glibc RPMS for this functionality at:
> 
>  http://people.redhat.com/drepper/rpms
> 
> Andrew, I have stress tested this patch for a few days now and
> have not been able to find any more bugs.  I believe it is ready
> to be merged in -mm, and upstream at the next merge window.
> 
> When the patch goes upstream, I will submit a small follow-up
> patch to revert MADV_DONTNEED behaviour to what it did previously
> and have the new behaviour trigger only on MADV_FREE: at that
> point people will have to get new test RPMs of glibc.
> 
> 

I've also merged Nick's "mm: madvise avoid exclusive mmap_sem".

- Nick's patch also will help this problem.  It could be that your patch
  no longer offers a 2x speedup when combined with Nick's patch.

  It could well be that the combination of the two is even better, but it
  would be nice to firm that up a bit.  Chewing a page flag is an expensive
  thing to do.

  I do go on about that.  But we're adding page flags at about one per
  year, and when we run out we're screwed - we'll need to grow the
  pageframe.

- I need to update your patch for Nick's patch.  Please confirm that
  down_read(mmap_sem) is sufficient for MADV_FREE.


Stylistic nit:

> + if (PageLazyFree(page) && !migration) {
> + /* There is new data in the page.  Reinstate it. */
> + if (unlikely(pte_dirty(pteval))) {
> + set_pte_at(mm, address, pte, pteval);
> + ret = SWAP_FAIL;
> + goto out_unmap;
> + }

The comment should be inside the second `if' statement.  As it is, It
looks like we reinstate the page if (PageLazyFree(page) && !migration).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: why UDF have so ugly filesize limit?

2007-04-20 Thread Jan Kara

> from fs/udf/super.c:
> in function udf_fill_super
> sb->s_maxbytes = 1<<30; (1 GB)
> 
> Why sb->s_maxbytes is not equal to MAX_LFS_FILESIZE?
  Because UDF had some flaws and user could crash a kernel with larger
filesize. In -mm kernel are patches fixing the flaw and also raising the
limit back to MAX_LFS_FILESIZE.

> So, in include/linux/fs.h written that the filesystems should put that
> (MAX_LFS_FILESIZE) into their s_maxbytes, otherwise bad things can
> happen in VM.
  Bad things can happen only if you set it to more than
MAX_LFS_FILESIZE. With smaller values only users are disappointed ;).

Honza
-- 
Jan Kara <[EMAIL PROTECTED]>
SuSE CR Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure

2007-04-20 Thread Linas Vepstas


Implement the so-called "first failure data capture" (FFDC) for the
symbios PCI error recovery.  After a PCI error event is reported,
the driver requests that MMIO be enabled. Once enabled, it 
then reads and dumps assorted status registers, and concludes
by requesting the usual reset sequence.

(includes a whitespace fix for bad indentation).

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/scsi/sym53c8xx_2/sym_glue.c |   15 +++
 drivers/scsi/sym53c8xx_2/sym_glue.h |1 +
 drivers/scsi/sym53c8xx_2/sym_hipd.c |   18 ++
 3 files changed, 30 insertions(+), 4 deletions(-)

Index: linux-2.6.21-rc4-git4/drivers/scsi/sym53c8xx_2/sym_glue.c
===
--- linux-2.6.21-rc4-git4.orig/drivers/scsi/sym53c8xx_2/sym_glue.c  
2007-04-20 12:52:01.0 -0500
+++ linux-2.6.21-rc4-git4/drivers/scsi/sym53c8xx_2/sym_glue.c   2007-04-20 
15:25:35.0 -0500
@@ -1987,6 +1987,20 @@ static pci_ers_result_t sym2_io_error_de
disable_irq(pdev->irq);
pci_disable_device(pdev);
 
+   /* Request that MMIO be enabled, so register dump can be taken. */
+   return PCI_ERS_RESULT_CAN_RECOVER;
+}
+
+/**
+ * sym2_io_slot_dump -- Enable MMIO and dump debug registers
+ * @pdev: pointer to PCI device
+ */
+static pci_ers_result_t sym2_io_slot_dump (struct pci_dev *pdev)
+{
+   struct sym_hcb *np = pci_get_drvdata(pdev);
+
+   sym_dump_registers(np);
+
/* Request a slot reset. */
return PCI_ERS_RESULT_NEED_RESET;
 }
@@ -2241,6 +2255,7 @@ MODULE_DEVICE_TABLE(pci, sym2_id_table);
 
 static struct pci_error_handlers sym2_err_handler = {
.error_detected = sym2_io_error_detected,
+   .mmio_enabled = sym2_io_slot_dump,
.slot_reset = sym2_io_slot_reset,
.resume = sym2_io_resume,
 };
Index: linux-2.6.21-rc4-git4/drivers/scsi/sym53c8xx_2/sym_glue.h
===
--- linux-2.6.21-rc4-git4.orig/drivers/scsi/sym53c8xx_2/sym_glue.h  
2007-04-20 12:15:07.0 -0500
+++ linux-2.6.21-rc4-git4/drivers/scsi/sym53c8xx_2/sym_glue.h   2007-04-20 
15:21:31.0 -0500
@@ -270,5 +270,6 @@ void sym_xpt_async_bus_reset(struct sym_
 void sym_xpt_async_sent_bdr(struct sym_hcb *np, int target);
 int  sym_setup_data_and_start (struct sym_hcb *np, struct scsi_cmnd *csio, 
struct sym_ccb *cp);
 void sym_log_bus_error(struct sym_hcb *np);
+void sym_dump_registers(struct sym_hcb *np);
 
 #endif /* SYM_GLUE_H */
Index: linux-2.6.21-rc4-git4/drivers/scsi/sym53c8xx_2/sym_hipd.c
===
--- linux-2.6.21-rc4-git4.orig/drivers/scsi/sym53c8xx_2/sym_hipd.c  
2007-04-20 12:18:59.0 -0500
+++ linux-2.6.21-rc4-git4/drivers/scsi/sym53c8xx_2/sym_hipd.c   2007-04-20 
15:18:01.0 -0500
@@ -1180,10 +1180,10 @@ static void sym_log_hard_error(struct sy
scr_to_cpu((int) *(u32 *)(script_base + script_ofs)));
}
 
-printf ("%s: regdump:", sym_name(np));
-for (i=0; i<24;i++)
-printf (" %02x", (unsigned)INB_OFF(np, i));
-printf (".\n");
+   printf ("%s: regdump:", sym_name(np));
+   for (i=0; i<24;i++)
+   printf (" %02x", (unsigned)INB_OFF(np, i));
+   printf (".\n");
 
/*
 *  PCI BUS error.
@@ -1192,6 +1192,16 @@ static void sym_log_hard_error(struct sy
sym_log_bus_error(np);
 }
 
+void sym_dump_registers(struct sym_hcb *np)
+{
+   u_short sist;
+   u_char dstat;
+
+   sist = INW(np, nc_sist);
+   dstat = INB(np, nc_dstat);
+   sym_log_hard_error(np, sist, dstat);
+}
+
 static struct sym_chip sym_dev_table[] = {
  {PCI_DEVICE_ID_NCR_53C810, 0x0f, "810", 4, 8, 4, 64,
  FE_ERL}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-20 Thread Bill Davidsen


Ingo Molnar wrote:

( Lets be cautious though: the jury is still out whether people actually 
  like this more than the current approach. While CFS feedback looks 
  promising after a whopping 3 days of it being released [ ;-) ], the 
  test coverage of all 'fairness centric' schedulers, even considering 
  years of availability is less than 1% i'm afraid, and that < 1% was 
  mostly self-selecting. )


All of my testing has been on desktop machines, although in most cases 
they were really loaded desktops which had load avg 10..100 from time to 
time, and none were low memory machines. Up to CFS v3 I thought 
nicksched was my winner, now CFSv3 looks better, by not having stumbles 
under stupid loads.


I have not tested:
  1 - server loads, nntp, smtp, etc
  2 - low memory machines
  3 - uniprocessor systems

I think this should be done before drawing conclusions. Or if someone 
has tried this, perhaps they would report what they saw. People are 
talking about smoothness, but not how many pages per second come out of 
their overloaded web server.


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Acecad USB Tablet: usbmouse takeover and odd motion

2007-04-20 Thread Vojtech Pavlik

On Fri, Apr 20, 2007 at 06:09:55PM +0200, Giuseppe Bilotta wrote:
> On 4/20/07, Dmitry Torokhov <[EMAIL PROTECTED]> wrote:
> >On 4/20/07, Giuseppe Bilotta <[EMAIL PROTECTED]> wrote:
> >>
> >> Sorry, it seems I was wrong, it's not usbhid but usbmouse taking over.
> >> After a fresh plug (e.g. at bootup) I get the following:
> >>
> >
> >Well, the question is - why do you have usbmouse module on your system?
> 
> Stock Debian kernel 2.6.18 comes with it.
> 
> With my custom kernels I can probably skip compiling it at all, if you
> so suggest; should I blacklist it for the distro kernel? Or is there a
> chance that some random USB mouse plugged in would fail to function by
> doing so?
 
usbmouse and usbkbd are only intended for embedded systems where the
full usbhid doesn't fit and for testing purposes: Normal distros
shouldn't have them enabled.

-- 
Vojtech Pavlik
Director SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/8] Kconfig: silicon backplane dependency.

2007-04-20 Thread Michael Buesch

On Friday 20 April 2007 13:35, Martin Schwidefsky wrote:
> From: Martin Schwidefsky <[EMAIL PROTECTED]>
> 
> Make the "Sonics Silicon Backplane" menu dependent on the two buses
> it can be found on.
> Goes on top of git-wireless.patch.
> 
> Cc: Michael Buesch <[EMAIL PROTECTED]>
> Cc: John W. Linville <[EMAIL PROTECTED]>
> Signed-off-by: Martin Schwidefsky <[EMAIL PROTECTED]>
> ---
> 
>  drivers/ssb/Kconfig |1 +
>  1 files changed, 1 insertion(+)
> 
> diff -urpN linux-2.6/drivers/ssb/Kconfig linux-2.6-patched/drivers/ssb/Kconfig
> --- linux-2.6/drivers/ssb/Kconfig 2007-04-19 15:24:40.0 +0200
> +++ linux-2.6-patched/drivers/ssb/Kconfig 2007-04-19 15:55:44.0 
> +0200
> @@ -1,4 +1,5 @@
>  menu "Sonics Silicon Backplane"
> + depends on PCI || PCMCIA

This is wrong. SSB does not depend on PCI or PCMCIA.
SSB can (and does) stay very well on its own feet and
can be the main system bus.
Most Linksys WRT routers work that way. They have no
PCI bus, but a SSB bus instead.

Nevertheless, I'm not sure what your problem really is.
Does a s390 machine exist with a B44 card? I doubt it.
So what about the following: We simply add
a DEPENDS ON !S390 to both SSB and B44.
I really thing that is the right fix for this.
The patch above is clearly not, as it breaks things for
embedded devices without PCI or PCMCIA bus.

-- 
Greetings Michael.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [d_path 0/7] Fixes to d_path: Respin

2007-04-20 Thread Miklos Szeredi

> > I gave a chroot example that showed that in the current
> > implementation, you can get pretty random clashes between mounts; there are
> > other cases with lazy unmounts as well.
> 
> Irrelevant as well.  If you create chroot problems it's your problem.
> 
> The fact is that if you have a normal setup the code works fine.  All
> other situations cannot be handled with the current kernel interface.
> 
> This does not give anybody the right to say "since the code doesn't
> always work we can break it completely".  That's completely
> unacceptable.

I'm not sure I understand the situation completely.  What exactly is
broken in libc by removing unreachable mounts from /proc/mounts?

Is it the situation when
 - file descriptor is opened
 - process does chroot
 - process does fstatvfs on file descriptor
?

In that case currently fstatvfs() _usually_ gives the correct results,
but can give wrong results if mounts paths accidently clash in
/proc/mounts?

Also isn't it the case, that fstatvfs() or statvfs() performed within
the chroot could also give incorrect result for a _reachable_ mount if
it clashes with an unreachable mount?

If this is the case, I would think that removing the unreachable
mounts from /proc/mounts, would actually be fixing this second case,
which is more likely to be used anyway.

BTW, this patch, or at least a predecessor is in -mm, and it very much
feels the Right Thing(tm).  The /proc/mounts under a chroot
environment actually looks sane, instead of some random crap, that it
was previously.

While we should make every effort to keep the kernel interfaces
stable, this shouldn't prevent us from fixing bugs.  And this one is
clearly a bug, even if not a very serious one.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2]: PCI Error Recovery: Symbios SCSI base support

2007-04-20 Thread Linas Vepstas



Hi Matthew,

After a long hiatus, I took another stab at pci error recovery 
for the symbios. This is very nearly the same patch as before, 
with only an update to enable MWI, and to support chip workarounds.
I think I've addressed all the other issues that came up. Thus,
again, I'll ask that the patch go in (for 2.6.22 of course).


To recap the only outstanding issue:

>> @@ -657,6 +657,10 @@ static irqreturn_t sym53c8xx_intr(int ir
>> + /* Avoid spinloop trying to handle interrupts on frozen device */
>> + if (pci_channel_offline(np->s.device))
>> + return IRQ_HANDLED;
>
>Just wondering ... should we really be returning HANDLED?  What if the
>IRQ is shared?  Will the hardware de-assert the level interrupt when it
>puts the device in reset (ie is this a transitory glitch?), or do we
>have to cope with a screaming interrupt?

This routine *always* returns HANDLED anyway, so this patch does
not change semantics. For a symbios device plugged into a shared
irq line, this is a problem with or without my patch.

Yes, irq's will typically scream until handled. Yes, the device
reset will eventually clear the irq, assuming the system doesn't 
deadlock on a screaming irq. 

--linas

Here's the formal changelog entry:

Various PCI bus errors can be signaled by newer PCI controllers.  
This patch adds the PCI error recovery callbacks to the Symbios 
SCSI device driver.  The patch has been tested, and appears to 
work well.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>

--
 drivers/scsi/sym53c8xx_2/sym_glue.c |  136 
 drivers/scsi/sym53c8xx_2/sym_glue.h |4 +
 drivers/scsi/sym53c8xx_2/sym_hipd.c |6 +
 3 files changed, 146 insertions(+)

Index: linux-2.6.21-rc4-git4/drivers/scsi/sym53c8xx_2/sym_glue.c
===
--- linux-2.6.21-rc4-git4.orig/drivers/scsi/sym53c8xx_2/sym_glue.c  
2007-04-20 12:07:38.0 -0500
+++ linux-2.6.21-rc4-git4/drivers/scsi/sym53c8xx_2/sym_glue.c   2007-04-20 
12:52:01.0 -0500
@@ -657,6 +657,10 @@ static irqreturn_t sym53c8xx_intr(int ir
unsigned long flags;
struct sym_hcb *np = (struct sym_hcb *)dev_id;
 
+   /* Avoid spinloop trying to handle interrupts on frozen device */
+   if (pci_channel_offline(np->s.device))
+   return IRQ_HANDLED;
+
if (DEBUG_FLAGS & DEBUG_TINY) printf_debug ("[");
 
spin_lock_irqsave(np->s.host->host_lock, flags);
@@ -726,6 +730,20 @@ static int sym_eh_handler(int op, char *
 
dev_warn(>device->sdev_gendev, "%s operation started.\n", opname);
 
+   /* We may be in an error condition because the PCI bus
+* went down. In this case, we need to wait until the
+* PCI bus is reset, the card is reset, and only then
+* proceed with the scsi error recovery.  There's no
+* point in hurrying; take a leisurely wait.
+*/
+#define WAIT_FOR_PCI_RECOVERY  35
+   if (pci_channel_offline(np->s.device)) {
+   int finished_reset = wait_for_completion_timeout(
+   >s.io_reset_wait, WAIT_FOR_PCI_RECOVERY*HZ);
+   if (!finished_reset)
+   return SCSI_FAILED;
+   }
+
spin_lock_irq(host->host_lock);
/* This one is queued in some place -> to wait for completion */
FOR_EACH_QUEUED_ELEMENT(>busy_ccbq, qp) {
@@ -1510,6 +1528,7 @@ static struct Scsi_Host * __devinit sym_
np->maxoffs = dev->chip.offset_max;
np->maxburst= dev->chip.burst_max;
np->myaddr  = dev->host_id;
+   init_completion(>s.io_reset_wait);
 
/*
 *  Edit its name.
@@ -1948,6 +1967,116 @@ static void __devexit sym2_remove(struct
attach_count--;
 }
 
+/**
+ * sym2_io_error_detected() -- called when PCI error is detected
+ * @pdev: pointer to PCI device
+ * @state: current state of the PCI slot
+ */
+static pci_ers_result_t sym2_io_error_detected (struct pci_dev *pdev,
+ enum pci_channel_state state)
+{
+   struct sym_hcb *np = pci_get_drvdata(pdev);
+
+   /* If slot is permanently frozen, turn everything off */
+   if (state == pci_channel_io_perm_failure) {
+   sym2_remove(pdev);
+   return PCI_ERS_RESULT_DISCONNECT;
+   }
+
+   init_completion(>s.io_reset_wait);
+   disable_irq(pdev->irq);
+   pci_disable_device(pdev);
+
+   /* Request a slot reset. */
+   return PCI_ERS_RESULT_NEED_RESET;
+}
+
+/**
+ * sym2_reset_workarounds -- hardware-specific work-arounds
+ *
+ * This routine is similar to sym_set_workarounds(), except
+ * that, at this point, we already know that the device was 
+ * succesfully intialized at least once before, and so most
+ * of the steps taken there are un-needed here. 
+ */
+static void sym2_reset_workarounds (struct pci_dev *pdev)
+{
+   u_char revision;
+   u_short status_reg;
+   struct

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-20 Thread Bill Davidsen


Mike Galbraith wrote:

On Tue, 2007-04-17 at 05:40 +0200, Nick Piggin wrote:

On Tue, Apr 17, 2007 at 04:29:01AM +0200, Mike Galbraith wrote:
 

Yup, and progress _is_ happening now, quite rapidly.

Progress as in progress on Ingo's scheduler. I still don't know how we'd
decide when to replace the mainline scheduler or with what.

I don't think we can say Ingo's is better than the alternatives, can we?


No, that would require massive performance testing of all alternatives.


If there is some kind of bakeoff, then I'd like one of Con's designs to
be involved, and mine, and Peter's...


The trouble with a bakeoff is that it's pretty darn hard to get people
to test in the first place, and then comes weighting the subjective and
hard performance numbers.  If they're close in numbers, do you go with
the one which starts the least flamewars or what?

Here we disagree... I picked a scheduler not by running benchmarks, but 
by running loads which piss me off with the mainline scheduler. And then 
I ran the other schedulers for a while to find the things, normal things 
I do, which resulted in bad behavior. And when I found one which had (so 
far) no such cases I called it my winner, but I haven't tested it under 
server load, so I can't begin to say it's "the best."


What we need is for lots of people to run every scheduler in real life, 
and do "worst case analysis" by finding the cases which cause bad 
behavior. And if there were a way to easily choose another scheduler, 
call it plugable, modular, or Russian Roulette, people who found a worst 
case would report it (aka bitch about it) and try another. But the 
average user is better able to boot with an option like "sched=cfs" (or 
sc, or nick, or ...) than to patch and build a kernel. So if we don't 
get easily switched schedulers people will not test nearly as well.


The best scheduler isn't the one 2% faster than the rest, it's the one 
with the fewest jackpot cases where it sucks. And if the mainline had 
multiple schedulers this testing would get done, authors would get more 
reports and have a better chance of fixing corner cases.


Note that we really need multiple schedulers to make people happy, 
because fairness is not the most desirable behavior on all machines, and 
adding knobs probably isn't the answer. I want a server to degrade 
gently, I want my desktop to show my movie and echo my typing, and if 
that's hard on compiles or the file transfer, so be it. Con doesn't want 
to compromise his goals, I agree but want to have an option if I don't 
share them.


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/5] NFS: Fix the 'desynchronized value of nfs_i.ncommit' error

2007-04-20 Thread Trond Myklebust

From: Trond Myklebust <[EMAIL PROTECTED]>

Redirtying a request that is already marked for commit will screw up the
accounting for NR_UNSTABLE_NFS as well as nfs_i.ncommit.
Ensure that all requests on the commit queue are labelled with the
PG_NEED_COMMIT flag, and avoid moving them onto the dirty list inside
nfs_page_mark_flush().

Also inline nfs_mark_request_dirty() into nfs_page_mark_flush() for
atomicity reasons. Avoid dropping the spinlock until we're done marking the
request in the radix tree and have added it to the ->dirty list.

Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]>
---

 fs/nfs/write.c |   47 ++-
 1 files changed, 22 insertions(+), 25 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 8e94246..ce5b4a9 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -38,7 +38,6 @@
 static struct nfs_page * nfs_update_request(struct nfs_open_context*,
struct page *,
unsigned int, unsigned int);
-static void nfs_mark_request_dirty(struct nfs_page *req);
 static long nfs_flush_mapping(struct address_space *mapping, struct 
writeback_control *wbc, int how);
 static const struct rpc_call_ops nfs_write_partial_ops;
 static const struct rpc_call_ops nfs_write_full_ops;
@@ -255,7 +254,8 @@ static void nfs_end_page_writeback(struct page *page)
 static int nfs_page_mark_flush(struct page *page)
 {
struct nfs_page *req;
-   spinlock_t *req_lock = _I(page->mapping->host)->req_lock;
+   struct nfs_inode *nfsi = NFS_I(page->mapping->host);
+   spinlock_t *req_lock = >req_lock;
int ret;
 
spin_lock(req_lock);
@@ -279,11 +279,23 @@ static int nfs_page_mark_flush(struct page *page)
return ret;
spin_lock(req_lock);
}
-   spin_unlock(req_lock);
+   if (test_bit(PG_NEED_COMMIT, >wb_flags)) {
+   /* This request is marked for commit */
+   spin_unlock(req_lock);
+   nfs_unlock_request(req);
+   return 1;
+   }
if (nfs_set_page_writeback(page) == 0) {
nfs_list_remove_request(req);
-   nfs_mark_request_dirty(req);
-   }
+   /* add the request to the inode's dirty list. */
+   radix_tree_tag_set(>nfs_page_tree,
+   req->wb_index, NFS_PAGE_TAG_DIRTY);
+   nfs_list_add_request(req, >dirty);
+   nfsi->ndirty++;
+   spin_unlock(req_lock);
+   __mark_inode_dirty(page->mapping->host, I_DIRTY_PAGES);
+   } else
+   spin_unlock(req_lock);
ret = test_bit(PG_NEED_FLUSH, >wb_flags);
nfs_unlock_request(req);
return ret;
@@ -406,24 +418,6 @@ static void nfs_inode_remove_request(struct nfs_page *req)
nfs_release_request(req);
 }
 
-/*
- * Add a request to the inode's dirty list.
- */
-static void
-nfs_mark_request_dirty(struct nfs_page *req)
-{
-   struct inode *inode = req->wb_context->dentry->d_inode;
-   struct nfs_inode *nfsi = NFS_I(inode);
-
-   spin_lock(>req_lock);
-   radix_tree_tag_set(>nfs_page_tree,
-   req->wb_index, NFS_PAGE_TAG_DIRTY);
-   nfs_list_add_request(req, >dirty);
-   nfsi->ndirty++;
-   spin_unlock(>req_lock);
-   __mark_inode_dirty(inode, I_DIRTY_PAGES);
-}
-
 static void
 nfs_redirty_request(struct nfs_page *req)
 {
@@ -438,7 +432,7 @@ nfs_dirty_request(struct nfs_page *req)
 {
struct page *page = req->wb_page;
 
-   if (page == NULL)
+   if (page == NULL || test_bit(PG_NEED_COMMIT, >wb_flags))
return 0;
return !PageWriteback(req->wb_page);
 }
@@ -456,6 +450,7 @@ nfs_mark_request_commit(struct nfs_page *req)
spin_lock(>req_lock);
nfs_list_add_request(req, >commit);
nfsi->ncommit++;
+   set_bit(PG_NEED_COMMIT, &(req)->wb_flags);
spin_unlock(>req_lock);
inc_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
@@ -470,7 +465,7 @@ int nfs_write_need_commit(struct nfs_write_data *data)
 static inline
 int nfs_reschedule_unstable_write(struct nfs_page *req)
 {
-   if (test_and_clear_bit(PG_NEED_COMMIT, >wb_flags)) {
+   if (test_bit(PG_NEED_COMMIT, >wb_flags)) {
nfs_mark_request_commit(req);
return 1;
}
@@ -557,6 +552,7 @@ static void nfs_cancel_commit_list(struct list_head *head)
req = nfs_list_entry(head->next);
dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
nfs_list_remove_request(req);
+   clear_bit(PG_NEED_COMMIT, &(req)->wb_flags);
nfs_inode_remove_request(req);
nfs_unlock_request(req);
}
@@ -1295,6 +1291,7 @@ static void nfs_commit_done(struct rpc_task *task, void 
*calldata)
while

Re: [PATCH 12/15] ide: make ide_hwif_t.ide_dma_host_on void

2007-04-20 Thread Sergei Shtylyov


Hello, once I wrote:


[PATCH] ide: make ide_hwif_t.ide_dma_host_on void


* since ide_hwif_t.ide_dma_host_on is called either when 
drive->using_dma == 1

  or when return value is discarded make it void, also drop "ide_" prefix
* make __ide_dma_host_on() void and drop "__" prefix


   BTW, it would also make sense to make hwif->ide_dma_timeout() and 
hwif->ide_dma_lostirq void too (and possibly drop the ide_ prefix). 
Their results are *explicitly* ignored.


  I've started preparing the patches and found out that aec62xx has completely 
bogus ide_dma_timeout() -- the same as ide_dma_lostirq() and it doesn't even 
call __ide_dma_timeout()... :-/
  Don't know whether to deal with this in a separate patch...

MBR, Sergei
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/5] RPC: Fix the TCP resend semantics for NFSv4

2007-04-20 Thread Trond Myklebust

From: Trond Myklebust <[EMAIL PROTECTED]>

Fix a regression due to the patch "NFS: disconnect before retrying NFSv4
requests over TCP"

The assumption made in xprt_transmit() that the condition
"req->rq_bytes_sent == 0 and request is on the receive list"
should imply that we're dealing with a retransmission is false.
Firstly, it may simply happen that the socket send queue was full
at the time the request was initially sent through xprt_transmit().
Secondly, doing this for each request that was retransmitted implies
that we disconnect and reconnect for _every_ request that happened to
be retransmitted irrespective of whether or not a disconnection has
already occurred.

Fix is to move this logic into the call_status request timeout handler.

Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]>
---

 net/sunrpc/clnt.c |4 
 net/sunrpc/xprt.c |   10 --
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 6d7221f..396cdbe 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1046,6 +1046,8 @@ call_status(struct rpc_task *task)
rpc_delay(task, 3*HZ);
case -ETIMEDOUT:
task->tk_action = call_timeout;
+   if (task->tk_client->cl_discrtry)
+   xprt_disconnect(task->tk_xprt);
break;
case -ECONNREFUSED:
case -ENOTCONN:
@@ -1169,6 +1171,8 @@ call_decode(struct rpc_task *task)
 out_retry:
req->rq_received = req->rq_private_buf.len = 0;
task->tk_status = 0;
+   if (task->tk_client->cl_discrtry)
+   xprt_disconnect(task->tk_xprt);
 }
 
 /*
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index ee6ffa0..456a145 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -735,16 +735,6 @@ void xprt_transmit(struct rpc_task *task)
xprt_reset_majortimeo(req);
/* Turn off autodisconnect */
del_singleshot_timer_sync(>timer);
-   } else {
-   /* If all request bytes have been sent,
-* then we must be retransmitting this one */
-   if (!req->rq_bytes_sent) {
-   if (task->tk_client->cl_discrtry) {
-   xprt_disconnect(xprt);
-   task->tk_status = -ENOTCONN;
-   return;
-   }
-   }
}
} else if (!req->rq_bytes_sent)
return;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/5] NFS: Fix race in nfs_set_page_dirty

2007-04-20 Thread Trond Myklebust

From: Trond Myklebust <[EMAIL PROTECTED]>

Protect nfs_set_page_dirty() against races with nfs_inode_add_request.

Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]>
---

 fs/nfs/write.c |   17 ++---
 1 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index ce5b4a9..7975589 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -388,6 +388,8 @@ static int nfs_inode_add_request(struct inode *inode, 
struct nfs_page *req)
}
SetPagePrivate(req->wb_page);
set_page_private(req->wb_page, (unsigned long)req);
+   if (PageDirty(req->wb_page))
+   set_bit(PG_NEED_FLUSH, >wb_flags);
nfsi->npages++;
atomic_inc(>wb_count);
return 0;
@@ -407,6 +409,8 @@ static void nfs_inode_remove_request(struct nfs_page *req)
set_page_private(req->wb_page, 0);
ClearPagePrivate(req->wb_page);
radix_tree_delete(>nfs_page_tree, req->wb_index);
+   if (test_and_clear_bit(PG_NEED_FLUSH, >wb_flags))
+   __set_page_dirty_nobuffers(req->wb_page);
nfsi->npages--;
if (!nfsi->npages) {
spin_unlock(>req_lock);
@@ -1527,15 +1531,22 @@ int nfs_wb_page(struct inode *inode, struct page* page)
 
 int nfs_set_page_dirty(struct page *page)
 {
+   spinlock_t *req_lock = _I(page->mapping->host)->req_lock;
struct nfs_page *req;
+   int ret;
 
-   req = nfs_page_find_request(page);
+   spin_lock(req_lock);
+   req = nfs_page_find_request_locked(page);
if (req != NULL) {
/* Mark any existing write requests for flushing */
-   set_bit(PG_NEED_FLUSH, >wb_flags);
+   ret = !test_and_set_bit(PG_NEED_FLUSH, >wb_flags);
+   spin_unlock(req_lock);
nfs_release_request(req);
+   return ret;
}
-   return __set_page_dirty_nobuffers(page);
+   ret = __set_page_dirty_nobuffers(page);
+   spin_unlock(req_lock);
+   return ret;
 }
 
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PCI bridge range sizing bug

2007-04-20 Thread Jesse Barnes

On Friday, April 20, 2007 11:28 am Linus Torvalds wrote:
> On Fri, 20 Apr 2007, Jesse Barnes wrote:
> > Sounds good, hopefully reassigning the bridge resources won't cause
> > too much trouble.  Do you have time to hack this up?  If not, I
> > could give it a try, as long as ajax is willing to test...
>
> Actually, I would suggest we not do it automatically (because the
> need for it is just so low, and the downsides are potentially huge -
> there are just too many resources that are "hidden" from us through
> ACPI tricks and having hardware that doesn't actually expose their
> PCI resources fully through the normal PCI resource setup).

Yeah, that's probably prudent.  OTOH we should probably let the user 
know in no uncertain terms that some of the stuff behind one of their 
bridges will be inaccessible.

Thanks,
Jesse
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/5] NFS: clean up the unstable write code

2007-04-20 Thread Trond Myklebust

From: Trond Myklebust <[EMAIL PROTECTED]>

Get rid of the inlined #ifdefs.

Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]>
---

 fs/nfs/write.c   |  117 --
 include/linux/nfs_page.h |   30 
 2 files changed, 71 insertions(+), 76 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index ad2e91b..3ed4feb 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -460,6 +460,43 @@ nfs_mark_request_commit(struct nfs_page *req)
inc_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
 }
+
+static inline
+int nfs_write_need_commit(struct nfs_write_data *data)
+{
+   return data->verf.committed != NFS_FILE_SYNC;
+}
+
+static inline
+int nfs_reschedule_unstable_write(struct nfs_page *req)
+{
+   if (test_and_clear_bit(PG_NEED_COMMIT, >wb_flags)) {
+   nfs_mark_request_commit(req);
+   return 1;
+   }
+   if (test_and_clear_bit(PG_NEED_RESCHED, >wb_flags)) {
+   nfs_redirty_request(req);
+   return 1;
+   }
+   return 0;
+}
+#else
+static inline void
+nfs_mark_request_commit(struct nfs_page *req)
+{
+}
+
+static inline
+int nfs_write_need_commit(struct nfs_write_data *data)
+{
+   return 0;
+}
+
+static inline
+int nfs_reschedule_unstable_write(struct nfs_page *req)
+{
+   return 0;
+}
 #endif
 
 /*
@@ -746,26 +783,12 @@ int nfs_updatepage(struct file *file, struct page *page,
 
 static void nfs_writepage_release(struct nfs_page *req)
 {
-   nfs_end_page_writeback(req->wb_page);
 
-#if defined(CONFIG_NFS_V3) || defined(CONFIG_NFS_V4)
-   if (!PageError(req->wb_page)) {
-   if (NFS_NEED_RESCHED(req)) {
-   nfs_redirty_request(req);
-   goto out;
-   } else if (NFS_NEED_COMMIT(req)) {
-   nfs_mark_request_commit(req);
-   goto out;
-   }
-   }
-   nfs_inode_remove_request(req);
-
-out:
-   nfs_clear_commit(req);
-   nfs_clear_reschedule(req);
-#else
-   nfs_inode_remove_request(req);
-#endif
+   if (PageError(req->wb_page) || !nfs_reschedule_unstable_write(req)) {
+   nfs_end_page_writeback(req->wb_page);
+   nfs_inode_remove_request(req);
+   } else
+   nfs_end_page_writeback(req->wb_page);
nfs_clear_page_writeback(req);
 }
 
@@ -1008,22 +1031,28 @@ static void nfs_writeback_done_partial(struct rpc_task 
*task, void *calldata)
nfs_set_pageerror(page);
req->wb_context->error = task->tk_status;
dprintk(", error = %d\n", task->tk_status);
-   } else {
-#if defined(CONFIG_NFS_V3) || defined(CONFIG_NFS_V4)
-   if (data->verf.committed < NFS_FILE_SYNC) {
-   if (!NFS_NEED_COMMIT(req)) {
-   nfs_defer_commit(req);
-   memcpy(>wb_verf, >verf, 
sizeof(req->wb_verf));
-   dprintk(" defer commit\n");
-   } else if (memcmp(>wb_verf, >verf, 
sizeof(req->wb_verf))) {
-   nfs_defer_reschedule(req);
-   dprintk(" server reboot detected\n");
-   }
-   } else
-#endif
-   dprintk(" OK\n");
+   goto out;
}
 
+   if (nfs_write_need_commit(data)) {
+   spinlock_t *req_lock = _I(page->mapping->host)->req_lock;
+
+   spin_lock(req_lock);
+   if (test_bit(PG_NEED_RESCHED, >wb_flags)) {
+   /* Do nothing we need to resend the writes */
+   } else if (!test_and_set_bit(PG_NEED_COMMIT, >wb_flags)) {
+   memcpy(>wb_verf, >verf, 
sizeof(req->wb_verf));
+   dprintk(" defer commit\n");
+   } else if (memcmp(>wb_verf, >verf, 
sizeof(req->wb_verf))) {
+   set_bit(PG_NEED_RESCHED, >wb_flags);
+   clear_bit(PG_NEED_COMMIT, >wb_flags);
+   dprintk(" server reboot detected\n");
+   }
+   spin_unlock(req_lock);
+   } else
+   dprintk(" OK\n");
+
+out:
if (atomic_dec_and_test(>wb_complete))
nfs_writepage_release(req);
 }
@@ -1064,25 +1093,21 @@ static void nfs_writeback_done_full(struct rpc_task 
*task, void *calldata)
if (task->tk_status < 0) {
nfs_set_pageerror(page);
req->wb_context->error = task->tk_status;
-   nfs_end_page_writeback(page);
-   nfs_inode_remove_request(req);
dprintk(", error = %d\n", task->tk_status);
-   goto next;
+   goto remove_request;
}
-   nfs_end_page_writeback(page);
 
-#if

[PATCH 2/5] NFS: Don't clear PG_writeback until after we've processed unstable writes

2007-04-20 Thread Trond Myklebust

From: Trond Myklebust <[EMAIL PROTECTED]>

Ensure that we don't release the PG_writeback lock until after the page has
either been redirtied, or queued on the nfs_inode 'commit' list.

Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]>
---

 fs/nfs/write.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 3ed4feb..8e94246 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -920,8 +920,8 @@ out_bad:
list_del(>pages);
nfs_writedata_release(data);
}
-   nfs_end_page_writeback(req->wb_page);
nfs_redirty_request(req);
+   nfs_end_page_writeback(req->wb_page);
nfs_clear_page_writeback(req);
return -ENOMEM;
 }
@@ -966,8 +966,8 @@ static int nfs_flush_one(struct inode *inode, struct 
list_head *head, int how)
while (!list_empty(head)) {
struct nfs_page *req = nfs_list_entry(head->next);
nfs_list_remove_request(req);
-   nfs_end_page_writeback(req->wb_page);
nfs_redirty_request(req);
+   nfs_end_page_writeback(req->wb_page);
nfs_clear_page_writeback(req);
}
return -ENOMEM;
@@ -1002,8 +1002,8 @@ out_err:
while (!list_empty(head)) {
req = nfs_list_entry(head->next);
nfs_list_remove_request(req);
-   nfs_end_page_writeback(req->wb_page);
nfs_redirty_request(req);
+   nfs_end_page_writeback(req->wb_page);
nfs_clear_page_writeback(req);
}
return error;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/5] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-20 Thread Trond Myklebust

I've split the issues introduced by the 2.6.21-rcX write code up into 4
subproblems.

The first patch is just a cleanup in order to ease review.

Patch number 2 ensures that we never release the PG_writeback flag until
_after_ we've either discarded the unstable request altogether, or put it
on the nfs_inode's commit or dirty lists.

Patch number 3 fixes the 'desynchronized value of nfs_i.ncommit' error. It
uses the PG_NEED_COMMIT flag as an indicator for whether or not the request
may be redirtied.

Patch number 4 protects the NFS '.set_page_dirty' address_space operation
against races with nfs_inode_add_request.

Finally, patch number 5 fixes an issue with the RPC code that is supposed
ensure that NFSv4 disconnects before resending a request. The current code
will disconnect for every request it resends (and has a bunch of false
positive cases), instead of just ensuring that it disconnects once every
time a timeout or a garbage reply occurs.

My thanks to the various patient victim^Wpeople who helped with extensive
testing.

Cheers
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: qla2xxx hba crashes with older 2310 cards

2007-04-20 Thread James Bottomley

On Fri, 2007-04-20 at 13:24 -0700, David Miller wrote:
> From: Robert Peterson <[EMAIL PROTECTED]>
> Date: Fri, 20 Apr 2007 10:40:30 -0500
> 
> > I've seen some chatter about the qla2xxx driver but not paid attention, so
> > I'm sorry if this is a known issue.  I've got an older qlogic hba, and 
> > recent
> > drivers don't seem to play nice with it.  I've got the latest firmware from
> > qlogic's web site.  I'm using a 2.6.21-rc6 kernel from Steve Whitehouse's
> > -nmw git tree.  Reverting to an older driver (but same kernel) and it works.
> > The current driver gives this:
> 
> Yes, known problem, I'm sorry these guys broke the driver for
> you as well, please see this thread:
> 
>   http://marc.info/?l=linux-kernel=117671067701124=2
> 
> This was really a stupid change to make.

OK,OK, we heard you the first time ... the maintainers will try to fix
this in a manner acceptable to all concerned ... could we try to cool
down the public traductions now?

James


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: AGPGart / AMD K7

2007-04-20 Thread Dave Jones

On Fri, Apr 20, 2007 at 04:22:06PM -0400, Preston A. Elder wrote:
 > Dave, Greg,
 > 
 > Here is the trace with 2.6.20.6
 > 
 > I added back in my trace code, as you see.  As you can also see,
 > agp_amdk7_probe is still not called.

Try looking down in __driver_attach()
The fact that we're not calling the ->probe function is quite bizarre.

It could be this in __driver_attach

if (!dev->driver)
driver_probe_device(drv, dev);

Though that'd be odd.

Putting a #define DEBUG 1 in drivers/base/dd.c may also yield some clues.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PCI bridge range sizing bug

2007-04-20 Thread Ivan Kokshaysky

On Fri, Apr 20, 2007 at 11:28:42AM -0700, Linus Torvalds wrote:
> Actually, I would suggest we not do it automatically (because the need for 
> it is just so low, and the downsides are potentially huge - there are just 
> too many resources that are "hidden" from us through ACPI tricks and 
> having hardware that doesn't actually expose their PCI resources fully 
> through the normal PCI resource setup).

Definitely. I was intending to enable that *only* with some boot option.

> Ivan, want to add some way to force that allocation (something like 
> "pci=assign-bus-resources")

Yes, hopefully I'll get something in a next couple of days.

Ivan.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: qla2xxx hba crashes with older 2310 cards

2007-04-20 Thread David Miller

From: Robert Peterson <[EMAIL PROTECTED]>
Date: Fri, 20 Apr 2007 10:40:30 -0500

> I've seen some chatter about the qla2xxx driver but not paid attention, so
> I'm sorry if this is a known issue.  I've got an older qlogic hba, and recent
> drivers don't seem to play nice with it.  I've got the latest firmware from
> qlogic's web site.  I'm using a 2.6.21-rc6 kernel from Steve Whitehouse's
> -nmw git tree.  Reverting to an older driver (but same kernel) and it works.
> The current driver gives this:

Yes, known problem, I'm sorry these guys broke the driver for
you as well, please see this thread:

http://marc.info/?l=linux-kernel=117671067701124=2

This was really a stupid change to make.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: AGPGart / AMD K7

2007-04-20 Thread Preston A. Elder

Dave, Greg,

Here is the trace with 2.6.20.6

I added back in my trace code, as you see.  As you can also see,
agp_amdk7_probe is still not called.

Linux agpgart interface v0.101 (c) Dave Jones
agp_amdk7_init: In function
agp_amdk7_init: Before pci_register_driver
__pci_register_driver: In Function (driver = agpgart-amdk7, multithread = 0)
__pci_register_driver: Before Spinlock
__pci_register_driver: Before Init List Head
__pci_register_driver: Before driver_register
bus_add_driver: In Function (c048e920)
bus_add_driver: Before kobject_set_name
bus_add_driver: error = 0
bus_add_driver: Before kobject_register
bus_add_driver: error = 0
bus_add_driver: Before driver_attach
bus_add_driver: error = 0
bus_add_driver: Before klist_add_tail
bus_add_driver: Before module_add_driver
bus_add_driver: Before driver_add_attrs
bus_add_driver: error = 0
bus_add_driver: Before add_bind_files
bus_add_driver: error = 0
bus_add_driver: Returning 0
__pci_register_driver: error = 0
__pci_register_driver: Before pci_create_newid_file
__pci_register_driver: error = 0
__pci_register_driver: Returning 0

Even when I start X (using the fglrx driver) I still do not see the
probe function being called.

Everything looks successful, too :(

I will try with 2.6.21rc7, but I don't hold out too much hope.

PreZ

Dave Jones wrote:
> On Fri, Apr 20, 2007 at 02:31:01PM -0400, Preston A. Elder wrote:
>
>  > Here is the code for __pci_register_driver:
>  > ...
>  > 
>  > So in the above case, we ARE saying if driver_register returns 0 then
>  > pci_create_newid_file.
>  > 
>  > Is it different to the code you have?  As I said, this IS 2.6.19.
>
> Yes, .20 changed this in this way..
>
> @@ -445,9 +442,12 @@ int __pci_register_driver(struct pci_driver *drv, struct 
> module *owner)
>  
> /* register with core */
> error = driver_register(>driver);
> +   if (error)
> +   return error;
>  
> -   if (!error)
> -   error = pci_create_newid_file(drv);
> +   error = pci_create_newid_file(drv);
> +   if (error)
> +   driver_unregister(>driver);
>  
> return error;
>  }
>
>
> Retry your tracing with .20 (or better yet, .21rc7/todays git)
>
>   Dave
>
>   

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cciss: Fix warnings during compilation under 32bitenvironment

2007-04-20 Thread James Bottomley

On Fri, 2007-04-20 at 12:30 -0700, Andrew Morton wrote:
> On Fri, 20 Apr 2007 14:50:06 -0400
> James Bottomley <[EMAIL PROTECTED]> wrote:
> 
> > > CONFIG_LBD=y gives us an additional 3kb of instructions on i386
> > > allnoconfig.  Other architectures might do less well.  It's not a huge
> > > difference, but that's the way in which creeping bloatiness happens.
> > 
> > OK, sure, but if we really care about this saving, then unconditionally
> > casting to u64 is therefore wrong as well ... this is starting to open
> > quite a large can of worms ...
> > 
> > For the record, if we have to do this, I fancy sector_upper_32() ... we
> > should already have some similar accessor for dma_addr_t as well.
> 
> hm.  How about this?
> 
> --- a/include/linux/kernel.h~upper-32-bits
> +++ a/include/linux/kernel.h
> @@ -40,6 +40,17 @@ extern const char linux_proc_banner[];
>  #define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))
>  #define roundup(x, y) x) + ((y) - 1)) / (y)) * (y))
>  
> +/**
> + * upper_32_bits - return bits 32-63 of a number
> + * @n: the number we're accessing
> + *
> + * A basic shift-right of a 64- or 32-bit quantity.  Use this to suppress
> + * the "right shift count >= width of type" warning when that quantity is
> + * 32-bits.
> + */
> +#define upper_32_bits(n) (((u64)(n)) >> 32)

Won't this have the unwanted side effect of promoting everything in a
calculation to long long on 32 bit platforms, even if n was only 32
bits?

> +
> +
>  #define  KERN_EMERG  "<0>"   /* system is unusable   
> */
>  #define  KERN_ALERT  "<1>"   /* action must be taken immediately 
> */
>  #define  KERN_CRIT   "<2>"   /* critical conditions  
> */
> _
> 
> It seems to generate the desired code.  I avoided Alan's ((n >> 31) >> 1)
> trick because it'll generate peculiar results with signed 64-bit
> quantities.

I've seen the trick done similarly with ((n >> 16) >> 16) which
shouldn't have the issue.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 >

1 - 100 of 768 matches

Mail list logo